A little over a month ago, I wrote an article about the alleged worker’s rights abuses perpetrated by Remotasks, a subsidiary of Scale AI, a 14-trillion-dollar company packaging training data for some of the most prominent AI companies in the world. Although their army of data-labelers, many of whom hail from developing countries without adequate economic bargaining power or work infrastructure, has remained instrumental to the development of modern-day LLMs such as ChatGPT and Gemini, they have, as alleged, received less-than-adequate protection from harmful businesses practices such as exposure to traumatic material, suspended or missing payments, and lack of communication on their platforms. Now under investigation for the Fair Labor Act, Scale AI’s practices have spawned a new facet in an already challenging discussion about how AI might revolutionize not only the efficiency of future workers but the fields of work they will be employed in.

It All Comes Down to Data

Historically, AI has been trained using natural data—that is, data created by humans through classification or recording in various forms (e.g. the article that you’re reading right now is considered natural data, since I (TY) wrote it and I am very much a real human). Yet, the creation of natural data is very expensive, especially since the rise of huge modern LLMs with trillions of parameters constitutes a virtual “black hole” of high-quality data—there can never be enough, and the more the merrier. So, as models require more and more data, there arises a problem: big companies leading the charge in AI development are starting to run out of data. Thus, many of them turned towards a new source of said data. Synthetic data is considered high-quality by many, but there’s a catch—synthetic data is created by AIs or algorithms instead of humans (e.g. a picture of a cat generated by Midjourney would be considered synthetic data). Compared to its natural counterpart, synthetic data is inexpensive and efficient to produce, and many companies have accepted them as part of their training pipelines.

So what does this mean for workers in data labeling, such as those employed by firms like Scale AI? Well, it’s complicated. Synthetic data will almost certainly revolutionize cheaper, domain-specific models that require less processing power to run. In fact, synthetic data plays an important role in the process of model distillation or model compression, fascinating ways to improve efficiency for current AI models that we will cover in the future. However, people have hypothesized that natural data is indispensable to the advancement of top-of-the-line AI models. A recent study published in Nature highlights the phenomenon known as catastrophic model collapse: if a model trained on another’s inputs was used to train another model, the resultant model will start losing information that the previous model had; after a few more iterations, the model’s responses would barely resemble that of the original. In the future, a huge data labeling industry may still be required to make our existing AI models better.

What’s Wrong?

As mentioned in our previous article, a lot could go wrong with having a massive asynchronous workforce labeling data around the clock. Data corporations such as Scale AI tend to label their workers as contractors instead of employees. This classification, probably meant to decrease the attachment of workers to their companies, reflects the informal nature of industries formed to prop up AI development: while some established employees view it as a “side-gig”, others depend on it for their livelihood. To support the growth of many frontier AI models, the industry may require tighter definitions on what data-labeling firms should and should not do regarding their laborers. To exemplify:

Efficiency and Psychology

As for existing industries running the gamut from fast food to business logistics, there has been extensive debate about whether future AI systems (robotics and autonomous programs) will be a sufficiently satisfactory replacement for human workers in various tasks. Although the general capability of AI likely will surpass that of human workers in the future, these debates are more nuanced than a simple matter of ability. Suppose you are consulting a doctor on a disease that you have. Between explaining what the ailment means in laymen’s terms, the doctor reassures you that everything will be fine, and that patients like you tend to recover quickly. His explanation makes you feel a lot less anxious, and may even bolster your will to recover.

Psychology is powerful, even today. When you pay a tip for a waitress, you don’t just pay for the actual service of delivering the food; you pay for the warm greeting, the pleasant tone of voice, the suggestions for an order, and then the delivery of the food. We may not think much of how big an impact these subtle points play in our desire to demand a certain service, but the occasional dissatisfaction with the imperfections of humanity pales in comparison with a hypothetical future of absolute autonomy. Imagine a program delivering the news that you will likely die within a few years, attempting to be calming with its words rather than its emotions. A human, in contrast, will soften the blow simply by being human. An AI will do just as well, and most likely better, on benchmarking tests, but it cannot replace all the tiny nuances that make us human.

The Price is Right

So what can AI help with? The answer is simple: humans will continue to do what they do best, interacting with other humans and empathize with other humans. Not only does this apply to services whose primary appeal is human contact (e.g. therapy, coaching, acting), but also the services where human contact is a byproduct (e.g. cooking, gardening). Although the latter will inevitably be mostly replaced with AI, I surmise that, with the widespread adoption of AI, there will be various “classes” of goods and services whose price and perceived quality is proportional to the degree of AI involvement in these products.

Let me explain. You can buy a wooden chair at IKEA for a very cheap price, since the furniture company produces a lot of these chairs at a very low cost-per-unit. However, if you were to buy a one-of-a-kind artisan chair, you would have to pay a lot more, since the chair needs more time and energy to be produced. We can apply this same principle to generative AI in intellectual property: a future person who wants a custom painting can either use a free AI tool to create it or commission a human artist. The difference in price will probably be substantial—AI is definitely more efficient (quantitatively speaking) at producing goods than humans—to the point where AI-generated art will be far more prevalent than human-created art, creating an artificial scarcity which may make human products sell at a premium. Now, this premium, if it is perceived as such, may even supplement the idea that it is of higher quality, simply because it is more expensive.

So what about products that are not completely created with AI? Today’s chatbots generate different responses depending on the detailedness and accuracy of the prompts they receive, and there is no telling whether an AI like ChatGPT will perform drastically differently if used with skill. Depending on the autonomy and contextual perception of future AI, there may arise a market for human professionals who specialize in the utilization of AI in a specific field, such as short video creation or writing, to maximize the quality of AI work with a degree of human intervention. Again, it all depends on how able future AI systems are. We could see a lot of such people using AI as tools to bolster work, or a relative lack of these skilled AI workers as the populace turns to sufficiently capable generative AI to do their work.

This article is brought to you by Our AI, a student-founded and student-led AI Ethics organization seeking to diversify perspectives in AI beyond what is typically discussed in modern media. If you enjoyed this article, please check out our monthly publications and exclusive articles at https://www.our-ai.org/ai-nexus/read!

Oddly Darwinian

So back to the age-old question, will AI replace jobs? In a sense, current research is deeply divided on the topic: some studies argue that AI will create more jobs than it eliminates, while others contend that this is not the case. In my opinion, AI will definitely create new industries (as an example, see the data-labeling industry), but it will also require a lot of workers, especially those engaging in intellectual labor jobs) to start adapting to the use of AI.

How can workers become more prepared? As a start, they can start learning about developments in AI, not just the facts on the surface, but also the deeper implications of the advancements. For example, as computing centers grow larger and larger, they essentially become huge power consumers that ultimately may become centers of energy production and distribution, which may in turn lead to a growth in the energy industry. Furthermore, it is not a bad time to start learning how to use AI the most effectively, whether attempting to give it instructions to speak in a different tone or practice using AI to do certain trivial tasks (like writing an email or drafting some goals for work).

Although it’s vital that humans start getting used to working with AI, it’s still important to note that our humanity, by itself, makes us valuable in our own way.


Written by Thomas Yin