Young people from small towns who label large AI models

2026/04/07 15:09

Author: Sleepy.md

Datong, Shanxi, a city that once relied on coal for half of its economy, has now shaken off its coal dust, picked up a sharp pickaxe, and is smashing it down on another invisible mine.

In the office buildings of the Jinmao International Center in Pingcheng District, there are no more elevator shafts, no more coal trucks.

Instead, thousands of tightly packed computer workstations have been replaced. The Shanghai Runxun Cloud Valley Big Data Smart Service Base occupies several floors, where thousands of young employees wearing headphones are staring at screens, clicking, dragging, and selecting. According to official data, as of November 2025, Datong City had 745,000 servers in operation, attracted 69 data labeling companies, created over 30,000 local jobs, and generated 750 million yuan in output value. In this digital mine, 94% of the workers are local residents. It's not just Datong. Among the first batch of data labeling bases designated by the National Data Administration, counties in central and western China, such as Yonghe County in Shanxi, Bijie in Guizhou, and Mengzi in Yunnan, are prominently featured. At the Yonghe County data labeling base, 80% of the employees are women. Most of them are stay-at-home mothers in rural areas, or young people who have returned to their hometowns because they couldn't find suitable jobs. A hundred years ago, the textile factories of Manchester, England, were crowded with farmers who had lost their land. Today, in front of computer screens in these remote county towns, sit young people who can't find their place in the real economy. They are engaged in a futuristic yet extremely primitive piecework, producing the data feed necessary for large models for artificial intelligence giants in Beijing, Shenzhen, and Silicon Valley. No one sees anything wrong with this. The essence of data annotation is teaching machines to recognize the world. Autonomous driving requires recognizing traffic lights and pedestrians, while large models need to distinguish between cats and dogs. Machines themselves lack common sense; humans must first draw a box on an image to show it "this is a pedestrian" before it can learn to recognize things on its own after absorbing millions of images. This job doesn't require a high level of education, only patience and a constantly clicking index finger. In the golden age of 2017, a simple 2D box could cost over ten cents, with some companies even offering as much as fifty cents. Fast-handed annotators could work over ten hours a day and earn five or six hundred yuan. In a county town, this was definitely a high-paying and respectable job. However, as large models evolved, the harsh reality of this production line began to emerge. By 2023, the unit price for simple image annotation had plummeted to 3 to 4 cents, a drop of over 90%. Even for more complex 3D point cloud maps—images composed of dense points that require magnification countless times to see their edges—annotators had to create a 3D bounding box with length, width, height, and tilt angles to perfectly enclose vehicles or pedestrians. Such a complex 3D bounding box only cost 5 cents. The direct consequence of the plummeting unit price is a dramatic increase in labor intensity. To cling to their meager monthly base salary of two to three thousand yuan, annotators must constantly and relentlessly improve their typing speed. This is by no means an easy white-collar job. In many annotation bases, management is suffocatingly strict; phone calls are not allowed during work hours, and mobile phones must be locked in storage compartments. The system precisely records each employee's mouse movement and dwell time. If a pause exceeds three minutes, a warning from the backend will lash out like a whip. Even more frustrating is the error tolerance. The industry standard is typically above 95%, with some companies even requiring 98%-99%. This means that if you draw 100 boxes and only two are wrong, the entire drawing will be sent back for revision. Dynamic drawings are frame-by-frame; vehicles changing lanes can be obscured, and annotators must rely on association to find them one by one. In 3D point cloud maps, any object with more than 10 points must be bounded. In a complex parking space project, lines that are too long or missing markings will always be found fault during quality control. Revising a drawing four or five times is commonplace. In the end, after spending an hour, you only get a few cents. A data annotator in Hunan posted her settlement statement on social media. After a day's work, she drew over 700 boxes, each worth 4 cents, for a total income of 30.2 yuan. This presents a starkly contrasting picture. On one hand, there are glamorous tech leaders at press conferences, discussing how AGI will liberate humanity; on the other hand, in county towns on the Loess Plateau and in the mountains of Southwest China, young people stare at screens for eight to ten hours a day, mechanically drawing thousands, tens of thousands of boxes, even dreaming of lane lines in the air. Someone once said that artificial intelligence appears as a luxury car speeding by, but if you open the door, you'll find a hundred people riding bicycles, gritting their teeth and pedaling furiously. Nobody sees anything wrong with that. The pieceworkers teaching machines "how to love" Once the bottleneck in image recognition was broken, large-scale models underwent a deeper evolution; they needed to learn to think, converse, and even demonstrate "empathy" like humans. This gave rise to the most crucial and expensive aspect of large-scale model training—RLHF (Reinforcement Learning Based on Human Feedback). In simple terms, it involves having real people rate the AI-generated answers, telling it which answers are better and more in line with human values and emotional preferences. ChatGPT appears "human-like" because it has countless RLHF (Reference Language Learning and Highly Formal) annotators training it. On crowdsourcing platforms, these annotation tasks are often openly priced: 3 to 7 yuan per task. Annotators need to give highly subjective emotional scores to the AI's answers, judging whether the answer is "warm," "empathetic," and "considerate of the user's emotions." A low-level worker earning a monthly salary of two or three thousand, struggling in the mire of reality, and having no time to even care about their own emotions, is forced to act as an emotional mentor and value judge for AI within the system. They need to forcibly break down and quantify extremely complex and subtle human emotions such as warmth and empathy into cold scores of 1 to 5. If their scores don't match the system's standard answers, they're judged as having a low accuracy rate, resulting in a deduction from their already meager piece-rate wages. This is a form of cognitive vacuum. Humanity's complex and subtle emotions, morality, and compassion are being forcibly dragged into the funnel of algorithms. In the cold scale of quantification and standardization, they are drained of their last bit of warmth. While you marvel at the cyber behemoths on your screen that have learned to write poetry and compose music, to offer comfort, and even don a melancholic facade, outside the screen, those once vibrant humans are degenerating into emotionless scoring machines through day after day of mechanical judgment. This is the most hidden side of the entire industry chain, never appearing in any financing news or technology white papers. Nobody sees anything wrong with it.

985 Master's Degree and Small-Town Youth

The basic task of creating frameworks is being crushed by the conveyor belt of AI. This cyber assembly line is beginning to spread upwards, devouring higher-level intellectual labor.

The appetite of large models has changed. It is no longer satisfied with chewing up simple common sense; it needs to devour human professional knowledge and high-level logic.

A special type of part-time job is frequently appearing on major recruitment platforms, such as "Large Model Logical Reasoning Labeling" and "AI Humanities Trainer". The threshold for this part-time job is extremely high, often requiring "a master's degree or above from a 985/211 university", involving professional fields such as law, medicine, philosophy, and literature.

Many graduate students from prestigious universities are attracted to these large companies' outsourcing groups. However, they quickly discover that this is not an easy mental workout, but rather a form of mental torture. Before officially accepting orders, they must read dozens of pages of scoring dimensions and evaluation criteria documents and conduct two to three rounds of trial annotations. After meeting the standards, if their accuracy rate is below average in the official annotation process, they will be disqualified and kicked out of the group chat. What's most suffocating is that these standards aren't fixed at all. Faced with similar questions and answers, using the same way of thinking to score them can yield completely opposite results. It's like taking an exam that's never finished and has no standard answer. Unable to improve accuracy through self-effort or learning, you can only keep going in circles, wasting mental and physical energy. This is the new form of exploitation in the era of big models—class folding. Knowledge, once seen as the golden ladder to break down barriers and climb upwards, has now become digital fodder offered to algorithms, more complex to chew on. Faced with the absolute power of algorithms and systems, the ivory tower master's students from top universities and the small-town youths from the Loess Plateau meet with the most bizarre convergence. They all fell into this bottomless cyber mine, stripped of their halos, their differences erased, all reduced to cheap, replaceable gears on the tracks. The same is true abroad. In 2024, Apple directly cut a 121-person AI voice annotation team in San Diego. These employees were responsible for improving Siri's multilingual processing capabilities; they once thought they were on the periphery of a major company's core business, but instantly plunged into the abyss of unemployment. In the eyes of tech giants, whether it's a small-town auntie pulling boxes or a logic trainer graduated from a prestigious university, they are essentially replaceable "consumables." Nobody sees anything wrong with this. The Tower of Babel, worth trillions, is built with pennies of blood and sweat. According to data released by the China Academy of Information and Communications Technology (CAICT), the Chinese data labeling market reached 6.08 billion yuan in 2023 and is projected to reach 20-30 billion yuan in 2025. It is predicted that by 2030, the global data labeling and service market will surge to 117.1 billion yuan. Behind these figures lies the valuation frenzy of tech giants like OpenAI, Microsoft, and ByteDance, with valuations reaching hundreds of billions or even trillions of dollars. However, this immense wealth has not flowed to those who truly "feed" AI. China's data annotation industry exhibits a typical inverted pyramid outsourcing structure. At the top are the tech giants who tightly control the core algorithms; the second layer consists of large data service providers; the third layer comprises data annotation bases and small-to-medium-sized outsourcing companies scattered across the country; and at the bottom are the labor-intensive annotation workers paid by piecework. Each layer of outsourcing takes a cut. When a large company offers 50 cents per unit, after layers of skimming, the annotation worker in a county town might receive less than 5 cents. Former Greek Finance Minister Yannis Varoufakis, in his book *Technological Feudalism*, put forward a highly insightful view: today's tech giants are no longer capitalists in the traditional sense, but rather "cloud lords." What they possess are not factories and machines, but algorithms, platforms, and computing power—the digital territory of the cyber age. In this new feudal system, users are not consumers, but digital tenants. Every like, comment, and view we make on social media is providing data to the cloud lords for free. And those data labelers distributed in lower-tier markets are the lowest-level digital serfs in this system. They not only produce data, but also clean, classify, and score massive amounts of raw data, transforming it into high-quality feed that large models can digest. This is a covert cognitive land grab. Just as the Enclosure Movement in 19th-century England drove farmers into textile factories, today's AI wave is driving young people who cannot find their place in the real economy to the screen. AI hasn't bridged the class divide; instead, it's created a "data and sweatshop transport belt" stretching from county towns in central and western China to the headquarters of tech giants in Beijing, Shanghai, Guangzhou, and Shenzhen. The narrative of technological revolution is always grand and glamorous, but its underlying reality is always the large-scale consumption of cheap labor. Nobody sees anything wrong with this. The cruelest ending is fast approaching, and faster than ever. With the leap in the capabilities of large-scale models, labeling tasks that once required round-the-clock human labor are being taken over by AI itself. In April 2023, Li Xiang, founder of Li Auto, revealed data at a forum showing that in the past, Li Auto had to manually calibrate approximately 10 million frames of autonomous driving images annually, with outsourcing costs approaching 100 million yuan. However, after using large models for automated annotation, what used to take a year could be completed in approximately 3 hours. This efficiency is 1000 times that of humans, and this was back in 2023. Just last March, Li Auto also released its new generation MindVLA-o1 automatic annotation engine. A self-deprecating saying circulates in the industry: "The more intelligence, the more manual labor." But now, major companies' investment in data annotation outsourcing has seen a precipitous drop of 40%-50%. Those young people from small towns who spent countless days and nights sitting in front of their computers, their eyes bloodshot, fed a behemoth with their own hands. And now, this behemoth is turning around and smashing their rice bowls. Night falls, and the office buildings in Pingcheng District of Datong remain as white as day. Young people changing shifts silently exchange weary bodies in the elevators. In this folded space tightly bound by countless polygonal frames, no one cares about the epic leap the Transformer architecture across the ocean has achieved, nor does anyone understand the roar of computing power behind hundreds of billions of parameters. Their gaze was fixed solely on the red and green progress bar in the background, representing the "passing grade," calculating whether those few cents or dimes of piecework could piece together a decent life by the end of the month. On one side, the Nasdaq bell rang and tech media relentlessly reported on AGI's arrival; on the other, these digital serfs, feeding AI with their own flesh and blood, could only wait anxiously in their aching sleep for the beast they had personally nurtured to carelessly kick their rice bowls away on some seemingly ordinary morning. No one saw anything wrong with this.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

Live Updates

5 hours ago
Binance Launches Gold vs. BTC Trading Competition with Dynamic Prize Pool
Bullish
Bearish
5 hours ago
Ocasio-Cortez Criticizes Trump's Approach to Iran
Bullish
Bearish
5 hours ago
Russia Advances Crypto Framework Bill with Central Bank Oversight
Bullish
Bearish
5 hours ago
AI TRENDS | Japan's Finance Minister to Discuss Threats Posed by Anthropic's Mythos Model with Banking Sector
Bullish
Bearish
5 hours ago
The Venus attackers transferred 2,301 ETH, which were then laundered in batches through Tornado Cash.
Bullish
Bearish
5 hours ago
AI TRENDS | Meta Utilizes Employee Data for AI Model Training
Bullish
Bearish
5 hours ago
Meta plans to monitor employee mouse and keystrokes to train AI models using internal operations.
Bullish
Bearish
6 hours ago
Analysts: AI funding reached $242 billion in Q1, but the funds were highly concentrated in a few "mega-funding rounds".
Bullish
Bearish
6 hours ago
Spot gold has climbed above $4,750 per ounce, up 0.67% on the day.
Bullish
Bearish
6 hours ago
BELIEF's market capitalization surpasses $20 million, with a daily increase of over 30%.
Bullish
Bearish

Young people from small towns who label large AI models

985 Master's Degree and Small-Town Youth

Live Updates

Trending News

2 Million Breakthrough: DeepMind's AI Revolution in Material Discovery

Sequence Introduces No-Code Solution for Web3 Game Development

Binance Pioneers World's First Crypto Triparty Arrangement with Banking Partner

Chainlink Sees Record Whale Activity Ahead of Staking Update

Casio's Exclusive VIRTUAL G-SHOCK NFTs: What You Need to Know

Coinbase CEO Armstrong Urges Stronger U.S. Crypto Regulatory Framework

Nexo Collaborates with Koinly to Ease Crypto Tax Reporting

Is Binance Trying to Regain Trust Through 'Crypto is Better with Binance' Campaign?

Circle's Bold Stance: Rejecting Allegations of Illicit Financing and Ties to Justin Sun

Rollbit Surpasses $500M Market Cap, Generating Impressive Fee Revenue