Appen declined to make an attributable comment.
“If we suspect that a user has violated the terms of service, Toloka will carry out an identity check and request an ID photo as well as a photo of the user holding the ID,” explains Geo Dzhikaev , responsible for Toloka operations.
Driven by a global rush toward AI, the global labeling and data collection industry is expected to reach more than $17.1 billion by 2030, according to Grand View Research, a market research firm and advice. Crowdsourcing platforms such as Toloka, Appen, Clickworker, Teemwork.AI and OneForma connect millions of remote workers in the Global South to tech companies located in Silicon Valley. The platforms publish microtasks from their technology clients, which include Amazon, Microsoft Azure, Salesforce, Google, Nvidia, Boeing and Adobe. Many platforms also partner with Microsoft’s own data services platform, the Universal Human Relevance System (UHRS).
These workers are primarily based in East Africa, Venezuela, Pakistan, India and the Philippines, although there are even some workers in refugee camps, which label, evaluate and generate data. Workers are paid by the piece, with pay ranging from a penny to a few dollars, although the high end is considered a rare find, workers say. “The nature of the work often feels like digital servitude, but it’s a necessity to make a living,” says Hassan, who now also works for Clickworker and Appen.
Sometimes workers are asked to upload audio, images and videos, which contribute to the datasets used to train the AI. Workers are usually not sure how their submissions will be handled, but these can be quite personal: on Clickworker’s Worker Jobs tab, a task says: “Show us your baby/child!” Help teach AI by taking 5 photos of your baby/child! » for €2 ($2.15). The next one says: “Let your minor (aged 13-17) participate in an interesting selfie project! »
Some tasks involve content moderation, helping the AI distinguish between innocent content and that which contains violence, hate speech or adult images. Hassan shared screen recordings of the tasks available the day he spoke with WIRED. A UHRS task asked him to identify “whore”, “c**t”, “dick” and “slut” from a body of text. For Toloka, she was shown pages and pages of partially naked bodies, including sexualized images, lingerie advertisements, a sculpture on display, and even a naked body taken from a Renaissance-style painting. Task? Decipher adult from benign to help the algorithm distinguish between salacious and entitled torsos.