AI Content Moderation Struggles to Cover Africa’s 2,000 Languages

Nairobi: Bereket Tsegay spent his days watching videos he did not understand. He was hired to moderate content on TikTok at the company's Kenya hub, one of the main centers for AI-assisted content review across Africa. He spoke Amharic, Ethiopia's official language, but the videos in his queue came from across the continent in languages like Luo, Dholuo, Kikuyu, Dinka, and dozens more. When nothing in the visuals looked obviously wrong, and nobody had reported the video, he usually left it up. When it had been reported many times, he took it down. He has since left the job, and he is candid about what he saw: the system was doing its best with almost no real understanding of the content it was judging. According to Global Voices, Bereket's account is one snapshot of a much wider problem. Africa has more than 2,000 languages, yet AI systems that moderate content across the continent were built primarily on English-language data, with some coverage of a handful of global languages. A 2025 study, "The State of Large Language Models for African Languages," found that only 42 African languages appear in any meaningful way across the systems reviewed. Just four languages-Amharic, Swahili, Afrikaans, and Malagasy-are handled with any degree of consistency. That leaves more than 98 percent of Africa's languages essentially invisible to the moderation systems that decide what stays up and what gets removed. The consequences fall on real people. Jackson Busolo, a Kenyan TikTok creator who posts in Swahili, mostly about politics, experienced this firsthand when his account was inexplicably removed and later restored without explanation. According to TikTok's Q1 2025 Community Guidelines Enforcement data, as reported by Business Daily Africa, between January and March 2025, TikTok removed more than 450,000 videos from Kenya alone and banned over 43,000 accounts. By the second quarter, removals had climbed to 592,000. The platform attributes most of this to automated systems but declined to specify which African languages i ts AI moderation tools cover. When a moderation system cannot process a language, it is less likely to flag content for human review. It relies on indirect signals such as user reports, visual cues, or audio patterns from languages it does recognize. This leads to both false positives and false negatives, with content sometimes removed unjustly or harmful content in unrecognized languages left unchecked. Mercy Mutemi, executive director of the Oversight Lab, highlighted the issue of an English-trained algorithm tasked with moderating content in a multilingual environment. The problem extends to journalists and civil society, where disinformation in African languages can gain more traction due to slower moderation responses. Fact-checkers have had to manually track posts in languages like Amharic during political tensions, performing work that should have been managed by platform systems. Efforts to address this gap are underway but are scattered and under-resourced. Research groups like AfricaNLP are worki ng on multilingual datasets and models for African languages. The 2025 AfricaNLP workshop included projects on hate speech detection and news classification in low-resource languages. Some commercial efforts are also emerging, with companies like Cohere partnering with initiatives to integrate African language datasets into their models. The African Union's Continental AI Strategy and national AI strategies, such as Nigeria's, emphasize the importance of linguistic diversity. However, strategy documents alone do not close the gap between current capabilities and the needs of Africa's diverse languages. The language gap in AI content moderation is a known issue with known causes, primarily the economics of AI system development favoring languages with abundant digital text. Regulatory pressures from the EU AI Act and the Digital Services Act, which enforce non-discrimination and transparency in AI systems, may compel platforms to address this issue. Africa's rapidly growing social media user base further und erscores the need for systems that truly work for the continent's diverse linguistic landscape. Addressing this gap requires recognizing it as a problem rather than an acceptable trade-off.

AI Content Moderation Struggles to Cover Africa’s 2,000 Languages

RECENT POSTS >>