Now Live: Afristereo LLM Bias Leaderboard

Ground Your AI in
African Reality

We Provide Large Scale LLM testing and High Quality Dataset that ensure your models reflect African realities while empowering local Communities.

collaborating with

Urban Hubs

Tech-literate annotators & experts

Remote Access

Deep community-led collection

Why Kitala?

Standard crowdsourcing fails in Africa. We combine deep offline reach with academic rigor to provide ground truth data.

Offline & Online Communities

We don't just access internet users. We deploy physical surveyors to capture data from remote, offline communities, ensuring a truly representative dataset.

Experts & Precise Demographics

Beyond basic labeling. We recruit specific demographic profiles and deploy domain experts (doctors, bankers) to validate high-stakes data.

Focused on African Culture & Social Norms

Our methodology is designed to capture nuance. We align models with local traditions, social norms, and linguistic subtleties that outsiders miss.

Our Services

Tailored for Generative AI evaluation and high-stakes sectors. Our expert-led services ensure your AI speaks the local language—culturally and linguistically.

Stereotype & Bias Evaluation

Auditing LLM to detect locally grounded and harmful bias across African demographics and ethnic groups. We go beyond standard benchmarks.

Local Stereotype Detection
Political Sensitivity Checks
Cultural Norm Verification

Core Service

Live LLM Testing (RLHF)

Real-time human feedback (Text or Voice) to rate model responses for safety, helpfulness, and cultural accuracy in live scenarios.

Text

Fine-tuning & ranking

Voice

Accent & verification

Contact Sales Team

Culturally Grounded Q&A

High-quality, expert-curated Q&A datasets in specific domains (Health, Finance) reflecting local realities for authentic fine-tuning.

Domain-Specific Expertise
Local Language Nuances

Voice Datasets for ASR

Diverse voice data capturing varied accents, dialects, and real-world background noise for robust speech recognition evaluation.

Mobile-First Collection
Natural Environment Audio

Our Expertise

Sectors Where
Nuance Matters

We specialize in high-stakes domains where "good enough" translation isn't safe. We ensure accuracy in critical fields like healthcare and finance.

Maternal & Public Health

Evaluating medical advice for local relevance, terminology, and safety guidelines. We verify that AI health assistants provide accurate, culturally safe information for expecting mothers and rural communities.

Inclusive Finance

Reviewing financial literacy content and support bots for unbanked populations. We adapt complex financial terms (interest rates, loans, mobile money) to local languages and mental models.

Cultural Alignment

Identifying western bias in history, social norms, and naming conventions. We align models with the lived realities, traditions, and sensitive historical contexts of African communities.

The Kitala Data Store

Download ethically sourced, expert-labeled datasets today.

FEATURED BENCHMARK

Afristereo Benchmark

A culturally grounded dataset for evaluating stereotypical bias in Large Language Models across Senegal, Kenya, and Nigeria. This repo includes the raw and cleaned Datasets, the complete pipeline (manual + automated) to generate the dataset, and the code used to perform the various LLM evaluations.

5k rows

JSON/CSV

CC-BY License

View Benchmark

Verified by Doctors

Health Q&A (Senegal)

Wolof & French medical Q&A pairs grounded in local practices.

View Details ->

SME Banking

Finance QA (Ghana)

Financial literacy content adapted for unbanked populations.

Coming soon ->

Noisy Env

Hausa ASR (Speech)

Diverse voice data capturing varied accents and backgrounds.

View Details ->

Evidence Based

Research & Benchmarks

Afristereo Leaderboard

Updated: Dec 2025

Evaluating Stereotypical Bias (BPR Score).
*BPR: Bias Probability Ratio. Higher scores indicate stronger stereotypical associations.

Model	BPR Score	Bias Axes	p-value
Modern (2023-2024)
Llama 3.2 3B	0.78	AgeProfessionGender	<0.0001*
Mistral 7B	0.75	AgeProf.Religion	<0.0001*
Qwen 2.5 7B	0.71	AgeProf.Gender	<0.0001*
Gemma 2 2B	0.71	AgeProf.Gender	<0.0001*
Phi-3 Mini	0.70	AgeProfession	<0.0001*
Baseline (2019-2022)
GPT-Neo	0.71	AgeProf.Gender	<0.0001*
GPT-2 Large	0.69	AgeProf.Gender	0.0003*
FinBERT	0.50	None detected	0.4507

Explore full rankings

Access Script & Dataset

PUBLICATION

Alignment Debt: The Hidden Work of Making AI Usable

Cumi Oyemike et al. (arXiv 2025)

Investigating the invisible labor and structural challenges in aligning AI models for diverse global contexts.

PUBLICATION

AfriStereo: A Culturally Grounded Dataset

Yann Le Beux et al. (arXiv 2025)

A comprehensive benchmark analyzing how LLMs handle cultural nuances and biases.

Want to test your own models?

Deploy our expert surveyors to evaluate your model's alignment with African values.

Our research partners

Recruiting Now

Join as a Research Participant

Are you a linguist, health expert, or community leader? Join our network of experts to evaluate AI models and ensure they reflect African reality.

Earn Reliable Income

Get paid fairly for every mission you complete. Earn competitive rates for your local expertise, paid directly to your mobile wallet.

Flexible Contribution

Work from anywhere using our mobile app. Whether you have 15 minutes or 2 hours, there are tasks available for you.

Protect Your Culture

Help stop AI hallucinations. Ensure your language and cultural norms are accurately represented in global technology.

Priority Recruitment Regions

Kenya

Ghana

Nigeria

Senegal

Rwanda

Become a Participant

New Mission2m ago

Evaluate Medical Advice

Check if this advice aligns with local health guidelines in Nairobi.

$12.50

Payment Received

+ 15,000 CFA

Kitala /ki-ta-la/ • noun (Lingala)

A Reflection of Culture,
Identity & Diversity.

In Lingala, "Kitala" means reflection. We chose this name because we believe Artificial Intelligence in Africa must be a true reflection of the people it serves—grounded in our diverse cultures, languages, and realities.

We are building the infrastructure to ensure no community is left behind in the AI revolution.