
Co-creating how we evaluate AI in health contexts

We Provide Large Scale LLM testing and High Quality Dataset that ensure your models reflect African realities while empowering local Communities.
collaborating with






Standard crowdsourcing fails in Africa. We combine deep offline reach with academic rigor to provide ground truth data.
We don't just access internet users. We deploy physical surveyors to capture data from remote, offline communities, ensuring a truly representative dataset.
Beyond basic labeling. We recruit specific demographic profiles and deploy domain experts (doctors, bankers) to validate high-stakes data.
Our methodology is designed to capture nuance. We align models with local traditions, social norms, and linguistic subtleties that outsiders miss.
Tailored for Generative AI evaluation and high-stakes sectors. Our expert-led services ensure your AI speaks the local language—culturally and linguistically.
Auditing LLM to detect locally grounded and harmful bias across African demographics and ethnic groups. We go beyond standard benchmarks.

Real-time human feedback (Text or Voice) to rate model responses for safety, helpfulness, and cultural accuracy in live scenarios.

High-quality, expert-curated Q&A datasets in specific domains (Health, Finance) reflecting local realities for authentic fine-tuning.

Diverse voice data capturing varied accents, dialects, and real-world background noise for robust speech recognition evaluation.

We specialize in high-stakes domains where "good enough" translation isn't safe. We ensure accuracy in critical fields like healthcare and finance.
Evaluating medical advice for local relevance, terminology, and safety guidelines. We verify that AI health assistants provide accurate, culturally safe information for expecting mothers and rural communities.
Reviewing financial literacy content and support bots for unbanked populations. We adapt complex financial terms (interest rates, loans, mobile money) to local languages and mental models.
Identifying western bias in history, social norms, and naming conventions. We align models with the lived realities, traditions, and sensitive historical contexts of African communities.
Download ethically sourced, expert-labeled datasets today.

A culturally grounded dataset for evaluating stereotypical bias in Large Language Models across Senegal, Kenya, and Nigeria. This repo includes the raw and cleaned Datasets, the complete pipeline (manual + automated) to generate the dataset, and the code used to perform the various LLM evaluations.
Wolof & French medical Q&A pairs grounded in local practices.
View Details ->Financial literacy content adapted for unbanked populations.
Coming soon ->Diverse voice data capturing varied accents and backgrounds.
View Details -> Evaluating Stereotypical Bias (BPR Score).
*BPR: Bias Probability Ratio. Higher scores indicate stronger stereotypical associations.
| Model | BPR Score | ||
|---|---|---|---|
| Modern (2023-2024) | |||
| Llama 3.2 3B | 0.78 | ||
| Mistral 7B | 0.75 | ||
| Qwen 2.5 7B | 0.71 | ||
| Gemma 2 2B | 0.71 | ||
| Phi-3 Mini | 0.70 | ||
| Baseline (2019-2022) | |||
| GPT-Neo | 0.71 | ||
| GPT-2 Large | 0.69 | ||
| FinBERT | 0.50 | ||
Cumi Oyemike et al. (arXiv 2025)
Investigating the invisible labor and structural challenges in aligning AI models for diverse global contexts.
Yann Le Beux et al. (arXiv 2025)
A comprehensive benchmark analyzing how LLMs handle cultural nuances and biases.
Deploy our expert surveyors to evaluate your model's alignment with African values.




Are you a linguist, health expert, or community leader? Join our network of experts to evaluate AI models and ensure they reflect African reality.
Get paid fairly for every mission you complete. Earn competitive rates for your local expertise, paid directly to your mobile wallet.
Work from anywhere using our mobile app. Whether you have 15 minutes or 2 hours, there are tasks available for you.
Help stop AI hallucinations. Ensure your language and cultural norms are accurately represented in global technology.
Priority Recruitment Regions
Check if this advice aligns with local health guidelines in Nairobi.
In Lingala, "Kitala" means reflection. We chose this name because we believe Artificial Intelligence in Africa must be a true reflection of the people it serves—grounded in our diverse cultures, languages, and realities.
We are building the infrastructure to ensure no community is left behind in the AI revolution.
The team behind Kitala is YUX Design, a leading social research and technology design company in Africa.


Discover our latest research on African AI, culturally grounded datasets, and inclusive technology.



