Building Africa's Human
Data Infrastructure
A capacity-building workshop at DLI 2026 on community-led dataset creation and sustainable AI pipelines focused on strengthening the people, skills, and systems Africa's AI future depends on.
About the Workshop
The rapid growth of generative AI has intensified demand for high-quality datasets, yet progress in African AI remains constrained by gaps in human data infrastructure - the people, skills, and coordinated pipelines required to create, translate, validate, and maintain datasets for African languages and contexts.
Hosted by Tonative Africa at the Deep Learning Indaba 2026 in Nigeria, this session focuses on capacity building for creators, translators, validators, and annotators. Through short talks, breakout discussions, and collaborative roadmap design, participants will explore best practices for dataset creation, quality assurance, and long-term capacity development across African language communities.
The session aims to produce shared guidelines, identify priority challenges, and co-develop a roadmap for strengthening Africa's sovereign, sustainable, and locally owned AI ecosystems. All participants must be present in person.
Location
Nigeria: Deep Learning Indaba 2026 (in-person only)
Format
Forums and dialogues (in-person; virtual or hybrid not supported)
Duration
1 hour, 30 minutes
Registration
Via the DLI 2026 conference registration process
AV & Scheduling
Coordinated directly with the Indaba organising team
Session Agenda
Full schedule to be confirmed with the Indaba organising team.
Opening & Context
Welcome, session goals, and introduction to the Tonative Data Academy and African AI data challenges. Facilitated by Cynthia Amol.
Guest Talk
A short invited talk highlighting challenges in African dataset creation, community capacity-building initiatives, and sustainable data pipelines.
Breakout Discussions
Participants split into groups around key pipeline stages which encompasses dataset creation & collection, translation & validation & annotation, and dataset usage & evaluation in order to identify challenges, needs, and opportunities.
Collaborative Roadmap Building
Groups share key insights and co-develop a shared roadmap for strengthening capacity, improving coordination across language communities, and designing scalable data pipelines.
Synthesis & Next Steps
Key takeaways, opportunities for collaboration, and post-Indaba follow-up plans including a shared resource toolkit and cross-community collaboration network.
Workshop Organisers

Alfred Kondoro
Head of Research, Tonative Africa
Lead Organiser
Alfred is a Tanzanian PhD researcher in Data Science at Hanyang University, Republic of Korea. He leads community-driven research initiatives at Tonative Africa aimed at strengthening African representation in AI, with work spanning NLP, HCI, and ICTD. His publications have appeared at EACL, AAAI, ACL, CHI, IMWUT, CIKM, CUI, and AfriCHI venues.
LinkedIn
Sharon Ibejih
Founder, Tonative Africa
Organiser
Sharon is a Senior Data Scientist at Ignite Energy Access and the Founder of Tonative Africa. Her work focuses on NLP, data curation, and AI pipeline design for low-resource African languages. She holds an MSc in Data Science and has presented at AfricaNLP, NeurIPS WiML, ICLR workshops, Deep Learning Indaba, and CVPR.
LinkedIn
Cynthia Amol
Co-Founder & Head of Data, Tonative Africa
Organiser
Cynthia is a PhD student in Computer Science and Google NLP Fellow at Maseno University, Kenya. A Deep Learning Indaba Alele-Williams Masters Award recipient, she leads the data validation pipeline at Tonative Africa and has co-organised workshops at NeurIPS, LREC-COLING, EACL, and CHI.
LinkedIn
Chinenye Anikwenze
Engineering Lead, Tonative Africa
Organiser
Chinenye is a Software Engineer and Automation Specialist focusing on defensive infrastructure and AI safety. As Engineering Lead at Tonative Africa, she manages technical infrastructure for 400+ contributors. Her research on Semantic Collapse and the security of tonal languages was recently presented at AFLC 2026 and Impact Fellowship Summit IREX 2026.
LinkedIn
Joy Olusanya
NLP Researcher & Training Manager, Tonative Africa
Organiser
Joy is a linguist and NLP researcher focusing on low-resource language technologies, multilingual NLP, and benchmark evaluation. She served as Workshop Chair for the CLRLCβLLMs Workshop at NeurIPS 2025 and is Founder and CEO of the Center for Low-Resource Languages and Cultures.
LinkedIn
Armand Bukama
Social Manager, Tonative Africa
Organiser
Armand is a Congolese computer scientist from the DRC with a degree from the Catholic University of Bukavu. He leads community engagement, outreach, and communications at Tonative Africa, and works at the intersection of AI, electronics, and sustainable energy for underserved communities.
LinkedIn
Faisal Muhammad Adam
Hausa Language Validation Lead, Tonative Africa
Organiser
Faisal is a lecturer and data science practitioner based in Kano, Nigeria, pursuing graduate studies in Applied Data Science at WorldQuant University. He serves as Hausa Language Validation Lead at Tonative Africa, coordinating contributors on multilingual dataset validation and quality assurance.
LinkedIn
Godspraise Okechukwu
Project Lead, Tonative Research Group
Organiser
Godspraise is a software engineer and NLP researcher based in Nigeria. As Project Lead in the Tonative Research Group, he contributes to dataset creation, validation, and multilingual resource development for African languages, building community-driven data pipelines for low-resource languages.
LinkedInWho Should Attend
This workshop is designed for a broad community of African AI practitioners and contributors.
Researchers
Working on African language technologies
Dataset Creators & Annotators
Building training data for African languages
Translators & Validators
Ensuring linguistic accuracy and contextual relevance
Open-Source Contributors
Supporting community-driven AI tools and pipelines
Students & Educators
Working on data-centric AI in academic settings
AI Practitioners
Building AI products and services for African users
African Linguists
With an interest in data creation for their languages
Expected Outputs
The workshop aims to produce tangible, community-owned resources and connections.
Community Guidelines
Shared guidelines for dataset creation and validation across African language communities.
Resource & Toolkit List
A curated list of tools, frameworks, and training resources for African language data pipelines.
Capacity-Building Roadmap
An actionable roadmap for scaling capacity-building initiatives and governance structures.
Collaboration Network
A cross-community network linking language teams, researchers, and practitioners.
Post-Indaba Summary Report
A public synthesis of insights and recommendations to inform future collaborations and publications.
Participation Requirements
What you need to know before attending
In-Person Attendance Required
This workshop is exclusively delivered in person at DLI 2026 in Nigeria. Virtual or hybrid participation cannot be supported.
Register via DLI 2026
Workshop participation is through the official Deep Learning Indaba 2026 registration. See the DLI website for registration deadlines and details.
Speaker & Organiser Deadline
All confirmed speakers and organisers must finalise their participation by the ticket allocation deadline communicated by the DLI team.
AV & Logistics
Audio-visual requirements and scheduling will be coordinated directly with the Indaba organising team ahead of the event.
Get Involved
Interested in collaborating, co-organising, or presenting at this workshop? Reach out to the Tonative team... We want to hear from you.