From access to analysis: IWAC as a testbed for DH and AI
Despite its capacity to enrich African history and the study of Muslim societies through alternative modes of knowledge production for both public scholarship and research, the Global South—including Africa—remains under-represented in Digital Humanities (DH).
IWAC pushes beyond preservation and access to open new lines of enquiry on Islam and Muslim life in West Africa. Drawing on IWAC data, the project develops visualisations that show how DH can advance research on Muslim public life in the region. It therefore moves past the common digital history model of serving digitised sources without interpretation (Robertson & Mullen 2021), by analysing the IWAC dataset with computational methods and adopting an exploratory approach to media representations of Islam and Muslims, alongside the intellectual and translocal histories that shape them.
Distant reading
"Distant reading", introduced to literary studies by Franco Moretti, contrasts with the intensive focus on individual texts of "close reading". In DH and the social sciences, it uses computational techniques to analyse large textual datasets, enabling researchers to detect patterns and themes that may be overlooked by traditional qualitative methods. Common approaches include topic modelling, sentiment analysis, and network analysis.
The method has its limitations. Algorithmic modelling can obscure nuances such as context, tone and meaning that can be recovered through close reading. The quality of optical character recognition (OCR) also affects subsequent analyses by introducing transcription errors. For this reason, distant reading is best combined with close reading and domain expertise. IWAC takes this mixed-methods approach and provides interactive dashboards to facilitate exploratory analysis.
Interactive visualisations
Interactivity promotes open scholarship. Filtering, sorting and switching views enable a level of dynamic enquiry that static figures cannot match. IWAC demonstrates this potential by providing visual tools that serve as analytical instruments and vehicles for scholarly communication. Hover tooltips and other dynamic elements surface context without clutter, thereby improving accessibility.
Current applications
- Keyword mapping. Explores thematic evolution in the West African press using the Dublin Core "Subject" and "Spatial Coverage" fields.
- Topic modelling. Surfaces latent themes and their trajectories across time and publications.
- Sentiment analysis. Tracks tonal shifts towards Islam and Muslim actors in the press.
- Spatial & network visualisation. Maps places and relational patterns to support questions about translocal connections and media representation.
The scale problem: 25 million words
IWAC now contains more than 25 million words—over 300 books's worth of text. At this scale, manual curation and keyword tagging are no longer viable. Converting sources into machine-readable text ("datafication") remains essential, but it must be automated to keep pace with growth.
From artisanal curation to an AI-assisted pipeline
IWAC began as an artisanal operation: careful digitisation and meticulous tagging of people, places, events, and topics. That labour produced reliable material but revealed a limit. With only brief support in 2023 (Berlin Senate funding) and otherwise a single-researcher effort, the project shifted to AI-assisted workflows to sustain expansion and to prototype reusable methods.
What AI enables—and where it fails
Python pipelines integrating LLMs now:
- improve OCR, including complex newspaper layouts;
- enhance named entity recognition (NER) for downstream analyses;
- accelerate ingestion and make under-resourced archives more accessible.
These gains come with risks: opaque corrections, hallucinations, and the standardisation of linguistic diversity that can erase historical orthography. More broadly, AI raises ethical, legal, and socio-technical concerns—bias, privacy, opacity—that demand accountable practice.
Operating principles
IWAC treats AI as a set of tools for solving specific problems, emphasising transparency, awareness of bias, and scholarly authority. The aim is pragmatic: to sustain growth, improve discoverability and share methods that others can reuse, all without surrendering critical source analysis. To ensure reproducibility and community scrutiny, all AI-enhanced pipelines (code and prompts) are freely available on GitHub.
Digital minimalism
IWAC recognises the tension between advocating AI and the environmental and ethical costs of computation. Guided by "minimal computing" and "digital sobriety", we prioritise lightweight, efficient, open-source models and right-sized pipelines. In under-resourced contexts—such as libraries without dedicated archivists or archives with large unprocessed holdings—targeted AI that makes invisible materials findable and usable can be the lesser cost. To enable scrutiny and reuse, we document methods and publish code and prompts on GitHub. Used judiciously, LLMs support exploratory analysis and low-friction interfaces while we monitor bias, opacity, privacy, and energy use.