The following textual analysis was carried out on a corpus of 969 news clippings relating to Islam and Muslims from the Beninese national newspaper La Nation (formerly Daho-Express and Ehuzu), published between 1970 and 2022. It contains a total of 300,813 words.
Preprocessing text data
The corpus has been "lemmatised" using Stanza, which is the process of grouping together different inflected forms of the same word. For example, "musulman", "musulmane", "musulmans" are all reduced to the lemma "musulman". By reducing words to their base forms, lemmatisation ensures that all the different inflections and derivations of a word are treated as a single entity. This improves the consistency of the text corpus, allowing for more accurate semantic analysis and more reliable frequency analysis and pattern recognition.
Python's Natural Language Toolkit (nlt) library, which contains a list of French stop words, was used to remove stop words and punctuation. Stop words are words that have little semantic weight (which may be important for a specific study, of course), but which add noise to a basic statistical analysis.
Stop words such as "et", "le" and "du" often occur frequently but carry little semantic weight. By removing stop words, the analysis can focus on words that contribute more to the overall meaning, improving the interpretability of the results. The presence of stop words can distort the meaning of terms. Stop words can introduce noise into algorithms that rely on word distribution or co-occurrence networks, leading to misleading results or interpretations.
Both lemmatisation and stop word removal contribute to the creation of a clean, efficient and semantically rich corpus. This paves the way for more robust and interpretable results, enabling researchers to draw nuanced conclusions from textual data.
The web-based text reading and analysis environment, Voyant Tools, was used to produce a variety of visualisations, including top-frequency words, the distribution of a word's occurrence across the corpus, keywords and terms that occur close together, geospatial aspects of the texts, and links between people, places and organisations that occur together.
Keywords in context
The "Keywords in Context" tool shows each occurrence of a keyword with a bit of surrounding text (the context). It can be useful for studying more closely how terms are used in different contexts.
Corpus Terms is a table view of term frequencies in the entire corpus.
Corpus Collocates is a table view of which terms appear more frequently in proximity to keywords across the entire corpus.
Trends shows a line graph depicting the distribution of a word's occurrence across a corpus or document.