The future of the Collection beyond 2023: a showcase of the possibilities of digital humanities

Titre

liste des auteurs

Frédérick Madore

Vincent Favier

Résumé

fr Seconde intervention de Frédérick Madore lors du lancement officiel de la Collection Islam Afrique de l'Ouest

en Second presentation by Frédérick Madore at the official launch of the Islam West Africa Collection

Date

9 novembre 2023

Est une partie de

Lancement de la Collection Islam Afrique de l'Ouest

Importance matérielle

17 minutes, 28 secondes

Langue

Anglais

Type

Communication scientifique

Identifiant

iwac-reference-0000847

contenu

So what is the future of the Collection after 2023? Far from being the end of the project, this is just the beginning, as the database will continue to grow. Togo and Côte d'Ivoire will be at the center of this expansion. Thanks to Samba Koné, president of the Ivorian National Press Agency, we have already approached six newspapers, including Fraternité Matin, which Issouf mentioned, to ask for permission to add a total of 5,000 press articles that I have already digitized. I'm also in talks with Togo Presse, Togo's state-run newspaper, to do the same with about 1,500 press clippings.

I also see this project as a collaborative one, so I would like to work with other scholars in the region, such as Issouf, who have expressed a keen interest in uploading some of their own relevant primary sources, provided that copyright, data protection, and privacy issues are respected.

I have already mentioned that most of the documents in the database are in French. So the corpus therefore offers more from the perspective of Francophone, Western-educated Muslims than from the perspective of Arabisants, Arabic speakers, and non-French speakers. So to kind of mitigate this Francophone bias, I would like to add oral testimonies in national languages, in audio format from local scholars, imams, and prominent Muslim figures in major Islamic centers in these countries about the broader history in the development of Islam in these areas. The transcription, both in French or English and in national languages would help to bridge what Ousmane Kane has described as the Europhone and non-Europhone knowledge of Islam. I'm also thinking beyond West Africa. At ZMO, we have several scholars who are working in East Africa on the Swahili coast, such as Kai Kresse, who will say a few words in a few minutes.

In addition to the expansion of the database, Digital Humanities, or DH for short, will be at the heart of the next phase of the project. This project is not only about digitizing and cataloging a collection of primary sources, but will also explore computational methods to engage with printed sources to develop new arguments about Islam and Muslims in the region. I would now like to outline some of the possibilities of DH analysis that can be done with the data from the Collection. But before that, in many DH projects, the spotlight tends to shine on the kind of eye-catching result that you can see online. But what is often overshadowed is the work of the research assistants who spend countless hours curating and preparing these data sets. So what I'm about to show you is partly based on this work. So I would like to thank again Vincent and Aleksei for their work, and I would also like to invite Vincent to say just a few words about his experience and work over the last 10 months, and then I will show you some example of DH analysis on the corpus of newspapers.

Thank you very much Frédérick. I won't be long because my task since 10-11 months was quite repetitive, and focus largely on identifying and entering keywords into the database so that to have, I mean, all the tools we're developing, Frédérick is developing now, relied on the richness of these keywords. And of course, the objective was to cover as many topics as possible, to identify the names, the places, the geographical places, the events, the associations that pop up in the materials that we processed. So, I mean, especially regarding the sermons that we encoded into the database and the newspaper articles. And regarding the references also, it was important to, based on the abstract that I found, also to process all the keywords and to update them and to have a very, not exhaustive list of course because as academic works are still going, are still published and this is also something that needs to be taken into consideration for the future to find a way to update regularly this database. So of course, I have also my own biases regarding keywords, what is for me a bit more relevant than others, but yes I think this needs refining as well and that's why a collaborative dimension of the project should be also an important element to take into consideration. So, yes, that's it for the work in the shadows and what we did. Thank you.

Thank you, Vincent. And thank you, Aleksei as well, for the work you have put in the database. So let me share again my screen. As I said, countless hours were spent doing the OCR and reviewing the result. So at the moment, the newspaper articles from Burkina Faso and Benin have a total of 2,843,699 words, which is about 7,500 pages in Time New Romans, single spaced. This staggering figure does not include Islamic publication. This data set is thus ideal for distant reading. Distant reading is a methodological approach in the social sciences that use computational techniques to analyze large textual data sets. It originated in literary studies and contrasts with close reading, which involves in-depth analysis of individual texts. Distant reading allows researchers to identify patterns, themes, and phenomena that may not be observable using traditional qualitative methods. It's going to be a bumpy road. I don't have enough time to explain everything in detail, but my aim is to pique your curiosity and give you a very brief overview of what we can do. And if there are particular visualizations you're interested in, we can come back to them during the Q&A session.

And just by the way, also, I haven't mentioned it, but the entire data set of the Collection can be downloaded in a variety of formats for reuse. And of particular interest, thanks to the help of our colleague Alisher, all the metadata of the database, everything, is available in the ZMO Institutional Repository. So you can download massive CSV files with everything, and it's this kind of file that is the backbone for the computational analysis.

If we go back to the website, so there's a page, digital humanities. I won't present my methodology in detail, but on the website, you have everything. You can even have access to the Python code, which kind of, you can see how the visualization were created. Let's start with temporal analysis here. Temporal analysis can reveal important trends and patterns in the corpus over time. How certain themes or topics have gained or lost prominence in Burkinabè and Beninese newspapers when discussing about Islam and Muslims. The following graphs illustrate the yearly frequency of the top 10 most frequent keywords in each corpus. Here the first one you have with Benin. It's a bit crowded here, maybe fewer keywords would have been better, but for example an interesting trend for Benin is the Ahmadiyya in red, you can see here around the 2010s, and the Ahmadiyya community is not that large in Benin, but they have a very effective communication strategy. And if we look, for example, for Burkina Faso, then obviously since around, let's say 2015, then you have this huge spike here about terrorism and radicalization, which makes sense with the problems that the country has faced in recent years.

Also interesting thing you can do is to compare the annual frequencies of selected keywords. So topics, Islamic association, or Muslim leaders. Here, for example, with Benin, you can see this huge spike here, which is the keyword cooperation with Arab states. In the mid 70s and 80s, when Benin developed strong relation with Gaddafi's Libya. I also did it for Islamic associations in Burkina and main prominent Muslim leaders. And this is a good indication of the influence and favorable treatment that certain Muslim leaders or association enjoy in the media. So for example, if I go down here, sorry, it's a bit fast. So you have the annual frequency of selected Islamic association. You can really observe what are the associations that feature more prominently in the media. And you can also do this for Muslim leaders, for example. So you can see who are the main imams or preachers who feature in the newspapers.

Sorry, I have to go a bit quick just to give you an overview. So there's also what is called topic modeling. Topic modeling is a type of statistical method used to discover the latent topics that occur in a large collection of documents. An unsupervised algorithm, meaning that specific topics are not predetermined, processes the data to identify clusters of words according to their co-occurrences within the documents. So it can provide a way of understanding the thematic underpinnings of the corpus. Here we have five topics per country. For Benin, for example, and again, you can see the first topic that the code discover is clearly about the Ahmadiyya, which is definitely something that needs to be studied in more depth, why the Ahmadiyya features so prominently in Beninese newspapers. If we go down, you have topic two clearly, which is about Islamic holidays. We have Ramadan, prayer, holiday, fasting. Here you have a third one, which is clearly I mentioned the cooperation with Arab states. So clearly this one, we have Libya, cooperation, minister, project, development. We can see the pattern here. And then if we move to Burkina, I'm going a bit faster here. Burkina, here, obviously, we have a topic that is clearly related to jihadism with the keywords security, terrorism, attack, military. And if we take a look at the last one, for example, you see this one is clearly about the pilgrimage, which is an important topic that come quite often in the corpus about the organization of the pilgrimage to Mecca.

The third example I want to show you is what is called sentiment analysis, and I really like this one. Let's go here. Sentiment analysis can provide insights into the tone and emotional context of the corpus. This can be useful for examining how different issues are portrayed in newspapers and countries over time. I will skip the first two ones and to show you this one here. This is not coming from an electrocardiogram test of me under a high level of stress, but rather it measures the general positivity or negativity of articles published over the years. Values range from minus one to one, with minus one indicating extremely negative sentiment, one being extremely positive sentiment, and zero indicates neutral sentiment. We can see some spike here around, this is for Benin, maybe 1974, here 1983. So obviously this is not self-explanatory, but it triggers some question. Why is it so negative in 1982? So then you can do some close reading specifically on that period to try to understand what was happening in the newspaper at that time. Same for Burkina Faso. Sometimes it's super positive, sometimes super negative. You won't get any answers from those graphs, but then it can help you to orient you towards specific years or period in the corpus.

You can also do topic-related sentiment. So sentiment towards certain issues, Islamic associations, or Muslim leaders may become more positive or negative in response to specific events or wider social changes. So here are some examples with the hajj, the pilgrimage to Mecca, and you can see, interestingly, it's quite often very negative when it's about the hajj. You see those spikes, it's nearly minus one, which is the most negative, but it makes sense because in both countries, the organization of the pilgrimage to Mecca has often been a total disaster with poor conditions for pilgrims, corruption and embezzlement. So it makes sense that the tone of these articles were rather quite negative.

I also did it for the word Imam. And also, interestingly, you can see those very negative spikes in several periods, same for Burkina Faso. And why, again, you might ask, in both countries, the imamate has consistently proved to be the main source of discord among the Muslim community. And in some cases, the disputes have turned violent and the mosques involved have been closed by the authorities. So this might explain also why the tone was very negative or mostly negative over the years.

So if we go down, obviously with terrorism, it's negative. And I will skip a few. You see laïcité, secularism. Again, what happened here, then you can use those methods to really highlight some specific years and then go in the corpus to do some close reading. And I want to finish this one with a specific association. And you see the Mouvement Sunnite. So the Mouvement Sunnite is the main Salafi association in Burkina Faso. And in the 1990s and early 2000s, serious internal disagreements led to a shooting in a mosque, which left several injured and even one dead, if I remember correctly, and led to the closure of the mosque. And as you can see, in fact, in the 1990s and early 2000, it was very, very negative. So then again, it makes sense the result that we got from the sentiment analysis.

And there's a last one also. I created a heat map for the newspaper articles from Burkina Faso. This heat map shows the concentration of mention of articles in different locations. This visualization uses a color gradient to indicate the relative frequency of mentions with areas of heightened focus appearing at hot spots through more intense coloring. You can zoom in and zoom out. Obviously cities in Burkina Faso here are featuring more prominently in the newspaper articles, but you can also see that Saudi Arabia here and also Palestine is quite mentioned frequently in the corpus of newspaper articles.

There's even more, but I think I will stop there. You can go to the website to see yourself. I know it's a lot to digest and we could have spent several minutes on each visualization, but I hope that this has given you an idea of the wide range of computational analysis that we can perform on the data from the Islam West Africa Collection.