Temporal analysis can reveal important trends and patterns in the corpus over time, how certain themes or topics have gained or lost prominence.
Top 10 most frequent keywords
The graphs presented here show the yearly frequency of the top ten most frequent keywords in each corpus, as derived from the Dublin Core Subject metadata. This visualisation offers insightful observations on the dynamics of keyword prevalence, highlighting three main trends:
- Variability: The frequency of some keywords remains relatively stable across years, while others show marked fluctuations. This variability may reflect changes in the focus of media coverage or shifts in public interest.
- Dominance: Certain keywords appear consistently more often than others, underlining their continued relevance or prominence within the corpus. This dominance suggests that these issues are central to the discourse captured in the dataset.
- Temporal shifts: Notable spikes or dips in the frequency of certain keywords in particular years may signal broader socio-political or cultural changes that influence discourse. These temporal shifts provide valuable clues to understanding the context and evolution of the topics discussed.
It is important to note that the attribution of these keywords was done manually and the selection is not exhaustive. Therefore, the analysis presented here should be seen as an initial overview that deserves further, more detailed examination to fully appreciate the complexities and subtleties of the dataset.
A comparative analysis of keyword frequency trends in Benin and Burkina Faso reveals both convergences and divergences that reflect the media landscape of each country. In both countries, Islamic holidays, in particular Aïd al-Adha, Ramadan and Aïd el-Fitr, emerge consistently in the media discourse, suggesting a significant coverage of these observances. The observed variability in keyword frequencies in both datasets may signal shifts in societal interests, media narratives, or the impact of external events in the region.
However, the data also highlight different aspects within each country's media focus. Burkina Faso's corpus specifically highlights the figure of Oumarou Kanazoé and themes of terrorism, with a marked escalation in the frequency of keywords "Terrorism and radicalization" in recent years. This increase suggests that such events have had a significant impact on public discourse. Interestingly for Benin, despite growing concerns about the infiltration of jihadist movements in the Gulf of Guinea region, this issue is not prevalent in the Beninese data.
Furthermore, the presence of terms such as Association des Élèves et Étudiants Musulmans au Burkina and Islamic faith-based education in the Burkina Faso graph underlines a strong interest or influence in Islamic education and Islamic organisations. This aspect is less pronounced in Benin's discourse, suggesting possible differences in the social and educational structures relating to Islamic communities in the two countries.
Such differences and patterns are not only indicative of different national narratives, but also of the different degrees of emphasis given to certain social, political and religious issues in the media of each country.
Python code
To ensure the transparency and reproducibility of our analysis, the following Python code snippet demonstrates how we interacted with the IWAC API to retrieve the necessary data and perform the keyword frequency analysis.
import requests | |
import pandas as pd | |
import plotly.graph_objs as go | |
from tqdm.auto import tqdm | |
from collections import Counter | |
from concurrent.futures import ThreadPoolExecutor, as_completed | |
# Function to fetch data from a single item set | |
def fetch_data(api_url, item_set_id): | |
page = 1 | |
items = [] | |
while True: | |
response = requests.get(f"{api_url}/items", params={"item_set_id": item_set_id, "page": page}) | |
data = response.json() | |
if data: | |
items.extend(data) | |
page += 1 | |
else: | |
break | |
return items | |
# Function to fetch and process data for all item sets in a country | |
def fetch_and_process_data(api_url, item_sets): | |
all_items = [] | |
# Use ThreadPoolExecutor to parallelize requests | |
with ThreadPoolExecutor(max_workers=5) as executor: | |
future_to_id = {executor.submit(fetch_data, api_url, id): id for id in item_sets} | |
for future in as_completed(future_to_id): | |
all_items.extend(future.result()) | |
# Process items to extract subjects and date | |
processed_data = [] | |
for item in all_items: | |
subjects = [sub['display_title'] for sub in item.get('dcterms:subject', []) if sub.get('display_title')] | |
date = item.get('dcterms:date', [{}])[0].get('@value') | |
for subject in subjects: | |
processed_data.append({ | |
'Subject': subject, | |
'Date': pd.to_datetime(date, errors='coerce') | |
}) | |
return pd.DataFrame(processed_data) | |
# Function to create an interactive keyword graph for each country | |
def create_interactive_keyword_graph(df, country, output_filename): | |
top_keywords = Counter(df['Subject']).most_common(10) | |
top_keywords = [keyword for keyword, count in top_keywords] | |
df_top_keywords = df[df['Subject'].isin(top_keywords)] | |
df_grouped = df_top_keywords.groupby([df_top_keywords['Date'].dt.year, 'Subject']).size().reset_index( | |
name='Frequency') | |
fig = go.Figure() | |
for keyword in top_keywords: | |
df_keyword = df_grouped[df_grouped['Subject'] == keyword] | |
fig.add_trace(go.Scatter( | |
x=df_keyword['Date'], | |
y=df_keyword['Frequency'], | |
mode='lines+markers', | |
name=keyword | |
)) | |
fig.update_layout( | |
title=f"Annual Frequency of Top 10 Keywords in {country}", | |
xaxis=dict(title="Year", rangeslider=dict(visible=True), type="date"), | |
yaxis=dict(title="Frequency") | |
) | |
fig.write_html(f"{output_filename}_{country}.html", full_html=True, include_plotlyjs='cdn') | |
# Example usage | |
api_url = "https://iwac.frederickmadore.com/api" | |
country_item_sets = { | |
"Bénin": ["2187", "2188", "2189"], | |
"Burkina Faso": ["2200", "2215", "2214", "2207", "2201"] | |
} | |
# Process and create graphs for each country | |
for country, item_sets in country_item_sets.items(): | |
df = fetch_and_process_data(api_url, item_sets) | |
create_interactive_keyword_graph(df, country, "top_keywords_graph") | |
tqdm.write(f"Interactive graph has been created for {country}.") | |
Multiple keyword comparison
These graphs show the annual frequencies of selected keywords (topics, Islamic associations and Muslim leaders), allowing for comparative analysis.
Python code
import requests | |
import pandas as pd | |
import plotly.graph_objs as go | |
from tqdm.auto import tqdm | |
from concurrent.futures import ThreadPoolExecutor, as_completed | |
def fetch_data(api_url, item_set_id): | |
page = 1 | |
items = [] | |
while True: | |
response = requests.get(f"{api_url}/items", params={"item_set_id": item_set_id, "page": page}) | |
data = response.json() | |
if data: | |
items.extend(data) | |
page += 1 | |
else: | |
break | |
return items | |
def fetch_title_for_id(api_url, keyword_id): | |
response = requests.get(f"{api_url}/items/{keyword_id}") | |
data = response.json() | |
return data.get('dcterms:title', [{}])[0].get('@value', 'Unknown Title') | |
def fetch_and_process_data(api_url, item_sets, selected_keyword_ids): | |
all_items = [] | |
with ThreadPoolExecutor(max_workers=5) as executor: | |
future_to_id = {executor.submit(fetch_data, api_url, id): id for id in item_sets} | |
for future in tqdm(as_completed(future_to_id), total=len(item_sets), desc="Fetching item sets"): | |
all_items.extend(future.result()) | |
processed_data = [] | |
selected_keyword_ids_set = set(map(int, selected_keyword_ids)) # Convert to set of integers for faster lookup | |
for item in tqdm(all_items, desc="Processing items"): | |
subjects = item.get('dcterms:subject', []) | |
date = item.get('dcterms:date', [{}])[0].get('@value') | |
for subject in subjects: | |
if subject.get('value_resource_id') in selected_keyword_ids_set: | |
processed_data.append({ | |
'Subject': subject['display_title'], | |
'Date': pd.to_datetime(date, errors='coerce'), | |
'ID': subject['value_resource_id'] | |
}) | |
return pd.DataFrame(processed_data) | |
def create_interactive_keyword_graph(api_url, df, selected_keyword_ids, output_filename): | |
if df.empty: | |
print("No data available for the selected keyword IDs.") | |
return | |
# Fetch titles for the keywords | |
keyword_titles = {str(id): fetch_title_for_id(api_url, id) for id in tqdm(selected_keyword_ids, desc="Fetching titles")} | |
df_grouped = df.groupby([df['Date'].dt.year, 'Subject', 'ID']).size().reset_index(name='Frequency') | |
fig = go.Figure() | |
for keyword_id in selected_keyword_ids: | |
if keyword_id in df['ID'].astype(str).unique(): | |
subject_title = keyword_titles[keyword_id] | |
df_keyword = df_grouped[df_grouped['ID'] == int(keyword_id)] | |
fig.add_trace(go.Scatter( | |
x=df_keyword['Date'], | |
y=df_keyword['Frequency'], | |
mode='lines+markers', | |
name=subject_title # Use the fetched title as the trace name | |
)) | |
else: | |
print(f"No data found for ID {keyword_id}. Skipping this ID.") | |
fig.update_layout( | |
title="Annual Frequency of Selected Muslim Leaders in Burkina Faso", | |
xaxis=dict(title="Year", rangeslider=dict(visible=True), type="date"), | |
yaxis=dict(title="Frequency"), | |
legend_title="Keyword Title" | |
) | |
fig.write_html(f"{output_filename}.html", full_html=True, include_plotlyjs='cdn') | |
print(f"Interactive graph has been created. File saved as '{output_filename}.html'") | |
# Example usage | |
api_url = "https://iwac.frederickmadore.com/api" | |
all_item_sets = ["2200", "2215", "2214", "2207", "2201"] | |
selected_keyword_ids = ["898", "861", "944", "960", "947", "855", "1102", "1053", "912"] | |
df = fetch_and_process_data(api_url, all_item_sets, selected_keyword_ids) | |
create_interactive_keyword_graph(api_url, df, selected_keyword_ids, "selected_keywords_graph") |