Dashboard of transmedia indicators to analyse French televisual archives.
role
Data visualisation, data analysis, full stack development
conference
Ninon Lizé Masclef. (2019). De Charlie Au Bataclan Retour Sur La Médiatisation Des Attentats de 2015. INA La Revue Des Médias.
keywords
datavisualization, big data, cloud computing, research, media history
The French National Audiovisual Institute (INA) is the largest repository of audiovisual archives in France. It gathers heterogeneous data from TV, radio, press and web archives. It stores 18 million hours of recordings and documentation of programs since 1950. Its mission is to preserve and promote France's audiovisual heritage. When started working at the archive center, it was undergoing a massive transformation of its data structure. In fact, it was transforming its compartmentalized data silos into a unified data lake. This move was aimed at putting metadata at the heart of the data architecture.






Every month, media historians and documentalists publish thematic analyses of the media in a newspaper called InaStat. So far, they have mostly conducted studies on TV news, because it is a program known to be well described in the database. Other programs were rarely studied. My goal was twofold: to take advantage of the new data lake architecture, which offers new possibilities for archival analysis, to study new metadata that were still little used, and to develop new indicators to analyze the structure and content of television programs since 1950. This involved the collaboration of two teams, historians and engineers, who don't usually communicate with each other, to create an interdisciplinary tool that would benefit both.

InaStat newspaper


To expand the scope of media analysis, the first step was to define the scope of usable metadata and quantify its relevance. I then created a datamart with BigQuery and a visual interface. I designed and developed both the front-end and back-end of the REST API "Baromètre+" to automate and visualize reports in an interactive dashboard. I acquired specialized skills in data visualization by comparing several tools: D3.js, Gephi, sigma.js, Google Data Studio and Tableau. My favorite datavizualization library is D3.js, which I used extensively to work with multidimensional data and create interactive custom datavizualizations.
One datavisualization allowed me to analyze the structure of TV grids across channels. It shows program genre (TV news, series, talk show, etc.), audience share, and production type (self-produced content or purchased). It reveals the strategies of channels and media groups (e.g. Lagardère and France Televisions). Take the example of France 2, which prefers to produce content rather than buy it, unlike TF1, which produces very few original programs.


Weekly TV program grid for the two first national channels (TF1 and France 2) from September 2012 to June 2013. Genre, type of production and average audience rate of programs by broadcast time slot


Among the various indicators I have developed with media historians is a temporal word cloud. It illustrates the evolution of the vocabulary used in a television program. In this data visualization, the time axis is visible for each year of broadcast of a given program. Each word is linked to its year of use on the axis. Force-directed algorithms (Fruchterman Rheingold, ForceAtlas2) were applied to the graph, collecting in the center the transversal words used throughout the program and sending less frequent words to the sides. Since "Joséphine, ange gardien" is a TV series that has been broadcast over a long period of time, its analysis shows the changes in the language over time. For example, a word like "secte" (sect) appeared in the language at one point in time and quickly disappeared. Others, such as "homosexualité" (homosexuality), entered the public vocabulary more slowly. Surprisingly, the vocabulary of a soap opera turns out to be an indicator of French social issues.


Temporal word cloud for the TV series « Joséphine, ange gardien » from 1997 to 2015


From archive to data : for a documentary expertise of metadata

The Baromètre+ application is not designed to automate human judgment and interpretation. It is designed to create a new hybrid expertise between computer science and documentation. This interaction between the humanities and data science will undoubtedly play an increasingly important role in historiography and archiving, reconfigured by technology. This tool offers the possibility to process a larger set of data and to use other types of information to analyze media history. Thus, it allows studies on a larger time scale. My study has shown that metadata (data about the data) are as important for a media study as the data itself.







The data architect Gautier Poupeau and I were invited to meet with the Business Application Platform Lead of Google Cloud to discuss prospects of cloud computing and our experience of BigQuery.






Finally, my work led to the publication of a study on the media coverage of the Charlie Hebdo terrorist attack. The aim was to compare the media coverage of this terrorist attack (January 2015) with that of the Bataclan attack (November 2015). This survey was commissioned by the French Research Center for the Study and Observation of Living Conditions (CREDOC). What was particularly interesting about this study was finding a way to identify programs that talked about a specific political event. Since there was no such concept as "Charlie Hebdo terrorist attack" in the metadata, I had to collect a semantic field evolving around this historical event: names of public figures and places, specific descriptive keywords. I also highlighted the strategy adopted by the TV channels to cover such an event and how the programming grid was restructured. Some channels invited religious experts, political figures or relatives of the victims, while others preferred the forces of order. The article is available online, on La Revue des médias.


Hourly airtime devoted to attacks and terrorism in France in 2015.