Posted on: 24 October 2018
By Edward Arnold, Assistant Professor in French and European Studies, Trinity College Dublin
A few months ago, in Rome, were held the 14th International Textual Analysis Days (JADT). This conference offers a fairly representative sample of computer-based text analysis whose stakes have steadily increased with the explosion of the Internet. Among these applications, there is one which arouses a sustained interest: the identification of the pen having written an anonymous text or of dubious origin. Indeed, at the mere reading of a text, this identification is impossible, even for a cultivated reader who is well acquainted with the works of the real author. This is why literary history includes a very large number of unallocated works or whose attribution remains doubtful.
Elena Ferrante is the latest of these puzzles. Works published under this pseudonym are a worldwide success, especially his prodigious Friend (four volumes published in French at Gallimard). But the author refuses to drop the mask.
During the JADT of Rome, two researches were presented, in order to establish its identity. Researchers A. Tuzzi, M. Cortelazzo and J. Savoy , on the other hand, used a large panel of 150 novels signed by about 40 contemporary authors, including a number from Naples and his region. – as would the writer – or the writer – who is hiding behind E. Ferrante. In total, this “corpus” has more than 10 million words. Beyond the possible identity of Ferrante, it is therefore a panorama of contemporary Italian literary creation that emerges from these analyzes.
The first interest of these two communications: they use various methods of attribution of author which all lead to identify the same feather for the works published under the names of Elena Ferrante and Domenico Starnone. These conclusions confirm the investigation of Claudio Gattiwho – according to the classic rule: “where does the money go? – identified Anita Raja as the copyright recipient of the publisher of E. Ferrante.
Anita Raja is none other than the wife of Domenico Starnone. She would have served as a “power of attorney” to her husband. Unless she also contributed to these novels. The pseudonym chosen is reminiscent of the name of Elsa Morante (1912-1985) who happens to have taken part in some works of her husband Alberto Moravia (1907-1990) … As we see, the finding of a even feather does not solve everything!
A. Tuzzi and M. Cortelazzo also note another interesting fact: the last works published under the name of Starnone come closer – by their style – to those of Ferrante, as if the first had been contaminated by the second. The same adventure had happened to Romain Gary who, while working on the Ajar, could not help making “ajarism” in his official works.
The parallel with Gary does not stop there since Domenico Starnone received the Italian equivalent of the price Goncourt (price Stega) in 2001 (for Via Gemito ) and that E. Ferrante very nearly received it in 2015 (for 4 th volume of the prodigious Amie ).
Naturally, the analysis goes beyond the simple identification of this feather of the shadow. A. Tuzzi, M. Cortelazzo and J. Savoy are aware that the identification of the author leads to a second question: “What can you teach us about this author and his work? For now, these researchers have identified a number of words or expressions common to Starnone and Ferrante that are not found (or significantly less) in other contemporary authors. Thus would be laid the first steps of what should one day be a “stylistic computer assisted”.
The second interest of these two researches is to conclude that the most effective author attribution method is that of “intertextual distance”, a method developed as a text classification tool . This method consists of superimposing the texts by couple and counting the differences between their respective vocabularies. Below a certain threshold, the author is the same. If the principle seems simple, the application requires a computing capacity out of reach of the human brain. For example, the calculation made from 150 novels used to identify Ferrante represents 11,175 comparisons (comparable to parallel readings), which the computer does in a few seconds but that no reader could realize, even by dedicating a Entire life.
This method already has an important track record to his credit, and not only in the literary field. Two recent examples illustrate its solidity and the true scope of this research.
In 2015, it allowed Cyril Labbé – professor-researcher at the University of Grenoble – to discover that the catalog of two of the world’s largest scientific publishers, contained more than 120 fake scientific articles . The software that detected these “fake papers” uses the intertextual distance. The world’s leading scientific publishers systematically submit the articles to them in the field of computer science and electronics.
In 2017, Jennifer Byrne (Australian biologist) was named one of the ten scientists of the year by Nature for having detected about fifty scientific frauds in cancer research. She made these discoveries using the intertextual distance suitable for DNA sequencing . Again, this software – made by the same researcher from the University of Grenoble – is used by hundreds of geneticists from around the world.
Beyond its anecdotal aspects, the Ferrante affair may help SSH researchers to understand – after scientific publishers, statisticians, computer scientists or biologists – all that applied mathematics can bring them. computer science applied to the study of texts.
By Edward Arnold, Assistant Professor in French and European Studies, Trinity College Dublin.
This article was originally published in French in The Conversation. Read the original article here.