Voyant Tools is a free web-based reading and text analysis environment for large digital collections. Stefan Sinclair and Geoffrey Rockwell of the University of Alberta developed the project. Support for Voyant came from the Canadian Institute for Research Computing in the Arts, the Social Science and Humanities Research Council, and the Canada Foundation for Innovation; therefore, it was specifically designed for the digital humanities. Humanists, such as historians, linguists, and literary scholars can easily perform lexical analysis on great collections of work without opening a single volume. Statistics such as relative word frequency and distribution inform scholars of lexical patterns evident in large collections that simply cannot be achieved by human scholar. After a brief tutorial, any humanist can easily access Voyant, upload a collection, and use textual patterns to recognize which documents best answer a historical question.
Voyant is extremely easy to get started; however, it works best in highly specific conditions. Like most text mining tools, Voyant requires an extremely large collection of data. It is possible to input a single URL or few PDFs, but the results would not reveal many patterns, nor draw any large lexical comparisons. That being said, Voyant is extremely easy to operate. Voyant reads any combination of plain text, HTML, XML, PDF, RTF, and Microsoft Word (However, if you would like articles in a specific order, it is best to upload them in that order). Voyant also holds two internal collections, the Humanist Listserv Archives and Shakespeare’s Plays. These large collections show Voyant’s greatest strengths: the ability to analyze vast amounts of textual data within and between individual texts.
I recommend selecting Shakespeare’s Works when first encountering Voyant. Far more powerful than a simple keyword search, its corpus of Shakespeare’s Works allows the beginner to select words within certain documents and/or the entire corpus and visualize its frequency and relative use to other terms. However, the default interface is quite complex and features several analytic tools; it is best to test the multitude of analytic devices and conceptualize what historic or linguistic questions are feasible given the analytic models.
In actuality, Voyant operates eight different tools in harmony. The left side of the screen features Cirrus, Summary, and “Words in the Entire Corpus.” Cirrus creates a visual word clouds based on one document or the entire corpus, while Summary and “Words in the Entire Corpus” provide the total number of words, their corresponding frequencies, and the work with the densest vocabulary. Initially, Voyant will feature stop words, such as “the,” “and,” and “it.” To edit these over-represented words out of Voyant’s analysis, just hit the options button on the toolbar and select “English.” If you want stop words edited throughout, select “Apply Stop Words Globally.”
The middle panel features the Corpus and Corpus Reader. This is the most intuitive part of Voyant, as it resembles a Zotero database. One can either select a single document or scroll through the entire corpus. When a specific work is selected, it is highlighted on the corpus reader’s scroll bar. When a single word is searched, every work containing that term is also highlighted on the scroll bar. Astonishingly, one can pass their cursor over any word in the Corpus Reader and note its frequency.
Yet, one cannot gather magnitudes of frequency or measure relative word frequencies without the three panels to the right of the screen. Here, an experienced historian could utilize the Word Trends, Keywords in Context, and Words in Documents to better understand the changing frequencies of words and their use in the proper context. One can select two or more terms and note their relative frequencies in a single work or the entire corpus.
Once one has uploaded their specific collection and highlighted word trends and frequencies, there are a number of methods to export a given analysis. The easiest way is exporting a temporary URL, which is extremely useful for bookmarking or sharing with a colleague. (There is a danger that the corpus will no longer be available, but, if commonly accessed within a two week window, should not be lost.) One can also export a code snippet and paste it into your own content. This way, you can bring a live Voyant tool into your own work, and the user can manipulate the data accordingly. Finally, one can export Voyant tools in raw data, such as XML or static images and finish the work with a more suitable application. All of these applications have the potential to make Voyant findings collaborative and interactive.
Ultimately, where Voyant excels in application and adaptability, it is sorely lacking in customization. Voyant’s display is extremely rigid and unforgiving. One must adjust settings on multiple tools in order for the whole interface to reveal the same results. It is also extremely difficult to negate certain collections and return to the initial page. One should come to Voyant knowing exactly what historical question to ask.
Furthermore, Voyant offers a number of supplemental tools that are not accessible in the default interface. Voyant Tools such as Bubblelines, Bubbles, ScatterPlot and Links (a force direction graph) offer unique visualizations or display correspondence analysis to the user. Correspondence analysis is especially important to the historian, since it represents how words cluster around a given set of documents. In order to obtain these advanced tools, one must export the corpus URL and then append the desired tool’s short code. This step is not necessarily difficult, but it is substantially restricts complex data analysis from the first-time user. Upon entering the default page, one should be able to view all possible functions and activate or deactivate them depending on the type of inquiry; instead, one must access the “skin builder” page through the page’s export icon. Voyant could also present the user with a number of pre-developed packages and a description of the type of project tools could complement. This way, Voyant could enhance both the user’s experience and receive helpful feedback as to what application works best for a certain discipline.
Nevertheless, Voyant is a relatively easy tool that has contributed greatly to digital history projects large and small. One project, Datamining with Criminal Intent, uses Voyant to text mine the extensive Old Bailey Criminal Court Records. Stretching from 1674-1913, the Old Bailey records contain more than 197,000 documents. By exporting up to one hundred of these digital files at a time, Criminal Intent has the ability to quickly determine trends in prisoner age, occupation, or gender. One can also research the range of offences and punishments through time. On a smaller scale, Lincoln Logarithms: Finding Memory in Sermons compares several sermons that were given after Lincoln’s assassination on April 15, 1865. In this particular blog, the northernmost and southernmost sermons were selected to analyze and compare the frequencies of “slavery” and “peace.” The author exported the Word Trends and the Keywords in Context tools not only to highlight the difference in word frequencies, but to reveal the great difference in connotation.
Ultimately, Voyant offers easy-to-use tools which greatly aid in large and small digital collection lexical analysis; however, one must enter the application aware of what historical trends to interrogate. Voyant is not the most intuitive or the most forgiving set of analytical tools. It often requires patience, a number of applications, and a large digital text collection. However, Voyant lives up to the creativity of the user. Digital historians with little knowledge of coding and topic modeling can easily determine shifts in tokens, types, and word frequencies. Historians can determine lexical changes in one author’s corpus, broad changes in time, and account for geographical difference. Though Voyant requires some time to get used to, it is very much worth the effort.