Number rules the universe.


The words of the man credited with discovering that the square of every right triangle’s hypotenuse is equivalent to the sum of the squares of its legs surely has some credit to his name in the world of mathematics—but might his words carry weight in the historical arena, as well? Pythagoras was, of course, more of a philosopher and thinker in the general sense than an all-out math geek; thus, I’d argue his claim is more than a mere love declaration for arithmetic. Numbers are everywhere we look, including in the humanities: it’s merely a matter of whether we choose to look at them—and, if we choose to look at them, how we choose to look at them.

Pythagoras, ancient Greek philosopher and mathematician.

Just as any set of primary sources is rich in physical description, emotional characterization, cultural depiction, and societal personification on a holistic level, any set of primary sources is also rich in numbers on an fundamental level: word frequencies, correlations, colocations, patterns, and trends are built into any piece of text we wish to analyze by default. While close reading is often the historian’s most valuable tool, the rapid increase in computing power, open-source collaboration, and rise of natural language processing research over the last decade has changed the game, adding another, juxtaposing implement to the historian’s toolbox: “far reading.”

“Far reading,” referring to the strategy of analysis in which meta-trends are extracted from a corpus of text, often enables one to form new hypotheses and conclusions due to the simple fact that computers can read infinitely faster than we can. While they might not be great at interpreting meaning, computers are excellent at finding numerical relationships between various words in the text—something that would take you and I years upon years to do manually. With this new perspective, we’re able to ask questions and seek answers in ways we previously couldn’t have, making “far reading” an area ripe for research.

Armed with the University of North Carolina’s Documenting the American South: North American Slave Narratives collection, I set out to answer a question that would’ve been difficult to investigate without the tools of “far reading”: how did the role of religion change in the lives of enslaved Americans over time? Of course, I could’ve taken my time reading through all of the 295 narratives, synthesizing a thesis and finding supporting evidence along the way, but I wasn’t planning on going for a PhD in history quite yet. Instead, I took another approach, asking: what patterns might arise when I zoom out and view the trends of religious word frequencies and use contexts over the course of the 19th century? What conclusions regarding the role and characterization of religion might I be able to draw from such trends?

Before dumping all 295 narratives (available for download here) into Voyant, a free, online text analysis tool developed for the digital humanities, I ran UMass Amherst’s MALLET (Machine Learning for Language Toolkit) Topic Modeling Tool on the corpus of text to categorize the sources and pick out the top 25 texts in which discussion of religion was most prevalent. While narrowing the scope of my investigation in this way might’ve slightly skewed my eventual findings in comparison to the insights one might uncover in the entire body of text, we live in a world of finite computing power, and there’s no way Voyant would’ve been able to handle the entire archive’s aggregate corpus of text. Thus, I figured it would make the most sense to investigate narratives that dealt the most explicitly with religion—I would have the most data to work with in the fewest relative amount of words.

A list of the top 25 topic groupings (sorted by descending topic density) within UNC’s Documenting the American South North: American Slave Narratives corpus of text, as determined by UMass-Amherst’s MALLET Topic Modeling Tool. Note that several of the top 25 topic groupings relate to religion; namely, #4, #8, #9, and #25. I conducted the analysis which follows on grouping #9 as it proved to contain texts with the most well-distributed collection of included topic words.


A list of the top 25 texts contained within topic grouping #9 from above, sorted by descending topic density. Note that the numbers in parentheses indicate the number of words contained in each document corresponding to the topic.

With the top 25 texts from one of the several religious-associated topic groupings (#9 in the list above) generated by MALLET in hand, I created a new directory containing the 25 text files and pre-pended each text file name with the year in which it was published to enable chronological sorting of the texts in Voyant. Directory in hand, and VoyantServer installed, it was time for some fun.

Directory of top 25 religious-associated texts from topic #9 above, filenames pre-pended with year of publication to enable chronological analysis in Voyant.

Upon first loading the corpus into Voyant, I quickly realized that my pre-processing and preparatory work wasn’t yet complete: as the classic cirrus and trends graphics below show, my texts were diluted with meaningless, commonly-used nouns and verbs. Although Voyant has its own stopword-removal functionality, I needed to give it a bit of help before I’d truly be able to learn from my texts.

Voyant cirrus visualization of religious corpus word frequencies, prior to manual stopword cleaning.
Voyant top-5 most frequent words trend visualization of corpus, prior to manual stopword cleaning.

Before I continue, though, I think it’s important to note the fact that masculine-gendered nouns and pronouns were so prevalent in the corpus of text. While they made it difficult to investigate what emotional and social connotations religion had by dominating most of the visualizations I created, washing out less common but more powerful words, I initially found it interesting that the corpus was so male-focused. Deeper reading, of course, would be necessary to determine the exact reason behind such male-centrism across the corpus, but Voyant’s contexts tool gives an interesting foundation from which to construct potential explanations: contrary to my initial intuition, it appears that many uses of “man” and “men” were not with gendered reference, but rather with more general, abstract reference to humans, or mankind. As this illustrates, “far reading” can surely bring up interesting anomalies—but actually interpreting those anomalies is where the work of the historian begins.

Sample use contexts of the word “man” in the corpus.
Sample use contexts of the word “men” in the corpus.

Having removed the most frequently-occurring stopwords and meaningless words, such as “said,” “mr,” “man,” “men,” “time,” “say,” “come,” “went,” and a few others, the picture illustrating religion’s role in the enslaved Antebellum and pseudo-free Reconstruction-era South became clearer. As evidenced in the cirrus and top-5 most frequent words trend visualizations below, words like “great,” “good,” “new,” “young,” “conference,” “home,” “house,” and “school” were common in the corpus, leading me to suspect that positive notions of progress in politics and education seemed to parallel discussions of religion in the texts. Seizing the chance to investigate this idea, I dove deeper.

Voyant cirrus visualization of religious corpus word frequencies, following manual stopword cleaning.
Voyant top-5 most frequent words trend visualization of corpus, following manual stopword cleaning.

I wanted to take a look at the chronological relationship of religion and sentiment within the corpus, and created a custom word frequency trend visualization of the words “good,” “great,” “pain,” “suffering,” and “work” to do exactly that. To my chagrin, no clear upward- or downward-trending relationship appeared—regardless, a few interesting takeaways may be noted from the figure below. First, “good” and “great” occurred much more frequently than “pain” or “suffering,” suggesting that the tone of the corpus was more uplifting than it was grieving; secondly, the trends of “great,” “good,” and “work” seemed to parallel each other at times, perhaps suggesting that religion offered the opportunity for self-betterment and empowerment through service and commitment.

Word frequency trend visualization investigating the chronology of religion and sentiment in the corpus.

Indeed, upon taking a closer look at the various use contexts of “work” in the text, several suggest its positive role as a means for progress in the arenas of religion, politics, and education, such as “…he has been offered the superintendency of the church work in the West Indies, but respectfully declined,” “…it was due principally to his persistent work in that convention, that resolutions favoring universal suffrage were passed” and “…exhibited great pluck and perseverance in fitting himself for the work he desired to undertake. He pursued with assiduity every study.” Moreover, investigating use contexts of “good” and “great” revealed its close association with the “good Lord,” and positive influence of religion, in quotes such as “…used to sing his spiritual songs with great glee of spirit,” “I preached, doing the best I could. It was a great time with the Methodists,” and “The meeting resulted in great good. The Colored Methodist Episcopal Church now held full sway,” along with  “So I went to prayer and asked the good Lord to assist me if he pleased,” “I should, by the good Lord’s assistance, do all the good I can in this place,” and “I am happy to think that the good Lord has taken care of me up to the present.” Of course, these quotations by no means represent the entire sample, so it’s important to take them with a grain of salt; nonetheless, I believe there’s still value to be found in them.

Sample use contexts of the word “work” in the corpus
Sample use contexts of the word “great” in the corpus.
Sample use contexts of the word “good” in the corpus.

Reliability and representation of the whole aside, it seems that many of the insights I was able to derive from my first-ever “far reading” weren’t too far off the mark after all—UNC’s Documenting the American South: North American Slave Narratives “Guide to Religious Content” page comes to a similar conclusion:

…the narratives reveal the duality of black religious experience: the white-controlled message and practice, and the “invisible institution” the slave community established across the South embodying its own religious ideals and aspirations. Some report the conversion so central to evangelicalism. They capture the joy of being “in the spirit” and the rich sense of personal value religious commitment brought. Many individuals persisted in religious practice despite severe punishment. The narratives show how slaves interpreted the Bible, especially the Exodus story, as a metaphor for their own difficult lives and as promise of eventual liberation.

-Marcella Grendler, Andrew Leiter, and Jill Sexton

Thus, as a researcher, the natural question now becomes: what role does “far reading” serve, exactly? You’ll note that I’ve hedged many of my interpretations within this analysis, and as such, are likely wondering why “far reading” is a productive investment of time given the uncertainty and ambiguity of its implications.  While I’ll further emphasize (despite my general success in this investigation) that it can be dangerous to make broad conclusions from “far reading” without deeper verification, I’ve found “far reading” to be a nice change of pace from close reading, actively engaging my curiosity and leading me to take a more interactive approach to understanding history in a manner unlike any other reading I’ve done before. Even if it only provides a starting line from which more rigorous research is to be built upon, “far reading” enables one to ask specific questions of several sources en masse before committing to a single research topic which may or may not pan out to be suitably deep. It allows one to “zoom out” and view wide-scoped trends across a corpus which would otherwise take days to fully read within minutes, and allows one to think about topic dynamics, relationships, and interdependence in new ways by visualizing, quantifying, and categorizing patterns with a refreshing sense of discreteness in an otherwise subjective world.

That last line probably reveals my bias as an aspiring computer scientist, mathematician, and statistician, but by no means should that dissuade you from seeking a new perspective—give “far reading” a shot yourself, and see where it takes you.