There are lies, damned lies, and statistics.

-Mark Twain

Although Twain wasn’t exactly known for his optimism, his words pack a punch square to the nose of today’s fastest growing technological field—call it statistics, data science, applied mathematics, or analytics. Silicon Valley can’t get enough data despite the fact that we’ll soon run out of silicon on which to store it—regardless, the age of the algorithm is here, and likely here to stay.

American author Mark Twain.

While I discussed historical data in last week’s post, I’d like to shift that discussion to focus more broadly on data journalism in this week’s installment—that is, telling stories and making arguments with data and visualizations. As digital historians, we’ve much to learn from modern examples of both well-curated and shoddily-crafted visualizations and abstractions; whether the data represents economic metrics from 1900 or stock trends from 2019, many of the underlying fundamentals separating best practices from worst practices remain invariant. To orient our discussion, let’s take a deep dive into a beautiful visualization from the New York Times, applauding what it does well and critiquing where it falls short. Ideally, we’ll walk away with insights that improve our use of data in making historical arguments.

A visualization depicting “How Every Member got to Congress,” by the New York Times.

If you’re like me, you likely fell in love with the cotton-candy tangle of blue-raspberry and cherry sugar floss—er, I mean the time-series network graph—above at first sight. From a purely aesthetic perspective, the graphic is truly beautiful; one might consider it a piece of 21st-century abstract art. The juxtaposition of scale, shape, curvature, divergence, and convergence works to induce an organic sense of movement, growth, and life in the data. Clean labels, sleek text, and a refreshing sense of minimalism catch and engage the eye with ease, while internal symmetry builds resonance. Simple geometry intertwines to manufacture a sum greater than its parts; one-dimensional edges from professional birth to Congressional induction collaborate to form a multidimensional manifold.

From a purely aesthetic perspective, I have no complaints—in fact, from a purely aesthetic perspective, I’d call this an award-winning visualization. Add in the layer of interactivity which allows you to see the individual path of any given Senator, and you’ve got more than just an eye-catching visualization—you have an addictive exhibit.  Watching the cloud of lines sink into the background as a single Senator’s meandering journey rises and emboldens is a testament to the abstract elegance enabled by today’s backbone of binary. Imagining that Senator’s path vicariously through their nonlinear blue or red trace provides a sense of personal connection with what otherwise amounts to nothing more than a spreadsheet behind the scenes.

Michigan 8th District Representative Elissa Slotkin’s path to the House of Representatives, as depicted by the New York Times’ “How Every Member Got to Congress.”

From a purely aesthetic perspective, I commend the New York Times for their work. There’s a reason I’ve employed anaphora here, however: where the visualization excels in graphic design, it lacks in information design.

From a purely aesthetic perspective, I’ve fallen in love—and yet from a holistic perspective, I’ve walked away unimpressed. Inherently misrepresenting the very data with which it seeks to tell a story, the visualization raises a plethora of red flags when investigated under a lens polarized against superficial aesthetics—from a statistical, mathematical, journalistic, and argumentative perspective, the visualization falls flat. Let’s unpack, and see what we can learn.

Ask Why

I first encourage you to consider a question we should all ask upon seeing data in the wild: why? Why is the graphic in front of you at this very moment? What story is the author trying to tell with the visualization? Why is the author motivated to tell such a story? What argument is the author making with the data, and what role does the data play in supporting that argument? Why might a visualization serve as a more effective means of communication in the current frame of context than any other? Why might this visualization specifically be more effective in communicating a message than any other?

Of course, there are certain data-driven stories that simply lack structure without a core visualization, be it a graph, chart, or table. You’d be hard pressed to tell me what the temperature forecast looks like for the entire United States without a means of efficient visual summary and minimalism, just as you’d stumble to explain the recent trends of domestic unemployment in words only. Effective visualizations enable our comprehension by lending our minds an olive branch: by engaging our visual cortex, we’re able to form a better mental model of the story at hand than we’d be able to exclusively through text.

Effective visualizations answer the question: “why?” [source: The Weather Channel]
Effective visualizations communicate the message of a story in a manner which enables the reader’s deeper understanding and construction of a more complete mental model. This chart, from the Wall Street Journal, does exactly that.

That is, effective visualizations have a clear answer to the simplest question: why?

Ineffective visualizations, on the other hand, fail to bolster the overarching message of an author’s argument; rather, they dilute, contradict, blur, or distract from such a message. They overthrow a writer’s intentions, either by direct force (contradiction) or by mere unwelcome occupancy (distraction). Loitering visualizations often cause just as much harm to a piece as contradictory ones: whereas contradictory ones can, at times, be effective in providing balance and acknowledging counterargument when used effectively, informationless ones detract from the credibility of the author in the form of wasted space.

This leads us to a pair of fundamental questions in the context of the New York Times’ visualization: why did the authors choose to include this visualization in their article, and what argument are they making?

I’ll leave you to read the article to answer those questions for yourself, but in short, the article discusses the demographics of the House of Representatives with the goal of highlighting how such demographics misrepresent the Americans they serve. On second glance, then, (the first glance captured only aesthetics), the visualization seems reasonable: after all, it’s quite literally showing “How Every Member Got to Congress.”

On third glance, however, things start to get suspicious. If the goal of convincing the reader that the House misrepresents Americans, why exclude the career paths of the aggregate American population? Why limit the visualization to a vacuum which inhibits comparison? What’s the point in knowing what the American House of Representatives looks like if we don’t know what America looks like? Moreover, what’s the point in knowing what the American House of Representatives looks like if we don’t know what foreign legislatures look like? (the article goes on to make various textual comparisons between the American House and foreign legislatures to support its larger argument of misrepresentation).

On fourth glance, the suspicion turns to disappointment. What exactly did I learn from the visualization? What did the visualization express that couldn’t have been expressed more precisely in words or other, more comparative visualizations? On one hand, one might argue that the visualization more effectively captures a global view of the House than any alternative journalistic method could; on the other hand, one would ask: what’s the value added in capturing a global perspective when I’m unable to contextualize or derive conclusions from it? Sure, it’s nice to know that a large contingent of representatives passed through the gateways of private law, business management, and state legislature before arriving on Capitol Hill—but how, exactly, does that knowledge enhance my understanding of the problems we face in American politics from a quantitative and qualitative point of view?

It may be a beautiful piece of art, but I’m not sure I’d call it an “infographic”—to do so would be to contradict the first half of the word. By first asking “why?” we find ourselves at a loss for an answer.

Read the Fine Print

The next red flag that I noticed upon the “second glance,” so to speak, lies hidden in the fine print complementing the visualization on the New York Times‘s website only. The astute reader will find

Here are the paths that the members of the House of Representatives took to Congress. Each line represents a Democratic or Republican representative, and circles are the major educational, career and political milestones on their path to the House. Items are not exhaustive nor in chronological order.

In other words, the more astute reader will find that the visualization is admitting that it’s lying. Designing a time-series visualization to then make up a time-series path for each Senator isn’t much better than releasing a trailer for a movie that hasn’t yet been filmed: instead of seeking representation following observation, it’s seeking observation to support a presupposed representation. There’s justifiable support for slight manipulation in the world of data science when doing so clarifies the message already told by the data, but manipulating data to tell the story you want it to tell is the central subject of a cautionary best-seller.

At least the authors are being honest. If you’re going to manipulate data, disclose it—but do your reader a favor, and disclose it with more detail than the authors of the New York Times piece do. “Items are not exhaustive nor in chronological order,” does nothing but set off alarm bells in my head—linking to a supplementary page in which you discuss what manipulations were made and why would be sufficient to silence those bells. It’s one thing to disclose an admission in fine print, and another to disclose it openly; in fact, to the critical consumer, the latter will bolster your argument more than the former. Revealing the process is just as important as revealing the final result when it comes to data journalism—transparency is often the best weapon to defend against misunderstanding, suspicion, skepticism, and criticism (or any combination of the above).

There’s one more reason I’m wary of fine print: in the age of the Internet, context is a rare commodity. As soon as an image is publicly accessible on a server, celebrities across the globe can tell you what happens next: that image is open to free interpretation. In the sphere of data journalism, one must consider the dangers of such free interpretation—from clickbait-driven headlines to misunderstood Facebook rants, visualizations without sufficient internal context are in danger of supporting adversarial arguments. Not all readers work hard to seek the truth, making it ever more important to embed that truth in the visualization itself.

Including sources, qualifiers, and other explanatory footnotes within visualizations builds credibility in a day and age of clickbait. [source: Wall Street Journal]
The honest data journalist should first seek to avoid the necessity of fine print—but when doing so is impossible, the honest data journalist will embed that fine print in footnotes rooted to the very pixels of the image.

Quantify

Numbers, like visualizations themselves, engage a different part of our brain than text alone—when properly leveraged, numbers enable us to more fully understand, characterize, and process an argument. Text qualifies, numbers quantify. Not all visualizations need numbers to effectively tell a story, especially when the story is more qualitative than quantitative; however, numbers can often add an additional dimension in support of a story, while a lack thereof is a missed opportunity.

Something as simple as percentages and fractions indicating how many representatives passed through each portal embedded in the New York Times’ visualization would go a long way to enhance the reader’s conceptual grasp of the House; something as simple as year-markers affixed to each individual representative’s “stop” along their journey would add considerable depth. Summary statistics listed along the bottom, capturing trends, means, medians, and modes would engage yet another method of reasoning within the reader’s digestive process, telling a more complete story.

The devil is in the details of data science, and there’s gains to be had from granularity; moreover, such granularity is often best captured in the form of numbers. A global perspective is useful, but so too is a local perspective for juxtaposition. Visualizations capture the forest, while numbers capture the composing trees.

A spatial visualization of the global economy which effectively composes a metaphorical “forest” from numeric “trees.” Without percentages, the visualization would lose much of its communicative power. [source: howmuch.net]
My goal was not to step on the toes of Sahil Chinoy and Jessia Ma, the authors of the New York Times piece, with my commentary—in fact, I’d encourage you to read their article at length, as it’s extremely well written. Instead, my goal is to improve the collective work of data journalists—anyone telling stories with data—by noting the nuances one must consider when creating visualizations in the context of an example. Regardless of whether my thoughts have changed the way you consume, analyze and construct visualizations, the nature of the game dictates that practice makes perfect—hence, I recommend you check out flowingdata.com and anychart.com for inspiration, and take a crack at creating your own visualization.

With great power comes great responsibility, and with great preparation comes great performance. To do is to learn, and to learn is to improve.

And there’s always room for improvement.

Stay tuned for next week, where I’ll take my own shot at creating a visualization.

Give the pupils something to do, not something to learn; and the doing is of such a nature as to demand thinking; learning naturally results.

-John Dewey