Two Revolutions

In the next week or two I’ll be talking to some social science students about tools for doing research and writing up results. Over the years I’ve accumulated various things on the topic, ranging from bits of advice to templates or things I use myself. My focus is on managing the various pieces of the work process in plain-text, especially when it comes to writing code you can read later, and keeping track of the work you’ve done. When talking to undergraduates or graduate students about this, and when teaching classes that use these tools, I increasingly run into the problem that it’s hard to get started on this topic without backing up a bit first in order to talk about how the computer they are using works.

I think reason for this is the rise of the flat-screen, touch-based model of computing, most obviously on phones and then very secondarily on things like Apple’s iPad or Microsoft’s Surface tablet. Now, most people who need to write long documents (like papers or dissertations) or work in an involved way with data do not use a tablet as their primary device. But it does seem clear that some kind of touch-screen interaction is the future of computing for most people. Indeed, once you consider phones properly you realize it’s the present of computing for most people. While it is not strictly impossible, it remains very difficult to do your academic, social-science work on a device of this sort. This is likely to be the case for some time. The tools we have are not designed up for them.

That’s not surprising. But I think there is an underappreciated tension here. Two ongoing computing revolutions are tending to pull in opposite directions. On one side, the mobile, cloud-centered, touch-screen, phone-or-tablet model has brought powerful computing to more people than ever before. This revolution is the one everyone is talking about, because it is happening on a huge scale and is where all the money is. In practice it puts single-purpose applications in the foreground and hides from the user both the workings of the operating system and (especially) the structure of the file system where items are stored and moved around.

On the other side, open-source tools for plain-text coding, data analysis, and writing are also better and more accessible than they have ever been. This has happened on a smaller scale than the first revolution, of course. But still, these tools really have revolutionized the availability and practice of data analysis and scientific computing generally. They continue to do so, too, as people work to make them better at everything from slurping up data on the web to presenting it there. These tools mostly work by gluing together separate, specialized widgets into a reproducible workflow. They are “bitty” or granular because the process of data analysis is that way as well. They do much less to hide the operating system layer—instead they often directly mesh with it—and they also presuppose a working knowledge of the file system underpinning the organization of the things the researcher is using or creating, from data files to code to figures and final papers.

The tension is that, increasingly, people who come in to the world of social science wanting to work with data tend to have little or no prior experience with text-based, command-line, file-system-dependent tools. In many cases, they do not have much experience multi-tasking in a windowing environment, either, at least in the sense of making applications work together in the service of a single goal.¹ To be clear, this is not something to blame users for, and neither is it something to complain about in misguided nostalgia for the command line. Rather, it is an aspect of how computer use is changing at a very large scale. The coding and data analysis tools we have are powerful and for the most part meant to allow research products to be opened up and inspected. But the way they work clearly runs against the current of everyday, end-use computing, which increasingly hides many implementation details and focuses on single-purpose tasks. Again, specialized tools are necessarily specialized. The net result for the social sciences in the short to medium term, I think, is that we will have a suite of powerful tools that enable an amazing variety of scientific activity, developed in the open and mostly available for free. But it will get harder to teach people how to use them. Organizations like Data Carpentry have begun to address this challenge in a very positive way. I think we’ll need a lot more in that vein in the future.

As opposed to multi-tasking in the less-interesting sense of trying to pay attention to a number of discrete tasks (writing, email, calendar, web-browsing), each controlled by a separate application. ↩︎