Wanting to Know Everything

The NSA has assembled a gigantic database of telephone calls in the United States, with the help of all of the major telecommunications providers (except Qwest). The database is not of voice recordings, but of calls made. It constitutes data on a huge network of ties between people who call each other. In recent years, sociology and related fields have seen a lot of development in the areas of dynamic modeling of social networks, and in fast algorithms for analyzing large, sparse graphs. Entities with this kind of structure include things like the Internet, or AOL’s instant messenger network, or the universe of telephone calls within the United States. Some of the papers in this edited volume, published by the National Academy of Science, give a sense of what people are doing. (The volume was co-edited by my colleague Ron Breiger.) For instance, you can read about Data Mining on Large Graphs, Identifying International Networks, the Key Player Problem, and the use of MTML models to study adversarial networks. I think it’s fair to say that techniques of this sort are of significant interest to the intelligence community.

Social scientists, in the normal course of things, are severely limited in the amount of good data they can collect on networks of this sort. The Internet Movie Database has proved a very useful source of data for developing theory and methods in this area because it’s comprehensive and publicly available. Other researchers have set out to collect very large datasets describing some network structure together with the attributes of the people in it. A recent paper by by Gueorgi Kossinets and Duncan Watts, for example, analyzes all the emails sent over the course of a year by 43,000 students, faculty and staff at a large private university. But the traffic analyzed in that paper is just a drop in the ocean of the real flow of communication that travels by voice and email every day.

Social network analysts—in fact, any social scientist who works with quantitative data—often dream of ideal datasets. The kind of thing we would collect if money, time and ethics did not constrain us. When we daydream like this, our thoughts tend toward harmless megalomania: maximally comprehensive data on the whole population of interest, in real-time, with vast computing power to analyze it, and no constraints on updating or extending it. And a pony, too. At the limit, something like Borges’ map is what we want, a perfect, one-to-one scale representation of the world.

Scientists and spies are not so different. The intelligence community’s drive to find the truth, to uncover the real structure of things, is similar to what motivates natural or social scientists. For that reason, I can easily understand why the people at the NSA would have been drawn to build a database like the one they have assembled. The little megalomaniac that lives inside any data-collecting scientist (“More detail! More variables! More coverage!”) thrills at the thought of what you could do with a database like that. Think of the possibilities! What’s frightening is that the NSA is much less constrained than the rest of us by money, or resources, or—it seems—the law. To them, Borges’ map must seem less like a daydream and more like a design challenge. In Kossinets and Watts’ study, the population of just one university generated more than 14 million emails. That gives you a sense of how enormous the NSA’s database of call records must be. In the social sciences, Institutional Review Boards set rules about what you can do to people when you’re researching them. Social scientists often grumble about IRBs and their stupid regulations, but they exist for a good reason. To be blunt, scientists are happy to do just about anything in the pursuit of better knowledge, unless there are rules that say otherwise. The same is true of the government, and the people it employs to spy on our behalf. They only want to find things out, too. But just as in science, that’s not the only value that matters.