One of the defences of the NSA’s
spying attempts data collection to improve the nation’s security has been that they’re only actually capturing the metadata. They’re not capturing the underlying identities of those sending the messages or making the phone calls, nor are they analysing the actual messages or calls themselves. That last might still be true but new research has shown that it is trivially easy to connect the metadata to the originals from whom it originates.
Buy Now: Sony PlaysStation VR In Stock Here
The research is here:
So, just how easy is it to identify a phone number?
Trivial, we found. We randomly sampled 5,000 numbers from our crowdsourced MetaPhone dataset and queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.
What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.
How about if money were no object? We don’t have the budget or credentials to access a premium data aggregator, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched.1 Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.
If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers.
Now of course it’s entirely possible that the NSA doesn’t actually do this. And I think it’s also obvious that they don’t in fact do this for all calls and messages sent by all 300 million Americans each year. Then again, they don’t actually need to either: if such a matching exercise were undertaken then of course the results would be kept in a database somewhere. And you’d only need to run such a matching exercise on those numbers that you had not already associated with a name or a definite identity. That database would both grow and become more accurate over time.
There’s also good reason to think that the NSA will do rather better than those figures that the researchers managed. Intelius, for example, uses a reverse phone lookup system for part of its work. And of course the phone companies have a much more accurate system than the one Intelius has constructed from public information. If the NSA has access to those phone company records then of course it will be able to near instantly match any phone number to the presumed legal owner (ie, whoever the phone company thinks owns it).
Whether the NSA does do that, accesses the phone companies accurate databases or not, none of us knows. But it is obviously true that the NSA has enough, with that metadata and those phone numbers, to be able to identify by name the vast majority of people connected to those numbers that they know. And once you know the name there is a vast treasure trove of information that can be mined. And given what we’ve been told so far, who really wants to insist that they’re not doing that linking?
Don't Miss: iPhone 8: Everything You Need to Know