In times when terrorist attacks are unfortunately on the agenda, the topic Privacy and protection of Confidentiality is considered by many to be suspect by itself.
But really the renunciation of Privacy represents a necessary evil, in order to guarantee the safety of citizens, or, in light of the use of Big Data in an increasingly pervasive way, this does not risk to turn into a boomerang?
That false trade-off between security and privacy
It has now become a commonplace the idea that in order to guarantee the safety of citizens, they must renounce their “claims” in terms of confidentiality.
In reality, as we will see shortly, to the increase of available information does not necessarily correspond an increase in the “signal” (that is to say, an increment in the really relevant information) but more likely in the “noise” (this term meaning useless, misleading or simply random information).
The search for the signal in this case looks more like that of a needle in a haystack… with the aggravating circumstance that as the information grows the needle always remains the same, while the haystack grows out of all proportion!
Let’s try to clarify the concept with a numerical example.
Looking for a needle in a haystack that grows out exponentially
With the availability of the data relating to hotel reservations (data that obviously includes name, surname, address, etc.) made by a large number of tourists (in the order of 1 billion), we now want to find out if among these there are any potential terrorists, such as for example two subjects of different nationality and residence, who have decided to meet in the same hotel, located in any part of the world, on two different days (which we consider suspicious, and which we therefore interpret as an indication of the planning of a possible terrorist attack).
So, let’s summarize the data of our example 1 and try to do some simple calculations:
- the number of hotel bookings concerns 1 billion (109) of individuals of all nationalities;
- every tourist goes to a hotel 1 day out of 100;
- Let’s imagine that we focus our surveys on 100,000 hotels, and that each hotel can accommodate 100 people each;
- our analysis is developed over a period of 1000 days.
Based on these assumptions, let’s start now by evaluating the probability that two people will meet in the same hotel on two different days.
In addition to intruding into the private life of a disproportionate number of harmless (and innocent) citizens, the police would still be called upon to make an absolutely unsustainable investigative effort in practical terms.
It is for these reasons that a security project initially proposed by the Bush administration in 2002, with the evocative name of “Total Information Awareness”, was prematurely “closed” and not refinanced.
But in 2002 the “craze” of Big Data Analytics had not yet exploded, and since then, many seem to have thought about it again…
The example is taken up and adapted from the original shown in the masterful text “Mining of Massive Datasets”, Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman. ↩