“If we torture data long enough, they will confess
(revealing the secret messages that God has sent us)”

R. Coase (freely readapted quote)

The search for “hidden” connections and obscure meanings within the data, if not supported by rigorous methodological criteria characterized by scientificity, can lead to detect correlations determined simply by chance (also known as “spurious correlations”), which may manifest with greater ease just as the size of the datasets grows.

In this sense, the example of numerology, an ancient practice that has survived to the present day, can be instructive and has come back into vogue thanks to the famous book by Dan Brown (“The Da Vinci Code”).

The results obtained through these practices have no scientific value (therefore they should remain confined within the fiction).

Science doesn’t play with numbers

Nevertheless, numbers maintain their “halo” of likelihood, due to their high capacity of suggestion, associated with their “narrative” characteristics, such as:

  • reconstructing the “facts” in such a way as to give a complete sense that corresponds to the “truth” (by leveraging the “Bias of Confirmation”);

  • relying on numerical “cherry picked” samples, evoking in so doing an “appearance of scientificity” (by virtue of the “data driven” fallacy, according to which “data always speak for themselves”).

We deal with it here, therefore, precisely to avoid falling into the same methodological errors when dealing with large amounts of data, as when performing Big Data analytics.

Let’s see how to prevent us to fall prey of data illusions.

    Access preview

    Subscribe our Newsletter to access Contents Preview: