Ideas

The Fallouts Of Ignoring Small Data Won’t Be Small 

Sujeet Mishra

Jul 27, 2017, 06:26 PM | Updated 06:26 PM IST


LEGO bricks (Ben Pruchnie/Getty Images)
LEGO bricks (Ben Pruchnie/Getty Images)
  • Sure, focus on Big Data, but remember, Small Data insights hold the key to read the mind of the person who would finally write the cheque.
  • If stories of Kodak, Xerox, Nokia are staple diet on which modern stories of innovation and disruption are predicated, then this LEGO story would be seen as the most persuasive and compelling argument on the limitations of Big Data. In it lie the reasons for occasional failures of forecasting – whether it be regarding elections or business strategy or regarding why some politicians are so well ahead of the curve.

    It is Small Data which would be the leitmotif of tomorrow when machine learning would increasingly take over the mundane, structured, rule-based component of our lives.

    In an interview to the online business analysis journal Knowledge@Wharton, Martin Lindstrom, author of Small Data: the tiny clues that uncover huge trends presciently observes,

    Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why.

    This conclusion is surely going to warm hearts of the likes who have been taking on engineers and scientists.

    Book Cover of <i>Small Data: The Tiny Clues That Uncover Huge Trends by Martin Lindstrom</i>
    Book Cover of <i>Small Data: The Tiny Clues That Uncover Huge Trends by Martin Lindstrom</i>

    To an extent, this schadenfreude is justified as the engineering community started taking in Big Data as a dogma. This is not to decry the strength of Big Data, but to moderate the attempts to oversell it, as many of us tend to do with ideas we are in love with. Such behaviour is also, I may add, what has been the curse of humanities education in India.

    While carrying out statistical analyses machine learning tends to overwhelm the causative factors or limited bandwidth of databases or bias in the sample selection, no matter how large the data set is. Limitations faced by Google to forecast flu-outbreaks or the failure of poll forecast in the recent presidential elections in the United States should temper expectations from statistics.

    Big Data

    The internet has permitted creation and accretion of knowledge at an unprecedented scale. The sheer scale has led to the speciation of what was called, well, just data. Social media has become the new data generator.

    For YouTube alone, something which got created on 14 February 2005, 300 hours of video get uploaded every minute, and there are 3.25 billion hours of video watched each month.

    Twitter which debuted in 2006, at an average of 6,000 tweets each second, sees 500 million tweets each day.

    Facebook which opened to the world in 2006, now generates 4 new petabytes of data each day adding to the 300 petabytes of data already stored; 250 billion photographs have been uploaded at an average of 350 million each day, with 4 million likes generated every minute (a thousand Giga makes an Exa and thousand Exa makes a Peta).

    Now, these large data lakes are what the world looks up to as Big Data. In the millions of actions per minute – likes on Facebook, search on Google, tweets on Twitter – one can read deep in to the psyche of the populations. A fair point if one is analysing what are essentially physical processes.

    But it has now been increasingly realised that Big Data can be grossly misleading in some cases as data sets generated by social media can have very strong biases.

    Revolutions and unrest spread like contagion on social media. News reaches you before the editorial sieves let them through the media outlets. So, can a revolution be predicted by the tweets or search engines? Or would it go the way of promise of flu prediction by search engines?

    Here comes Small Data

    Insights which lead to disruption, say creation of Steve Job-led Apple Inc., are clearly beyond the pale of Big Data-driven decision-making. The case of LEGO, for example, quoted by Martin Lindstorm is so very instructive. How a chance interaction with an eleven-year-old German boy made the company realise how misled it was about the aspirations of a new generation.

    As he observes, “every culture has its own corridors for desire and escape” and understanding this unmet desire, which even the person himself might be unaware of, is the key to creating big enduring value creation. Gifted political leaders are known to have the ability to capture small data and make counter-intuitive decisions. This approach also gives another very important insight.

    The times to come are likely to see the individual customer at the centre of the manufacturing universe with products customised to her needs. In this scaleless world, would there be space for giants like Ikeas and Walmarts? What about LEGOS which make products that are neither easy to make nor to play with.

    Small Data insights hold the key to read the mind of the person who would finally write the cheque. In his book, Martin shares his 7C approach to analyse what is protocol for Small Data-driven decision making. They being Collecting, Clues, Connecting, Causation, Correlation, Compensation, Concept.

    So, can we say Small Data is all about people while the domain of Big Data is exclusively inanimate? Or would the two co-exist?

    Contrary to popular perception that Internet of Things (IoT) is inherently about Big Data, a sensor, say a thermostat would be generating Small Data on which an air conditioner would trip. So, while Small Data-driven decision-making in a domestic IoT network would act on the immediate (temperature, light intensity etc.) it would influence decisions taken at the grid level.

    (While on Small Data, this initiative of Cornell is also worth a closer look.)

    Lessons for the Government

    Large data sets can deliver high value in archival and compliance. Land ownership assurance, revenue transactions, farm subsidies etc. can be flawlessly delivered in the most transparent manner. Big data sets like land records, remote sensing imagery collections, Aadhaar database can be the foundational layer for rural governance.

    However, which crop would be sown in a region in a particular season is still a decision which needs ears on the ground–a case for small data collection and analysis. This data typically can best be captured and fed to a larger decision making machinery by grass root government functionaries like patwaris, who remain the most resistant-to-change layer of governance and the main cause of discontent with governments in rural India. Online initiatives can sure make things easier but that does not rule out the need of better trained and motivated hands on the field.

    Policy makers need to keep the distinction of big and small always in sight and not just be driven by the over-promising consultants and marketeers. As stakes grow higher, we can not survive the sin of inadvertence.

    Dr Sujeet Mishra is a railwayman and currently the OSD of the National Rail and Transportation Institute, which is in transition to become Gati Shakti Vishwavidyala, a central university.


    Get Swarajya in your inbox.


    Magazine


    image
    States