Many Organizations Put Big Data Aside To Focus on Small Data
Much of the buzz around data over the past few years has been about big data and how organizations are using large data sets to give them a business edge. However, due in no small part to the pandemic, small data is moving into the spotlight.
Small Data Is Not New
According to Bryan Philips Cupertino, Calif.-based In Motion head of marketing, small data is the opposite of big data. It is a term that describes data sets with fewer than 1,000 rows or columns. The term was coined in 2011 by researchers at IBM to describe datasets that are too small for traditional statistical methods. In contrast to big data, small datasets can be analyzed using estimation. Examples of small datasets include customer transactions, social media posts, and individual genome sequences.
Small data, or the use of small data sets is not new. In 2019, Arun Ramaswamy, Chief Technology Officer for NielsenIQ, pointed out in a CMSWire post that the era of big data is coming to end. He wrote that it will be replaced by small data sets as AI is developing so that it can do more with less and because it is harder to get consumer data access because of the emergence of regulations like the GDPR and the California Consumer Privacy Act.
But small data goes back even further then that. In fact, as early as 2014, David Lavenda, a product expert with extensive experience leading information-intensive technology organizations, wrote that while specialized business analysts have been able to exploit it at a macro level, big data has failed to provide individual workers with the insights they need to act daily.
“There is nothing wrong with big data per se; but it’s not actionable for individual workers. What workers need is not big data, but rather, relevant data presented in smaller and smarter chunks,” he wrote in CMSWire.
Small Data Is Trending
It has clearly taken a long time for this message to trickle down that small data is useful too, but it has and now Gartner has identified it as one of the top 10 trends in the data and analytics space for this year. It is hard to know if the pandemic is the cause of the change in focus, but it certainly contributed to the change.
“The speed at which the COVID-19 pandemic disrupted organizations has forced D&A leaders to have tools and processes in place to identify key technology trends and prioritize those with the biggest potential impact on their competitive advantage,” said Rita Sallam, distinguished research vice president at Gartner, in a statement about the trends.
The research also points out that with the evolution and widespread traction of artificial intelligence (AI) and machine learning (ML) across the enterprise, businesses can now apply new techniques for smarter, less data-hungry AI solutions.
More to the point, Gartner points out that the extreme business changes from the COVID-19 pandemic caused ML and AI models based on large amounts of historical data to become less relevant. Running parallel to that is the fact that decision making by humans and AI are wider and require data from different sources for accurate responses to queries.
As a result, Gartner recommends that organizations adopt technologies that can use whatever data is available, as well as, wider sets of data that enables the analysis and use of synergy of a variety of small and large, unstructured, and structured data sources, as well as small data which is the application of analytical techniques that require less data but still offer useful insights.
Small Enterprises Join the Data Party
Small data analysis also enables small enterprises to join the data party too, according to Lior Shamir, associate professor of computer science at Kansas State University. He said that small data is data that humans can read and understand without the need to use machines. Although small data can be analyzed without using computers, machine learning and statistical methods can be applied to better understand the data and identify patterns that are difficult to identify and quantify manually. Such patterns can then be examined manually to provide deeper understanding through the data. These insights can often be more informative than analysis of big data, which relies entirely on machine analysis, and the results are sometimes more difficult to translate safely into actions.
This will be particularly useful for B2B companies, which often struggle with how to use machine learning and AI because they think they need millions of transactions and events to make an accurate prediction, Amnon Mishor, founder and CTO at San Francisco-based Leadspace, told us. It is true that more data usually means greater accuracy, but often B2B companies can make a big impact by using a small sample — a few dozen deals over the past year or two — to point them to other companies who look like those wins. Then each additional deal is an incremental lift in that prediction.
Learning Opportunities
“But beyond just those preliminary models,” Mishor said, “many companies we have worked with focus on tapping into big data resources from third parties or other sources that tell you more about the full company structure, employees, what sub-industries they are in, technologies they use, specialties, etc. By uncovering the deeper DNA of your small data, you can increase the predictive capacity and results.”
There are other advantages too, Christos A. Makridis, research professor at Arizona State University and senior adviser at Gallup, told us. He pointed out that one of the fundamental limitations in ML is that you do not always know what success looks like, so you can train a classifier to predict outcomes that might not actually be optimal. Moreover, when there is a regime change, you can get wildly different predictions from the models that are non-sensical.
The emergence of small and unconventional data offers advantages for decision-makers and marketers. For example, email meta-data can tell you about who is opening an email and what sort of emails they are likely to open. Furthermore, soft information — including through conversations — can point you in the right direction as to what hard data you might want to look at.
“Ultimately, the most important phenomena are usually the most complex,” he said. “Neatly formatted numerical data might be easiest to work with, but it might not be the most revealing. New, unstructured, and smaller scale data offers even more potential.
Enlist Some Analytics Dashboarding
Today, there are multiple cloud-based, front-end reporting and dashboarding tools that are both intuitive and inexpensive to leverage, said Craig Kelly, VP of analytics at Montreal-based Syntax. Ideally, organizations can find low-hanging metrics that are not overly complicated yet still provide value to the business, and that source data from one or two data sources directly, to build out those dashboards quickly.
With analytics, it is a great idea to start with a small scope to get some quick wins with users. This can increase user adoption rates early, and the frequent feedback will help you determine if organizations are on the right track. Organizations can always add to what they have already built, assuming it was properly structured at the start.
“Obtaining quick wins in analytics is great because it removes skepticism from within the business by showing data analytics’ real value. It also creates stakeholders who have a vested interest in championing a larger implementation, which greatly increases your odds of success,” Kelly said. The results, he added, is that it encourages the team to actively start using the applications, which will only help everyone understand the moving parts involved better. This should provide perspective and input for developing an optimal end solution.
Most importantly, make sure that the data you are presenting to your end users is accurate. Users will inherently question the data they see, and if it proves to be incorrect, it will quickly kill the credibility of the solution for good.
About the Author
David is a full-time journalist based in Paris, who spends his time working between Ireland, the UK and France. A partisan of ‘green’ living and conservation, he is particularly interested in information management and how enterprise content management, analytics, big data and cloud computing impact on it.