Big Data and blockchain – a breakthrough in data analysis

The constant acceleration of data growth is part and parcel of today’s realities. Social networks, mobile devices, data from measurement devices, and business information are just a few types of sources capable of generating gigantic amounts of data.

Nowadays, the term Big Data has become quite common. Not everyone is yet aware of how quickly and profoundly technologies for processing large amounts of data are changing various aspects of society. Changes are taking place in various areas, creating new problems and challenges, including information security, where its most important aspects, such as confidentiality, integrity, availability, etc., should be in the foreground.

Unfortunately, many of today’s companies resort to Big Data technology without creating the proper infrastructure to do so, which would be able to reliably store the vast amounts of data that they collect and store. Blockchain technology, on the other hand, is rapidly evolving to solve this and many other problems.

What is Big Data?

Basically, the definition of the term is on the surface: “big data” means managing very large amounts of data as well as analyzing it. If you look more broadly, it is information that cannot be processed in classical ways because of its large volume.

The term Big Data (big data) itself appeared relatively recently. According to Google Trends, the term grew in popularity at the end of 2011:

In 2010, the first products and solutions directly related to big data began to appear. By 2011, the majority of major IT companies, including IBM, Oracle, Microsoft, and Hewlett-Packard, were actively using the term Big Data in their business strategies. Gradually, IT market analysts are beginning to actively research the concept.

Currently, the term has gained considerable popularity and is actively used in a variety of areas. However, we cannot say with certainty that Big Data is a fundamentally new phenomenon – on the contrary, big data sources have existed for many years. In marketing, they include databases of customer purchases, credit histories, lifestyles, etc. For years, analysts have used this data to help companies predict future customer needs, assess risks, shape consumer preferences, etc.

This has now changed in two ways:

  • More sophisticated tools and methods for analyzing and comparing different data sets have emerged;
  • The analysis tools have been supplemented by a host of new data sources due to the widespread transition to digital technology, as well as new methods of data collection and measurement.

Researchers predict that Big Data technologies will be most actively used in manufacturing, healthcare, commerce, government, and a variety of other sectors and industries.

Big Data is not any particular array of data, but a set of methods for processing it. The defining characteristic of big data is not only its volume, but also other categories that characterize the labor-intensive processes of data processing and analysis.

The input data for processing can be, for example:

  • Logs of Internet user behavior;
  • Internet of things;
  • social media;
  • meteorological data;
  • digitized books from major libraries;
  • GPS signals from vehicles;
  • transaction information from bank customers;
  • location data of mobile network subscribers;
  • information on purchases at major retail chains, etc.

Over time, the volume of data and the number of data sources is constantly growing, and against this background, new methods of processing information are appearing and existing ones are being improved.

Basic principles of Big Data:

  • Horizontal scalability – data sets can be enormous and this means that a Big Data processing system must dynamically expand as its volume increases.
  • Fault tolerance – even if some hardware elements fail, the entire system must remain operational.
  • Data locality. In large distributed systems data are usually distributed over a large number of machines. However, if possible, and in order to save resources, the data are often processed on the same server that is stored.

New breakthrough technologies, such as blockchain, are needed to make all three principles work consistently and, therefore, store and process big data in a highly efficient manner.

What is Big Data for?

The scope of Big Data applications is constantly expanding:

  • Big data can be used in medicine. For example, a patient can be diagnosed not only by analyzing the patient’s medical history, but also by taking into account the experience of other doctors, information about the environmental situation of the area where the patient lives, and many other factors.
  • Big Data technologies can be used to organize the movement of unmanned vehicles.
  • By processing large amounts of data, it is possible to recognize faces in photos and videos.
  • Big Data technologies can be used by retailers – trading companies can actively use massive amounts of data from social networks to effectively adjust their advertising campaigns, which can be maximally oriented at a particular consumer segment.
  • This technology is actively used in election campaigns, including the analysis of political preferences in society.
  • The use of Big Data technologies is relevant for Revenue Assurance (RA) solutions, which include tools for the detection of inconsistencies and in-depth data analysis allowing timely identification of possible losses or distortions of information that can lead to a decrease in financial results.
  • Telecom providers can aggregate big data, including geolocation data; in turn, this information can be of commercial interest to advertising agencies, which can use it to display targeted and localized advertising, as well as to retailers and banks.
  • Big data can play an important role in the decision to open a point of sale in a certain location based on the presence of a strong target flow of people.

Thus, the most obvious practical application of Big Data technology lies in the field of marketing. Thanks to the development of the Internet and the proliferation of communication devices of all kinds, behavioral data (such as the number of calls, buying habits and purchases) is becoming available in real time.

Big Data technologies can also be effectively used in finance, for sociological research and in many other spheres. Experts argue that all these possibilities of using big data are only the visible part of the iceberg, because on a much larger scale, these technologies are used in intelligence and counterintelligence, in the military, and in everything that is commonly referred to as information warfare.

In general terms, the sequence of work with Big Data consists of collecting data, structuring the information obtained through reports and dashboards, and the subsequent formulation of recommendations for action.

Let us take a brief look at the possibilities of using Big Data technologies in marketing. As is known, information for the marketer is the main tool for forecasting and strategizing. Big Data analysis has long been successfully used to determine the target audience, interests, demand, and consumer activity. Big Data analysis, in particular, allows advertising (based on the RTB auction model – Real Time Bidding) only to those consumers who are interested in a product or service.

The use of Big Data in marketing allows business people to:

  • get to know their consumers better, attract a similar audience online;
  • assess the degree of customer satisfaction;
  • understand whether the service offered meets expectations and needs;
  • find and implement new ways to increase customer trust;
  • create projects that are in demand, etc.

For example, the service Google.trends can indicate to the marketer a forecast of seasonal activity of demand for a particular product, fluctuations and geography of clicks. If you compare this information with statistical data collected by the corresponding plugin on your own site, you can make a plan for the allocation of the advertising budget by month, region, and other parameters.

According to many researchers, it is in the segmentation and use of Big Data that Trump’s campaign was successful. The team of the future U.S. president was able to correctly divide the audience, understand its desires and show exactly the message that voters want to see and hear. Thus, according to Irina Belysheva of Data-Centric Alliance, Trump’s victory was largely made possible by a non-standard approach to Internet marketing, which was based on Big Data, psychological-behavioral analysis and personalized advertising.

Trump’s political technologists and marketers used a specially developed mathematical model that allowed them to deeply analyze data from all U.S. voters and systematize it to make ultra-precise targeting not only by geography, but also by voters’ intentions, interests, psychotypes, behavioral characteristics, etc. The marketers then organized personalized communication with each group of citizens based on their needs, attitudes, political views, psychological traits and even skin color, using a different message for virtually every single voter.

As for Hillary Clinton, she used “time-tested” methods in her campaign, based on sociological data and standard marketing, dividing the electorate into only formally homogeneous groups (men, women, African Americans, Hispanics, poor, rich, etc.).

As a result, the winner was the one who appreciated the potential of new technologies and methods of analysis. It is noteworthy that Hillary Clinton’s campaign spending was twice as much as her opponent’s:

The main problems with using Big Data

In addition to the high cost, one of the main factors inhibiting the adoption of Big Data in various fields is the problem of choosing which data to process: that is, determining which data should be extracted, stored and analyzed, and which should not be taken into account.

Another problem with Big Data is of an ethical nature. In other words, a legitimate question arises: can such data collection (especially without the user’s knowledge) be considered a violation of privacy boundaries?

It is no secret that the information stored in the search engines Google and Yandex allows the IT giants to constantly refine their services, make them user-friendly and create new interactive applications. To do this, search engines collect user data about user activity on the Internet, IP-addresses, data about geolocation, interests and online purchases, personal data, mail messages, etc. All of this allows you to display contextual ads according to the user’s online behavior. At the same time, users are not usually asked for their consent, and they are not given the opportunity to choose what information about themselves they want to provide. That is, by default Big Data collects everything that will then be stored on these sites’ data servers.

From this comes the next important problem concerning the security of the storage and use of data. For example, is a particular analytics platform to which consumers are automatically submitting their data secure? In addition, many business representatives note the shortage of highly skilled analysts and marketers who can effectively operate with large volumes of data and solve specific business problems with their help.

Despite all the difficulties with the introduction of Big Data, businesses intend to increase investment in this area. According to Gartner research, the leading industries investing in Big Data are media, retail, telecom, banking and service companies.

Prospects for interaction between blockchain and Big Data technologies

Integrating distributed ledger technology with Big Data has synergies and opens up a wide range of new opportunities for businesses, including:

  • Access detailed information about consumer preferences, from which detailed analytical profiles for specific suppliers, products and product components can be built;
  • Integrate detailed transactional data and consumption statistics for specific product groups;
  • obtain detailed analytical data on the supply and consumption chains, monitor product losses during transportation (e.g. weight losses due to drying and evaporation of certain types of goods);
  • counter product counterfeiting, improve anti-money laundering and anti-fraud, etc.

Access to detailed data on the use and consumption of goods will largely unlock the potential of Big Data technology to optimize key business processes, reduce regulatory risks, open up new opportunities for monetization and the creation of products that will best meet current consumer preferences.

As we know, representatives of major financial institutions, including Citibank, Nasdaq, Visa, etc. are already showing significant interest in blockchain technology. According to Oliver Bussmann, IT manager of the Swiss financial holding company UBS, blockchain technology can “reduce transaction processing time from days to minutes.

The potential for analyzing financial information from blockchain using Big Data technology is enormous. Distributed ledger technology ensures the integrity of information as well as the reliable and transparent storage of all transaction history. Big Data, in turn, provides new tools for effective analysis, forecasting, economic modeling, and thus opens up new possibilities for making more informed management decisions.

The tandem of blockchain and Big Data can be successfully used in healthcare. As we know, imperfect and incomplete patient health data multiplies the risk of misdiagnosis and improper treatment. Critical health data about health care clients must be as secure as possible, have immutability properties, be verifiable and not subject to any manipulation.

Blockchain information meets all of these requirements and can serve as high-quality and reliable raw data for in-depth analysis using new Big Data technologies. In addition, with blockchain, medical institutions would be able to share reliable data with insurance companies, justice agencies, employers, academic institutions and other organizations that need medical information.

Big Data and Information Security
Broadly understood, information security is the security of information and supporting infrastructure from accidental or intentional negative impacts of a natural or artificial nature.

Big Data faces the following challenges in the area of information security:

  • Data protection and data integrity challenges;
  • the risk of unauthorized interference and leakage of confidential information;
  • improper storage of confidential information;
  • risk of loss of information, for example, due to someone else’s malicious actions;
  • risk of misuse of personal data by third parties, etc.

One of the main problems of big data that blockchain is intended to solve is information security. By enforcing all of its basic principles, blockchain technology can ensure data integrity and reliability, and by avoiding a single point of failure, blockchain makes information systems stable. Distributed registry technology can help solve the problem of trust in data, as well as enable the universal exchange of data.

Information is a valuable asset, which means ensuring the basic aspects of information security must come first. To compete, companies must keep pace, which means they cannot ignore the potential opportunities and benefits that blockchain technology and Big Data tools offer.