Big Data and Blockchain: Combo or Opposition?

Today, big data and distributed registry technologies are still the most popular IT topics. The possibilities of their implementation in every applied sphere, from the banking industry to medicine, are discussed at conferences of all levels, corporate meetings and state councils [1]. We analyze expert opinions and application cases to see if combining Big Data and blockchain will bring additional bonuses, when it is not necessary to combine these technologies, and whether there are examples of their joint use in practice.

Why blockchain is not needed in all Big Data projects?

First of all, we should emphasize the fundamental difference between distributed registry technologies and Big Data: Big Data involves the integration of information from different sources, while in blockchain, on the contrary, copies of information chains are stored on many different computers [2]. Decentralized storage and sequential nature of data recording and causes a rather low speed of data reading. In particular, the bandwidth of popular cryptocurrencies, based on the blockchain, does not exceed 10 thousand transactions per second, despite many projects aimed at increasing this indicator. At the same time, the network speed of the international payment system Visa, working through centralized servers, is 24 thousand transactions per second. The concept of Big Data implies the rapid processing of huge amounts of information, which blockchain cannot provide, at least not yet.

The data that hit the blockchain stays there forever. Therefore, it only makes sense to use this technology for tasks that require the permanent storage of immutable information, including information that is obsolete and no longer in use. But there is no such a need in industries with high dynamics of changes and low enough value of each separate record, such as, for example, information about purchases of a particular customer on a particular day. Today, it is the marketing industry that is most actively using big data to personalize promotional offers, as we have already discussed here. And in this case, blockchain is not at all necessary for Big Data solutions to compile a detailed portrait of a consumer or to manage corporate online reputation.

Big Data, Big Data, blockchain, data processing, blockchain, block chain, distributed registry
The concept of blockchain storage and Big Data contradict each other


When is it useful to combine Big Data with blockchain?


Because of its specific nature of data recording, blockchain is well suited for tasks where the highest reliability and immutability of information is required [2]. For example, in the field of information security. Distributed registry technology ensures data integrity and reliability, and, because there is no single point of failure, the stability of information systems. Blockchain can solve the problem of trust in data, as well as provide the possibility of universal exchange of data [4].

The immutability and trustworthiness of blockchain information chains will come in handy when organizing an automatic archive of data transactions, in particular for recording data pipelines. This will help avoid some of the worst Data Scientist errors in every phase of the CRISP-DM standard, which we wrote about here.

Similarly, blockchain will provide detailed supply chain and consumption analytics to track and control product loss in transit, such as weight loss due to desiccation and evaporation of certain commodities [4].

Similarly, the combination of Big Data and blockchain can be used in health care so that important health data about health care customers is as secure, immutable, verifiable, and not subject to any manipulation as possible. Blockchain will also enable healthcare providers to share trusted information with insurance companies, justice agencies, employers, academic institutions, and other organizations in need of health information . Read more about how to combine distributed registry technology with Apache Kafka in our new article.

In addition, decentralizing the distributed registry will eliminate intermediaries and interact directly with counterparties, avoiding an intermediate buffer such as a technology broker or insurance agent [2].

Big Data, Big Data, blockchain, data processing, blockchain, block chain, distributed registry
Blockchain is useful as an additional option for some Big Data projects

5 examples of blockchain’s successful combination with Big Data

Cloud-based big data storage services Storj and FileCoin, which provide high reliability, absolute immutability and protection of data from unauthorized access. These services promise to reduce the cost of data storage by 90% compared to similar solutions from Amazon Web Services’ Cloud [5].
Omnilytics is a system that combines blockchain with big data analytics for marketing, finance, auditing, trend forecasting and other applications in various industries. Users of the service can track their performance against the real-world performance of competitors and partners in their field. The service supports smart contracts, distributed data identification, information exchange through APIs and other protocols .
Datum is a decentralized information storage network, managed by a Data Access Token (DAT) and designed to monetize individual data .
Rublix is an international trading platform for cryptocurrency investors that verifies the authenticity and credibility of traders and provides access to market information to reduce current confusion. The immutability of blockchain guarantees reliable and verified investment data analytics [5].
Provenance is a product provenance data storage and reporting service for consumers, producers and sellers. Customers learn reliable information about what a product is made of, where it came from, and how it affects the environment. Manufacturers and retailers track each batch of products and, as the data accumulates, gain insight into customer needs and desires so they can tailor their products and services accordingly. Blockchain provides transparency throughout the supply chain, and Big Data tools provide the necessary analytics