Blockchain Could Solve Data Quality Issues
In response to the spiked-interest in cryptocurrencies, blockchain was created to record cryptocurrency transactions. Blockchain is an innovative technology that utilizes Peer to Peer (P2P), a vast network of computers operating a decentralized server to actively store cryptocurrency transactions into a secure and unalterable state. The process of blockchain goes as follows: a transaction is requested; a P2P receives the transaction; the P2P authenticates the transaction; a block of data is built up of authenticated transactions; and then the block of data is permanently joined with other blocks of data, forming a blockchain (Buterin et al.).
Additionally, blockchain has been found to be useful in storing, not just cryptocurrency-related transactions, but almost all transaction types (Sarikaya). What makes blockchain unique is its use of a decentralized system. In most cases, data is centrally stored which makes targeting easier for cyber attackers (Sarikaya). However, blockchain utilizes a system where all the data is stored within a decentralized network of nodes, ensuring the integrity of the data (Sarikaya).
Comparatively, both blockchain and data analytics manipulate data using algorithms, yet both methods have different objectives with the data (Sarikaya). Data analytics focuses on interpreting and drawing predictions from data, whereas blockchain is used to validate and store data (Sarikaya). The Extract Transform Load (ETL) process, a subject covered in the MIS2502 course, focuses on the data extraction step of preparing data for analyzing. One of the biggest challenges associated with ETL is maintaining data quality. Data analysts struggle to sustain data quality because of errors in the data, missing data, or redundancies in the data.
Also, the course reveals that the primary method data scientists use to maintain data quality is to verify the data through sampling manually. Blockchain could be an important tool for ETL as it can sustain and restore data integrity through its data verification process. Primarily, the data is stored and verified by a vast network of authorized nodes (Sarikaya) which eliminates the necessity to verify the data manually. Blockchain could effectively restore data quality and securely store validated data, resulting in a significant increase in data analysis efficiency.
References
Buterin, D., Rosic, A., Martynov, V., Baksht, S., Ravaei, N., Wu, J., & Mitra, R. (n.d.). What is Blockchain Technology? A Step-by-Step Guide For Beginners. Retrieved from https://blockgeeks.com/guides/what-is-blockchain-technology
Sarikaya, S. (2019, January 05). How Blockchain Will Disrupt Data Science: 5 Blockchain Use Cases in Big Data. Retrieved from https://towardsdatascience.com/how-blockchain-will-disrupt-data-science-5-blockchain-use-cases-in-big-data-e2e254e3e0ab