Michael Black - Michael Black's FOX MIS E-portfolio

MIS 2502 – Big Data

MIS 2502 research project – Big Data

As the price of storage has dropped exponentially, the amount of data that organizations are willing to retain has grown at the same scale. Data sets become “Big Data” when they become so massive that they cannot be handled by a conventional database system. (Dumbill) The system that manages the Big Data requires so much storage space, memory, processing power, and bandwidth that it requires clusters that distribute storage and processing. Hadoop, developed by the Apache Software Foundation, is a popular example of a commonly used open-source software framework to manage Big Data.

In MIS 2502, we learned about how organizations have many sets of data that serve different purposes. Transactional databases store real time data, a process called online transaction processing (OLTP). As we covered in class, data is traditionally moved from the a transactional database to an analytical database (OLAP) where it can be analyzed using extract, transform, load (ETL). In many ways, Big Data builds upon this process by integrating the transactional database operations and ETL into one system, like Hadoop. Hadoop can take in large amounts of messy, quickly changing data and convert it into something useful in real time by utilizing tremendous processing power. The old method of simply bridging OLTP and OLAP using ETL created “snapshots” that could become outdated and would require periodic reconstruction. When OLTP and OLAP are separated, activities like cluster analysis were only as recent as the last OLAP refresh. With Big Data, you can potentially watch visualized clusters form from minute to minute as thousands of transactions are processed by the Big Data system.

The analytical capabilities of Big Data grow with the progression of Moore’s law, increasing processing power and memory density. New technologies that increase the density of hard disks and reduce the costs of solid state drives also fuel Big Data’s capabilities. One of the benefactors of this growth is the incipient field of genomics. (Feldman) The total cost of sequencing one human’s entire genome has dropped by orders of magnitude ($95m in 2001 to about $1k in 2015) over the past two decades, and comparing gene associations with human disease traits between multiple genomes is a job perfectly suited for Big Data.

Works cited and referenced:

Dumbill, Edd. “Defining Big Data.” Forbes. Forbes Magazine, 07 May 2014. Web. 23 Apr. 2015. <http://www.forbes.com/sites/edddumbill/2014/05/07/defining-big-data/>.

“What Is Big Data?” What Is Big Data? SAS Institute Inc., n.d. Web. 23 Apr. 2015. <http://www.sas.com/en_us/insights/big-data/what-is-big-data.html>.

Henschen, Doug. “Big Data Debate: End Near For ETL? – InformationWeek.” InformationWeek. UBM Tech, 3 Dec. 2012. Web. 23 Apr. 2015. <http://www.informationweek.com/big-data/big-data-analytics/big-data-debate-end-near-for-etl/d/d-id/1107641>.

Feldman, Bonnie. “Genomics and the Role of Big Data in Personalizing the Healthcare Experience.” O’Reilly Radar. O’Reilly Media, Inc., 23 Aug. 2013. Web. 23 Apr. 2015. <http://radar.oreilly.com/2013/08/genomics-and-the-role-of-big-data-in-personalizing-the-healthcare-experience.html>.

Michael A. Black

MIS 2502 – Big Data

MIS 2502 research project – Big Data