A note about that first reading: It’s a bit dated and Hadoop has advanced since that article. Much of the focus in the open source community has been on side projects tied to Hadoop. One common theme is that analytics and better user interfaces are being layered onto Hadoop. Most companies would use Hadoop via companies like Cloudera and Hortonworks. These companies package Hadoop and sell services and support. To see what I mean re Hadoop and its other projects see the primary Apache page. For our purposes, we’ll keep Hadoop high level, but in the data science department, internship interviews etc you may want to know about projects like Hive, Cassandra, Pig and Spark.