Here are some important things for preparing for the 9th week:
Weekly quiz (you can click the highlights to get the related content)
- Please read the reading materials for Week 9 (two readings)
- Please finish the week 9 quiz before the class on 10/18 (Tuesday).
Additional extra credit task (due before the class on 10/20/2022)
- There are three steps for doing this task
- Step 1: leave your information here
- Step 2: attend a mock interview
- Step 3: fill out a survey
- Before starting, there are several things you need to pay attention
- You need to take step 1 as first, step 2 as second, and step 3 as third. Or else half of your credits will be cut off because we can not use your data
- You need to finish all of these three steps, or else half of the credits will be cut off
-
- Please find a quiet place because mock interviews record your voice and video
- Step 2 and step 3 links are also put at the end of step 1. So, if you use/click the link there in the correct order, you will also be fine.
- Having a mock interview necessarily means you need to use a laptop and give permission to get access to your microphone and camera
- If you have taken this before, please use a different email address
- You don’t need to submit anything to me, I will put the additional extra credit in your gradebook next Thursday.
- If you want to get the $20 Amazon gift card, please take this task seriously and try to get more than an 85% score.
- I will contact you once I find you got more than an 85% score. Alternatively, you can email me about your score.
- Please let me know if you have any questions.
Week 8 key takeaways
- Relational database
- CSV and Excel are flat files
- A relational database stores different data in different tables.
- Benefits of relational database
- integrity. It’s easier to maintain the integrity of data when the same item is recorded in one place only.
- flexibility. You can create different cuts to data.
- efficiency. It’s faster to retrieve and update data when you don’t have to plough through lots of redundant values.
- However, relational databases are more complex to operate and use than flat files.
- ETL: Extract, Transform, Load
- Why Do We Need ETL?
- The power of data analytics is often based on combining data from different sources
- However, data stored in different places are often formatted differently
- It can be very difficult to enforce a consistent schema
- Steps need to be considered for setting up an ETL process
- Read metadata (data dictionary)
- Choose the correct version of the data
- Set up rules for resolving other inconsistencies, duplicates, omissions, and other problems in the data and validate the data.
- ‘Big Data’ Is a Set of Technologies
- Hadoop: stores data in smaller chunks across a network on different computers (nodes).
- MapReduce: processes the pieces of data in parallel in different nodes and combines the results together.
Leave a Reply