-
Laurel Miller wrote a new post, Weekly Question #7: Due October 23, on the site Honors Data Science 1 month, 3 weeks ago
Leave your response as a comment on this post by the beginning of class on October 23. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
Laurel Miller wrote a new post, In-Class Exercise 8.2: Visualizing KPIs, on the site Honors Data Science 1 month, 3 weeks ago
Here is the exercise.
And here is the spreadsheet to complete the exercise [In-Class Exercise 8.2 – OnTime Airline Stats [Jan 2014].xlsx].
-
Laurel Miller wrote a new post, 10/16 class, on the site Data-Centric Application Development 1 month, 3 weeks ago
Hello everyone,
In class today we will work through an exercise that is very similar in format to an MIS2402 exam.
Also, please be advised that an exam 1 study guide is available here.
Exercise instructions
Zip file -
Laurel Miller wrote a new post, Reading Quiz #7: Due October 21, on the site Honors Data Science 1 month, 3 weeks ago
Some quick instructions:
You must complete the quiz by the start of class on October 21.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to sign in. […] -
Laurel Miller wrote a new post, In-Class Exercise 8.1: Identifying Key Performance Indicators, on the site Honors Data Science 1 month, 3 weeks ago
Here is the exercise.
-
Laurel Miller wrote a new post, Optional videos from assignment 2, on the site Data-Centric Application Development 1 month, 3 weeks ago
Optional 1
Optional 2 -
Laurel Miller wrote a new post, Weekly Question #6: Due October 16, on the site Honors Data Science 1 month, 4 weeks ago
Leave your response as a comment on this post by the beginning of class on October 16. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
I haven’t had much prior experience with Excel or other data-sorting softwares, so I don’t recall any times I’ve made one of these data corruption mistakes. However, after looking at the list, I do think forgetting to back-up your data as you’re working on it would be the most important to avoid. It’s a human error mistake and would be pretty terrible to loose all your work from forgetting to hit ‘save’ every hour or so after your laptop crashes.
-
I do not have a lot of experience with Excel or other softwares, but I do think that fully backing up your data before and while you’re working on it is extremely important, and failing to do so would result in a lot of headaches. In addition, I think the top error, clicking ‘yes’ before fully evaluating the ‘do you want to remove this?’ message, is also extremely important to avoid. I feel like this would be the most challenging one for me, because I often am quick to hastily dismiss messages like this when I’m working on a project.
-
I personally have made mistake #9: Copy formulas that use relative coordinates. When doing this, I got a #REF error in some of the equation cells or even miscalculations in those cells. This was caused when I didn’t lock some of the cells or columns. There are many issues when copying formulas that can cause these #REF errors. Not locking the cells is the one that I run into most often.
-
Whenever I am working with a large data file and need to make a calculation, I almost always make the copy formulas that use relative coordinates mistake. If I am trying to make a calculation that involves a certain fixed value, the relative coordinates will cause an error unless I fix the formula on that cell. Additionally, I sometimes copy and paste formulas instead of values and unless I notice that the number is wrong then it could lead to some incorrect calculations. I would say that I usually make this error once and then remember to fix it whenever I am working with excel formulas.
-
I also don’t use Excel a lot. I often use SAS or R to deal with data. For me, I think number six is very important to avoid. Because in R or other programming languages, they ask programmer to specify types, so it is very important to have an understanding in basic data types and know how to operate these data types. When I first learned how to use R, Java, and Python, I was so frustrated in dealing with these. But when I get use to these, they are very useful, and can help to prevent errors. Thus, I think it is very important to avoid miss the data type.
-
When working with Excel, I have ran into mistake #9 (copy formulas with relative coordinates) on many occasions. A common mistake among Excel users, I have frequently gotten reference errors and miscalculations as a result of making a copying mistake with a formula. This usually results from not locking the values or cells. While I have been increasingly better at avoiding such mistakes, they still come up occasionally.
-
I believe the tip on backing up data (mistake #3) is most important to remember. Backing up data prior to working with it will ensure that it is not lost or corrupted beyond recovery. The tip also mentions periodical saving with correlating version file naming, which is great organization practice.
-
I have made a lot of these mistakes over the years. One I remember specifically is “number 6: miss the data type.” When doing batch student record imports for my job at Temple, I have to be very careful when it comes to the dates such as the date of birth or date of registration since our registration system will not accept them in the wrong format type. Sometimes, I would have to re-check the excel sheet many times in order to figure out that this was the problem.
-
A mistake I commonly find myself making is #9. This is quite an easy mistake to make. This usually happens to me once or twice while working with data in Excel then I almost never make again that time working. It can also mess things up when you mean to copy a value then also copy the formula. This can lead to errors in data later down the road. A way to avoid this mistake is always check what is actually in the box after you paste in it.
-
I make the #8 mistake occasionally. When I use VLOOKUP to pull the data associated with the numerical input, I sometimes forget to put “False” for the “Approximate match” and Excel shows the data for an input not in the data range. This has caused me to think the input is in the data range while it actually is not and I would not find out until there are subsequent errors.
-
#2: I think that the most important mistake to avoid from this article is not backing up your data. A lot of the other mistakes are ones that could cause the data to be incorrect, but not saving your data can lead to a loss of all of the data.
-
I often find myself making a mistake related to #9, copying formulas using relative coordinates. All of my finance courses rely heavily on Excel, and its really easy to forget to lock a cell reference. Just recently, I was doing bond valuations for homework and to calculate the YTM, you have to calculate the coupon payments to maturity. I forgot to absolute reference the coupon rate and bond principal cells, so I copied down the first formula to my other 20 rows and they all came out blank because they were referencing other blank cells. My mistake was obvious, but I can definitely see how people fall into the same trap under different circumstances.
-
I have categorized cells as the wrong data type when trying to make a graph from the same table. I had put different dates in the cells, but Excel recognized them as integers rather than as dates, which created a nonsensical graph. Once I changed the data type to dates, I was able to adjust the graph.
-
As someone who didn’t really use excel before, I believe the only mistake I have committed is not backing up data, as that can be applied to any software. After I once lost maybe 5 pages worth of writing, i learned my lesson and I now back up my data quite frequently.
-
Because don’t have much experience with Excel, I can’t say for sure that I’ve made the mistakes mentioned in the article (at least not for that specific software). However, I believe the error of not backing up a copy of the original data is one that’s common no matter what software you use. When I save files, I would frequently overwrite the original data by giving the new file the same name as the original and saving it in the same location. This made it a hassle to start over if I realized the new data contains a lot of errors. That is why I believe this mistake is one of the most important to avoid.
-
-
Laurel Miller wrote a new post, Assignment 3: Cleaning a Data Set: Due October 21, on the site Honors Data Science 1 month, 4 weeks ago
Here are the instructions (in Word) (and as a PDF). Make sure you read them carefully! This is an assignment that should be done individually.
And here is the data file you’ll need.
Assignment 3 is due O […]
-
Laurel Miller wrote a new post, Assignment 3 video, on the site Data-Centric Application Development 1 month, 4 weeks ago
Here is the video for assignment 3 if you had any troubles.
-
Laurel Miller wrote a new post, In-Class Exercise 7.2: Finding Bad Data in Excel, on the site Honors Data Science 1 month, 4 weeks ago
Here is the exercise.
And here is the dataset you’ll need [Vandelay Orders by Zipcode.xlsx]. -
Laurel Miller wrote a new post, 10/9 class, on the site Data-Centric Application Development 1 month, 4 weeks ago
In class today we will:
Talk about what to expect in the coming weeks
The JS Quiz
Exam 1Lab exercise ( Instructions , Zip file )
Please remember to upload your completed assignment to the […]
-
Laurel Miller wrote a new post, Reading Quiz #6: Due October 14, on the site Honors Data Science 2 months ago
Some quick instructions:
You must complete the quiz by the start of class on October 14.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to sign […] -
Laurel Miller wrote a new post, In-Class Exercise 7.1: How Data Gets Dirty, on the site Honors Data Science 2 months ago
Here is the exercise.
-
Laurel Miller wrote a new post, 10/7 class, on the site Data-Centric Application Development 2 months ago
In class today we will (re)introduce jQuery. As you now know, we’ve been using a little bit of jQuery for some time now.
PowerPoint
Intro jQuery zip
Also, a study guide for the upcoming Java […] -
Laurel Miller wrote a new post, Discussion Question #4: How do you do it all???, on the site Co-Op Experience 2 months ago
It’s hard to balance your schoolwork and your internship. Tell us how you are handling it and what tips you have for keeping it all together.
-
Laurel Miller wrote a new post, Weekly Question #5: Due October 9, on the site Honors Data Science 2 months ago
Leave your response as a comment on this post by the beginning of class on October 9. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
https://www.kcra.com/article/stockton-basic-universal-recipients-data/29354243
Again, as a journalism major, any news coverage that uses data to uncover stories, inform stories, and report stories is of interest to me. I think this story is interesting because research is a commonly covered type of story, but the data reported in this story is the main aspect of the story, and acts as its own sort of reporting for journalists. I feel like news articles like this, focused on reporting and explaining results of data, is kinda like metadata? -
https://www.fbi.gov/news/pressrel/press-releases/fbi-releases-2018-crime-statistics
I am preparing a research proposal for a panel data class for my master’s curriculum related to decarceration outcomes. This article from the FBI notes highlights from its recently published 2018 crime statistics, including the decrease in both violent and property crimes, estimates of losses and property damages, and arrest records. Something I found particularly interesting is that the article cautions the reader in interpreting crime rankings for cities and states created by entities that use these FBI data, as “rough rankings provide no insight into the numerous variables that mold crime in a particular town, city, county, state, tribal area, or region”. In other words, many organizations that conduct their own analyses of these crime data fail to account for the nuances of crime and why it happens, and come out with reports and rankings that are too simplistic. Given that highly specific crime data are classified and tend to be unavailable to the general public, this article was useful in helping me understand what kind of data are available to me for my project.
-
https://www.nytimes.com/interactive/2019/10/01/us/politics/democratic-fundraising.html
This article looks at the summer fundraising efforts by the Democrats who are in the 2020 presidential race. I have always been interested in politics and am trying to keep up with the election as much as possible. I like this data visualization primarily because it is simple and clear. Often, information about campaign finance and presidential races in general are bogged down in jargon. I like that this infographic conveys individual contributions from the FEC and campaign announcement sources through the use of color. -
http://www.startribune.com/insiders-drive-most-cyber-security-breaches-according-to-study-for-minnesota-s-code42/562174112/
This article talks about employees taking data with them when they switch jobs and how this is the biggest threat to an organization’s cybersecurity and data integrity. I found this interesting since I am taking a cybersecurity class and learning how difficult it actually is to keep data secure. I also had to do a lot of trainings during my summer internship about confidentiality that I always thought were dramatic or unnecessary, but reading articles like this and taking the MIS security class has shown me how important employee practice is in maintaining confidentiality and what could go wrong if even one employee in a gigantic company doesn’t follow protocol. -
As a statistics major, it interests me to read about different metrics and how they relate to sports. This article goes into how the NFL saw a lot of unpredictable outcomes in week 4. Based on the data that 538 has collected, their elo rating system picked over half the games wrong. This is interesting to me as I am a fan of football as well as applications of data driven models to real life scenarios. Looking forward, 538 expects the unpredictable trend to continue as they went back and used their model on past data and found some inconsistencies over time.
-
The article outlines how IPOs, which in the beginning of the year returned astronomical performances (looking at you Beyond Meat), have considerably cooled off due to late cycle economic fears and an underwhelming result from recent IPOs including Lyft, Uber, and Slack. According to Goldman Sachs, IPO stock performance for the year is at its worst levels since at least 1995. I find this particularly interesting because some of the hottest/most glamorous technology startups have already IPO’d or plan to do so in the near future, so it will be interesting to see how these companies shift their exit strategies in the wake of the current macroeconomic landscape.
-
This article displays the true issues that GE is facing in 2019 with its debt obligations. I particularly chose this article because I am an avid investor and find it interesting how General Electric is trying to restructure and become successful again. General Electric just put a new CEO in place and I think he is doing a great job trying to reduce debt. By freezing 20,000 pensions, the debt that GE has to pay out will rapidly decline. I think this is an appropriate step for General Electric’s future.
-
The MLB’s Postseason Teams Were (Mostly) Obvious From The Start
I found this article interesting as a sports fan. Everyone loves to hear the story of the underdog coming out of nowhere and winning everything. This MLB season seems to ruin that story. The data all season long pointed towards the same teams making the postseason. This almost never happens in sports with almost all teams expected of making the postseason doing so. This MLB season is an outlier in the normally unpredictable world of sports.
-
https://www.scmp.com/news/hong-kong/politics/article/3030862/police-battle-protesters-they-set-streets-ablaze-central
I have been paying attention to news from China and Hong Kong recently, and I saw this article a few days ago. Some of the data in this article shocked me. I did not expect that this is the 17th consecutive weekend of anti-government protests in Hong Kong. The number of injuries and arrests mentioned in this article is a heartbreaking data. Also, there are more than 200 shops and public utilities have been damaged in the unrest. However, I think violence cannot solve the problems, I hope that the problem can be solved soon. -
This article contains data regarding how Americans feel about impeaching Trump, broken down by day and by party. The way the data is broken down by day is very interesting, because you can see how certain events changed opinions. This is interesting, and very pertinent due to the political climate and upcoming elections.
-
https://www.entrepreneur.com/article/340297
This article explores how data science is transforming the insurance industry. The application of big data and machine learning has helped customers find quotes, buy coverage and file claims much quicker. Unstructured data, such as driving behavior, can also be collected and analyzed for more customized pricing and coverage. It also helps prevent and detect insurance fraud, which can bring overall premiums price down. It looks like data science can benefit both sides of the insurance market.
-
https://www.theguardian.com/news/datablog/2019/sep/18/vaping-e-cigarettes-high-school-students-total
With the recent news of teenagers being hospitalized due to vaping, this article breaks down the data involving smoking/vaping and the teenage population. I think that the topic is something discussed often and the author did a great job portraying the data. I found it interesting because the data was broken into ages, races, and different nicotine products. It showed that the percentage of high school students vaping has jumped from 12% to 21% in the last year and that white students are the most common e-cigarette users. The growing usage of nicotine among the youth combined with the vaping-related illnesses in the news, the government is considering some form of ban.
-
This is an article about data surrounding Trump’s possible impeachment. I think that it’s very interesting to see the data reports of how many Americans support and don’t support this, especially along a timeline of events regarding the case. I think that this article is relevant to all of us because it has to do with the future of our president. It is also relevant to our class at the moment because it includes an example of data visualization.
-
This article illustrates the number of precedents that were overturned either “unanimous[ly], narrowly decided by one vote, or in between” by different chief justices of the supreme court. I have a curiosity for law, so it’s interesting to see how the court may overrule prior precedents under Chief Justice John Roberts. From the graph, we can see that several of the precedents that were overturned under his court were narrowly decided by just one vote, a significant change from past chief justices, whose precedent-altering cases were often decided by a large consensus. This indicates the possibility that several precedents in the future could be controversial and undermine the supreme court’s image as a “neutral arbiter.”
-
Aaron Rodgers Has Been Magic In The First Quarter — And A Pumpkin In The Fourth
This article piqued my interest because of its headline; it calls Aaron Rodgers a “pumpkin”. While used to describe his poor fourth quarter performance this season, it is both hilarious and seasonally appropriate. I knew Aaron Rodgers was struggling this year, but now I have a little more insight into why. Aaron Rodgers used to be considered clutch. What happened? -
As an advertising major, sometimes I wonder how the effectiveness of advertising can be measured and this article shows how different types of ads can be more efficient than others. It had great infographics and data visualizations like we studied in class.
-
Which Democratic Presidential Candidate Was Mentioned Most In The News Last Week?
This article looks at how often each Democratic presidential candidate has been mentioned or discussed on a few of the main news networks (CNN, MSNBC and Fox News.). Joe Biden has been mentioned the most often, which is not surprising as he is the frontrunner for the nomination. Bernie has also been mentioned 8% more recently. This is relevant to me as a political science major, as I have been following the race fairly closely.
-
-
Laurel Miller wrote a new post, Assignment 2 video, on the site Data-Centric Application Development 2 months ago
Here is the video for assignment 2 if you had any troubles. The optional problems video will be posted shortly.
-
Laurel Miller wrote a new post, 10/2 class, on the site Data-Centric Application Development 2 months ago
Hooray it’s lab day! Today’s challenge will focus on functions and events in JavaScript.
Links to all materials are as follows:
Instructions
Zip File
Please remember to upload your completed assignment to t […] -
Laurel Miller wrote a new post, Reading Quiz #5: Due October 7, on the site Honors Data Science 2 months, 1 week ago
Some quick instructions:
You must complete the quiz by the start of class on October 7.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to sign in. […] -
Laurel Miller wrote a new post, 9/30 class, on the site Data-Centric Application Development 2 months, 1 week ago
Hello everyone!
In class today we will introduce what are (perhaps) the most important topics of the semester: JavaScript functions and objects.
Here’s the agenda for class:Quick overview of events coming u […]
- Load More
I usually wear a smart watch, which tells me plenty of KPIs relevant to my daily activity and health. One example is the step count it provides. This is specific and measurable, since the watch counts each step I take while wearing it. It is achievable, since I can walk more to meet different step goals I might have. It’s relevant, since being able to track steps and how much I move throughout the day can have an effect on my overall health. Finally, it is time-variant, since I am able to see step counts across any span of time through the app my watch connects to, and also get information such as weekly averages to compare.
I
I use a sleep cycle app that calculates a weekly average of how much I slept each night. For the SMART criteria, it is specific and measurable in the fact that it measures how many hours and minutes I spent asleep each night. It is achievable in that I can use the data to set new goals for sleep averages each week. It is relevant, as sleep is a big part of my daily life and directly affects my daily performance/energy levels. And it is time-variant, not in just the calculation, but in that I sleep at different times of each day and notice different trends with each night of the week’s sleep data.
One of the KPI’s that I use on a regular basis is checking my class scores on Canvas. This KPI indicates how well I am doing in each individual class that I am taking. This KPI is measurable, actionable, and relevant. For example, each class scores out of a 100, therefore, it is measurable and you can see your performance out of this max of 100. Next, it is actionable because on each canvas page you can see a full breakdown of where your grades may be negatively impacting the overall grades. With this said, you can improve on those specific categories that you are lacking in. Lastly, it is relevant because as a full time college student, my GPA and grades are one of the few indicators of my raw academic performance.
On a daily basis, I use wear a Apple Watch. One specific KPI it measures is my heart rate. It is specific and measurable because it measures my heart beats per minute while I am wearing it. On the achievable side, I can look at my heart rate and try to bring it down by lowering my stress and anxiety levels. Similarly, it is relevant because it is important to my health to maintain a good heart rate, and I can monitor it through my watch. Lastly, it’s time-variant because it shows my heartbeat throughout the day; at any point, I am able to see what my heart rate is at that moment.
One thing I track in my personal life is how much money I spend. I can measure this through my bank account and act on it by changing my weekly budget or adjusting my spending habits. I usually review this every week and it is very relevant to my life because I need to pay rent, bills, etc. and have to be smart about my personal finances.
Similar to what Madison said, I use hours slept as a KPI for my personal life. I track this through the Samsung Health app on my phone. It is specific and measurable as sleep is a necessity for people and can be easily tracked if you sleep on a regular pattern. It is achievable since I can change my night routine to get ahead and allow me to fall asleep on schedule. It is relevant since for me especially if I do not get 7.5 hours of sleep I am usually tired the entire next day. It is time variant as I can go back and see how my sleeping patterns have changed over time to see how I should alter things to allow me to sleep better.
I wear a smart watch by Garmin every day, which I use to track my steps, floors climbed, and heart rate. A KPI I regularly track via this watch is my heart rate, particularly when I am running or exercising in general. The watch tracks my heart rate in real time, so I can tell which runs were more difficult than others based on my average, highest, and lowest heart rates. It is specific and measurable because it gives me my heart rate in beats per minute over the course of my day. It is achievable because I can use the data it gives me on my heart rate and my experience on each run to learn more about why I found a run particularly difficult or easy and how I can push myself in the next run. It is relevant because I love running and like to improve as a runner.
Currently, I am using a vocabulary app to help me study new vocabularies for GRE exam. It is specific and measurable, since it counts how many words I remembered each day and how many hours I spend to remember these words. It is achievable in the fact that I can set new goals for how many days I want to spend to finish one list of vocabulary. It is also time variant as I can go back and see how many words I remember and forgot each day, each week, and each month, and see my own “Ebbinghaus Forgetting Curve”. Finally, it is relevant because it can help me to know how my study goes.
I like to track my meals as a KPI for my personal life; it is usually a good gauge of how busy/healthy I am. I usually just count meals, which is specific and measurable. It is achievable and relevant, as eating is an everyday necessity of life. It is also time-variant because I can track meals per 12 hours, day, or few days.
I wear a Fitbit on the daily which tracks several things such as steps walked, hours of sleep, and calories burned. All three of these things are relevant to my everyday life and health as well as measurable and there are achievable goals that I can set for myself.
I like checking the weekly report from the iPhone’s screen time feature, which shows me how many hours I log using my phone as well as other useful data that can function as KPIs. The reports are very specific and the time is clearly measured, as it even shows which apps I spent the most time on, Its both achievable and relevant because I set goals to limit my phone usage so that I can be more productive doing other tasks such as reading and exercising.
My phone tracks how much time I spend on different apps every week, and, although I don’t always make the effort to reduce my screen time significantly, it does make me more aware of how much time I am spending on my phone. A KPI I could set is to reduce my screen time down to an hour a day over a month long period. My phone could track it and time spent is quantifiable.
A KPI I use in my everday life is the practice exams scores for actuarial exams. The website I use to study has charts to show your scores. This is specific and measurable because my scores are an exact measurement. It is achievable because I can try to get higher scores every time I take an exam. It is relevant because I need to pass many actuarial exams for my future career. Finally, it is time-variant, because I can look back at my scores for the week, the month, or any other amount of time.
One example of a KPI that I use regularly is the scheduling application my employer uses to track hours. This is specific and measurable because the amount of hours and times are precise. It is also achievable because I can request more hours or ask for off, therefore changing my scheduled hours. It is relevant because it impacts my time commitment and paycheck. Finally, it is time-variant since I can see how many hours I worked in the past day, week, or month.
One example of a KPI that I use on a regular basis is the number of sessions tutored per working hour. The number of sessions is specific and easily measurable. It is achievable because I can be more active in taking walk-in students to increase the number of sessions. It is relevant because number of sessions tutored is part of my overall performance review. Also, this metric is time-variant because I can measure the average number of sessions per working hour over a month, semester or school year.
A KPI that I use regularly is my phone’s ability to track the amount of time I spend looking at its screen. It is specific and measurable because of its detailed analysis of the number of hours and minutes I spend on individual apps. It is achievable because I can alter the amount of time I spend on my phone in order to get different results. It’s relevant because screen time has a large impact on my productivity. It is also time-variant because I can track this data over a day or several weeks.
I have a fitbit and one of my favorite kpi’s it tracks is my sleep. It tells me how often I wake up at night, and how much time I spent in each sleep cycle. By analyzing the quality of my sleep on multiple days it allows me to watch my habits so I can see what improves my sleep.