Lawrence Dignan

  • Here is the exercise.

    And here is the Excel workbook you’ll need [Pew Story Data (Jan – May 2012).xlsx]

     

     

  • Some quick instructions:

    You must complete the quiz by the start of class on March 27, 2017.
    When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to sign […]

  • Here is the exercise.

    And here are the workbooks [2012 Presidential Election Results by District.xlsx and Portrait 113th Congress.xlsx]

  • Data integration

    GOP Analytics

    Dashboard best practices

    The One Skill You Really Need

  • An interesting read on how Foursquare is building its dashboards and analytics out for retail. Worth checking out.

     

     

  • Leave your response as a comment on this post by the beginning of class on March 27, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]

    • Grade Point Average (GPA) is an example of a KPI for students to see how well they are performing in school. It is specific and measurable, as it is on a 4.0 scale, and each class (depending on the credits) factors into the score. It is achievable, as students can increase or decrease their scores by doing better or worse in class. It is relevant, since having a higher GPA causes the student to have more chances to get a scholarship or job opportunities. And it is time-variant since a student’s GPA will change each time a teacher enters new grades (by semester or class schedule).

    • An example of a KPI is the Virtual Wallet app for PNC which tells me my account balance.
      Specific and Measurable: It tells me the precise amount of money I have in my account
      Achievable: I can deposit more money for a specific goal.
      Relevant: It determines whether or not I can withdraw money and whether or not I should deposit money if there isn’t enough funds.
      Time-variant: I can look at my balance over days, weeks, etc.

    • My fit bit telling me the number of steps I do in a dap is KPI. Steps are specific and measurable because it is a precise measure of the number of steps I took. It is achievable because I can set my goal for the number of steps I want to take in a day and it will vibrate once I achieve my goal. It’s relevant because number of steps taken in a day can be compared to how active or fit I was that day.

    • MyFitnessPal is an app I use to count daily calories.
      Specific purpose – to ensure I meet standard calorie goals for the day
      Measurable – uses exact calories for counting
      Achievable – I have a specific caloric intake goal for the day
      Relevant – it keeps my focused/on track with greater health goals
      Time Phased – it’s a daily process.

    • A KPI that is incorporated into my lifestyle are the metrics displayed on a treadmill at the gym, specifically distance, time, and calories burned. These are measureable becasue each can be represented as a numeric digit (ex. 0.09 miles traveled, 2:20 minutes of use, and 230 calories burned so far). The metrics are achievable because i can alter my pace to return metics more suitable to my workout plan. The metrics are relevant because distance is directly related to treadmill use since users care to know how far they have ran/jogged/walked to measure exercise milestones and use a uniform measurement for visiting other gyms and using different machines. Lastly, the metrics are time based because health improvements are a daily process and correlate to frequency and severity of exercising.

    • My alarm clockis an example of a KPI. It’s specific and measurable because I can set spcific alarm time, like seven hours later. It is achievable because it rings at designed time points. It’s relevant it determines my sleeping time. It’s time-variant because the alarm time in weekends is later than in weekdays.

    • An example of a KPI that I use is called sleep cycle. It is specific and measurable because it measures the exact amount of time that I sleep and it wakes you up after you finish cycles of rem sleep. It is achievable because I can choose what time I want to sleep and what an approximate time that I would like to wake up. It is relevant because waking up during rem sleep will make you wake up groggy. And it is time-variant because you it logs the amount of sleep you get.

    • The pedometer on my phone that I use through the app My Fitness Pal is an example of a KPI to see how many steps I have taken over a period of time. It is specific and measurable because you can set your goal as the standard 10,000 steps or increase it based on how active your lifestyle is. It is achievable because people should walk on average 10,00 steps per day, but a person can increase the amount of walking they are doing to meet their goal. It is relevant because the amount of physical activity a person does can affect their health in a positive or negative manner. It is also time variant because you can look it over daily, weekly, or monthly.

    • A key performance indicator used by all students is the Grade Point Average. This is a great example of a KPI because he GPA covers all of the SMART criteria:
      Specific- a broad spectrum to valuate an individual’s grades
      Measurable- distinct metrics to approximate one’s grades overall
      Achievable- depicts where you score and clearly defines goals
      Relevant- stays up to date and summarizes your overall grades
      Time Phased- changes every semester

    • My phone telling me how much power in the battery I have left is a form of KPI I use on a regular basis. The color and image of the battery icon is specific and measurable. Charging my phone for more battery life would be achievable. Lastly, It’s relevant to how much use I can get out my phone in a specific timeframe.

    • A form of KPI that I use on a regular basis is the Dow Jones Industrial Average (DJIA). It has a Specific purpose of averaging a number of stocks and consolidating that into a single amount. The DJIA is measurable, as it is a number that measures the average performance of the top companies in our economy. It is achievable, as the it has a start amount and is measured based off of that each day. It is relevant to the success of our economy, though it is an average so it doesn’t necessarily measure each company, but a multitude of them. It fluctuates exponentially during market operating hours.

    • An example of a KPI I use in my daily life is grades in my class. depending on my performance I receive a grade out of 100. I can use that measurement to gauge how well I am performing in a class. This KPI lasts until the end of the class, it is achievable and a measure of success.

    • I think a cool KPI to talk about is within the sports industry. Scouts are constantly trying to identify key performance indicators to evaluate players. In basketball for example, metrics like points, assists, rebounds, turnovers, shooting percentages, and minutes are all KPIs to evaluate how well a player is performing.

  • Here is the exercise.

    And here is the spreadsheet to complete the exercise [In-Class Exercise 8.2 – OnTime Airline Stats [Jan 2014].xlsx].

  • Leave your response as a comment on this post by the beginning of class on March 9, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]

    • I have not been in the position to make one of those mistakes so I think the most important one to avoid is Number 3: Start working on the database without doing a full backup first. If there is saved backup then if something goes wrong there is a least a backup you can rely on. That way you do not lost something that you or others have put in hours of work on. It may take an extra minute or so to do it but its better to be safe than sorry.

    • The most common mistake that I make when manipulating data AND the most important mistake to avoid in my opinion is to back up data. On many occassions i have forgotten to back my data up and corrupted the file due to incorrect manipulations, then I would have to restart from square one when i could have backed up each step of the way to save time and precious insights. Backing up is a must every step of the way, especially when the data contains vulnerable data or if the data does not belong to you. Like Anthony said up top, better safe then sorry. Only takes a minute so make this step instinctual.

    • Fortunately, I have never had the opportunity to make these kinds of mistakes while working with data. However, I think the most important mistake to avoid is Number 1: Click “yes” without carefully evaluating the message that says “do you want to remove this from the server?.” Although not with data, I have clicked “yes” on dialogue boxes that pop-up on other sites without reading what it says, and it has caused major problems — deleting files, closing servers, etc. Without reading the message properly, files may become corrupted, or altogether destroyed. Taking a bit more time in the short-run, and being careful, will save a lot of time and effort in the long-run. As Ben Franklin was fond of saying, “Haste makes waste.”

    • I have accidentally sorted the data in an excel file and not included all of the columns. Luckily I noticed soon enough that I could use the back arrow to correct my mistake. Doing something like this is really easy to do when your quickly clicking around, so I always make a back up file.

    • I’ve never made any of the mentioned mistakes, but I think the most important to avoid is #6, missing the data type. This one seems to be the trickiest. You could make an entire spreadsheet, thousand of rows and columns deep, and everything seems fine, but using the wrong data type for just one cell could ruin the whole sheet. For example, using string when it is meant to be date/time.

    • Sometimes I will forget to check the data type, and it is hard to find out what is wrong with the data. So I think it is important to avoid #6, Missing the data type. Especially with dates and integers. Because it is hard to tell the difference and it can ruin the whole data.

    • I have forgotten the data type before, and nothing happened-no data would sum! It was very annoying and it took a great deal of time to figure out. I would argue, however, that keeping a clean backup at all times is the most important thing, because if you don’t have a lot of time to dedicate to this issue, at least you can start at the beginning. Sometimes redoing it will reveal to you what you did wrong the first time.

    • I haven’t made these mistakes, and I think the third mistake “start working on the database without doing a full backup first” is the most important to avoid. If you made some mistakes other than the third one, you could waste some time to fix the problem, but if you started working on the database without doing a full backup, the mistake might cause irreparable loss, and you might not be able to start over again.

    • I have done Number 10, “Open a CSV file directly into Excel”, many times without even knowing it could cause problems. I suppose in those prior situations the data I was working with wasn’t large enough where scientific notation kicked in, but it is good to know this tip to avoid in the future.

    • I have often opened a CSV file into Excel, I was not even aware that doing so could cause serious problems. Since my past experience with excel has mostly just been using it for small in class assignments I guess there was not much that could have gone wrong. Going forward I will need to be more careful about how this could affect my data.

    • I have used Excel many times, but I have been lucky to not have experienced any of these mistakes. I think the most important to avoid is Number 3: Start working on the database without doing a full backup first. If you forget to back up your work before using it, you risk losing it. You also have to be careful if you mess up because if it is not backed up, you might not be able to start over.

    • A common mistake I have made in the past is to not back up the data that I was working with, and at times create different versions. This has particularly happened to me on excel sheets for finance classes, such as intermediate corporate finance, in which I would work on a pro forma without saving my progress at various stages. That way, if I made a mistake after a certain stage, I could access the last stage and begin from there.

    • Step number 3 is the most important step. Step number 3 is to start working on the database without doing a full backup first. Having a full backup is always important because if something happens where the file doesn’t save, you will still have the file available to you. Although this never happened to me with an excel file, it has happened to me on word and I had to start the file over again because I did not make a backup of the file.

    • I have not made any of these mistakes before but I believe that #6 would have the most impact since it is quite tricky to catch when dates are integers or integers have dates when you have a huge data set. You would have to implement clearly defined columns and know the data that you are working with very well in order to avoid this type of mistake.

    • A very common mistake I have made before is number 3: start working on the database without doing a full backup. Often I start work right after receiving the job, and I always assume autosave would save any accident happen when working on the database. However, sometimes the system does not save all of the data you have, and in order to ensure you don not lose your data you have to fully backup the database as well as do checkpoint save along the way!

    • I do not have much experience working with such data mentioned in the article, therefore, I have not made one of the mistakes mentioned. However, I believe the worst mistake would be to not backup your workbook before making edits to it. If you make a monumental mistake in the beginning and don’t realize it until later, all of your work will be compromised because you didn’t save it. Backing up your work is probably the biggest mistake you can make, whether you are working on something relating to data or not.

    • I have not made any of these mistakes, but I think the most important one to avoid would be working on a database without a full backup. Without running constant backups, you can run the risk of losing hours of work. Not backing up a database can also cause it to be out of date with relevant information because it has not been backed up recently.

    • Of the common mistakes made in Excel, the most common one I have committed is number 9: copy formulas that use relative formulas. Though I do not use Excel that frequently, this is the problem that I run into the most. This problem occurs when I try to copy a formula in another cell and it doesn’t produce an actual value, instead it gives an error in the cell or group of cells. I often forget to use absolute formulas and have to go back after the error message occurs, or it gives a number that isn’t close to what I’m attempting to calculate.

    • For number 3, I’ve done this a lot in the pre-autosave days. If I’m in a rush the last thing I will think about doing is setting up a backup to a database I’m working on. If it’s not a personal database, I can’t see anyone making this mistake. It’s just too risky not backing up all the data that’s being changed.

    • Luckily I have never made any of the mistakes listed in the article, but I think the most serious mistake is #3, doing work without a full backup. I’ve never had this error occur with data, but with basic assignments using powerpoint and word there have been times thatI haven’t saved a file only for my computer to shut down or for me to exit out of the tab accidentally, ultimately losing everything I’d spent hours working on. While most of the mistakes have some solution or clean up that may be enacted following the time the incident occurred, losing material without doing a backup is essentially hitting the restart button on any work you’ve put work and effort into accomplishing.

    • A few semesters ago, I took digital analytics which is a required course for advertising majors on the account management and media planning track. Within the class we worked closely with excel and Google Ad Words. I found committing a few of the errors mentioned in the article. One that i would commit constantly was copying the wrong formulas, in return i would get the wrong values. I’ve also accidentally delete some of my data multiple times and either did not notice until it was too late or picked up on it and had to start all over.

    • I have been doing a lot of work with Excel over the past month and I have found that it is easy to make mistakes. The problem is that if you do not go over your work with someone else you may not always see the errors. I think that it is important to explain your analysis to another person in simple language. I have been using COUNTIFS, SUMIFS, VLOOKUPS AND IFS functions. This are so awesome and can be confusing when you over thing them. A specific change that I have had is when I want to count groups that match a specific criteria it can be ease to over look or click on an adjacent cell. This was a lesson I had to learn. My solutions is ; I do the analysis, sit the project down and then I explain it to another person to help give me clarity while checking my work.

    • I think the most important error to avoid is simply reading what questions ask instead of just hitting yes. Too many times people are trying to figure out how to do something and just hit yes or allow without reading what they are doing. This can lead to inaccurate calculations and can mess up your data. It is very easy to take the time and read what pop ups say.

    • Working on something without creating a backup is one of the worst things you can do. There are so many factors that can play into something like that going wrong. its best to always create a backup.

  • Some quick instructions:

    You must complete the quiz by the start of class on March 20, 2017.
    When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to […]

  • Performance indicators

    Tyranny of success: Non profits and metrics

    Tracking health

    Wearable tech

  • Here is the exercise.

    And here is the dataset you’ll need [Vandelay Orders by Zipcode.xlsx].

  • Some quick instructions:

    You must complete the quiz by the start of class on February 28, 2017.
    When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to […]

  • Here are the instructions (in Word)   (and as a PDF).  Make sure you read them carefully!  This is an assignment that should be done individually.

    And here is the data file you’ll need: VandelayOrders(Jan).xlsx.

  • Leave your response as a comment on this post by the beginning of class on March 6, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]

    • Antonio Brown Is A Good Reminder Not To Obsess Over Combine Results

      Yesterday, Antonio Brown, of the Pittsburgh Steelers, was named the highest paid wide receiver in NFL history. This article uses a data set to break down how well Brown performed at the NFL combine, which will be occurring this week in Indianapolis. Brown participated in the combine in 2010 and was in the 30th percentile for wide receivers in his 40 yard dash, but was average or below average for the rest of the events of the combine. Brown was drafted by Pittsburgh in the 6th round of the draft because of his underwhelming performance at the combine and being an undersized wide receiver. However, on the field, Brown has very few noticeable weaknesses, which is why the Steelers organization was willing to invest so much money in this diamond in the rough. I find it interesting that more and more data is proving the NFL combine to more obsolete.

    • http://www.papersearch.net/thesis/article.asp?key=3491385
      The effects of cardio (specifically Pilates) were tested in this study on the elderly to see if there is an increase in balance, strength, and flexibility. The study was done over a period of 8 weeks, where 10 volunteers exercised 60 minutes a day, three times a week. After each session, balance tests were done to measure the affects of the program. It was found that the program significantly improved the balance of the participants, and resulted in fewer falling incidents. As a music future therapist working with the elderly, seeing the affects of movement and body exercises will change my session plans to include more movement.

    • http://bleacherreport.com/articles/2695967-nfl-combine-2017-results-tracking-40-times-bench-press-and-all-drills
      NFL COMBINE 2017 RESULTS: TRACKING 40 TIMES, BENCH PRESS AND ALL DRILLS

      The article above displays the results of the NFL Combine 2017, which took place on March 3rd. I typically don’t watch this part of football because I don’t know many of the players, but since going to Temple, I was looking out for the few players expected to participate in this year’s combine to watch their performances. The data in the article includes: the player’s name, school, 40 yrd dash time, and their bench, vertical, and broad. This type of data is necessary to help decide who the NFL will choose for their draft picks.

    • https://www.washingtonpost.com/graphics/national/united-states-of-oil/

      I found this article interesting because it takes a very popular topic on the news right now that is confusing to a lot of people and breaks it down using data visualizations to help readers better understand what areas of the United States would be affected most by Trump’s new gas and oil regulations. It maps out geographically where the most oil rich areas of the US are and where oil production is growing fastest. This article also directly compares natural gas to oil in all visualizations.

    • http://www.sciencemag.org/news/2017/03/dna-could-store-all-worlds-data-one-room

      This article discusses the world’s predicament of storing all the data we now collect. In the last two years, the article notes that humans have collected more data than in all of the preceding human history. However, since 2012, data scientists have been storing data in a nontraditional way… embedded into DNA. A single gram of DNA can store 1.28 petabytes of data. DNA also does not degrade over time like cassettes and traditional hard drives. I think it’s fascinating to talk about the long term challenges facing the amount of data we collect. It’s a real problem, but it is good to know there are smart minds already working on the issue. One idea not addressed in the article that I would be interested in hearing more about is the environmental impacts of storing data within DNA. Do DNA data storage facilities require less energy? Are they cheaper or more expensive to maintain? Hopefully these questions will also be worked out as research in this space evolves.

    • Data visualization GIFs


      This is a cool data visualization GIFs that can get message across pretty quickly. It shows political polarization in the American public from 1994 to 2015. This form of data visualization is interesting and make audience easy to remember. I actually do not care this topic at all, but the way of representing attracted me. So I think if every storyteller can make their data vivid like this, data science will never a boring subject.

    • This article on the New York Times about the U.S. Labor, Housing Markets Data Underscores Economy’s Stamina. I found this article interesting because discuss how data about the housing market, where prices are showing steady and stable growth, indicate stamina and resiliency in employment within our economy. This is interesting to me because it makes me curious as to the direction of causation. Do these two things really affect each other and can we make inferences of employment activity simply based on housing market growth. Just because the housing market is strong does not necessarily mean it is because of strength in employment or vice versa. A top segment of people could simply be doing well and investing in more homes, or developers could be buying properties in bulk. These two things, to me, do not exactly seem to have direct causation, and I would like to look at the data to check it out.

    • https://www.forbes.com/sites/bernardmarr/2015/09/08/4-ways-big-data-will-change-every-business/#380f52422729
      This article talks about how big data now can affect every business in 4 ways. I found this article interesting because how big data can impact a business inside/outside their business and a lot of business do not realize that. Every business can gather data and have access to different types of data, and this can be an asset for the business. Because companies can rely on big data to learn more customer insights, and then improve customer experience. Companies also can use big data to improve their internal performance by analyzing data for each transaction process as well as use big data to find out more information about what makes a good CEO for HR.

    • How to Boost Your Career in Big Data and Analytics

      As someone considering an MIS major and future in the data science industry, this article put together a majority of the information I have been looking for, including a beautiful infographic. The most interesting part for me was the stat that most data scienctists (92%) have an advanced degree and 48% obtained a PhD. I enjoyed reading repeatedly that now is a great time to enter this young field.

    • Cloudflare Data Leak: How to Protect Your Personal Data

      This article talks about the rising threat of personal information loss through data leaks and how users can protect themselves from falling victim to it. The California based company Cloudware had data leaked from their servers this past week and anyone who has account information with Yelp, Uber, OKCupid and other companies should change their passwords for those accounts. The article then goes on to list the ways users can help protect themselves from data losses through strengthening of their passwords. These tips include: two factor authentication, choosing passwords 8-10 words in length and never duplicating passwords.

    • https://www.theguardian.com/world/datablog/2017/mar/03/most-disapprove-of-phone-use-while-driving-but-plenty-still-do-it
      This article is about countries in the United Kingdom that is starting to issue tougher penalties for people that are caught using their cellphones while driving. The penalties may include: twice as many penalty points and increased fines. The article then shows a bar graph that shows the percentage of road users who support a zero tolerance approach to drivers and the UK comes out on top with over 60%. And in another bar graph it shows the percentage of people who admitted to using the phone while driving and UK is the lowest among the other countries. Although the UK has the highest rate of support for a zero tolerance approach, the UK still had over 20% of their drivers admitting to using the phone while driving..

    • https://www.forbes.com/sites/paularmstrongtech/2017/03/03/how-big-data-is-going-to-change-over-the-next-three-years/#470546eb7bd3
      This article explains how big data will evolve over the next three years. Data at its current state is sufficient in making business decisions; however, the lack of depth relative to what it may be in the next few years causes managers and higher ups to make malignant decisions. Data today is often misinterpreted and overvalued without looking at biases and inconsistencies. Big data in the future will be much more in depth and hopefully more accurate in terms of nullifying inconsistencies and outliers.


    • This article is about a new marketing strategy of Fitbit. Fitbit was only used to track how many steps the wearers walk every day, but now the company intends to track the wearers’ sleeping quality. The company collects the data of the users’ sleeping time, and wants to provide sleeping tracker to its customers. Because the market of Fitbit is approching saturation, the company tries to add new functions to the Fitbit to attracting more customers.

    • http://fortune.com/2017/03/02/exclusive-berkshire-hathaway-energy-deal-uptake-technologies/
      This article is about a company which specializes in interconnecting devices to extrapolate data. Because data is useful in nearly every facet of our lives they see being able to take data from nearly everything as a good way to improve things. One example given in the article is that they could use energy output data to tell if a wind turbine may be damaged due to a lower than average output. I think that this kind of use of “big data” definitely has its uses in the world, however I do find it frightening that nearly everything we do is being recorded and can be used by others.

    • Data Quality – A Simple 6 Step Process


      This article is about Data Quality. This is very important to anyone in the world that uses data. If you have bad data, it is not going to get you anywhere. This article talks about 5 important steps when finding data with good quality. You want data to be organized and have the best quality. You can to continue analyzing and improving your data until it can get you the most accurate data possible.

    • https://fivethirtyeight.com/features/unc-duke-really-is-college-basketballs-best-rivalry/
      As a huge fan of college basketball (and a good friend who is a crazy duke basketball fan) this article was very interesting because it was able to quantify something that many people only judge qualitatively: a rivalry. While it is very easy to just throw out the term, FiveThirtyEight had to, as always, attempt to quantify exactly what a rivalry was. This writer specifically chose to categorize a rivalry as two teams playing each other at least every other year over the last five years, and then broke it down by each team’s national ranking and then by how close the overall series score was. Duke and UNC, as was previously expected, was near the top in almost every category. They are both always good, nearly identical records against one another, close in proximity, and played the most games of any rivalry close to the top, by far. This was a really cool example of how data can verify an age old story.

    • https://fivethirtyeight.com/features/the-feds-favorite-inflation-predictors-arent-very-predictive/
      This article is essentially an argument that the Federal Reserve’s approach to inflation is too narrow. To predict inflation, the Fed uses a model that looks at labor market slack and inflation expectations. The authors argue that the Fed should use other data in their models to predict inflation. Inflation, the authors find, is highly persistent and not as sensitive to labor market slack or inflation expectations as once thought. The article provides a data visualization that shows trend inflation over the past 25 years. It shows that predicting inflation is actually more accurate when basing it off of the long-term trend line than it is when using traditional inflation models (that use labor market slack and inflation expectations). This article is interesting because it looks at very basic data and points to something very obvious (inflation follows its long-term trend). This is interesting because this prediction method is sometimes more accurate than the current Fed models.

    • https://www.wsj.com/articles/how-a-cyberattack-overwhelmed-the-911-system-1488554972?mod=e2fb
      The article is about a recent cyber attack on the 911 emergency call center. During the cyber attack, at least a dozen U.S states were affected by the “largest-ever cyberattack” on the country emergency-response systems. Apparently, the 911 system is vulnerable to hackers, and the reason is that since 2015 there was no budget spent on the cyber security. It is worrisome that how weak the system is and you do not want it to be easily hacked.

    • This Data-Tracking ‘Smart Condom’ Is the Weirdest Sex Wearable Yet

      There is a new “smart condom” on the market that can track your data in the sheets. The i.Con is a ring that you put over a regular condom and can track anything from thrust count, velocity, temperature, and can even tell you if you are at risk for chlamydia or syphilis. I am curious as to see if the i.Con will become the next big thing. I guess there is a way to find data for almost everything nowadays.

    • this article is interesting because it talks about how in a few years almost all companies will be a part of the cloud. It says how all the companies are switching over because of the amount of data being generated. Another reason they are switching over is because the cloud makes it possible for there to be less hands on equipment and easier online network.

      http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/

    • This article is interesting because it talks about how companies are beginning to move to the cloud. This means less hands on equipment and more online data processing. This will also be an easier method to store information because companies can use the cloud providers to help them with any problems that come up.

      http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/

    • http://www.billboard.com/articles/columns/chart-beat/7678046/migos-no-1-album-culture-billboard-200
      This article is about the Migos who are a rap group that are extremely popular right now, and have the number 1 album in 2017. The fast upcoming of a new style of Hip-hop music has the migos at the top of the charts, selling over 200,000 albums in about 3 weeks. This article is important to me because music is very important to society

    • How Data Runs The Sports World

      This article is interesting to me because of how the sports world is controlled by numbers. Data is everything in sports because it helps predicting the future, which is what sports teams want most. Data reveal so much therefore making it extremely valuable despite it being such a high risk low reward system.

  • Damn Excel (amen to that)

    Data credibility

    Data corruption tricks

    Clean data top 10

  • Here is the study guide for the first midterm exam.

  • Here is the exercise.

    And here is the graphic file you’ll need: Philadelphia Area Obesity Rates.png.

    Right-click on the file and save it to your computer.

  • Leave your response as a comment on this post by the beginning of class on February 20, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]

    • I think the most important principle to follow in creating a data visualization is “simplify.” Very often people try to over complicate things to give their audience as much information as possible, but sometimes too much of something is just too much. Simplification can take on different meanings, including limiting the research and data collection, displaying the data clearly, or only including helpful information. As Hoven says, “A good data visualization captures the essence of data without oversimplifying.”

    • ‘Simplify’ is the most important principle to follow in creating a data visualization. This is compounded because we are living in a time where people value their ‘Time’ more than ever. People get annoyed when another person calls them because most people would prefer a text so they can respond on their own connivence. With Amazon, Uber or Hello Fresh people love these because they all allow a person to save time. Infographics are being used more these days to tell a story that people can understand quickly. When a person is unable to clearly identify the point of the visualization within a few seconds you will loss their interest or they discredit the entire data set. Personally when looking for data to support my research, I find that if it does not capture my attention in 2-3 seconds I move on to the next one. With so much access to information there is no point in wasting time trying to figure out what is going on.

    • I think that the “Be Skeptical” principle is the most important, After all of our readings on bad data, the filter bubble, and pros/cons of open data, I have a new understanding as to why it is so important to ask questions and be skeptical. Going back to check your own work is such an important step. Aside from just business (since my knowledge of it is limited), being skeptical in general is good for discovering new information and creating hypotheses.It is important to analyze how the research was done and what it really covers. In class, we often talk about how different sources can come up with completely different answer solely because of different ways data was collected and other details including sample size, how representative the sample is, how it was collected, and more.

    • I believe that “ask why” is the most important principle of data visualization. After a chart or graph appears, we may know what is happening, but without the why we are like a doctor only treating a patient’s symptoms. A person may no longer have a headache, but the cause of the headache is equally as important to know. The same applies to data visualization. We may now know that there is a cholera outbreak in a certain neighborhood, but what is causing the outbreak is so much more important than the outbreak itself. The problem will continue if we can’t figure out why it is happening. All of the charts in the world will never solve any issues without a why attached.

    • “Ask why” is the most important principle of data visualization. Many times we take a quick a look at things and ask no questions. This leaves room for misinterpretation or false information. This principle is less important for minor data, but for data dealing with larger and more critical issues, we are often careless when it comes to understanding the information we are seeing. It is important to ask questions to be able to verify the data and then to be able to understand its relevance.

    • Simplify is most important to me in terms of data visualization (not necessarily the case with data manipulation where more detail is better) because many data visualizations often try to tell too much of what’s going on in a single frame, which may cause ambiguity and false assumptions. Data visualizations need to be telling of the problem they serve to solve; however, simplifying things allows for open interpretation from a wide array of viewers. Less is more in terms of visualization, display the need you serve to fulfill and only that need.

    • I believe the “ask why” principle is the most important one. This is because misusing data can be common in instances where too much data is acquired, or if the interpretation of the results are inaccurate. Understanding “why” something is happening is at the core of the problem, and if this question is answered, then the likelihood of the data actually reflecting the hypothesis is higher in my opinion. Hoven states, “this is where actionable results come from,” and I completely agree.

    • Simplify is the most important when talking about data visualization because you want the data to be simple enough so that everyone can understand it as well as having the most important data on it. Having a visual that is too crowded and all over the place may have more data in the visual but it is not visually pleasing and it will be harder for people to understand. If you have too much data in the visual it will also deter the viewers from the main part of the visual.

    • I think “compare” is the most principle in data visualization. It is important to show the relationship between different data in data visualization. Single datum does not have too many meanings, so for data sets, it is the distribution of data that matters. Thus, we can see the big picture and have better understanding of the data.

    • I think “Simplify” is the most important principle in the Hoven article. Having too much data can cause many problems. When a lot of the data is unnecessary, you need to just get rid of it because it is not helping you get results. When creating a data visualization, too much data can be confusing and not helpful at all. You want a visualization with just the right amount of data so you are able to come up with conclusions when looking at and so it is easy to read.

    • I think ask why is the most important principle. Ask why is the most important principle because too often people want to find the quickest route to the solution and don’t delve deeper into an issue. You can get more out of anything you do the more you ask why. Especially when dealing with data, asking why can be crucial to understanding what you are looking at why you are looking at it.

    • I think the most important principle is view diversity, because when analyzing data there is always more than one way to look at it. When you are making a business decision dependent on data, you want to take into account all the different ways you can analyze the data and determine what way is the best way to look at it for our specific business problem. When the success of the business is dependent on the data you want to make sure the numbers are right and can be depended on.

    • Simplify is the most important principles. Because nowadays people value their time over anything. Data visualization is meant to give people a visual impact that when they see it, without much thinking they can tell what message it is trying to convey. If people have to figure out what the graphics is about by themselves, they will not waste their time on it.

    • Simplify is the most important principle because if the visualization is complicated then the information and message of the data can be lost. It doesn’t matter how good your information is, if you can’t visualize it simply then the information is worthless. Graphics are meant to be one of the simplest forms of communication, so if you are spending more than a moment trying to understand what a graphic is trying to say then you have lost your audience.

    • Simplify is the most important principle. The whole purpose of data visualization is to compress/compact huge amounts of data into something that anyone can understand. Thus, simplification is vital for effective data visualization. What purpose is a data visual if it is confusing or not simplified? We might as well just keep the raw data and not bother with the visualization at all. A good visualization captures the essential essence of the data without being too thorough. Therefore, simplify is the most important principle.

    • Simplify is for sure the most important principle because in the trend of “big data”, we need to find a way to understand what it means and transfer it into a story for people to understand it. For example, NYSE They use visualization for all of their data to show how the market change every second. In order to do so, they are simplifying all of their stock into in one S&P index to show that how the market move as a whole.

    • I think simplify was definitely the most important principle because without simplifying something many more mistakes can be made. Big data can be much simpler when made smaller.

    • I think simplify was definitely the most important principle because without simplifying something many more mistakes can be made. Big data can be much simpler when made smaller. Having a lot of unnecessary data gives you hell in the future. Taking the time to simplify is something that will take a lot of time off later and increase accuracy.

    • The most important principle is to be skeptical about the data that you create and the data that you see. Too often in the media portrays infographics that are clearly trying to get the audience to believe their viewpoint of the data. This however, is not the entire story. Sometimes the data is misleading (like making some numbers much larger than others or creating a much larger divide between numbers than is proportionally necessary). In the era of social media and infinite information, it is key to be skeptical of all data that is portrayed as absolute “fact”.

    • Stephen Few’s 8 core principles of data visualization bring up many important points. However, the single most important in my opinion is “Be Skeptical”. There is lots of data in the world, but this makes it very easy to tell a “data-driven” story any way you’d like. The methodologies behind “why” data says what it says are very important and is something we need to always keep at the forefront of our minds when consuming data visualizations. Data is good, but data can also be skewed and twisted and remodeled many different ways. Always ask: “What was your methodology behind this insight?”

    • I would say the most important principle would be Ask Why. When you can find out why something is happening, you can start to better understand the data. Knowing why the data exists allows you to make predictions on what will happen next or adjust the situation depending on what the data means.

  • Load More
Skip to toolbar