-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Here is the exercise.
And here is the graphic file you’ll need: Philadelphia Area Obesity Rates.png.
Right-click on the file and save it to your computer.
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Class,
I’ve uploaded a video walkthrough of the first Tableau in-class activity that was started last Friday. Because it’s so integral to becoming acquainted with Tableau, and many elements are used in […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
As mentioned in class today, if you had points deducted from Assignment #1 because you didn’t meet the basic scenario requirements, (you will know this from my additional note at the top of your evaluation sheet), […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Leave your response as a comment on this post by the beginning of class on October 12, 2016. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
An article that I found interesting was on Guardian Data Blog. The article collected all the data about natural disasters and when they occurred. Also, it uses the data to compare countries to see which one is the safest and riskiest place to live. The article uses data visualizations by using bubbles and color schemes to make it easy for the reader to understand the concept of what the data is for. This caught my attention since currently hurricane Matthew is happening and is on its way to Florida. https://www.theguardian.com/global-development/datablog/2016/apr/25/where-is-the-riskiest-place-to-live-floods-storms
-
I found this article on FiveThirtyEight. I decided to explore political side because I realized I have not shown enough interest in related topics. The article i found is on the Obamacare law.It explains how the law has increased insurance coverage all over the country. The article first of all cites the beginning of this years election when most of the primary election candidates adopted this law as campaign point. The fact that this stopped towards the ending of the election helps me to understand that the positive effects of this law have started to increase. the article also uses effective data visualization to hep me understand at a glance that the number of people who are uninsured have reduced in states all over the country. I think the reason the candidates have stopped mentioning Obamacare in their campaigns is that they see that it will make substantial changes to the insurance coverage rate in the country. Because that was what i could evaluate just from the data visualization provided.
http://fivethirtyeight.com/features/obamacare-has-increased-insurance-coverage-everywhere/ -
The article i found was on FiveThirtyEight. This article talks about the Obamacare law and how it has improved the health care insurance coverage in the united states. At first there was a negative reaction to the establishment of this law, which is why many of the primary election candidates used it as a campaign point at the beginning of this years election. this article uses an effective data visualization to help me understand, at a glace, the reduction of the number of uninsured people in states all over the country. I think it is safe to draw that from the data representation, there will still be a further reduction in the future, which is the reason why candidates are no longer talking about this in their campaigns.
http://fivethirtyeight.com/features/obamacare-has-increased-insurance-coverage-everywhere/ -
The article I chose for this question was from FiveThirtyEight. The article is titled “How Many Times Did Trump Interrupt Clinton In The First Debate? Depends On How You Count”, I wanted to use this article for my explanation because I feel it has to do a lot with bad data and how it can be caused. This article was about the differentiation of the interruption count. Different media outlets all had different numbers comparably. We see this bias in numbers because there is not an exact definition of “interruptions” for every reporter/viewer of the presidential debate to go off of.
-
An article that I found interesting and cites many sources of data is Nate Silver’s article on FiveThirtyEight about the affects of the recently discovered controversial recording of Donald Trump on his chances of winning the election. I find the article interesting because I think that the presidential election of 2016 itself is very scandalous and filled with controversies. Silver’s article cites nine polls that have conducted interviews since the tape’s release. Silver then writes about the affects of sampling error with respect to Trump’s ratings. Some polls show that Trump’s ratings had not changed, while other polls show a huge change. I find it interesting that different samples of voters can yield a wide arrange of results, especially in the context of something as large as the presidential election.
-
The article that I found is on Forbes’ website and is about how big data is helping us to predict almost everything imaginable. I think that it’s really fascinating how we are able to use big data to make predictions and use those predictions to identify future potential problems and therefore potential solutions to those problems. For example, big data is allowing us to make predictions in high school graduation rates and then find ways to attempt to keep students in school. I think that this is very interesting because of how it will help us in the future to further develop society and become more knowledgeable and intelligent as people.
-
The article that I found is on Forbes’ website and is about how big data is helping us to predict almost everything imaginable. I think that it’s really fascinating how we are able to use big data to make predictions and use those predictions to identify future potential problems and therefore potential solutions to those problems. For example, big data is allowing us to make predictions in high school graduation rates and then find ways to attempt to keep students in school. I think that this is very interesting because of how it will help us in the future to further develop society and become more knowledgeable and intelligent as people.
http://www.forbes.com/sites/bernardmarr/2016/10/10/5-amazing-things-big-data-helps-us-to-predict-now-plus-whats-on-the-horizon/#12e542ad2b63 -
The article that I found interesting was “Rangers And Jays Battle To The Postseason’s Most Exciting Game- So Far” on FiveThirtyEight.com. The article ranks the MLB playoff games by the “Excitement Index,” which estimates how likely each team is to win the game at any given moment, based on how they’ve done historically in similar situations. The index was originally used for basketball, but has been transformed for the baseball playoffs. The “exciting games” feature large swings in the win probability for each team, with the probability changing after each play such as a strikeout or a home run.
-
The article that I found interesting was off of the five thirty eight website and it was a article about the poll information that is found in the polls in relation to the current election. The main focus of the article is centered around the fallout cause by the trump tape that was released to the public. In the tape Donald tape is making disrespectful comments about female. The article show me the different changing in Donald’s ratings people that were once for him have change there mind because those comments he made. http://fivethirtyeight.com/features/election-update-polls-show-potential-fallout-from-trump-tape/
-
The article that I found interesting and relevant was one on Amazon, and how they plan to implement their grocery/food section of amazon into real stores. In this article, Amazon states their data about how they plan on making a series of corner stores around the country that focus on selling fresh groceries. However, this is not what the most exciting part! They also plan to have drive through spots at these corner store locations, where people driving home from work can just drive through, pick up what they need, and go home! There wouldn’t be a need to get out of the car after a long day of work to go shopping! Amazon is trying to turn their digital data world into real life which is amazing to see! I love Amazon and shop on there all of the time, that is why I found this very interesting and relevant for myself.
-
The article I found interesting is titled, “2016 Election Forecast,” and it uses poll data to show the change in projected electoral and popular vote and subsequently chances of winning for each candidate. The accompanying map (to the data) shows the breakdown of popular and electoral votes for each candidate by state and is color coded red (for Clinton) and blue ( for Trump). The most recent aggregation of data reveals that there Hilary Clinton has an 86.3% chance of winning and Donald Trump has a 16.4% chance of winning. This information is useful, however, I think being able to see the changes in the three aforementioned fields (electoral, popular and chances of winning) over time is particularly insightful. It shows the impact of debates/media stories, etc. The map graph and line chart included are both very easy to read and make picking up data easy. The upcoming election is an important event with outreaching impacts, so it is covered by every news outlet. The graphs and polls on each network is different, so it is nice to use a credible data source to see an accumulation of polls in one place.
http://projects.fivethirtyeight.com/2016-election-forecast/?ex_cid=rrpromo
-
***Blue for Clinton and Red for Trump
-
http://fivethirtyeight.com/features/election-update-women-are-defeating-donald-trump/
The data driven article I found interesting for this week pertains to the apparent gender gap in candidate preference that is showing itself in the current election going on. FiveThirtyEight explores hypothetical scenarios about election outcomes if only one gender were voting for each candidate. The article showcases the obvious data that backs up the statement of Clinton’s very wide margin of women voters as opposed to Trumps margin of mens voters. One interesting point noted in the article was that if men were the only ones voting, Trump would win essentially anything considered a swing state across the electorate map. The purpose of this article is to mainly highlight how polarized the overall electorate has become and how changing demographics has caused a substantial shift in how some candidates are received. -
The article I chose was “Hurricane Matthew’s US death toll rises to 33 as flooding chaos continues” which is found on the Guardian Data Blog. I always like to stay up to date with current events. This event is especially relevant to me as I am from Florida and all of my friends go to school back home. I kept up to date with this hurricane and kept in touch with my family and friends to make sure everyone was okay. Unfortunately, the new information from this article shows that 33 people have died because of Hurricane Matthew. Although it’s sad to read about, I think it is important to stay up to date with current events.
-
I chose an article that predicts the likelihood of each baseball team that is participating in the post season this year. The article outdated now because it’s a week old but I chose it out of spite because they gave the Giants a 3% chance of winning the world series. The article ranks the teams with a scoring system used by fivethirtyeight that gives points to teams according to a myriad of factors. It sorts the teams according to their general ability and performance with the most well rounded teams at the top and the teams that have had the most shaky seasons at the bottom.
http://fivethirtyeight.com/features/major-league-baseball-is-about-to-get-random/ -
I choose the article named ‘The End Of A Republican Party’ that discuss about Election2016, which is one of the most important events in these days. The article used data to analysis characters of GOP and showed the overall characters of GOP are whiter, older and less educated. Then tables shows the division within the party. People like or dislike Trump in GOP hold different views. This is interesting is that through the data and data visualization we can know who in GOP are and what views they actually hold. That can helps us know Election2016 more deeply.
http://fivethirtyeight.com/features/the-end-of-a-republican-party/ -
The article I found is from the journal of accountancy and it discusses retirement fears of Americans. Forty-one percent of CPA’s reported their clients’ number one concern is running out of money for retirement. This is because the baby boomer generation has been tasked with not only supporting their children, but their elderly parents as well. The elderly are living longer than once projected, which is causing them to run out of the money they had previously saved. I find this interesting because as an accounting student and hopeful future CPA, knowing what people are concerned about and how to help them with it will assist me in the future if I choose to open my own practice.
-
The article that I found interesting delved into the benefits of extracting data from the human genome to find patterns that help explain what causes certain viruses. In particular, this article looked at two major viruses that have plagued our population over the last few years, such as Zika and Ebola. This case interviews Pardis Sabeti, who is a computational geneticist at MIT. Ultimately, his lab is data-mining the human genome for cures. http://www.wired.co.uk/article/genetics-viruses-ebola-zika-tick
-
An article which I found interesting came from The Guardian Data Blog. The main premise of the article is to illustrate to society how the U.S. Government spends money in the 2016 federal budget. The data is presented in an interactive way that adjusts the amount of tax by category depending upon total income. The reader simply drags the bar according to their annual income which corresponds to the amount of tax per government spending area. This is an extremely effective way of illustrating data to a wide audience.
-
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Here is the exercise.
Before you start, save this Tableau file and the studentloans2013 Excel workbook to your computer. Remember, to save the file right-click on the link and choose “Save As…” (don’ […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Here is the study guide for the first midterm exam. Exam is October 7. Exam review is October 3.
Format for review is:
Unstructured, for my part. I do not have an agenda for topics to cover. I will field […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Some quick instructions:
You must complete the quiz by the start of class on October 10, 2016.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to […] -
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Here is the exercise.
And here is the spreadsheet you’ll need to complete the exercise [In-Class Exercise 4.2 – FoodAtlas.xlsx].
Make sure you right-click on the Excel file link and select “Sa […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 1 month ago
Here is the assignment.
Here is the worksheet as a Word document to make it easy to fill in and submit (along with your Tableau file).
And here is the data file you will need to complete the assignment […] -
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Leave your response as a comment on this post by the beginning of class on September 28, 2016. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
While looking at Hoven’s article “Stephen Few on Data Visualization: 8 Core Principles,” I the most important principle is simplify. Just by in class examples this week, we have seen that the more information you put on data visualizations, the harder they are to understand. For example, it is great to use color variations to show differences in graphs but not a good idea to use patterns to show distinctions.
-
Reading through Hoven’s article “Stephen Few on Data Visualization: 8 Core Principles”, I believe the most important principle is explore. While exploring data you get to explore and discover things. Uncovering new data is very critical when doing data analysis. Also, when exploring with data people can be creative and innovative while finding new data.
-
In Hoeven’s “Stephen Few on Data Visualization: 8 Core Principles,” simplifying is the most important. If the end user doesn’t understand what they are looking at, or what is meaningful about the data, the visualization has very little purpose. I believe if all of the other principles are satisfied, simplifying will come with greater ease.
-
From the article “Stephen Few On Data Visualization: 8 Core Principles” I believe that the most important principal is Simplify. This is the most important to me because I agree with Few when he states that data is like art. Even the most simplest pieces of art can send the strongest messages to its viewers and I believe that Data/Data Visualizations can have the same affect. We should let the data speak for itself, less is indeed more.
-
After reading the Article , the most important Core principles is Explore. The reason that i feel it is the most important is because using data to discover other things is a major key in understanding and comparing data. When exploring data we are able to learn things in other to master the currents data and also data that we never knew was available.
-
I believe that the most important core principle of data visualization in Hoven’s article is the principle of simplification. I think that this principle is the most important because the simplification of the visualization is what allows the viewer to not have to think about what the visualization is about or how it works. The viewer is simply able to see the visualization and understand the key concepts of why and how it works in the way that it is presented.
-
I think that the most important principle in the article is “respond.” You can put as much effort as you want into combing through data to find the best sources, simplifying the visualizations to make them digestible, and any other aspect of creating data visualizations – but data and the information shown in the visualization is pointless in a vacuum where it wont be critically considered. People interpret data through their personal biases, so the best way to make data matter is to put in a forum where it has the opportunity to spread to different people who will see the data in different ways.
-
I think that “simply” and “explore” are directly linked and so they are equally most important. Simplifying without being oversimplified allows data visualization to be immediately understood by whoever looks at it, at the same time leaves way for further explorations and deductions. This can also mean that it has to visually communicate well with the explorers while taking into account the language barrier.
-
In the article, ‘Stephen Few on Data Visualization: 8 Core Principles” i think the most important principle is simplify. Data visualization can contain a lot of information, too much information is not a good thing. It needs to be simple, easy to understand. The eye of the reader cannot be focused on too many things within the visualization or it will ruin the message trying to be sent. Get to the point of the data, don’t worry about background information.
-
Niels Hoven’s article, “Stephen Few on Data Visualization: 8 Core Principles” mentions many important principles however I believe the most important one is to simplify. It is extremely important for the viewers of the data to understand what the visualization is conveying and that cannot happen if the data is not in a simple and pristine layout. As Hoven mentions, “an artist can capture the essence of an emotion with just a few lines, [and] good data visualization captures the essence of data – without oversimplifying.”
-
The most important principle out of the eight given in the Hoven article would be “explore.” Data visualization isn’t effective if it bores the end user. Humans are curious by nature, and its important to allow people to able to discover on their own. “Explore” and “ask why” are very similarly related in this case, since “ask why” relates to why certain things happen and “explore” is finding out that they happen in the first place.
-
After reading “Stephen Few on Data Visualization: 8 Core Principles,” I think the most important principle is to ask why. We need to know why something is happening and many people do not focus on this key principle. When viewing data, most would just try to understand the relationships and trends. But, more importantly if we can understand why something is happening, there will be more advancement in data visualization. Also, without knowing why, we won’t be able to understand the whole picture of the data.
-
After reading the article “Stephen Few on Data Visualization: 8 Core Principles”, the principle of Simplify is the most important for me. Data Visualization is invented for people to analysis and present data more conveniently and more directly. It helps people understand quickly.
So, Data Visualization must be simplified, and can present the essence of data apparently. -
After reading “Stephen Few on Data Visualization: 8 Core Principles,” I believe that the most important principle is “Be skeptical.” Being skeptical is a very important principle because most people do not dig deeper than the initial results. It is crucial that we go deeper than just the initial results, because there is always a bias in the data. Being skeptical about the initial results and looking at more data can help eliminate this bias because the more data that is used, the more the individual gets to see the bigger picture.
-
From the article, “Stephen Few on Data Visualization: 8 Core Principles, I believe the most important principle is simplify because data visualizations are supposed to really information in a convenient and easy to access way. If a particular data visualization is too difficult to understand, it has a high chance of being overlooked and ineffective.
-
After reading Hoven’s article “Stephen Few on Data Visualization: 8 Core Principles”, I found the most important principle to be “explore.” This principle struck me as the most important due to the fact it gives whoever is visualizing the data the potential to uncover even more within the data visualization they are analyzing. While data visualizations are telling a very specific story of information, and sometimes found within and infographic, each visualization still allows for the user to explore that particular data set and potentially discover certain aspects of it they may not have known existed prior.
-
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Here is the exercise.
Here are the links in case you cannot click from the document.
History, Economics and Social Issues
Science and Health
English, Fine Arts and Entertainment
Remember […]
-
Rhea & Maris
Bad infographic:
http://www.nytimes.com/interactive/2008/10/25/opinion/20081025_opart.html -
Rhea & Maris
Good infographic:
http://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.html
-
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Some quick instructions:
You must complete the quiz by the start of class on September 26, 2016.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to […] -
Shana Pote posted a new activity comment 8 years, 2 months ago
Max, please paste the URL for the article into your reply for full credit.
Thanks!
Prof. Pote -
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Here is the exercise.
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Leave your response as a comment on this post by the beginning of class on September 21, 2016. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
I believe, the most important takeaway in last week’s discussion was that data can be bias. Data can be bias because the people who the collect the data can be biased when collecting because that person will only collect the data that he or she believe is important according to their beliefs. Also, a signal problem is important as well since the data itself creates confusion. A recent example of how I used data to make a decision was I was trying to find a great hair product. I was trying to find an organic hair product that will help my hair’s characteristics. I search on google and even ask family members . I found hundreds of different products. However, I noticed the products that were recommended to me were never compared to a different product. Once a person found a product he or she liked the person only used that product. This resulted in a signal problem because I had to go out of my way to compare products (compare data) and find which product was the best for me.
-
The most important take away from last week’s discussion on biases and the signal problem is that data can be skewed by the creator. This can happen when the creator goes in to analyze data with a preconceived idea of how something should be, so they try to fit the data to their hypothesis. Or, the creator find data that looks so correlated that it must have a cause and effect relationship, when in actuality, it does not. These errors, or biases, in analysis can produce a signal problem, in which the conclusions are wrong due to the approach. Recently, I was shopping for a new phone, so I went online to certain forums to read reviews and opinions on specifications of the phone, however, there always seems to be more reviews that are overwhelmingly negative or overwhelmingly positive; rarely are they right in the middle. I believe this is caused not by people who are the most objective, but by people who are the loudest, in a sense. A way to combat this, is by going to a store to try out the phone myself instead of relying on subjective information.
-
In my opinion, the most important point made in last week’s class is how we can indirectly create bias by collecting data in such a way that misrepresents a demographic or group. This is easy to overlook because it is easy to believe that everyone has a phone or internet access, just because the people around you do. In the article written by Crawford about hidden biases in data, Crawford writes about how Boston’s StreetBump app is out of reach for people who don’t own smartphones. When I bought my laptop for college, I used data to help me find the one I wanted. I looked through hundreds of Amazon reviews and recommendations to help my make my decision; but, many of the reviews I found on Amazon were either very one sided. They were ether five stars or one star reviews. I know that Amazon combats this by sorting the reviews by how useful they are, but I think this is bias too, as not all reviews that are submitted are equally represented. To add to this, not all of a certain laptop’s owners are Amazon buyers, so not all individuals who own a certain laptop can write a review for it on Amazon. I believe the only way I could counteract this was to actually test-trial laptops by buying them and returning them if I was not satisfied, but the amount of effort in order to go through this process is probably too troublesome for most buyers.
-
I found our discussion last week particularly interesting because biased data is everywhere and affects everyone. I think it is shocking that there are people who don’t even realize biased data exists or is a problem. Personally, I buy a lot of things off Amazon and for me, reading the reviews are essential. However, there are keywords and traits that always make me hesitant to trust a review. For example, when they are overly negative or overly positive. Also I find responses that include pictures of the product very helpful. Recently, I was looking to buy a tapestry from Amazon and there was one 1 star review. In the review they included that the tapestry was too small and included a photo. However, in the description of the item they said it was a small tapestry and gave the same dimensions as the disgruntled reviewer. This made me completely ignore that person’s review because it was clear they hadn’t even read about the tapestry before they purchased it. Additionally, I found it very interesting hearing what my classmates thought and their other experiences. I never thought about a profile picture indicating a more reliable reviewer, but it does show an added level of legitimacy.
-
I think one of the most important parts of using data to make decisions is looking at present data comprehensively. I say that because I know that I don’t always do that. As a consumer I feel quite a bit of brand loyalty. For example, when it comes to technology, I tend to prefer brands like Samsung and Lenovo over most others and I will sometimes opt for a more expensive option because I’m buying from a brand I trust, even when compared to brands that are recognized as being of a high quality. Buying from a brand that I find reliable makes sense, but beyond that there’s not much data for why I should make decisions like that. I tend to do extensive research whenever I buy things, but still often settle on the brands that I know well. My situation makes me wonder how much any number of industries are impacted by people making decisions and purposely ignoring data that could lead them to make other conclusions.
-
I feel the most important thing that I learned from last weeks class is how bias data can be. I often use different sites to look up the rating of different activities before I attend them. After last week class I learned that different websites will give me different options. In class on Friday we used yelp and google maps and the yelp is mostly for food and how good the is. On the other hand using google maps will relate to just how to get to a specific location. The important thing that I also learned was to look the source of the data. if the person making the review, always has a negative things to say then , using them as a source for where to go to eat will not be the best choice. Paying attention to the source is very important, key factors such as past ratings and even a picture on the account page is important to determine how reliable they are. Another thing I took from last week’s class is pay attention who is making or in charge of the site because the information will be bias.
-
The one thing that I took from last weeks discussion is that many pieces of data can be biased. This happens because data is not objective, we as humans give the data meaning through out interpretation. Many data sets that we come across are going to be biased and its our job to be able to try and point those out so that we can realize that particular data set is not a fair one. A recent decision that I have made using sets of data is choosing valid summaries for a book I have been reading. My teacher insists that we use the internet to find summaries on the book so we can get a better understanding of said book. However, while searching through the internet I noticed that one of my summaries had slightly different information than the other three. I decided to look into this more and I realized that the one article was a biased review on the book instead of a valid, non-biased one I was looking for. This made me realize I always have to make sure that my information is fair and legit, or else a bias piece of information might soar right over my head without even noticing.
-
The thing I took away from class was that data can be biased by the creator of the data. Recently, I was looking at a fantasy football raking site that gave who the best fantasy players would be this year. It gave a list of the best quarterback, running backs, wide receivers ect. From this list I tried to draft the players that were ranked highly on this site. However, this site could have been biased. Alot of the players ranked highly were from the same team, I think the creator of this list was a fan of that team and was biased towards that team. This was a problem because now i couldn’t trust this source and had to go out of my way to find a list with no biased.
-
The most important takeaway from last week’s discussion regarded bias and how individuals can skew data and other information a certain way. Unfortunately, if a human is doing a review, they will show bias towards a certain brand or product because of their past experience or loyalty. I typically use Consumer Reports, YouTube videos, and other review sites when I am searching to purchase a product. When I searched for my newest pair of headphones, I used these three different sources and received different answers whether it being a different opinion from a technology blog or bias from a reviewer. While the data regarding the headphones tends to help in the decision-making process, it also makes your mind go crazy because you do not know who you should believe. As a result, the consumer should stick to the brand or reviewer they have shown loyalty towards in their past purchases. In the end, it is important to note, sometimes the bias in data and reviews makes the decision difficult.
-
I think that the most important takeaway from last week’s discussion on bias and the Signal problem is that not all data can be trusted and we cannot assume that there is no influence of bias in the data. Recently, I was getting food at Johnson and Hardwick dining hall and came across the pizza. The dining hall at the time was fairly busy, but there was almost no pizza taken from the pizza station. I used this information to determine whether I should take the pizza or not because it might not have been that high quality pizza if no one was taking it. Another possibility was that everyone just ate all the pizza and this was fresh pizza that was presented in front of me. From this, I made the decision to take the pizza and it was in fact of satisfactory quality. This decision was definitely biased by my attraction to pizza, but I tried to not let it affect me. This plan inevitably failed, however I got some pizza I enjoyed. This scenario did result in a signal problem and I made the decision based on the facts that were presented before me.
-
The most important takeaway from the discussion last class was how data can be biased, but sometimes you may not know it. For example, Boston could have failed to realize that not all of its citizens could download the app, and accepted the biased data as the complete picture. It’s easy to collect data and overlook any bias or sampling error that is made. This is why it is so important to really look into your data collection methods first before accepting the data as true. My most recent decision was whether or not I should do my homework, or just continue watching Avatar the Last Airbender. One bias I had was that I like Avatar a lot more than doing homework, and my natural laziness almost prevented me from doing the most logical thing and finishing my homework. (And yes, I did eventually get to my homework. Unfortunately, my laziness will have to wait).
-
Focusing on the bias present in all data was an important reminder to step back and reevaluate my rationale behind data driven decisions. Most recently, I reexamined my reasoning behind the CAPSIM decisions I made for my Integrative Business Apps class. The data my team used to make decisions for our Round 2 decisions was based off our results from the previous round. The way in which my team perceived this data was biased, since we each had different opinions on what to do for first round decisions; since we made a final decision as a team, we ultimately compromised and picked a decision that accommodated as many different opinions as was possible. As a result, though we all looked at the same data from our first round decisions (market share, financials, utilization, etc.) we each interpreted it differently, since we each had inherent biases in our outlook.
-
The most important takeaway I received from last week’s discussion is that you have to be aware of what data you’re looking at before you believe it is true. Often times data can be biased or misinterpreted. Today, I looked around online in order to find a good quality barber near campus. I found a few different places and used data such as the average number of stars given. I chose a place based on the amount of reviews and the high ratings. Usually with reviews you have to be careful because sometimes they can be fake or the type of people who review are not good because they are either very negative or very positive. I made sure to try to find credible reviews on Yelp by seeing how many reviews someone had done and check the post for grammar and length.
-
The most important takeaway from our discussion, in my opinion, is the prevalence of data bias and the importance of being aware of it. We covered an example of bias on review websites such as Yelp, with certain accounts leaving 1 star reviews as anomalies among the rest. I use review websites (specifically Yelp) quite often in order to judge whether or not I will like a particular restaurant. When researching reviews for the Pita Chip restaurant on campus, I checked Yelp to see if I might enjoy it. Most of the reviews were positive, however, some one-star reviews skewed the overall rating of the restaurant. This bias did not affect my decision, as I realized some of the accounts were created solely for the purpose of leaving negative reviews, and they were few and far between. This bias can be counteracted by knowing the signs of faulty reviews, as well as looking at the prevailing sentiment regarding a specific establishment.
-
In my view, the most important takeaway from last week’s discussion is that data is created with bias.
Last month I wanted to buy a watch for myself, and I searched for information on a Chinese question&answer website called ‘zhihu’ (like Quora in America).
There are many answers recommend some fashion watches like DW and post elegant pictures of them. It seems like everyone likes this style and the Nylon strap. However, I found that most of these answers are similar and even use same images with the same watermark. Those must be advertisement answers.
Then I ignored these answers with flowery words and beautiful photos, and just read answers with ‘daily-life’ pictures or ‘looks like real experience’. -
I found last week’s class discussion about the Signal Problem very interesting, as I believe that it is going to become a growing concern in our society. A recent example of using data to make a decision occurred during my fantasy football draft last month. Although the data said that another player may be better than Eagles wide receiver Jordan Matthews, I still selected him for my fantasy team. As an Eagles fan, I had a clear bias when selecting Matthews. One way to counteract this Signal Problem during the draft would have been to not look at the player’s names or teams, just their statistics from last season and their projected statistics for this upcoming season.
-
The biggest takeaway from last weeks discussion of the signaling problem issue for me was just how biased data can be, no matter where you come across it. Whether it be social media, review sites, or any sort of analytical review of information. Data is something that can easily become inherently biased and easily manipulated. More recently, while trying to pick out new parts for my computer I was repairing I came across numerous reviews of products I was interested in, but most of the reviews were inconsistent and consisted of several 5 star reviews, and then several one star reviews. I chalked this up to some people having anecdotal experiences with the products and some of them being negative, but it made me wonder if some bias was at play in the reviews that were more positive.
-
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Here’s the site we discussed in today’s class, Spurious Correlations. Poke around and see what you can find. Great for conversation starters!
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
In class we talked about a few examples of open data. Here are some others you might want to check out throughout the course. Consider how having these data sets freely available to the public might transform […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Some quick instructions:
You must complete the quiz by the start of class on September 19, 2016. The quiz is based on the readings for the whole week.
When you click on the link, you may […]
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Here is the exercise.
As discussed in class today, please comment on this post with the following:
What dataset did you find
Where did you find it
Why did you think it was interesting
What did you […]-
– The data set I found was The City of Philadelphia Outdoor Advertising which explained locational and relevant attribute data pertaining to billboard and outdoor advertising locations throughout the City of Philadelphia.
– I found this data at http://www.opendataphilly.org under the Organizations – City of Philadelphia tab.
– This information was interesting to me because I love seeing many different ways of advertising, billboards, flyers, outdoor publicity. All of these forms catch a civilian’s eye on a daily basis.
– I learned that they haven taken many types of information into this data. Ranging from location to building permits, two very different specifications but both equally important.
– This data could be used when trying to find out where to place a new advertisement or even how to update a previous standing advertisement. Also we could learn from the data the general size of the advertisements and what is working the best to catch people’s eyes. -
A data set that I thought was interesting was “Indego Bike Share Trips” found on opendataphilly.org. (https://www.opendataphilly.org/dataset/indego-bike-share-trips) It’s really interesting because a lot of useful and relevant data is included in the data set. As a student, having a cost-effective alternative to a car is very important to have. Just by looking at it, I know that most people who use Indego are Indego30 members and use the service for one-way trips. The trips include starting and ending latitude and longitude, which can be plotted onto a map, possibly showing the most bike-friendly streets and neighborhoods as well as the types of trips that bikes are used for. If many bike trips start at a bike station and end at school, supermarket, or public transportation center, it can tell us about the usage habits of Indego users. This data isn’t hard to come by, either. I believe its done automatically, because there are over 170,000 Indego bike rides from April 2016 to June 2016.
-
–The data set that I found is “Healthy Chinese Takeout” which is a movement run by Temple Universities Center for Asian Health. This push is to try and get Chinese takeout restaurants to reduce the amount of sodium used in their dishes to try and prevent high blood pressure in Philly citizens.
— I found my Data set “Healthy Chinese Takeout” at http://www.data.gov and my data set URL is here
http://metadata.phila.gov/#home/datasetdetails/555d710ffc0eee467ebcedd8/representationdetails/555d7150e23ac33e7ed09122/
— I thought that this data set was interesting because I, as well as many others love to eat Chinese food, but also know how bad it is for us. I wanted to learn more about this and see if there was any way that I could enjoy my Chinese takeout, but in a healthier manner!
— I learned that so far there are approximately 200 restaurants in the Philly area that have taken a part in this and are trying to reduce the amount of salt in their dishes. I now know that there are specifically 3 Chinese takeout restaurants in my neighborhood that I know partake in this low-sodium push.
— The decision that I can now pull from this information is where I can order my Chinese food from without worrying if I will accrue a health issue or not. -
I found a data set called “Historic Streets” that catalogues and maps historic street in Philadelphia at http://www.opendataphilly.org under the “Planning/Zoning” section.
It’s interesting to me because it shows the areas because I don’t know that much about Philadelphia’s history because I’m not from here. I found it odd that most of the historic areas are clustered downtown (no surprise) but then there are a bunch more clustered in the extreme north of the city with not much in between.
The data showed me roughly where people settled in the city’s infancy and the formations that they situated themselves. The data didn’t tell me what groups settled where but it wasn’t hard to find that information with just a little extra research.
For most the data wouldn’t have much practical value, but it would show people like city planners where in the city should be preserved and left out of plans for future building. It could also be useful for tourism as well because it could be used in maps for visitors. -
The data set I found most interesting between the two sources provided was the “Bike Network” dataset found under transportation on http://www.opendataphilly.org. This comprehensive dataset put together is a vast collection of all of the bike lanes in Philadelphia throughout different neighborhoods that are deemed “bicycle friendly.” This data source not only provides very detailed comma separated vales in excel, but also keyhole markup language for even more interactive data presentation through Google Earth. This made the data presented much more easy to interpret thanks to actually being able to see the locations that correspond with the respective CSV rows.
This data is particularly interesting to me personally due to the fact that I commute on my bike throughout the city somewhat frequently and have always found that even though bike lanes are present in certain areas, calling them safe would be a stretch in my opinion. Knowing that such thorough research into this topic was conducted, I will now be able to go through this data and know which areas might be better to ride in now. The biggest takeaway that I got from reading this data is that no one particular neighborhood in the city of Philadelphia seems to be majorly “safer” than others, and that conventional bike lanes are found predominately throughout the city as opposed to shared bike lanes, which put cyclists and drives right on top of one another.
This data would be very helpful for any new commuter getting started with riding in the city of Philadelphia who wants a better understanding of what neighborhoods are deemed more bicycle friendly and which are simply not included in the data. In addition to seeing particular lanes associated with each neighborhood, actually seeing satellite imaging can help them determine whether or not they would feel safe biking in that area of the city or not.
-
-I found a data set that shows data from a pedestrian count at over 5,000 different locations in the Delaware Valley. It also records data such a weather, temperature, and pedestrians within the hour.
– I found in on OpendataPhilly.com. It was the 14th one down on the database page.-I think the data is interesting because you can find out what influences people going out and walking around. Things like weather, day and hour influence the amount of people walking around.
-I learned that things like time and weather affect pedestrianization immensely. For example, on a rainy day there was only 43 people out the entire day ,but days before on a partly cloudy day the count was 673 people. Also at times like 2 am not many people are walking around but at 2 pm there are many.
– This could be important data for local businesses. Local businesses need to run promotions and without this data local business would not know the best times and days to plan the promotion to maximize the amount of pedestrians that are there.
-
The dataset I chose from OpendataPhilly is called, “Visualizing Philadelphia’s Neighborhood Change Process” (http://penniur.upenn.edu/publications/visualizing-philadelphias-neighborhood-change-process). The data demonstrates the varying levels of attention the neighborhoods of Philadelphia have received as investors look to get into real estate in Philadelphia once more. I found this data to be useful and interesting because as I have accepted a full time offer in Philadelphia for the coming year, I am on the lookout for potential neighborhoods to move into.
The article uses single-family homes as an indicator of change in a neighborhoods. It states, “real estate prices reflect the willingness to pay for different neighborhood characteristics, and if a shift in these characteristics typifies the neighborhood change process, then we should expect a shift in prices as well.”
-
The dataset that I chose from data.gov the dataset is called “College Scorecard”. http://catalog.data.gov/dataset/college-scorecard. I thought the data was interesting because it explains to students how to make college affordable and pick what college is the best fit for them using data. The data makes it easier for students to search for a college that is the best fit for that student. I a can use the data to see if Temple University is a good fit for me based on the data. Also, this data is good for future students who plan on going to college.
-
The data set that I found is called the College Scorecard made by the Department of Education on the website data.gov. I think that it is an interesting data set because with it, it allows the user to easily search for colleges based on program, location, size, name, and other advanced search options. Once the user clicks on an individual school, they can see all the different types about that data laid out in a very user friendly manor. From the data, I was able to learn that the average student at Temple University has an ACT score between 22 and 28. This is just one of many different types of data I can learn from the data set. Using this data, someone would be able to make well informed decisions about the colleges that they may be interested in applying to and what is the best school for them to attempt to get into.
-
The dataset I found was Impaired Driving Death Rate, by Age and Gender, 2012, All States on data.gov. I found it interesting to see which states have the worst impaired driving death rate and which states have few impaired driving deaths. Also, it was interesting to see what age was the most common for each state in impaired driving. I learned that for most states the driver is usually male in an impaired driving death. This data could be used to let certain state governments know they need to raise awareness and alert specific age groups about impaired driving to help reduce the death rate for impaired driving.
https://catalog.data.gov/dataset/impaired-driving-death-rate-by-age-and-gender-2012-all-states-587fd
-
1) I found a dataset analyzing the frequency of healthy food access regarding those who live in high poverty areas in Philadelphia.
2) I discovered this on the website https://www.opendataphilly.org/dataset/philadelphia-food-access/resource/1813ac51-131a-4f49-90c0-ef00e99c970a
3) I thought it was interesting because I had never considered a correlation between those in poverty and having access to healthy food. I assumed one could just go to a local supermarket
4) I learned from the data that the notoriously high obesity rate that Philadelphia is known for could be caused by the lack of access to healthy food by those in poverty-stricken areas
5) This data could be used in regards to city planning and urban development, bringing more accessible healthy food into lower-income areas -
1) The data set that I found was Red Light Camera Locations in Philadelphia
2) I found the data set on OpenDataPhilly.org (https://www.opendataphilly.org/dataset/red-light-cameras/resource/3a3c307e-1515-45d2-b5b8-8c90336630db)
3) I found the data set interesting because it breaks down the exact location of the red light camera along with the number of citations and warnings that have been handed out over the past year.
4) I learned from the data where to make sure you come to a full stop and not try to speed up to make the yellow light.
5) You could use the data to create a map of where you are most likely to get caught by a red light camera based on the locations of the cameras, as well as the frequency of citations and warnings given out by the cameras. -
-The dataset I found is the distribution of English vocabulary for foreign learners.
-I found the dataset on the website ‘Test Your Vocabulary’ http://testyourvocab.com/blog/2011-07-25-New-results-for-foreign-learners#mainchartNonnative
-After fulling up the questionnaire, the Web will give the result of your vocabulary, and then it has a report for native English adult speakers and foreign learners separately.
-Most native English adult speakers who have taken the test fall in the range 20,000–35,000 words. And for foreign learners of English, we’ve found that the most common vocabulary size is from 2,500–9,000 words.
-As a International student from China, I can compare my Vocabulary with peers who are native speakers, and then I can have a better knowledge of the gap between my English skills and theirs.
For me there is reference standard when I am improving English skills. -
From your ID name I think you are Chinese or have certain relation to China.
Do you have a Wechat account? Or you can just search Chicheng_Zhang, which is my account.
I hope we can help each other in this class.
-
-
Shana Pote wrote a new post on the site MIS 0855: Data Science Fall 2016 8 years, 2 months ago
Class,
For the “Building a Data Dictionary” exercise that I asked you on Friday to do over the weekend, here’s some additional detail we didn’t have time to cover in the classroom:
1) A data dictionary is […]
- Load More