-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Here is the exercise
Here are the links in case you cannot click from the document.
History, Economics and Social Issues
Science and Health
English, Fine Arts and Entertainment
Remember to […]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Leave your response as a comment on this post by the beginning of class on February 15, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
After reading the article and learning more about data visualization in class, I now think that the “Explore” principle is most important. In my opinion, the “Explore” principle is the whole basis behind data visualization. We hope to display our analysis of the data and our findings but we also hope that through the visualization, the viewer can evaluate the data also. Possibly making further conclusions and insight. If our data visualization tool allows for good exploration, new discoveries possibly can be made that were overlooked prior.
-
I think the most important principle is to View Diversely. Being open to multiple viewpoints is essential because it creates opportunities for new and different ideas. Also viewing diversely causes you to view how things come together in numerous ways as opposed to having tunnel vision.
-
After reading Hoven’s article and the things we were taught in class I think the Ask Why part is the most important. Being able to ask questions about the data you have or the results of the data can really help you make better decisions. It could also get rid of any potential bias that is in your data and this will allow you to make the best decision possible for the company or business you work for.
-
When looking back at Hoven’s article, I thought the idea of “asking why” was the most important. I feel like anyone is capable of gathering data, but I think it takes a lot of refined skill to truly analyze why the data is the way that it is. When you begin to ask why, you start to dig into possible reasons for things occurring; which I believe is the real reason why bother to work with data. If you collect a data set, simplify it and have it laid out neatly, but fail to ask why the data is like that, I feel like the whole process of collecting that data was in vain.
-
From the eight principles Stephen Few discussed in his article, I believe that “Simplify” is the most important. When visualizing data you normally do that for an audience. While you are familiar with the data you are working with, often the people looking at the visualization are not. It is important to make the data accessible for them and allow them to understand the displayed data. Therefore, you would have to simplify the data and decide what and how information is displayed to make it accessible for the viewer. As the author mentions this is a critical step, and you have to put a lot of thought in it to not oversimplify, but also not overwhelm the viewer with information.
-
In Hoven’s article, I think that the “asking why” principle is the most important. In order to properly collect data, people need to understand why that data was collected and presented. Anyone can do a data collection/data search, but not everyone can understand why that collection and search were done. Asking why provides people with an ability to dive deeper into a data set and fully understand it. There’s no point in looking at a data set without understanding what that data set possesses. The whole data set is wasteful without the why part. In addition, asking why allows people to understand not only the data, but complications about the data and things that are potentially causing bias because they are critically thinking about the data more.
-
In my opinion, the most important of the 8 principles of data visualization is to ask why. This one in particular caught my attention the most because when analyzing or observing a data set or visualization, the data does not have a meaning until we give it one by questioning it. Critical thinking in response to presented data lets us look past the numbers and figure out why something is happening, and also lets us discover relationships between variables. Most importantly, once we know why something is the way it is, we can move into action to produce substantial results in response to the problem at hand. This is a powerful tool, and has innumerable applications to the real world and to issues that we face globally today.
-
I think the most important of visualizing data is “Attend.” If you want to create a data set that is relevant and understandable, besides needing simple tools, you’ll also need tools that help you to figure exactly which data is necessary to provide information. Stephen Few uses the example of the ball-passing and gorilla video, which demonstrates how the human mind is prone to distraction. Having a tool that makes it easier to distinguish which data is relevant versus irrelevant makes creating data visualizations is important for any data visualization program.
-
The first core principle, “simplify”, is by far the most important one because it’s the most practical. If you want the audience to sufficiently understand the data you’re presenting then it has to be simple. Presenting it in a complicated fashion with many other things going on can confuse the viewers on what they’re looking at. Also, it could even take away from the actual point that you’re intending to prove.
-
Out of Stephen’s 8 principles, I believe that “Be Skeptical” is the most important principle. Most of us just accept whatever data we find first, and maybe it’s due to laziness or as Stephen states, “….exploring any further is too hard”. It is crucial to question your data and to find other sources as opposed to sticking to just one data that may not be correct; compiling more than one data will provide you with a more validated answer.
-
I think Simplifying is the most important principle. If a visualization is complex and hard to understand, what is the point of this data? If it’s not simple, it is difficult to grasp the data. When you look at data, you want to easily understand it and gain a good understanding of what is being shown.
-
For my stand-point simplicity is the most important. Having looked at various examples of visualisation good and bad, having simplicity is key. It is a pretty easy and mundane aspect but it is vital in data visualization. What is the point of a visualization if people have a hard time understanding it? Why not just look at the data then? Simplicity plays a key role in being able to get your point across efficiently and effectively to an array of people.
-
In my opinion, the most crucial principle is to “Be Skeptical”. Without verification of collected data, the facts remain a mystery. A data set is incomplete as long as its data is not known to be factual. Being skeptical while conducting research with data collection is completely necessary in order to maintain a certain level of credibility.
-
To me the most important principle is #1, simplify. The reason I find this principle most important is because when I make an assessment as to whether I can utilize data visualizations, the first thing I do is make sure that it is readable. Put simply, if I cannot understand it I cannot garner anything from it. This is why I find that rule number 1, simplify, is the most important.
-
I found that viewing data diversely is the most important core value out of Stephen’s 8 principles. I agree to how we should visualize data as openly as possible. In doing so the data presented and the way it is seen shows the same data but different results. Not only is there different results but more insight to the data.
-
From the eight principles, simplicity has precedence as an essential for data visualization, but I believe the exploratory principle is the most important principle because it establishes greater meaning for data visualization. The ability to explore relationships within data is vital for transforming data to useful information. The capacity of a visualization to develop and present relationships in data is valuable for capitalizing on the potential explanatory power of data. The emphasis on the exploratory principle increases that capacity, making the explore principle the most important principle.
-
In my opinion from the 8 core principles of data, visualization is #2, to compare. Since not everyone has a photographic memory to actually remember the data to which we were exposed and then see what changed in another representation of data, being able to “compare” data should be the most important principle. Comparing data for example in businesses it can help make strategic decisions, analyze and evaluate different data sets like performance, sales, etc. I find it useless to have two or more data sets presented with different information and not be able to analyze and realize what’s different. I find this as a really important principle in data visualization as in the real world.
-
I believe that the most important principle is Simplify. When a graphic is overly complicated and more focused on being a cool or attractive visual, I tends to be much worse at conveying what it wants with the data. Therefore, if something is clear and simple, it will be much easier to understand and use the data.
-
In my opinion, the most important principle is #4, explore. I think the exploring aspect of data is most important because the data should not only give us exact answers, but it should also allow us to dig deeper and discover new things based off of what we see. It should be presented in a way in which it allows people to grow and learn more. If people are able to explore the data, it keeps it interesting and lets people stay creative, rather than being something just to “look” at.
-
I think that the simplify step is the most important core value of Stephen’s 8 principles. My reasoning for this is that the purpose of data science is to convert raw data into useful information, and the only way to make that data useful is to simplify it in a way that can be understood by everyone. By simplifying the data we are also able to begin visualizing it as well so before we can begin doing anything with the data we must simplify it, which is why I believe it is the most important step.
-
After reading the article, I think the most important principle is simplicity. When looking at a graph, I think its important that the viewer is able to understand what is going on. Simplicity is the foundation to creating a successful and effective graph. Without simplicity it would be extremely difficult to apply other principles of the article.
-
I feel as though the most important Principle mentioned in the article is the principle of “asking why”. Any amount of data can be displayed and given to us to view, but if we do not delve into the reason for that data and ask ourselves why we should even care about the data, then there is no point to even looking at it. In order for a visualization or a set of visualizations to have any sort of meaning to us, we need to ask why so we can then take some sort of reasonable, logical action from the data.
-
In my opinion, simplifying is the most important core principle given by Hoven. I consider simplifying the most important core principle because if not for this guiding principle visualizations would be unclear. The simplifying core principle is a building block for the other seven core principles. Hoven stated “Good data visualization captures the essence of data” this makes it easy for viewers to interpret and understand the visualization.
-
I think that all the 8 core principles are important but principle # 7 be skeptical is the most important. This principle is the most important because it urges you to question the answers we get from our data. Being skeptical and questioning the answers we get from the data allows us to further explore the data from different perspective and fully understand and connect the meaning. I also think being skeptical and questioning the answers we get from the data allows us to eliminate some of the personal biases we use to interpret data.
-
I believe that out of the eight principles number seven is the most important. When looking at data it is most important to be skeptical because of human error. There will always be cases where human error occurs and when the case does occur, it is important to be aware of it. Otherwise, conclusions and analyses will be skewed based on the incorrect data.
-
Out of the eight principles, I think that the most important would be to be skeptical. It is very important to question every result and try to prove things wrong. That way you can make sure a conclusion is correct – that is – if you cannot prove it wrong after being skeptical. There are always paths for biases to come out in data, and skepticism is the best way to reveal them, without that ability to mistrust data to a certain extent, accuracy and validity would always be in question. Being skeptical just helps us make sure that we aren’t slacking or getting lazy when it comes to data evaluation and collection.
-
The most important principle is the first one, simplify. When looking at a chart, graph, or any other visual, it should be clear enough to know what is going on. A simplified graphic will allow the viewer to see what is being presented and why it’s important. If it is too complex the viewer will either be confused or be annoyed by the difficulty and stop looking at it.
-
I think the most important principle is the seventh, be skeptical. Nowadays when we see data or any other information we take it at face value instead of being skeptical and challenging notions. One of science’s biggest tenets is skepticism because it is important to challenge things that are wrong and presented as scientific facts.
-
I think the most important principle is definitely to simplify it. When using different types of visual data, it can become tricky as all the data could get grouped all together. Since the point of data visualization is to present an analysis of some data, it would make sense for the data to be easy to understand.
-
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Some quick instructions:
You must complete the quiz by the start of class on February 13, 2017.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to […] -
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Here is the assignment.
Here is the worksheet as a Word document to make it easy to fill in and submit (along with your Tableau file).
And here is the data file you will need to complete the assignment […]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Here is the exercise
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Here are some more examples to help you with Assignment 1.
Bad Hypotheses:Philadelphia is a growing area in terms of job opportunities and competitive wages.
Philadelphia has great paying jobs. […] -
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Leave your response as a comment on this post by the beginning of class on February 8, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
The most important takeaway from last week’s lesson for me was how prevalent bias in data is. Sometimes a bias can be obvious, such as in the political and marketing fields. Other times though, a bias may be very hard to see or realize that it exists. There are also times when the bias is not mutually exclusive and several biases affect the data in question. Recently, when choosing an institution to attend to complete a BA in Finance, I had to compare data from a few different Universities. Although most of the data I reviewed favored Temple, I was heavily biased toward this University regardless. This was because my brother recently graduated from Temple last year. This resulted in a signal problem because I was well aware of the bias and tried to look past it when making my final decision. To counteract the bias and signal problem in my decision, I could have made my choice strictly based on ranking of the schools by a third party. This would show me which school was better, strictly on merit. Taking my bias toward Temple out of the decision.
-
What struck me most about our discussion regarding bias in data was the notion that is an inherent part of data and information. These outlying pieces are in every dataset, because no data set is perfect. When I was shopping for Christmas presents this past December on Amazon, some of the reviews reflected a bias. The reviews on Amazon are sometimes dreadfully done, e.g. people review the fast shipping rather than the product, people not knowing how to use the product properly, people not actually buying the product, ect..I could have looked at the highest rated reviews to eliminate bias for the products because other users would have acknowledged that someone wrote a good review as opposed to someone who did not.
-
What struck me the most in terms of bias and the Signal Problem was that there is no overall cure for bias. No matter what anyone does to make a sample better (i.e., have it be random, have a bigger sample, etc.) bias will always be present. There is no full proof method to fully eliminate bias. The best anyone could do is reduce it to the slightest extent but any sample has to be taken with a grain of salt because some bias will be evident in any set of data. And even if the bias is not glaring out at someone, there are little hints of bias that exist when a person thinks critically about the data they are analyzing. Recently, I was searching for a new pair of shoes and was researching reviews on different types of shoes that interested me. However, all the reviews were biased because most people had strong views on the product, both good and bad, which influenced their say in the review. Also, the shoes were on many different websites, so the demographics for each website may be different and some people of different demographics may have a different point of view than someone from a different demographic. A signal problem existed because I was aware of how opinions of non-professionals are not a good indicator of how to judge a product, idea, or anything else. In order to counteract this signal problem, I would have to have professional shoe raters test the shoe and give feedback on the shoe. If their input was similar to the customers’ reviews, I know that it is a trustworthy decision to either buy/not buy depending on the result. If their input was different, I would side with the professionals slightly more, but I would average the two scores and see the potential result from that.
-
What I took away from last week’s discussion on the signal problem and bias was that we must take the necessary steps to try to eliminate bias from datasets and to try to account for any signal problems with data before they happen. Whenever I plan to buy something online, no matter what it is, I always try to read the reviews thoroughly in order to obtain the thoughts and opinions of other people who have purchased the product. The problem with many online reviews is that the people who submit them are either really pleased with the product they received or they’re really displeased. The majority of reviews are not based in the middle ground of grading (such as 2-4 stars,) but they’re either 1 star or 5 star reviews. It normally results in a slight signal problem because the people who leave the reviews have very strong feelings about the product, as opposed to those who were merely satisfied instead of amazed. I tend to counteract the signal problems with online product reviews by seeing how valid peoples’ reasons for reviewing are, and making a decision based on the less extreme reviews.
-
The most important part about our discussion of bias was how it can alter people’s decisions in bigger ways than I thought. Also I thought it was interesting to find the bias in online sites like Yelp. Also the signal problem was interesting because I learned you can’t just accept the data as always being correct. You actually have to dig and do research in order to make sure the data you are looking at is accurate. A recent decision I made was getting a new laptop for Christmas. I had to look at many reviews online in order to make sure I was getting a good laptop. Most people are bias when they create a review so I had to be aware of that and try and see if I could find more accurate reviews. I used several websites in order to make my decision which allowed me to asses the reviews clearly and see which ones were potential bias or not.
-
The most important takeaway is how not all data is complete or representative of everything/everyone. I’ve recently used Yelp to decide on a restaurant. I had a bias on price, so some restaurants were automatically eliminated as an option because they were too expensive. I possibly could have eliminated really good restaurants because of my price bias. Instead, I could have still considered the more expensive restaurants.
-
The most important takeaway from last week’s discussion about bias and signal problem would be that when collecting data or implementing a change, we must really be aware and a conscious effort to eliminate bias and signal problem as much as we can. When I was applying to colleges about a year ago I used a website called Niche to see my chances of getting accepted into my desired colleges and to figure out what colleges I needed to really focus on. Niche had data of students SAT scores and GPA and whether they got accepted, rejected, or waitlisted for that specific college. There were many biases in website where the sample size was too small, they didn’t take into account any other information, and much of the information was probably only from students who really cared and were really happy or sad about the results. There was a signal problem where there wasn’t representation of many different types of people, only the selective amount that used the website. I realized that I should be using a more reliable source and ultimately no website can truly tell me if I would get into my desired colleges because there are way too many variables colleges look at during admission process.
-
Recently I have gotten a new phone and I used data to help me decide which phone to get. After looking at reviews from multiple sights, I finally decided on the S7 edge due to positive feedback and its specs appealed to me. I was already pretty biased towards the phone before searching other alternatives but the reviews seal the decision for me.
-
Throughout the whole discussion of bias and signal problems in data, the main thing that stuck with me is that most of the data we use is biased in some way. Whether if it is reviews of food, places, events, or products the information found there will be skewed in some type of way. It isn’t always necessarily biased purposely, but still the subjective opinion is present. For example, last week I was trying to decide between two pairs of shoes to by for myself. I looked up reviews of both and got mixed reviews on which shoe to choose. The problem I noticed was that the reviewers factored in other shoes’ quality and design along with the current shoe so it through everything off. I ended choosing by a flip of the coin. My best advice on how to counteract the inevitable bias in the data we use is to take it all “with a grain of salt”. We should all just try to experience or try it our ourselves so we will know if it works for us or not. An opinion from another person always makes me iffy until I can make my own judgement.
-
I feel as though the most important takeaway from last week’s discussion on bias is the fact that bias is always there and as long as we are aware of said bias, we can reduce the chance of making a mistake or influencing conclusions when analyzing data. I used the data of various prices when deciding whether to buy Ben and Jerry’s or 7/11 brand ice cream. The Ben and Jerry’s was more expensive, and with my bias, I usually tend to go for the cheaper option when purchasing something. However, my bias of knowing that I like Ben and Jerry’s and being loyal to the brand, also had an influence over my hesitation in automatically buying the cheaper ice cream. There could have been a signal problem because there were only two brands of ice cream to choose from, not all brands of ice cream were there for me to base my decision on. I might have counteracted this bias by attempting to base my decision on the physical ice cream rather than by the price or my brand bias.
-
In my opinion, the most important takeaway from last week was the topic of data fundamentalism which is the notion that correlation always indicates causation, and that massive data sets and predictive analytics always reflect objective truth. There hasn’t really been a recent case in which I collected data, however, the first example I thought of for this questions is in regards to the recent presidential election. Many people, based on data and statistics, believed that Hillary Clinton would win the election. The reason why the data was incorrect and Donald Trump won the election, is likely to be due to an error in the collection of data is regards to data bias and a signal problem.
-
My biggest takeaway from last week’s class on bias and signal problem is being aware of the data. The data that you have or you obtain should be questioned to check for its integrity. We must as the “why” and “how” questions to provide greater higher level insight into that results. Also, data must be checked to see if it represents the population or the sample size and for accuracy. A quick example of me using data is choosing NFL(football) players for fantasy sports line up on Fanduel. The concept is fairly simple, look at the various players and their past data in regards to their performance from previous weeks and versus other teams. Then select players who I think will play well in order to win. A lot of times I run into a dilemma with bias choosing my players from my favourite team, even though the numbers might say otherwise. To deter any bias or signal problems I can pick games in which my favourite team is not playing, which will eliminate personal preferences.
-
During the discussion on Monday I found it to be the most interesting that large data sets can be greatly skewed due to the signal problem. Data can be skewed due to the fact that in low income areas people cannot afford smartphones where data is taken from. Therefore, large portions of the population are left out of data sets. Recently I was online shopping for my best friend’s 21st birthday and when I was looking through the Forever 21 website I was having a very difficult time finding something that I would like to buy. The main reason being that when I looked at the pictures of the model wearing the clothing I thought the article of clothing was very pretty. But then when I read the reviews they were very extreme. For the most part the only reviews that were there were reviews that were highly praising the article of clothing or saying how terrible they were. These reviews could be affected by the signal problem if the people that bought the clothing do not have access to a computer or smart phone. Some articles of clothing that are sold online are also sold in stores. The reviews of the clothing that are also sold in the stores are not being accounted for.
-
In my opinion, the most important takeaway from the discussion regarding bias and the signal problem is that biases are inevitable in Big Data and the signal problem will always under-represent the population. I recently use data when deciding what classes I would take for this semester. I used the Rate my Professor website when picking classes. The data on the professors researched are biased, with the ratings going from 1 to 5 and mostly students bashing their professor if they did not get a good grade. A signal problem in the data on rate my professor is that all of the students who have taken the professor do not leave a rating. This skews the data making it bias to students who only have had a great or terrible experience with the professors.
-
In my opinion, the most important takeaway from last week’s discussion on bias and the signal problem is to always be conscious/aware of data and to analyze it. Bias is widespread, and it can make decision-making difficult. Moreover, it is important to thoroughly analyze data and question it instead of simply accepting it. An example of how I used data to make a decision was when I was debating on whether or not I wanted to buy a specific make-up product. I became interested in the product after watching a video on a make-up artist rave about it. However, when I read reviews on the product, there were people who either loved it or disliked it which wasn’t very helpful. Reading the reviews left me conflicted and skeptical, and I didn’t know whether or not I wanted to purchase it. To counteract the bias and signal problem, I decided to buy the make-up product and try it out since I’m able to return it if I’m not satisfied.
-
For me, the most important takeaway from our discussion regarding bias and the signal problem is that biases are inevitable when working with data. Humans are always involved when collecting data. They can be the ones that provide the data or the ones that designed a way to collect the data. It is human to be biased, humans create data, thus data will always be biased, too. It is important to remember that once you are aware of the bias you can take action to reduce it and secure the integrity of you insights won from data. I recently used data to decide which smartphone I would buy. I looked into different manufacturers and operating systems and ended up buying a Samsung smartphone. One bias was that I wanted to buy an Android phone, since I am learning how to build apps for that OS, and my previous negative experience with Sony. Although I also looked at Apple and Windows phones, I was more looking to find reasons why to get a Samsung instead of actually comparing the different phones’ technical data. Therefore, you can definitely say that there was a bias present, which also resulted in a Signal Problem. To counteract this signal problem I could have copied only the technical data in a word file and remove the name and brand of the phone. Then look for the best data and match it with the phone. I would have made a decision based on solely data and not have been influenced by emotions.
-
The most important takeaway from last week class was acknowledging that bias exist in data collecting and interpretation and that the signal problem can affect the credibility of the data. Over the weekend I was decided where to order food, I look at reviews on yelp and google reviews and decided on Italian Kitchen a local pizzeria up the street from my house. I picked that pizzeria based on the review and on the fact that I have order from there before and the food was good, unlike zesto’s where I had a bad experience in the past. Due to my bad experience at zesto’s a bias already existed in my head that their service was bad before reading the reviews.When dealing with bias it’s best to collect more data, increase the sample size to minimize errors. It also helps to be specific about the kind of data you are looking for.
-
The most important takeaway from last week’s class in my opinion was the identification of personal bias as a signal problem in data collection. An example of bias would be how I was looking for places to eat this weekend on Zomato, but for almost all of the restaurants I saw had mixed reviews, either boasting about good food, or complaining about bad food. These reviews weren’t reliable because only people who experienced great or poor meals at the restaurant would be more likely to leave reviews, as oppose to people who’ve had average meals. In the end, I just ended up getting McDonald’s and calling it a night.
-
The most important thing that I took away following reflection on biases in data sets and the Signal Problem is the simple realization that data is not neutral. As the article by Kate Crawford put it, they are creations of human design. We are the ones that point out relationships between data sets, and we decide what they mean. Recently, I used data sets to make a decision on signing a lease for an apartment. I compared data and performed cost-benefit analysis for each apartment with my prospective roommates, and we came to a decision. However, the biases I had during this process included my feelings towards a certain unit and the features of the unit such as price, location, size, amenities, etc. I do not believe a signal problem arose from these biases though, because I was not the only one making the decision on the apartment so compromise with my roommates was a factor. Also, the comparisons to other units prevented a blind assumption or inference on an apartment, especially having toured the units previously. Regardless, bias in data sets is definitely something to keep in mind for the future when comparing data, as it has possible major implications for an outcome.
-
My most important takeaway from last time’s discussion was the skepticism we always need when approaching data sets. Although we may have reduced error to a minimum, there is always something that can be questioned whether that be the sampling method, sample size, biases, or the Signal Problem. I recently used some data to find the best bagel to buy at Bagel Hut. I looked at the options and gathered information by asking customers what their favorite bagel was and why that was so. I found that most people at that time recommended me to get the spicy cheddar with cream cheese. I realize that there was definitely bias in my little sampling. For starters, what I did was a convenience sample, I just asked the people available at the time. Second, this was at 9 o clock in the morning, there is probably a difference in tastes between those who come early vs those who come later. Also, this wasn’t a very representative sample of the Bagel Hut customer base. When I usually come, there are always professors, Caucasians, African Americans, Middle Easterners, men and women. That day however, there were mostly older professors all Caucasian. I obviously had a biased sample. The signal problem also applies to this situation, I had lost the voices of other people who didn’t have class in the morning and/or those who woke up later. In the end, however, the spicy cheddar was delicious.
-
The biggest takeaway for me was the fact that bias truly cannot be eliminated from data, it can be minimized but not taken out. It made me think of all the statistics and polls we hear about in the news and how we have to take information with a grain of salt because even statistics have heavy bias in them.
-
I believe the most important takeaway from last week’s discussion was that although data is a powerful tool, it is never the complete answer. Useful research will always take this into account and supplement findings with other measures for accuracy and completeness. For example, when I was using census data to make conclusions about different hypotheses for the first assignment, I most likely would not reach the most accurate conclusions about how the income per capita of a city is representative of the average wealth of a city’s resident’s. This is because I would probably encounter a biased finding from a data set easily affected by a signal problem. In this case, census data can create a signal problem if measures are not taken to correct for underrepresentation of people with income below the poverty line, who have many factors that prevent them from fully partaking in census data collection. I might have accounted for this signal problem by supplementing my findings with specific measures of poverty to cross-check the accuracy of my initial conclusions.
-
The most important takeaway I had from the class discussion was that of the signal problem. I still think it is strange that the bias of the signal problem can’t be removed from the situation no mater how big the data pools are. I recently relied on Amazon revews to make decision for a purchase for a hair product and i can see how bias would be encountered there since it sounded like the customers who left a review were either very satisfied with the product or completely disappointed with and there were not many people in the middle. I could have tried to counteract the bias by checking the item/s reviews on other websites or trying to ask my family and friends if they had ever had used it before.
-
The most important takeaway from last week’s discussion was that bias can be both deliberate and accidental. That being said, I also know how important it is for an individual to properly analyze data in order to avoid bias, and if the data is biased, then the individual must find a new data set. Recently, I was looking at average scores for SATs by state, and I found that the southern states averaged higher than the northern states, but that made no sense since the northern states’ education was better than that of the southern states. That’s when I realized the data didn’t account for the number of student who took the SATs. I had to go to a different data set that showed me that variable, and I found out that even though the northern states had lower score, they all had more students (>50%) who were able to take the test. In some of the southern states, the percentages were in single digits (<10%). The bias made it appear as if states with good standards for education were in need of help, when in fact they were not.
-
The most important takeaway from the bias and signal problem for me was that you should always be skeptical with the data that you are using. An example of this is when I was working on a project for another class on government surveillance. If I had just looked at what data the government was collecting using conspiracy nut websites I would probably have concluded that the government was always watching me. However by going to the NSA’s official website I was able to pull off a full list of the data that the NSA collects from its citizens. In addition to using the NSA’s website I also used some of the articles that we read in this class to get a better idea of what data was really being collected. The key point here is to use data from multiple sources so you will not inadvertently express a bias.
-
One takeaway I garnered from this weeks lessons’ is that bias is within every study by virtue of the fact that human beings are biased, and human beings conduct studies. All though I was not specifically using the data, the dissenting opinions of the new presidents cabinet as well as specifically from himself about the validity of polls from left-wing sources is something I find interesting. First of all, the election polls were wrong from many outlets, but many of them used the popular vote rather than by checking on a state by state basis. A new myth is pervading political discourse, that is that statistical studies can be drawn up in any way based on the bias of the person creating the analysis. I do not know if this is true, but what I do know is that although there is bias within every study and news outlet, that alone should not be grounds to denounce anyone who disagrees with you. In fact, data and studies contradicting your opinions should be the ones you pay the most attention to.
-
The most important thing I took out of last week’s discussion on bias and signal problem is to actually assess the data that is produced right in front of me. My most recent incoming with this situation is picking my fantasy basketball team for the week. I never thoroughly looked at stats that the players on my team provide, but rather just picked on the performance from previous games. I know now to look at stats, and imply them into game time situations and how they play against the team at hand.
-
The most recent instance of the use of data in my every day life is looking at the first week sales of some of my favorite music artists to determine their popularity. I have determined that if an artist exceeds a first week sales number of over 50k, they have reached a new increment of popularity and 100k and so on and so on. I believe that the one thing that we can look at in music to quantify popularity is the first week sales because that is when demand for the the artist’s project is the highest.
-
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Some quick instructions:
You must complete the quiz by the start of class on February 6, 2017. The quiz is based on the readings for the whole week.
When you click on the link, you may […]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
See the completed data dictionary here.
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
In class we talked about a few examples of open data. Here are some others:
Business: data.gov’s “Impact” section
Science: The Genomes Unzipped project
Government: New York City parking violations
Jour […] -
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Here is the exercise.
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Leave your response as a comment on this post by the beginning of class on February 3, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
http://www.jdsupra.com/legalnews/cyber-ecurity-recent-developments-in-21890/
This site has a lot of great information regarding recent developments in cyber security relating to financial data. The one entry on the site states that FINRA (Financial Industry Regulatory Agency) fined 12 brokerage firms $14.4 million for violating federal security laws and rules. The firms failed to store records and communications in the WORM (write once, read many) file format. There is another report on there about how they are planning to improve cyber risk management standards for large interconnected financial firms in our country. I am glad to see that the agencies overseeing cyber-security at financial institutions are staying diligent protecting this data from threats.
-
http://fortune.com/2017/01/22/climate-data-trump-admin-hackers/
This article discusses the efforts that are underway to secure data collected by U.S. government agencies (EPA, and NOAA) in order to prevent its destruction under trumps administration. Some data regarding the climate has already been destroyed under the orders of the Canadian PM. The race against the government to store the data is truly interesting albeit somewhat scary. -
http://motherboard.vice.com/read/big-data-cambridge-analytica-brexit-trump
The article I found was interesting to me as it relates to recent major political stories from around the world, but specifically in the United States. The focus was the dangers of big data and the sheer power and influence it possesses. Specifically, the main subject of the article was a company called Cambridge Analytica. This was the company behind the online campaign of two major events in the recent past, Brexit and the election of Donald Trump. Cambridge Analytica is a Big Data company, and they utilize big data statistics for political purposes. The whole process started when a man named Michal Kosinski attended Cambridge University for his PhD. There, he learned of a process called Psychometrics. The process focuses on recording psychological traits of individuals, and from these measurements they can determine someones personality traits, or how they are likely to respond to certain statements. It started as a personality questionnaire app that allowed respondents to share their results on Facebook. However, more applications were discovered for the technique. Soon, the method was being used to target specific voter bases for votes such as Brexit and the 2016 Presidential Election. The company Cambridge Analytica eventually used the method to target certain voters who would likely support Hillary Clinton, and fed them statements over social media platforms such as Facebook that aimed at getting the voters upset with Clinton and suppressing their vote for their candidate. With this system in place, Trump gained an advantage and eventually went on to win. The article highlights a very powerful method of using Big Data, and shows that data can truly change the world when used in a certain way. -
This was a really interesting article that I found just doing my regular browsing on reddit. The is a much less an article but more of a picture in the form of an interactive visualisation. The visualisation lets you pick a state and see where the people moving out of that state move to. It uses U.S. Census Bureau data to sum up the and relate people moving from one state to another. Just from a basic stand point this is very interesting. Looking deeper as our class readings state this can help bring forth more qualitative insights or theories.
-
This was a really interesting article to me because the past few months have been controversial about how Donald Trump did not win the popular vote. It has come up even more frequently in recent weeks as Trump has taken office and made some questionable decisions in office. However, under a new system of electoral college, Hillary Clinton could have won the popular vote by five percentage points but still have lost the election. In order for Clinton to have one, many states would have had to switch their voting to the way Nebraska and Maine does it, with congressional districts being split up. Very interesting for people interested in politics, like myself, as well as people who believe the electoral college is ridiculously (also me).
-
http://www.infoworld.com/article/3161222/analytics/getting-off-the-data-treadmill.html
This article explains how many business just use analytic tools and draw connections between different sectors of their business from these tools. Mintz describes that how that method does not show why certain connections happen making them realize that you have to go back to the workbook stage to figure out the meaning of the data all over again, causing this thread mill effect. The author makes a vague statement by saying, “It’s much less about incorporating the latest machine learning algorithm that delivers a 3% improvement in behavioral prediction, and more about the seemingly simple task of putting the right information in front of the right person at the right time”. Although it seems easy to say its the simple task of putting the right information in front of the right person at the right time, that is a vague statement that doesn’t give much information on how to achieve it. -
This article stated that the country of Finland won the mot Olympic medals per capita. They have about 2.30 medals per million of people in their population. They came to this conclusion by taking the total number of medals won divided by the number of summer Olympics competed in.This article was interesting to me because when you think of Olympic medals you’d only think of who won the most overall not how many were won per the people that country. Even though this is a random fact it is still interesting nonetheless.
-
This articles talks about how “Big Data” can help improve machine learning. AMPLab was created with “a vision of understanding how machines and people could come together to process or to address problems in data.” That could mean more accurate predictions by machines which will in turn relieve some analytical responsibilities from humans.
-
https://www.nytimes.com/reuters/2017/02/01/business/01reuters-usa-economy.html?_r=0
This article describes data that proves that the national factory activity has increased to a level higher than it has been at in the past two years. This increase in factory activity indicates an expansion in manufacturing which accounts for 12% of the U.S. economy. The data of the index came from the Institute for Supply Management(ISM)’s production index. ISM is a source of great interest to me as I am a Supply Chain Management major and ISM will help me throughout my supply chain career. This article is very intriguing because it describes the impact the economy can have on aspects of the supply chain, like the fact that the increase in manufacturing came with an increase in both quantity of orders and prices of raw materials which I will have to understand and adjust for in my career when I may need to procure those raw materials or facilitate those orders.
-
https://www.bloomberg.com/news/articles/2017-02-02/factory-skills-gap-could-spell-trouble-for-trump-s-jobs-plan
This article is interesting data-driven analysis regarding the US job market under Trump’s Administration and if their plan to bring back US factory jobs can be successful. The data shows that there are 340,000 possible manufacturing job openings this year alone. The skill gap is one large factor standing in the way of Americans and nearly 340,000 factory jobs. Today’s labor market has about a 43 Percent talent scarcity. While the data shows that the tech force is almost single-handedly offsetting the skill gap. Factory workers are still left out of the loop as Large manufacturing companies look give jobs to robots. Trump and his Administration now face the task of getting manufacturing companies back to America and having them employ humans. -
http://www.cnbc.com/2017/02/02/snap-ipo-s-1-filing.html
Snapchat is very well known app with more than 100 million downloads. This extreme successful private company has filed for an IPO, initial public offering. This means that snapchat is no longer a private company but will be able to be owned by share holders or at least a percentage. Snapchat requires an active phone number and contacts syncing them with your phone and others to have this huge database of exchange messages via video, images, and text.
-
https://fivethirtyeight.com/features/trump-could-really-mess-up-mexicos-economy/
This article by Lucia He, which was published on fivethirtyeight.com, describes the negative influence of Donald Trump’s Anti-Mexiko way of politics. It uses analyzed data starting from the time he first introduced these ideas in his candidacy for president of the United States until today. The author makes clear, based on economic data from the Mexican government and the WTO, that Mexico depends on the US as an important source of income, e.g. money flowing back from Mexican, who emigrated to the US and now support their families, but also makes clear that both countries profit from the free trade they were having. Trump’s anti-mexican campaign would therefore hit Mexico very hard, which can already be seen by data and the projections of experts, like those of the Mexican bank Banamex, but would also hurt the US in the long run.
The article does a great job to support its arguments with the use of recent data from credible sources and therefore is very interesting for me, as that is what I am trying to learn – finding useful information in a big set of data.
-
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Here is the exercise.
And here is the spreadsheet you’ll need [In-Class Exercise 2.1 – 2015 Car Fuel Econ [Start]]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 8 months ago
Some quick instructions:
You must complete the quiz by the start of class on February 1, 2017. The quiz is based on the readings for the whole week.
When you click on the link, you may see […]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 9 months ago
Here is the exercise
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 9 months ago
Here are the instructions in word (and as a PDF). Make sure you read them carefully!
Please submit the assignment via Temple OWLbox.
When your assignment is complete, you should email the document in .docx […]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 9 months ago
Leave your response as a comment on this post by the beginning of class on Januar 27, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your o […]
-
Conventional wisdom says that eating food that is high in fat content will make you fat.
Data: IN order to test this idea we would have to get a sample group of people representing different groups of people and keep them om strict diets. Some receiving food that is high in fat, others eating food that is very low or contains zero fat. All sample subjects would be limited to the same amount of physical activity also. All other variables would have to be identified and factored into the test in addition. This sample set would have to be continuously weighed and body fat measurements taken in order to see if the fat content in food is the determining factor in gaining fat. -
One of the most common pieces of conventional wisdom I’ve ever heard was to never eat any snacks or junk food before having dinner because you’ll no longer have an appetite. To test this piece of wisdom, I would two sample groups of people. One of the groups doesn’t eat anything before dinner and the other groups eats as much candy and junk food as they like. Then I’d give both groups the same amount of food on the same and they could have as many plates as they’d like. Finally I’d compare the amount group #1 and #2 ate and compare the numbers to see if this piece of wisdom turns out to be correct in practice and theory.
-
One piece of conventional wisdom that comes to my mind is, “if it isn’t broke, don’t fix it.” The phrase means that if something is functioning as it should be, then there is no need to tinker around with how it is working. Some data I would want regarding this statement would be how efficiently the thing is working. If the thing is working, but it isn’t working to its full capacity, then it could possibly use some fixing.
-
One conventional wisdom that I’ve always been told is, “if you get wet in the rain you will get sick”.
I would collect data by observing how many people get sick by going out in the rain and getting wet. I would conduct this data multiple times with many different types of people on different days. I would see how soon the subjects get sick and also take note of the weather outside on the days they got wet in the rain. -
One piece of conventional wisdom that I have heard is “No pain, No gain.” The data I would collect for this would be to use two weightlifters. I would have one go extremely hard until he starts to feel the pain of the lift by using more weight and less reps. For the other lifter I would use a smaller amount of weight but maybe use more repetitions. I would then keep this up for a few months and see which lifter has more muscle. I would then be able to see if pain allows a lifter to gain more muscle.
-
One conventional wisdom, which is widely spread is the assumption that males are naturally better at science than females. To test this conventional wisdom I would start with collecting data on the grades of boys and girls of our standardized tests. The math section on the SAT test could be a first source of data, as it is easy to obtain and has enough data point to be a reliable sample. Still, I would try to also get more data from different sources, which e.g. ask questions differently, as we know that boys and girls respond differently to different styles of teaching/learning, to avoid bias and jumping to conclusions and missing the actual reason behind the widely believed phenomenon that boys are better at science than girls.
-
The first conventional wisdom that comes to mind is “the more time you spend studying, the better you will do in class”. They conventional wisdom flat out states the greater the amount of time you spend studying the better you grades will be. This also brings forth a relation or correlation between time spent studying and grades. The way I would test this is by collecting data of all the students with their grades and the amount of time they spend studying.
-
“If you work hard you will be successful” is a conventional wisdom I have heard all my life. This is not necessarily true hard work doesn’t automatically equals success. The are people who have worked hard all their life and they are not successful and there are people who just attempted something and became successful. To test this I would take of survey of people who are successful and compare how hard they work to achieve their successful.
-
A piece of conventional wisdom that I have heard is that being a gymnast makes you short. To test this statement, I would look at data stating the average height of both male and female gymnasts during childhood, adolescence, and adulthood. Then, I would examine the type of impact gymnasts endure on their bodies and determine through medical research if the kinds of falls and landings they perform do indeed stunt the body’s growth. Finally, I would compare the average height data set and the impact data set and look for a correlation between the two. Depending on this, I will be able to discern if being a gymnast actually makes someone short.
-
A piece of conventional wisdom that I have always been told is that you should wait at least 30 minutes after eating before going swimming because it can upset your stomach and affect your ability to swim. One could use data to test if there is any truth to this piece of conventional wisdom by conducting two separate studies on the same group of people. One day, the group of people could go swimming immediately after eating the same meal and evaluate their ability to swim and their health status. The next day, they could eat the same meal, but this time wait 30 minutes, and see if their swimming ability or health status changed because they waited to swim.
-
An example of Conventional Wisdom that comes to mind is, “You will not be successful without higher Education.” Even though higher education has some correlation with success, it does not guarantee it, nor does it mean that you will not succeed without education after high school. I would test this by doing research on individuals who have become successful with and without higher education, and look at what paths they took to get to that point.
-
A piece of conventional wisdom I have heard/been told is to wait at least half an hour after eating before swimming to avoid cramps. To test this, a sample group of people would be needed. On one day they would consume the same meal and swim in a pool immediately after. On the next day, they would again consume the same meal and wait half an hour before swimming in the pool. On both days, each person would be monitored and evaluated to see if they experienced muscle cramping. This could potentially prove whether or not this specific piece of conventional wisdom is true or not. However, I am aware that there are factors/variables involved (i.g. weight of each person) that need to be acknowledged in this test.
-
An example of conventional wisdom would be that same-sex couples are unfit parents. I would test this by finding same-sex couples throughout the US with kids. I would collect subjective and objective data. The subjective data would come from surveying the parents and children through a series of questions like: “On a scale of 1-10 how happy are you?”. I would collect the objective data through school test scores and physical reports from doctors to see how the parents affect the children’s health and academics.
-
One Conventional Wisdom that I have to hear often is “great things come to those who wait”. This statement does not always prove to be true, I am sure we can all recall hearing stories about people who were in good positions but waited too long and let opportunities pass them by. Great things are sought after, not waited for. In order to support my claims against the old conventional wisdom, I will use big data. From a business aspect, I would test how long senior VPs, CEOs, CFOs, other executives, and managers have been in base roles before moving up into their respective roles compared to other business professionals who are relatively the same age and not yet advanced in their careers.
-Jibreel Murrray
-
One piece of conventional wisdom that I have heard is that eating green vegetables will make you stronger. I would take a random sample of 100 kids aged 16-18 who are relatively the same sized height and weight. I will evaluate each person’s muscle mass before any vegetable eating or working out has been performed. I will flip a coin, heads being Group A and tails being Group B. Both groups will go to the gym five days a week, but only Group A will have the vegetables. They will have the portion of vegetables at dinner as stated by the food plate by the government. After about six months, I will retest the muscle mass of both groups and see which group has grown stronger, or if there is no difference at all.
-
One piece of popular conventional wisdom commonly used to ease a person’s concerns is “you can’t know what will happen in the future.” Obviously no one can be certain about the future, but I think it would be useful to test the correctness of this statement by evaluating if we can, in fact, have a high enough level of certainty about future events. To test the accuracy of this statement I would have two groups of people participate in the stock market, a marketplace commonly accepted as unpredictable. One control group would live by this conventional wisdom and just make decisions based on historical data, not concerning themselves with predictions of the future since they believe they can never know what will actually happen. Another group will focus on using data from the market and other sources to make predictions or forecasts about the future with a great enough level of certainty to be confident that something will likely happen or not. Then I would measure the accuracy of this group’s predictions against the market performance of the control group and evaluate whether or not it simply more beneficial to not be concerned about knowing the future.
-
A conventional wisdom that I have heard is that drinking one glass of red wine per night is actually beneficial to your health. Although too much red wine can cause detrimental effects such as alcoholism, cancers, and high blood pressure, other studies show that a glass of red wine per night can lower incidences of cardiovascular disease, mortality, and type-2 diabetes. In order to conduct my own experiment, I would establish two coed groups. Each group would have a range of ages from 21-60. Group A would be the group to test the idea and Group B would act as the control group. Group A is to drink a specified amount of red wine every night for 3 years (this amount would not exceed one glass per night). Group B would be asked to refrain from drinking red wine for the 3 years. Throughout the 3 years, the members of each group would have regular doctors appointments where they specifically analyze the changes in their health and compare the two groups’ doctor reports.
-
A conventional wisdom I’ve constantly heard is “You are what you eat”. I personally don’t agree with this because I am definitely not what I eat. I can eat unhealthy and healthy all I want but it won’t look like I do, at least currently. The data that could be used to test this is to get two sample groups between the ages of 20-35, one group who eats healthy and one group who eats junk and unhealthy foods. With both of these groups, we can evaluate each person’s body weight, size, etc. and keep track of what they eat on a daily basis.
-
One piece of conventional wisdom I’ve heard before is that the earlier you leave out for work, the less traffic there will be. I would measure this by measuring traffic speeds and car density on a particular stretch of road like a highway during every time period. Because maybe it is possible that being on the road early is too popular of an idea and that the roads may be actually more crowded earlier in the morning.
-
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 9 months ago
Here is the exercise
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 9 months ago
Some quick instructions:
You must complete the quiz by January 27, 2017 9:00 am.
When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to sign in. […]
-
Ermira Zifla wrote a new post on the site MIS 0855: Data Science Spring 2017 7 years, 9 months ago
Here is the syllabus for the course.
You should read the syllabus carefully. Everything you need to know is in this document.
- Load More
Kaia Bentsen, Pat Murt, Tyler McCaw