Shawn Niederriter's E-Portfolio

PhD Research Project

This Spring, I have helped out Dr. Jing Gong and other members of the PhD research group with collecting and extracting data from Google for use in their analysis research related to ranking Non-Organic Search Ads of specific search queries. I wrote a Python-based “Google Crawler” program that is a web crawler specifically fixed to search and collect information from Google. At it’s core, Google is essentially a giant “web crawler”, making it’s way around the web to grab every link on the internet before ranking it 1-n. However, they are very strict about stopping these “web crawlers” from being able to freely collect data from Google, so you have to be careful about the amount of information you request at one time. Overloading Google with too many requests at once will lead to you getting your IP address blacklisted by their servers, being unable to use any of their services.

A second important factor of the research project was related to being able to imitate specified locations so Google’s geo-location tags would render relevant Non-Organic Search Ads for us to compare the same keywords across multiple locations. As I mentioned, Google does not take kindly to people collecting their data. Equally so, they do not want people to imitate traffic from different locations. To imitate traffic coming from specific locations, I used a method called proxying to mask the actual location of where my search came from. Since Google is very good at picking them up, it takes a good bit of try and fail to really get it right.

All in all, I really got a lot out of this experience. Though I was familiar with how Google search methodology’s worked, I had never been able to successfully imitate my location to Google. I learned exactly how proxying works and was able to finally accomplish this. I am going to continue improving this project by adding steps in that make it easier for them to continue running my Google Crawler when I graduate in the Fall. I also have become interested in a PhD program myself, so getting to work with them directly was a great experience I have and will continue to value as we continue to work on this project.

Shawn J Niederriter

PhD Research Project