Python Workshop on Web Data Extraction
Workshop Information
This is a following up workshop after Dr. Jing Gong’s Python and Web Data Extraction. The content of the workshop will be the same mostly while slight changes will be made based on the feedback of the first workshop. Click here for details of the workshop.
Date/Time: Wednesday, July 27, 2016 from 9:50 AM to 3:45 PM (EDT)
Location: Room 746, Alter Hall
Step 0: Preparation
- Please install and set up the Python 2.7 before we start. Instructions can be found here: Quick-Guide-to-Installing-and-Setting-Up-Python-2.7. If you have any questions during the sep up process, please feel free to contact our TA for hands-on assistance.
a).Shawn Niederster: shawn.niederriter@temple.edu
b).Xue Guo: tug25690@temple.edu
c).Zhe Deng: zhe.deng@temple.edu
- Please install a browser (e.g., Google Chrome/Firefox) that can view source code of a page.
- Please download all the materials listed before you come.
Topic 1: Python Basics
- Slides: 1 Python Basics
- Tutorial: Tutorial 1 – First Python Script
- Python Script (right click and save as): FirstPythonScript.py
Topic 2: Web Scraping
- Slides: 2 Web Scraping
- Tutorial: Tutorial 2 – Extracting Data from 10-K
- Python Scripts (right click and save as):
- 1GetIndexLinks.py
- 2Get10kLinks.py
- 3DownLoadHTML.py
- 4ReadHTML.py
- CSV file (right click and save as): CompanyList.csv
Topic 3: Introduction to Natural Language Processing
- Slides: 3 Intro to Natural Language Processing
- Tutorial: Tutorial 3 – Computing TF and TF-IDF
- Python Scripts (right click and save as): 5tfidf.py