Community Platform
Interests
Not Available
This Year
No Points
Total
No Points
MIS Badge

Click here
to validate the recipient

Python Workshop on Web Data Extraction

Workshop Information

This is a following up workshop after Dr. Jing Gong’s Python and Web Data Extraction. The content of the workshop will be the same mostly while slight changes will be made based on the feedback of the first workshop. Click here for details of the workshop.

Date/Time: Wednesday, July 27, 2016 from 9:50 AM to 3:45 PM (EDT)

Location: Room 746, Alter Hall


Step 0: Preparation

  1. Please install and set up the Python 2.7 before we start. Instructions can be found here: Quick-Guide-to-Installing-and-Setting-Up-Python-2.7. If you have any questions during the sep up process, please feel free to contact our TA for hands-on assistance.

        a).Shawn Niederster: shawn.niederriter@temple.edu 

        b).Xue Guo: tug25690@temple.edu

        c).Zhe Deng: zhe.deng@temple.edu 

  2. Please install a browser (e.g., Google Chrome/Firefox) that can view source code of a page.
  3. Please download all the materials listed before you come.

Topic 1: Python Basics

  1. Slides: 1 Python Basics
  2. Tutorial: Tutorial 1 – First Python Script
  3. Python Script (right click and save as): FirstPythonScript.py

Topic 2: Web Scraping

  1. Slides: 2 Web Scraping
  2. Tutorial: Tutorial 2 – Extracting Data from 10-K
  3. Python Scripts (right click and save as):
    1. 1GetIndexLinks.py
    2. 2Get10kLinks.py
    3. 3DownLoadHTML.py
    4. 4ReadHTML.py
    5. CSV file (right click and save as): CompanyList.csv

Topic 3: Introduction to Natural Language Processing


  1. Slides: 3 Intro to Natural Language Processing
  2. Tutorial: Tutorial 3 – Computing TF and TF-IDF
  3. Python Scripts (right click and save as): 5tfidf.py
Skip to toolbar