code4lib Web Scraping Workshops

The DCI is pleased to be hosting two workshops by code4lib Toronto on Web Scraping.

Web Scraping Part I: In-Browser Scraping and Working with X-Path

This part one on Thursday November 3, is an introduction to the concepts, using browser extensions to quickly get started on scraping. No programming experience is required.

What you’ll learn:

  • What is web scraping and why is it useful
  • Use browser extensions and web tools to quickly scrape data off a web page
  • Use XPath/XQuery to select elements on a page
  • Export extracted data to file to process in OpenRefine, Excel or other software

Web Scraping Part II: Working with Python

Part two, on Thursday, November 10, is a deeper dive into the Web Scraping with Python lesson, building on your experience with Python and introducing you to Python libraries, APIs, and object-oriented programming.

What you’ll learn:

  • Brief recap of what is web scraping
  • Writing a spider to scrape a website using Python and the Scrapy framework
  • Use popular web-based scraping services (e.g. morph.io)

Everyone is invited, although some familiarity with Python and HTML/XML will be helpful.

Note: Registration is capped at 20 attendees. Please email Kim Pham at kim.pham@utoronto.ca with the following details:

  1. Full Name
  2. Contact email
  3. Title (Student, Librarian, Archivist, N/A etc.)
  4. Organization (if applicable)
  5. Why are you interested in this workshop?
  6. What is your previous experience with web scraping, HTML, XML, and Python?