DCI Lecture: “Unleashing the Infinite Archive”

Please join us at 4pm on March 16th for a lecture in BL 728  (Bissell building, 7th floor) by the inaugural Marshall McLuhan Centenary Fellow in Digital Sustainability, Professor Ian Milligan (Waterloo, History). Milligan’s lecture, “Unleashing the Infinite Archive: Exploring Born-Digital Cultural Heritage at Scale through Interdisciplinary Collaboration,” explores several issues that are central to the DCI’s mission. […]

How are Web Archives Made and Used? A report and discussion on work in progress at the DCI

The DCI invites anyone interested to join us for an informal presentation of our ongoing work with the Marshall McLuhan Centenary Fellow in Digital Sustainability, Ian Milligan, on Thursday February 9 from 12-1 in BL 417. We’ll have a few refreshments and are happy to extend the discussion afterwards. Prof. Ian Milligan, PhD student Emily Maemura […]

Panel Announcement: Studying the Past Through Technology

The Digital Curation Institute is pleased to announce a panel discussion on Thursday, March 2nd at 4pm in Bissell Room 507. All are welcome, and light refreshments will be provided. This panel, “Studying the Past Through Technology: An Interdisciplinary Roundtable,” brings diverse perspectives to bear on the question of how we can study the past […]

Free Software Carpentry workshop in January!

Do you want to use computational techniques in your research? Want to learn the basics of programming in a supportive, inclusive, and hands-on environment? Want to not spend a dollar? On January 26 and 2007, the Digital Curation Institute in collaboration with Software Carpentry will host a free two-day workshop that will teach basic computing […]

code4lib Web Scraping Workshops

The DCI is pleased to be hosting two workshops by code4lib Toronto on Web Scraping. Web Scraping Part I: In-Browser Scraping and Working with X-Path This part one on Thursday November 3, is an introduction to the concepts, using browser extensions to quickly get started on scraping. No programming experience is required. What you’ll learn: What is […]

Community Maps and Engagement: Lessons from designing a “Green Map”

Community Maps and Engagement: Lessons from designing a “Green Map” Dawn Walker Time and place: Semaphore Demo Room, Robarts first floor (1150), Tuesday, April 19, 9.30am With the increase in urban agriculture and environmental groups focused on addressing pressing issues they see in their communities, the opportunity exists to better understand what technical barriers individuals face […]

Data Practices: Harnessing Cultural Heritage Data to Support Scholarly Practices

Join us for a talk and discussion with Prof. Unmil Karadkar at the Semaphore Demo Room at Robarts Library on March 17 , 4pm-5pm: Abstract In this talk, I will present two examples of projects that exemplify my research agenda, which synthesizes information-centered studies of scholarly work practices and user experience-centered studies for designing and evaluating software. […]

screen-shot-2015-03-03-at-4-46-11-pm

Web Archive Analytics Workshop: Archiving and Accessing Ten Years of Political Websites

In association with the DCI lecture on October 29, Prof. Ian Milligan is offering a 2-hour hands-on workshop on web archive visualization on October 30.

 

This workshop uses the Canadian Political Parties and Political Interest Group collection to trace the web archiving workflow from collection development to analytics. Beginning with an introduction from Nicholas Worby, Government Information & Statistics Librarian at the University of Toronto’s Robarts Library, on the Archive-It dashboard and collections development process, attendees will learn about how web archiving happens from the perspective of a librarian. With Ian Milligan, a professor of digital history from the University of Waterloo, we then move into the process of accessing, downloading, and interpreting web archival data, from the UK Web Archive’s Shine portal (allowing faceted, n-gram style searches) to the warcbase platform for text and link analysis.

All software used will be open source, and will include warcbase, Shine, and Gephi.

cpppig-visualization

Time and place: Friday, October 30, 10am-noon at the Semaphore Demo Room in Robarts Library, University of Toronto (room 1150).

Participation is open and free, but you need to register by emailing christoph.becker@utoronto.ca!

How to find it:

map_kmdi_semaphore

 

 

 

cpppig-visualization

Ian Milligan: The Challenge of Digital Sources in the Web Age: Common Tensions Across Three Web Histories, 1994-2015

The first DCI lecture in Fall 2015: Prof. Ian Milligan from Waterloo.

Abstract:

The sheer amount of social, cultural, and political information that is generated and, crucially, preserved every day presents new exciting opportunities to historians. A large amount of this information is being contained within web archives, which contain billions of web pages. Scholars broaching topics dating back to the mid-1990s will find their projects enhanced by web data – military historians can use forum posts by soldiers, social historians can track aspects of everyday life through blogs and comments, political historians can study changing sentiment, tropes, and link structures, and economic historians can explore the rise and fall of businesses webpages. Yet this tremendous opportunity is mitigated to some degree by the sheer challenge of dealing with all that data: we have more information than ever before, but the scale is overwhelming.

We have several common tensions, however, beyond basic ones of having enough storage and computational power to deal with all of this information. I will focus on two. The first is that while historians largely want to work with content, technological limitations push us towards rich metadata. The second is that without basic understanding of the conceptual structure of the web archive, from crawl structure to the biases, we can generate wildly misleading results – a problem for historians with most digitized sources.

In this talk, I explore these tensions as they have played out over three case studies that I have studied: the Internet Archive’s March-December 2011 Wide Web Scrape (WARC files), the 2009 GeoCities end-of-life torrent (a wget-compiled collection of mirrored websites), and the 2005-Present Archive-It collections of Canadian political parties, unions, and organizations (WAT files, which contain derivative data). For each archive, I briefly discuss the usage, technical, and ethical challenges that such collections present for historians: problems of too much data, processing time, and the difficulties in applying cutting-edge natural language processing.

Milligan - Picture

Biography:

Ian Milligan is an assistant professor of digital and Canadian history at the University of Waterloo. There, he is principal investigator of the web archives for historical research group (https://uwaterloo.ca/web-archive-group/), which is supported by an Ontario Early Researcher Award and SSHRC. Milligan serves as a co-editor of the Programming Historian (programminghistorian.org). He has published several articles looking at the impact of born-digital sources on historians and has a forthcoming co-authored book, Exploring Big Historical Data: The Historians’ Macroscope on digital methods with Imperial College Press. His first book, Rebel Youth: 1960s Labour Unrest, Young Workers, and New Leftists in English Canada, appeared in 2014.

The lecture takes place at 16:00-17:30 on Thursday, 29th of October 2015, in Room 728 (7th floor) at the iSchool, Bissell Building, 140 St. George Street.

 

Prof. Milligan is also conducting a 2-hour hands-on Workshop on web archive visualization on October 30!