No tempest in my teapot: Analysis of Crowdsourced Data and User Experiences at the California Digital Newspaper Collection
Session Type: Presentation/Panel
Session Description:
The National Library of Australia enabled OCR text correction for digitized
historical newspapers since it was first put online in mid-2008. By the end of
that year, volunteers from around the globe were correcting newspaper text in
Australia’s Trove collection at the rate of more than 300,000 lines per month.
This rate has since risen to approximately 2,400,000 lines per month (June
2012). The top producer alone has corrected more than 2,250,000 lines.
At the California Digital Newspaper Collection user OCR text correction began in August 2011. As of July 1, 2012, 636 registered volunteers have corrected approximately 370,000 lines of newspaper text, with the most dedicated user correcting more than 150,000 lines.
What motivates Trove and CDNC volunteers? Who are they? In this session we look at
1. Demographics: What is the age of text correctors, their profession, gender,
and how did they learn about CDNC text correction?
2. Experiences: What do text correctors like about newspapers and about the
text correction?
3. Motivation: What makes users correct text?
4. Quality: OCR of newspaper text is often of very poor quality. What is the
quality of the crowdsourced corrected text?
5. Preferred data: What types of newspaper stories do text correctors
prefer?
6. Economics: What is the estimated economic value of corrected text? What are
the costs of providing a text correction infrastructure?
Finally we look at the differences and similarities in between the Trove and
CDNC collections and volunteers and other non-newspaper library or archival
collections that have used crowdsourcing.
Session Leaders:
Brian Geiger, California Digital Newspaper Collection, University of
California Riverside
Frederick Zarndt, IFLA Newspapers Section
Session Notes:
View the
community reporting Google doc
for this session!
Session Slides: