Wikipedia is mainly an effort to preserve existing knowledge. One thing Wikipedians like to do is to preserve old photos that have become part of the public domain. This can mean illustrating a Wikipedia article; but another goal is to give everybody direct access to the highest quality version available, to reuse however they see fit. Your local historical society might take these same public domain photos, and sell you a print or a high resolution scan, and might even imply that your reuse is restricted to non-commercial use. Google Books will offer mediocre scans, watermarked on every page with their logo, and again requesting only non-commercial reuse. But Wikimedia’s approach is driven by a desire to empower humanity. In this video, I demonstrate how Wikipedia’s sister sites, Wikisource and Wikimedia Commons, build on the excellent work of Project Gutenberg, the Internet Archive, to make high quality scans of photos from books broadly accessible. This 12 minute video is meant as a general survey of the topic; it’s somewhat technical, but should be accessible to most audiences. In later videos, I’ll delve more deeply into the technical details; so if you’d like to pitch in with these efforts, check back for more detailed instructions — or look through the links below. I cover a lot of ground in this video. Here’s some background that may help:
- Project Gutenberg (Wikipedia article) started in the early 1970s, and continues to this day; its original purpose was to to transcribe cultural works into a digital format, and it invited volunteers to participate from the start; it has a highly structured approach to volunteer engagement, as compared to Wikimedia’s “anyone can edit” approach. Also potentially of interest: Distributed Proofreaders
- The Internet Archive (Wikipedia article), founded in 1996, carried this a step further, by creating high quality scans of public domain books, and making them freely available. Many of them have optical character recognition (OCR) applied, so that the text is recognized, searchable, and selectable — at least approximately — by a computer algorithm. While unaffiliated volunteers can upload files to archive.org, there is no volunteer curation or improvement after upload; it’s primarily a centrally managed database, built by paid staff.
- Wikisource and Wikimedia Commons are both “projects” in the Wikimedia family of collaborative web sites, along with Wikipedia. Hundreds of thousands of people volunteer their time on these projects; some work actively on multiple projects.
- Wikimedia Commons is a media repository; one of its main purposes is to host images, videos, etc. that are embedded in Wikipedia articles, Wikisource texts, etc. When a file is on Commons, it is available to all Wikimedia projects; when it’s on Wikisource, it is only available for embedding on Wikisource.
- Beginner’s guide to adding texts to Wikisource
- HesperianBot, a script that will dig into the Internet Archive’s holdings to find high quality page scans
- Commons Helper, a tool for moving files from a site like Wikisource to Wikimedia Commons
- A user script by Superm401, that makes Commons Helper easier to access (requires a little wiki knowhow; see also this related discussion)