Rudolf Steiner Archive Logo
Archive Section Info Rudolf Steiner Archive & e.Lib 
[Spacer]
[Spacer]

Home

Books

Lectures

Articles

Help Out

Links

ChronoList

Search

[Spacer]
Saturday the 10th of December 2016, 12:29:10 PM PST v104.b  [Spacer]


Supporters
Links to Our Supporters
Visit the Mayflower Bookshop.
Visit I
Visit the e.Gallery -- Fine Art Presentations.
Visit KnowNews.Net ... Because KnowNews is Good News.
Visit the Goethean Science site.
Visit EduCareDo International Research and Learning Centre!
Visit Our Supporters


Get it on Google Play!
Click badge to download our APP.



Users On-Line: 170
Most at 1 time: 13935
When: Tue Oct 4 2016,
at 1:20:27 PDT








Steps in Digitizing a Document ...

A Line of Forest Trees

What does it take to get a document digitized and published online here at the Rudolf Steiner Archive? Here are the steps:

  From this ... (click for larger view)
[ From this ... ]
  • Locate and acquire the document, either purchased or from a library (see image at right ... click image for larger view).
  • Scan each page, saving as a computer file (preferably, but not necessarily, a TIF [Tagged Image Format] file). At this point it is a graphic image, like a photo of each page.
  • Run files against OCR (Optical Character Recognition) software, which converts any alphabetic characters it finds in the image to actual “text” characters, resulting in the creation of text files. The accuracy of this process varies from 95% recognition for very clear documents, to no recognition at all for some old manuscripts, which have to be keyed in (typed) by hand.
  • Proofread and correct each text file, comparing against the original document (preferred), or the scanned images, and save as a revised file(s). This includes:
    • edit for typographic errors, whether caused by OCR inaccuracy or in the original document (it happens!),
    • verify special characters, especially left and right quotation marks, and diacritical marks such as umlauts.
  • Proofread to locate all footnotes and graphics (e.g., diagrams, drawings) in order to place them correctly in the online version.
  • Proofread to locate all references to items online in order to set up links for cross-references.
  • Convert to HTML. For a single lecture, this is a single file. If this is a book or collection, there are multiple files, including cover image, contents, prefaces, appendices, synopses, notes, footnotes, cross-references – much of this is automated, but the human eye is still needed, and a lot of this must be done manually.
  • All browsers are not equal! There is quite a bit of work that needs to be done to make the document render, at least close to the same way, in all browsers! What looks fine in one browser may look terrible in another. And when you fix it in the other browser, it breaks the first one. We recommend Firefox!
  • [ ... to this. ]
    To this ... (click to read the lecture)
      
  • Put into the database(s), cross-referencing with other documents, create index, keywords, and other information needed for our database and the search/research tools we have created.
  • Publish on the website (Whew! see image at left ... click image to read the lecture).
  • From start to finish, a 10-page lecture could take anywhere from one to eight hours, from initial scanning to finally appearing online. For a collection or book, it can take 10-50% more time to handle all the indexing, cross referencing, and formatting. Also, graphics and diagrams can take a lot of work to clean up after they have been scanned. Some of our materials are original typewritten manuscripts on very fragile, yellowed papers, and are nearly impossible for OCR processing. Currently, there are 788 on-site volumes and 2896 individual documents here at the Archive!

Most of the digitizing project is done inhouse, but we have wonderful volunteers all over the world who acquire and scan documents, run against their own OCR software (if they have it), and create files that they send to us. The final proofreading, cross-references, creation of HTML files, setting up for our databases and tools, and online publication are all done inhouse. And, of course, we provide the heavy-duty servers and broadband to make it all available to the world.

Our Search and Research Tools and Database Management

Jim Stewart has designed and created the online tools — the database, searching capability, keyword indexing and cross referencing, etc. — that enable users to access and research the on-line documents with ease. This has been an ongoing project for almost 30 years.

How to Help

We have a tremendous backlog of materials we want to get online, and there are so many irreplaceable resources at risk worldwide! If you can afford to donate even a little to help support this initiative, you will be helping save irreplaceable works and to make the information available to so many others! Please check our Donation and Appeal pages to see how you can help!


  

  

      


160x600 Best Loved Litature


The Rudolf Steiner Archive is maintained by:
The e.Librarian / James Stewart /
Copyright © The e.Lib, Inc. MCMLXXX thru MMXVI.




Contact us
Powered by Thinking!
Copyright © 1980 – 2016
Rudolf Steiner Archive
16432 total hits since Friday July 2nd. 5 hits today. 
Page last updated on Sunday February 22, 2015 at 09:40:37. 



Valid HTML 4.01 Transitional