Million Book
Although it has been recently established, ISIS already has several digital projects in progress. The Million Book Project is a joint scheme between the Bibliotheca Alexandrina (BA) and over twenty international institutions, varying from universities to information institutes to development corporations from the USA, China and India. This project realizes the aims of BA in using digital technology to make the works of man permanently accessible to billions of people all over the world. In its multiple-steps process, the project makes constantly available a free-to-read, searchable collection of one million books through the Internet. Through this BA will achieve the goal of becoming a universal digital library and will improve global society through the availability of a vast range of knowledge.
One of the key activities is to work with different libraries, universities and institutions worldwide that can adopt this model of exchanges and/or donate some of their collections whether in digital form or through sending them for digitization and having them back. This would include books, journals as well as theses and research reports.
In its long term goal, the Million Book Project aims to capture all published books in digital form, while starting off with a short term aim of digitizing one million books (less than 1% of all books in all languages ever published) by the year 2005. The project was initiated at Carnegie Mellon University and will reach its short term aim of digitizing one million books worldwide by dividing the work among several centers dispersed across different countries around the world. Then, the collection of the first digitized million books will be reached by swapping the digitized books produced by the different partners. This method not only allows for sharing the resources of different countries and dividing the work among them, but it also has the desirable feature of having each partner holding a mirror site of the million digitized books locally, thus guaranteeing fast access as well as reliability and availability.
The project involves an intricate and long process of first identifying and selecting the books that are to be scanned. Books are chosen according to strict criteria of them being not widely available (out of print, non-copyrighted and government documents), children’s books or science and technology dissertations.
The second phase is the scanning process, which includes the actual scanning, image processing and quality control and the Optical Character Recognition process. Five scanning stations have been donated by Carnegie Mellon University and made operational since October 2003. Thirty digital laboratory specialists have been recruited and trained, working 7 days a week, two shifts per day. The target is to scan and process 5000 pages/day/scanner (approximately 50 books/day).
The Image Processing & Quality Control process comes next in the scanning process, where a combination of manual and automated image processing tools are used to enhance the quality of the scanned images. The process mainly involves the removal of noise, the reduction of the size of the file, removing any extra white spaces and margins and curvature correction.
Optical Character Recognition (OCR) is the third step in the scanning process; it provides support for full text indexing and searching. The availability of online search allows users to locate relevant information quickly and reliably, thus enhancing student’s success in their research endeavors. This resource would also provide an excellent test bed for language processing research in areas such as machine translation, summarization, intelligent indexing, and information retrieval. Usability studies are also to be conducted to ensure that the materials are easy to locate, navigate, and use. Appropriate metadata for navigation and management will also be created. The final stage of the project is the publishing stage. The result will be a unique resource accessible to anyone in the world 24 hours a day 7 days a week, without regard to nationality or socioeconomic background.
|