Concepts of Digitization and Digital Library
As defined by Reitz (2008), digitization is “the process of converting data to digital format for processing by a computer. In information systems, digitization usually refers to the conversion of printed text or images (photographs, illustrations, maps, etc.) into binary signals using some kind of scanning device that enables the result to be displayed on a computer screen.” She also defines digital library as the “library in which a significant proportion of the resources are available in machine-readable format (as opposed to print or micro-form), accessible by means of computers”.
The digital libraries store, organize and disseminate digital contents. These contents are created either through digitization of existing printed materials and media documents, or through re-keying/re-composing of existing printed materials and media documents, or through creating new documents in digital formats. The first kind of documents is known as digitized documents, and the later kind of documents is known as born digital documents. In Indian digital libraries both kinds of documents are available. The digitized documents are stored either in image formats or in text formats. If the original documents are available in European languages such as English, French, German and Spanish, the optical character recognition (OCR) software can automatically convert them into searchable digital text format, where qualitative OCR conversion rate is much higher. On the other hand, if the original documents are available in Indian languages such as Sanskrit, Hindi, Bengali, Oriya, Telugu and Tamil, the contents are made available either in image formats or re-keying the texts for the inclusion in the digital libraries. OCR software for Indian languages is still in the developmental or testing stage, where OCR conversion rate is much lower than acceptable rate. The full-text searching is possible in textual documents but this facility is absent in image documents.
Documents and Collections in Digital Library Systems.
Digital library is the concept of information stored digitally and made accessible to users through digital systems and networks, but having no single location. It is, therefore, analogous to a library as a storehouse of information, but has a virtual existence in the digital spaces. Digital library is essentially a fully automated information system with all resources in digital form. Many views of digital libraries stem from what libraries currently do. Traditional libraries collect, organize, provide access to, and preserve objects in their collections. A library collection may include books, magazines, journals, theses, dissertations, manuscripts, audio-visuals, maps, etc. The flexibility of digital technology allows it to handle new kinds of object efficiently. Digital library collections can include things without direct physical analogs, such as algorithms or real time data feeds. They also may include digitized representations of what have traditionally appeared largely in museums and archives. With the rise of cost of paper publications and library storage, increasing use of computers, decreasing budgets, many libraries have to reduce their acquisition of books as well as their journal subscriptions. Documents in electronic form can become more easily available and widely used because the cost of digital storage and processing is going down.
Documents are the heart of digital libraries. Without documents there would be no digital libraries. In digital libraries, documents are not only what are stored in traditional libraries (e.g., books, journals, pictures and videos), but also include many works uncommon to those libraries, e.g., multilingual, multimedia, and structured documents (e.g., books broken into chapters, sections, subsections, figures with attached captions, colour graphics or images, attached or linked sound or video files, appendices, indexes, and ‘front matter’); programs, algorithms, bulletin board archives, besides others. A document can have various representations depending on its intended use; for example, some applications require high-resolution images of documents with invisible watermarks for security purposes as well as low-resolution images for children to download from the Internet. Collections of digital library ranges from small, self-contained, and narrowly defined collections to ones spread across physical and logical spaces. One of the common requirements for a digital library is the ability to deal with distributed collections of information.
Evaluation of Digitization Work and Digital Library System.
A digital library may be evaluated from a number of perspectives, such as collaboration pattern, system, access and usability, user interfaces, information retrieval, content and domain, services, cost and overall benefits and impact. An important issue under discussion across various communities is the set of metrics to be used for evaluating digital libraries. Selection of digital library metrics should be considered from both system-oriented and user-oriented viewpoints. From the system’s perspective, we consider capacity (number of digital objects stored and number of users served simultaneously), content, transaction speed (speed of search response). From the user’s perspective, we consider impacts of the system on the user (e.g., impact on patterns of association and attitudes about the digital libraries), effectiveness (relevance of the results; ability to produce a ranked list of results that are mostly relevant with best matches at the top), usability (e.g., ease of use, suitability to purpose, user’s effort), interactions with the system, and user satisfaction.
In a general way, the constructs or elements for evaluation of digitization projects covered in this study are:
- Collaboration pattern for collection building;
- Collaboration pattern for resources mobilization and utilization;
- Selection of contents for digitization;
- Digitization workflow;
- Interpretation, representation and metadata;
- Access and distribution — open access versus campus-wide (closed) access;
- User interfaces — search and retrieval; and
- Integration, cooperation with other resources and libraries.