LIS Project

Building A UNL-based Library Information System

The UNL-LIS is an application that allows for the retrieval of the metadata of books available in a library data store, it allows users to access this information in their own native language, regardless of the original language it is written in. There are two versions of the UNL-LIS:

The First Version

Bibliotheca Alexandrina participated with this version in the World Summit on Information Society (WSIS). In this version, users can search book catalogs by entering a query in their natural language inquiring about metadata such as the title or the author of a certain book, or even books by a certain author. The query is then Enconverted into UNL and the search is performed on the collection of books residing in the system. Search results contain the metadata of the books matching the query. These results are then Deconverted by calling the corresponding language server and displayed to the user in his/her native language.

The automatic process starts by extracting the MARC21 records from the BA Library Information System (BA-LIS). These records are parsed using the MARC21 parser. Based on the parsed information, the Master Definitions (MDs) as well as the UNL documents associated with the book are generated. The KB Builder is used to generate the files that are needed for the book browsing application.

The Second Version

The workflow of the second version of the UNL-LIS starts by extracting the metadata of books from the MARC21 records manually and verifying it. This metadata is then ready for semantic analysis (Enconversion) to determine the UNL relations between the titles’ words. The UNL expressions of the titles are generated with the help of the Enconversion grammar and the UW dictionary stored in the language server of the respective language.

Afterwards, a UNL specialist checks the validity of the resulting UNL expressions. A UNL expression may contain a wrong relation or an undefined UW.  if a relation is invalid, the specialist fixes it either manually or by modifying the Enconversion rules. On the other hand, if a UW is not included in the dictionary, it is pended to be later on defined by a UNL-UW specialist and inserted in the dictionary to help the Enconverter output a valid UNL expression. The output UNL expressions are finally stored as a UNL book record to be ready for Deconversion into any Natural language.

After the UNL expressions have been verified, they enter the Universal Words checker to check whether any UNL expression contains a new UW, if they do, these new UWs are defined and inserted in the UW dictionary of the target language. The UNL expressions are now ready for DeConversion using the UNL Deconverter, the UNL-Target Language Dictionary and the Deconversion rules stored in the language server.

After the UNL expression is Deconverted into the target language, it is verified by a librarian working as a NL-Validator to check whether the Deconverted title is suitable for the book or not, if it is, it will be stored as a book record in the target language. If not, the librarian returns it to the UNL specialist to Deconvert it again until the title reaches the stage of being a valid natural language book record in the target language. 

A specialized version of the Dictionary and Grammar have been built in order to Enconvert the selected book titles. The LIS Specialized Dictionary has been built by, first, extracting the words in the selected titles and choosing the UWs that accurately represent their meanings. Then, assigning them the appropriate linguistic features. A specialized version of Enconversion grammar have also been developed to deal with the linguistic phenomena manifest in the book titles.

Advantages of the Second Version:

ISAUC has completed designing and implementing a prototype LIS capable of translating the metadata of books into the six official United Nations languages (Arabic, Chinese, English, French, Russian, and Spanish) in addition to Portuguese. This system enables users to either browse the available book titles, or search within the database using any of the book's information: its publisher, author, classification ...etc, along with keywords to limit the possible results.

Moreover, the application can show statistical information on the stored books and their metadata such as the language they are written in. An editing option has also been designed for librarians to facilitate the cataloging of books. Using this option the librarian can add an authority, merge existing authorities, add or edit books metadata and link books with their human translations, if found. The application can also merge synonymous authority values that are different due to the different conventions of writing, abbreviation and name ordering.

The system can also provide the user with statistical information about the number of books that have been stored, translated and verified, and statistical information about their authors, publishers, subjects…etc.

Finally, the system provides users with statistics about the Enconversion and Deconversion processes showing the translation status, such as the percentage of data not yet Enconverted, Enconverted but not yet validated, and the Deconverted data ...etc.