The Arabic-UNL Dictionary
Much attention has been given to the dictionary in order to make it suitable and in the required format for supporting the morphological, syntactic and semantic analysis and generation needed for both the Arabic EnConversion and DeConversion. The Arabic-UNL dictionary stores three types of linguistic information:
- Morphological information: The information responsible for the correctness of the sentences’ morphology.
- Syntactic information: The information responsible for generating well-formed Arabic structures.
- Semantic information: Information about the semantic classification of words, it allows for the mapping between the semantic information in graphs and the syntactic structures of the generated sentences.
The Sources Used:
Different sources have been used to expand the Arabic-UNL dictionary:
- The English WordNet: it has been chosen because of the huge number of concepts available in it and the information provided for each (glossary, example…etc.).
- The International corpus of Arabic (ICA).
- The General Dictionary currently contains140,000 entries representing 80,000 concepts.
In addition to the General Dictionary, many specialized dictionaries have been built:
- The specialized dictionary for the Encyclopedia of Life Support Systems (EOLSS) which contains 27,100 entries and,
- The specialized dictionary for the UNL-based Library Information System (UNL-LIS) which contains 4,520 entries.
The Arabic UNL Dictionary:
The Arabic UNL Dictionary The size of the main general dictionary reached 99,908
entries representing 52,572 universal concepts.