The rapid digitization of cultural heritage has underscored the critical need for robust digital libraries, particularly for underrepresented languages like Arabic and Persian. This paper describes the methodologies and challenges involved in developing a metadata-driven Arabic digital library, utilizing bibliographic metadata extracted from the Diamond catalogue. It explores advanced metadata schemas, such as Dublin Core, and integrates text recognition technologies and preservation strategies to address key concerns of accessibility, scholarly use, and the long-term preservation of Arabic-script texts. The paper delves into specific challenges of processing Arabic script, including handling calligraphy, diacritics, and ligatures, and introduces innovative solutions like the use of frontispiece images to train OCR systems. Furthermore, it discusses how integrated metadata could not only enhance text recognition but also improve user engagement by enabling refined search functionalities and better resource discovery. Finally, the paper outlines future directions for expanding metadata frameworks to ensure interoperability and the long-term preservation of cultural heritage.
Digital Maktaba Project: Proposing a Metadata-Driven Framework for Arabic Library Digitization
Gagliardelli L.;
2025-01-01
Abstract
The rapid digitization of cultural heritage has underscored the critical need for robust digital libraries, particularly for underrepresented languages like Arabic and Persian. This paper describes the methodologies and challenges involved in developing a metadata-driven Arabic digital library, utilizing bibliographic metadata extracted from the Diamond catalogue. It explores advanced metadata schemas, such as Dublin Core, and integrates text recognition technologies and preservation strategies to address key concerns of accessibility, scholarly use, and the long-term preservation of Arabic-script texts. The paper delves into specific challenges of processing Arabic script, including handling calligraphy, diacritics, and ligatures, and introduces innovative solutions like the use of frontispiece images to train OCR systems. Furthermore, it discusses how integrated metadata could not only enhance text recognition but also improve user engagement by enabling refined search functionalities and better resource discovery. Finally, the paper outlines future directions for expanding metadata frameworks to ensure interoperability and the long-term preservation of cultural heritage.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.