Digital Libraries
and Digital Humanities

Digital Libraries

The National Library of the Czech Republic and Digital Humanities

In the second decade of the 21st century, the National Library of the Czech Republic (NK CR) has started exploring ways in which it could make data from its vast digitised archives available to users. The absorption of digital sources and the digitisation of existing collections provides a significant amount of resources for the digital humanities and quantitative research. At the same time, these vast riches beg the question of how best – both methodologically and practically – to boost research projects and play a helpful role in basic as well as applied research. For this very reason the NK CR carries out projects that support the capabilities of researchers at all levels, for example through research on specific topics (e.g. authorship) and, above all, through the preparation of backbone infrastructure providing data to researchers and the public in the area of Internet sources and digitised books, which is the focus of the DL4DH project.

The DL4DH Project

The DL4DH project – The development of tools for the more effective use and mining of digital library data to enhance digital humanities research (DG20P02OVV002), implemented in 2020–2022, focuses on mining digital library content. The Library of the Academy of Sciences of the Czech Republic (KNAV), the National Library of the Czech Republic (NK CR), and the Moravian Library in Brno (MZK) took part in this project in cooperation with other humanities experts, who are also largely involved in the Czech Association for Digital Humanities, z. s.

As part of the project, tools have been created that will enable humanities researchers to make better use of Czech digital libraries containing a significant amount of digitised data. The project has prepared for use software tools which work as a superstructure of the Kramerius system, the most used presentation software for digitalised data in libraries and other memory institutions in the Czech Republic, and allow accessible data to be mined for further processing. Specifically, the tools used are Feeder (graphical user frontend and REST API endpoint) and Kramerius+ (backend and enriched data database), which provide users with a wide range of possibilities to access basic data from the Kramerius system, as well as enriched data from Kramerius+. These can be accessed in this system through the aggregation of other services suitable for processing in digital-humanities research, currently thanks to LINDAT/CLARIAH-CZ services (tokenisation, lemmatisation, morphological analysis through the UDPipe tool, and the identification of named entities through the NameTag tool). Moreover, Feeder and Kramerius+ are freely available for further use and possible collaborative development through GitHub under the GNU GPL 3 licence.

After token allocation, data exports from the Kramerius library can be implemented through the visual interface and through the REST API. Users may log into the interface using their identity from the academic identity federation EduID.cz, and manage their own data sets in the interface using unique object and sub-object identifiers (books, periodicals, multi-volumes and their parts). Within the export, it is possible to receive zipped data packages for each part in CSV and JSON formats, containing text data, descriptive metadata, and technical paradata. For exports within the XML format, another output of the DL4DH project, the TEI Converter tool, is used to export data from the Kramerius system and enriched data from the Kramerius+ database in TEI P5 format. Afterwards, these outputs can be easily used as input to custom user workflows designed separately for each research project. For more in-depth information on the architecture of the solution for data provision, and free suggestions for data use in the fields of archaeology, sociology, literary studies, history, and religious studies beyond the Map of Religious Meanings presented here, we recommend using the certified methodology developed for the project.

Availability of Resources

The National Digital Library (NDK) contains documents digitised as part of the cooperation between the National Digital Library of the Czech Republic and the Moravian Library in Brno, which started with the project Creating the National Digital Library in 2012. The NDK thus contains digitised documents from both institutions, as well as documents transferred from the older digitisation of the National Library of the Czech Republic from the Kramerius 3 system and documents of other institutions digitising within the VISK 7 programme, which continue to transfer their data to the NDK.

All documents contained in the NDK are subject to disclosure in accordance with copyright law. It is possible to remotely view works whose copyright has expired, also to access their data and metadata (including full texts), and to download their images and PDFs. Documents that are still under copyright can be viewed from terminals in the study rooms of the National Library of the Czech Republic. A separate access category is given to so-called ‘works unavailable on the market.’ This involves a special licence, where a reader who is registered and logged in can remotely view works listed on the Unavailable Works List. You can find more information on this topic at DNNT.

Technical Limitations of Outputs

Older digitisation process outputs in FOXML formats (often just a few years old) tend to have poor text recognition, especially compared to today’s versions of OCR tools. This becomes a limiting factor for their use, not only in the Map of Religious Meanings, but also in the main outputs of the project. The lack of adequate processes and standards for updating the OCR components of packages stored long-term further complicates the situation. Assuming that libraries want to maintain a high standard of presentation, content should first be enriched on a long-term repository and then exported to the presentation interface. However, such a procedure is beyond the libraries’ current capabilities. As a solution, the Kramerius+ interface offers the ability to viably integrate current versions of OCR modules and other third-party tools into its export workflow, allowing text data with significantly higher relevance to be delivered to researchers when needed.

The DL4DH Collections

As part of the DL4DH project, several areas were identified for which virtual collections were created in the NDK. These involve proposed collections of digitised titles related to a specific topic. Here, we emphasised the specificity of the collections in the NDK according to the team’s topics of expertise, i.e. Bohemia-related topics of the represented disciplines, with a special focus on religious studies, archaeology, and history. All of them are available at https://ndk.cz/ under the heading Virtual Collections:

  • Spiritualistic Literature in Bohemia between 1870 and 1945
  • Masonic Literature in Bohemia between 1870 and 1950
  • Czech Esoteric Literature between 1870 and 1950
  • Czech topography [to be defined]

In addition to non-digitised titles, the Kramerius Digital Library has also traced titles that have already been digitised. For non-digitised works, the selection of titles was based on research at several locations. The first step was to browse the relevant literature and identify important keywords, authors, publishers, etc., which were then used to conduct the actual search. The ALEPH library system was used to select non-digitised titles; relevant titles were then proposed for digitisation and digitised. In the case of digitised titles, both the library system and the National Digital Library were used. Here, individual titles were browsed and their relevance to the topic was evaluated using the same key as in the case of non-digitised titles.

Project Researchers

    Magdaléna Vecková, Principal Investigator (Library of the Czech Academy of Sciences)

    Research team of the Library of the Czech Academy of Sciences:

    • Martin Duda
    • Tomáš Foltýn
    • Radim Hladík
    • Jana Křížová
    • Boris Lehečka
    • Pavel Straňák
    • Veronika Sladká

    Zdenko Vozár, co-investigator (National Library of the Czech Republic)

    Research team of the National Library of the Czech Republic:

    • Michaela Bežová
    • Jana Hrzinová
    • Martin Lhoták
    • David Novák
    • František Válek
    • David Zbíral

    Petr Žabička, co-investigator (Moravian Library)

    Research team of the Moravian Library:

    • Jan Holomek
This audiovisual map has been created as part of a project of the programme to support applied research and development in the field of national and cultural identity (NAKI II, Ministry of Culture of the Czech Republic) No. DG20P02OVV002 entitled ‘DL4DH – developing tools for the effective utilisation and mining of data from digital libraries to reinforce digital humanities research’.
Recommended citation format: Válek, František; Vozár, Zdenko; Zbíral, David; Bežová, Michaela; Hrzinová, Jana; Novák, David; 2022. A Religious Studies Map of Literary Meanings: Biblical Citations in the Press during the First Republic [online].