Description of Postdata Poetry Ontology

One stream of work in the digital humanities focuses on interoperability processes and the description of traditional concepts using computer-readable languages. In the case of literary studies, there has been some research into these topics, but the complexity of the knowledge domain remains an issue. This complexity is based on the diﬀerent interpretations of concepts in diﬀerent traditions, the use of isolated and private databases, unique applications of language and, thus, the richness of poetic information. All of this suggests the need to explore new options to represent the complexity in computer-readable language. This paper presents an ontology network designed to capture poetry domain knowledge. The ontologies in question relate to poetic works and their structural and prosodic components.

information. All of this suggests the need to explore new options to represent the complexity in computer-readable language. This paper presents an ontology network designed to capture poetry domain knowledge. The ontologies in question relate to poetic works and their structural and prosodic components.

Introduction
The Poetry Standardization and Linked Open Data Project, POSTDATA, aims to provide a means for European poetry researchers to publish and access semantically enriched data. To achieve this goal, it was necessary to develop a poetry ontology. This ontology attempts to enhance interoperability in the European poetry research community and capture the concepts and properties that define the domain of European poetry knowledge. The development of the ontology began with an attempt to define a domain model of poetry based on an analysis of 23 poetry repertories (i.e. poetry research databases) (M. Curado Malta, E. González-Blanco, C. Martínez Cantón, and G. Rio Riande 2016;Postdata ERC project 2021). These repertories were selected because of their relevance, availability in digital format (i.e. all are implemented in databases) and the rich sample they provided of multilingual poetry. They ranged from the classical period (e.g. Pedecerto 1 ) to the modern era (e.g. Corpus of Spanish Golden-Age Sonnets 2 ) to the Middle Ages (e.g. Cantigas de Santa Maria for singers 3 ). This first step allowed us to identify the most significant objects and properties that define a poetic work (i.e. a poem), taking into account the different traits related to this literary genre. The result was a domain model that reflected poetry's complexity and heterogeneity. We then transformed this domain model into an ontology network, which would allow for its effective and extensive use in computational frameworks as a Post-data computer-aided annotation tool. In this paper, we present the first version of our network's four most significant ontologies (i.e. version 1.0). 4 These ontologies relate to poetic works, their structural and prosodic components and information about relevant dates. To create the ontologies, we incorporated the ontology definitions into OMEKA, a framework that facilitates the use of these ontologies in research tasks. This article is structured as follows: In section $2, we present some previous results related to ontologies of literature, especially of the poetry domain. Section $3 describes the methodology that we used to develop our ontologies. Section $4 presents a detailed description of the most relevant ontologies that we created. Finally, $5 outlines our conclusions and directions for future work.

Related works
The first attempt to build a poetry ontology took place within the ReMetca project (E. González-Blanco and Rodríguez 2016). The results of that project were used to define the TEI-Verse module. 5 However, TEI's tags did not totally capture the structural richness of poetic traditions. Alternative approaches remained available. If, for example, we understood the poetic work as a cultural heritage item, we might represent the knowledge associated with it by using the CRM-CIDOC ontology, the Conceptual Reference Model (CIDOC-CRM), 6 which formally describes cultural heritage concepts and relationships. Similarly we might apply the Functional Requirements of Bibliographic Records (FBR) ontology 7 and FRBRoo, 8 which offer perspectives based on bibliographic and authority records (i.e. the standardised names for people and corporate bodies) (Tillett 2005). These ontologies might cover descriptive aspects of poetic works and the forms of their expression and manifestation; for a description of the the poetic work, its expression and manifestation, see (David and Newman, Richard 2019;Home | FRBRoo 2021). However they could not reflect structural aspects of the works or provide any literary analysis or prosody. In other words, they contained no modelling information for the analysis of textual features.
Aside from examining these well-known ontologies in the digital humanities, we also completed a general search for potentially relevant ontologies in more standard ontology repositories such as Linked Open Vocabularies, 9 Open Metadata Registry 10 and the Basel Register of Thesauri Ontologies & Classifications. 11 We concluded that despite specific efforts to model the poetry domain and the possibility of reusing some foundational ontologies to deal with poetry, these tools were only relevant to limited features of the poetry domain. This situation confirmed the need to create a new, comprehensive ontology of poetry as a literary genre.

Ontology network methodology
The first step in this work was to build a conceptual domain model of European poetry based on an accurate picture of the knowledge domain. A conceptual model is a representation of the knowledge domain created from concepts and properties and their relationships. For this purpose, we analysed a set of 23 digital repertories (M. Curado Malta, E. González-Blanco, C. Martínez Cantón, and G. Rio Riande 2016; M. Curado Malta, Centenera, and E. González-Blanco 2017; M. Curado Malta, E. González-Blanco, C. Martínez Cantón, and G. Rio Riande 2020; Postdata ERC project 2021) that had been selected for their representation of different poetry traditions, languages, prosodic systems and cultures (Postdata ERC project 2021). These repertories arose from research projects and, thus, contained information that had been gathered or generated by experts and had more reliable and robust content, categories and structures. To build the ontology, we applied the NEON methodology approach (Suárez-Figueroa 2010). The latter defines different working scenarios. Once the sources were selected, we applied scenario 2 (i.e. reusing and re-engineering non-ontological resources). According to this scenario, we began working with the conceptual structures (database structures) of the different repositories. Here we used a reverse engineering approach, i.e. moving from the conversion of the database schema to the ontology modelling process. All the concepts were extracted and compared so that we could build a common model out of them (Postdata 2020a;Postdata 2020b).
Based on this process, we developed the complete European Poetry Logical Domain Model (EP-DM). This work required us to deal problems related to the potential inconsistency of concepts. These problems had previously been addressed by experts in the poetry domain (Bermúdez-Sabel, Mariana Curado Malta, and Gonzalez-Blanco 2017; Bermúdez-Sabel, Díez Platas, Ros Muñoz, and Elena González-Blanco 2019; Elena González-Blanco, G. d. Rio Riande, and Clara Martínez Cantón 2016). The final conceptual model included both descriptive and bibliographic aspects of the poetic works. It also included information about textual transmission, prosodic,literary and rhetorical features, poetic structures, significant publication elements and relationships with music(Malta, González-Blanco, Cantón, and Rio 2017; M. Curado Malta, E. González-Blanco, C. Martínez Cantón, and G. Rio Riande 2016) (see Figure 1). The EP-DM was highly complex because of the number of concepts and properties that had to be included in this model of poetry domain knowledge. It is worth remembering here that the formal research process in the humanities is open-ended; each investigation focuses on different issues in the field and interprets them in different ways. The size of this first approximation meant that our model lacked usability. We therefore decided to build an ontology network to represent information about European poetry. Our criteria for building each of the ontologies in the network were as follows: • The classes, relationships and axioms of the ontology had to be thematically related or else necessary to complete the semantics of another ontology entity. The underlying semantics of each class had to relate to the area of knowledge.
• Weak coupling was required between ontological modules. Each ontology was built as a self-contained module that was related to the other ontologies through a small number of relationships.
• Each module had to be highly cohesive. In particular, the module had to contain the maximum number of property classes in order to ensure a highly cohesive pntology. As such, the ontology should be able to function while avoiding coupling with other ontologies as much as possible.
We completed the development process using an iterative-incremental model. Each ontology was built based on the principles of reusing ontologies, aligning vocabularies and properties to facilitate development, improving the semantic understanding of entities and facilitating interoperability Figure 2. After each iteration, the OWL specification (OWL -Semantic Web Standards 2021) was obtained for each ontology. Eventually, to define property ranges, we identified a set of controlled vocabularies. These were specified using the data model of the Simple Knowledge Organization System (SKOS). 12 These controlled vocabularies allowed us to establish the standard terminologies that are used by the scientific community. The result was ontology version 1.0.

Ontology descriptions
This section describes Postdata ontology V1.0. As noted above, this version of the ontology network was formed from the four ontologies we had already developed: the postdata-core ontology, the postdata-prosodic ontology, the postdata-structural ontology and the postdata-dates ontology. These ontologies were the main outcomes of the Postdata project. A literary and transmission ontology will be completed before the end of the project.

Postdata-core ontology
This is the main ontology for poetic representation, 13 and its prefix is pdcore (i.e. poetry domain-core). It provides information about poetic works and their manifestation. A poetic work (i.e. poem) can be represented through its different manifestations or versions in the poetry domain. In literature, it is also common to find sets of poems grouped together, for example, in a book. These scenarios are represented through three main classes and their relationships: pdcore:PoeticWork, pdcore:Redaction and pdcore:Ensemble (see Figure 3). • The PoeticWork class models the abstract concept of artistic creation. These creations must be in verse (poems, plays or songs), and their properties represent the descriptive metadata of the poetic work (title, abstract, creator or author and creation date). This class is implemented as a subclass of frbroo:F1 class.
• The Redaction class is a subclass of frbroo:F22 class, and it models editions of a poetic work. Each version of a poetic work is a redaction.
• The Ensemble class is a subclass of frbroo:F17 class, and it enables the modelling of an Ensemble as a collection of poetic works (i.e. a book of songs or collection of poems).
Besides these main classes, we considered including classes which, although not specific to poetry, model a transversal knowledge of the poetic domain and provide the relevant information. Some examples of these classes are: • pdcore:Person and pdcore:Organisation. This class models the agents that participate in the poetic work and the redaction according to their different roles.
• pdcore:Place and pdcore:Event. This class represents places of origin and references to events and locations.
One aspect of a literary work that needed to be addressed was its authorship and the roles played by related agents (see Figure 4). To model this knowledge, we used the design pattern agentRole 14 and defined the class pdcore:Role and its subclass pdcore:CreatorRole. The pdcore:CreatorRole class is useful for dealing with authorship because it can support the representation of: • Multiple authors through the multiple cardinality assignation of the pdcore:hasCreator property.
• An anonymous author using the pdcore:isAnonymous boolean property.
• Wrongful attributions (i.e. cases where a work was written by another author ) through the pdcore:isWrongAttribution property with a Boolean range. We also identified a set of controlled vocabularies which were used in some of the class properties as ranges.
• In the PoeticWork class, we identified genre, poeticType and authorEducation-Level.
• In the Person class, we identified gender, literaryPeriod, school, socialStatus and religiousAffiliation.
• In the Role class, we identified roleFunction and typeOfCharacter .
• In the CreatorRole class, we identified typeOfDesignation.
• In the Redaction class, we identified typeOfTextualElement.

Postdata-structuralElements ontology
The postdata-structuralElements ontology 15 contains all the information related to the structural elements of a redaction. The ontology prefix is pdstruct. Each redaction of a poetic work is organised into lines and stanzas. The structuralElements ontology defines two classes: pdstruct:OrderedLineList and pdstruct:OrderedStanzaList.
These classes are related to the pdcore:Redaction class using the pdstruct:hasLineList property or the psdtruct:hasStanzaList property. Since a stanza is a list of lines, a line is a list of words and punctuation marks, and a word is composed of syllables, we defined five more classes to complete the ontology. These were pdstruct:Line, pdstruct:Stanza, pdstruct:Word, pdstruct:Syllable and pdstruct:Punctuation.
• A line is a unit of verse that usually ends in a visual or typographic break and is characterised by its length and metre.
• A stanza is a group of lines. Usually, this grouping forms the basic recurring verse unit of a poem.
• A word is a list of syllables.
• A syllable is a single unit of speech sound and may have written or spoken forms.
• Punctuation refers to punctuation symbols.
Based on Ordered List ontology (olo) semantics, these classes are subclasses of olo:Slot since they may each be understood as a slot in an ordered list. In the first three cases, they are subclasses of olo:OrderedList since they represent a list of ordered elements Figure 5. In this ontology, four controlled vocabularies were identified and used in some of the class properties as ranges.

Postdata-prosodicElements ontology
This ontology 17 contains the classes and properties required to structure the information extracted from the prosodic analysis of a poetic work. The prefix for this ontology is pdprosodic. The prosodic analysis of a poetic work provides information about the poem's metrical patterns (see Figure 6). These patterns are defined at three levels: poem, stanza and line. This ontology imports the postdata struc-turalElements ontology since the latter's content relates to the metrical patterns of the line, the stanza and the poem. On this basis, we define three classes to represent these metrical patterns: the pdprosodic:LinePattern, the pdprosodic:StanzaPattern and the pdprosodic-WorkPattern. The three classes belong to the pdprosodic:Pattern hierarchy.
• The LinePattern models the metrical pattern of the line. Some important properties related to the line pattern are: pdprosodic:accentedVowels: This represents stressed vowels in the order in which they occur in the text.
pdprosodic:countingMetricalScheme: This represents the metrical scheme based on the number of syllables.
pdprosodic:grammaticalStressPattern: This represents patterns based on the position of expected stresses according to grammatical rules, including the distribution of weak and strong positions.
• The StanzaPattern summarises specific properties of the stanza as a rhyme scheme. One of the most common conventions is the use of letters to identify rhyming lines.
• The WorkPattern shares some of the properties defined in the LinePattern and StanzaPattern classes but it also presents certain properties as pdprosodic:presentRhymeMatc This allows a poetic work to be categorised according to the extent of matches between different rhyming sounds (i.e. based on assonance or consonance). This property also applies to the StanzaPattern class.
These ontologies were enriched by adding more classes that store prosodic analysis data. These classes included: • pdprosodic:Rhyme: This represents the repetition of similar-sounding words at the end of lines in poems or songs.
• pdprosodic:Foot: This is the unit of poetic metre used in most IndoEuropean poetic traditions, including English syllabic verse and Ancient Greek and Latin (classical) poetry.
• pdprosodic:metricalEncoding: This is the notation used to represent a metrical pattern; for example,the plus sign is used to encode a strong syllabic position. In this ontology, controlled vocabularies are of special interest because they represent a normalised form of the values of prosodic properties. We defined 12 vocabularies.

Postdata-dates ontology
Depending on the composition period, it may be difficult to date a poetic work or its versions with any precision. Moreover, in ancient or anonymous publications, it is impossible to determine a composition date. It may be necessary to establish ranges or suggest a likely date. This problem arises when the form of a text's transmission or preservation does not support tracing the composition date. To address these situations, we have proposed an independent and reusable ontology for the literary or heritage domain that covers special dating needsFigure 7. In this ontology, two classes are provided 18 : • pddates:DateEntity represents a temporal entity associated with the poetic work, its manifestations or an event.
• pddates:DateExpression forms the basis of the class hierarchy. This class and its subclasses provide different modes for representing a date related to the work's creation or relevant event associated with an entity.

A sample application of the Postdata ontologies
To demonstrate the usefulness, versatility and user friendliness of the Postadata ontology V1.0, we present an example. This concerns the song "Mais nos faz Santa María a séu Fillo perdõar," written by Alfonso X, el Sabio. An analysed redaction is extracted from "Cantigas de Santa Maria for Singers" (http://www.cantigasdesantamaria. com), which was created by Andrew Casson. The RDF implementation presents the poetic work, whose author is Alfonso X, along with the redaction prepared by Casson. The date is shown as an open interval since it is not known exactly. An extract of the structure is also given in stanza and line form . One point that should be highlighted is that this poetic composition is a song with a refrain. The refrain is the first stanza, but it repeats after each of the others. For this reason, the refrain has no assigned number assigned, and its lines are indicated as part of the stanza. Finally, we show the work's patterns, i.e. the stanzas and the lines. The relationship between the poetic work and the redaction is represented through the object property pdcore:isRealisedThrough. The object property pdcore:hasCreator is used to denote the author of the poetic work and the creator of the redaction, who are both a type of agent(Person). All these items are related to pdcore:CreatorRole objects.

edu/AndrewCasson
The date of the poetic work is shown as an open interval since it is not known exactly. It is associated with the poetic work through the object property date that refers to pddates:DateEntity. This is expressed as a pddates:OpenInterval with the following properties: • notBefore -"1270-01-01" • notAfter -"1282-12-31" In addition, because this work is a piece of medieval literature that is difficult to date, a note is created about who identified the date and how this took place: • Note (pddate:dateNote) -"Dates set by Walter Mettmann. . . . . . ." In addition to the URL and the creator, the redaction is described in the structure of the text that it presents. In this regard, the redaction structure is a list of stanzas (pdstruct:OrderedStanzaList) that relate to the redaction via the property pdstruct:hasStanzaItem from the ontology of structural elements. As we have seen, the ontology should point out that this poetic composition is a song with a refrain (its first stanza). This stanza repeats after each of the others. It is therefore assigned no number, and its lines are indicated as part of the refrain. As such, the stanzas are described by the following properties: • Stanza number, a positive integer indicating the stanza's position in the list (pdstruct: stanzaNumber) • If the stanza is a refrain, then -Is Refrain (pdstruct:isRefrain) -"true" • If the stanza is not a refrain but must be followed by a refrain of which only one instance will be created -Is Refrain Omitted (pdstruct:isRefrainOmitted) -"true" • The next stanza (pdstruct:nextStanza) • The first and last line of the stanza (pdstruct:hasFirstLine, pdstruct:hasLastLine).
The rhyme is also represented by the pdprosodic:Rhyme class object associated with the line. Its properties include: • The label (pdprosodic:label) -"A" • The associated phonemes (pdprosodic:ending) -"i.a" 6 Conclusions and future work In this paper, we have presented Postdata ontology V1.0., which was developed as part of the POSTDATA ERC Project. 19 This version of the ontology takes the form of a network that is comprised of four ontologies: the core ontology, prosodic ontology, structural ontology and date ontology. The first three ontologies represent the poetic work and its essential properties including prosodic elements. The fourth ontology contains information about dates. For literary works, this information is particularly complex. The ontologies were published based on the best practices and recommendations for Linked Data vocabulary publishing.
Since developing this first version, we have begun to map poetry databases and repertories onto the ontology with the aim of populating the ontologies and sharing the information in an interoperable RDF format. We have also incorporated the ontology definitions into OMEKA 20 in the hope that this will be a straightforward and easy way to populate the ontology. This cultural heritage collection platform facilitates our support of researchers who are using the ontologies. On the other hand, its Open Source licence allows us to adapt the platform to the scientific description of the primary sources of the poetic texts, which can be found in archives and university libraries. The descriptive standards for these materials, which are reflected in the POSTDATA ontology network, respond to the needs of these institutions, who are the most regular users of the OMEKA system. Based on these processes, we plan to review the ontology descriptions and controlled vocabularies and take into account additional comments from the scholarly community. In this way, we should improve the representative capacity of these tools. It should not be forgotten that an ontology is only useful so long as it can accurately describe knowledge in the domain.