Dataset Structure

The body of a Dataset has the formal structure:

 

DATASET_BODY=

 

[ PERSON | ANIMAL | PLACE | GROUP | EVENT | CITATION | RESOURCE | SOURCE | MATRIX | NARRATIVE_TEXT ] ...

 

The main entities in the body of a Dataset are defined by the following elements:

 

 

Each of these entities has an alphanumeric Key associated with it by which it can be referenced from other elements, e.g. <Person Key=’Tony123’>. See Symbolic Names for more details.

 

Those top-level entities are primarily linked (i.e. ignoring narrative references, inheritance, and hierarchies) as follows:

 

Person → Event (for Birth & Death), Group, Source

Animal → Event (for Birth & Death), Group, Source

Place → Event (for Creation & Demise), Source

Group → Event (for Creation & Demise), Source

Event → Source

Source → Citation, Resource

Matrix → Source

 

These direct links from one entity to another must be unique, i.e. no duplicates. A reference to a Person, Place, Animal, Group, or Event can be identified purely by its target Key. Links to Resources and Citations are slightly different because those linked entities can be parameterised. However, the expanded URL of a Resource, and the source-type URI plus the effective parameter set (see Inheritance and Parameters) of a Citation, will identify corresponding unique links.

 

STEMMA distinguishes a “link” from one entity to another from an embedded “reference” to some real-world subject or notion. The latter may occur when generating a reference from an entity Key or when marking-up a textual reference and equating it with a given entity (see Semantic Mark-up). The following elements are defined for linking entities in this fashion:

 

PERSON_LNK=

 

<PersonLnk [Key=’key’]>

[ PERSON_PROPERTY ] ...

[ TEXT_SEG ] ...

</PersonLnk>

 

ANIMAL_LNK=

 

<AnimalLnk [Key=’key’]>

[ ANIMAL_PROPERTY ] ...

[ TEXT_SEG ] ...

</AnimalLnk>

 

PLACE_LNK=

 

<PlaceLnk [Key=’key’]>

[ PLACE_PROPERTY ] ...

[ TEXT_SEG ] ...

</PlaceLnk>

 

GROUP_LNK=

 

<GroupLnk [Key=’key’]>

[ GROUP_PROPERTY ] ...

[ TEXT_SEG ] ...

</GroupLnk>

 

EVENT_LNK=

 

<EventLnk Key=’key’>

[ TEXT_SEG ]

</EventLnk>


RESOURCE_LNK=

 

<ResourceLnk Key=’key’>

[ PARAM_VALUE  ] ...

[ TEXT_SEG ] ...

</ResourceLnk>

 

CITATION_LNK=

 

<CitationLnk Key=’key’>

[ PARAM_VALUE ] ...

[ PARENT_CITATION_LNK ]

[ TEXT_SEG ] ...

</CitationLnk>

 

SOURCE_LNK=

 

<SourceLnk Key=’key’>

[ PLACE_PROPERTY ... | ANIMAL_PROPERTY ...

| PLACE_PROPERTY ... | GROUP_PROPERTY ...  ]

[ TEXT_SEG ] ...

</SourceLnk>

 

For SOURCE_LNK, the type of Property value allowed depends on what entity type the element is embedded within. See also EV_SOURCE_LNK.

 

Sub-models

STEMMA has two notional sub-models: conclusional — representing conclusions for presentation and sharing — and informational — representing the assimilation and analysis of information used to form the conclusions. The basic interrelation between these top-level entities in STEMMA’s conclusional sub-model is as follows:

 

 

 

 

 

See Source for a mention of the complementary informational sub-model.

 

An Event is basically a representation of a date (or range of dates) for which source information exists, and provides a contextual container identifying the where, when, and who of that source information. The subject references within the sources are represented by the Event entity being connected to multiple subject entities, such as Persons. Source entities may reference supporting Citations, and Resources such as images, photographs, documents, etc.

 

Each element can contain narrative sub-elements, as defined below. The narrative text can freely embed references to the other top-level entities and this facility for ad hoc linkage allows arbitrary historical connections to be recorded.

 

Several entities may link to others of the same type in order to define a hierarchical structure: Place/Group Hierarchies, Person/Animal lineage, Citation chains, and hierarchical Events. An additional form of same-type linkage is used to define an inheritance mechanism for Events, Citations, and Resources.

 

A second diagram is necessary in order to explain how STEMMA associates Properties with its subject entities. These constitute extracted and summarised information from a supporting source, and are usually associated with an Event. This because most such items are time-dependent (see Time-dependent Attributes) and an Event represents something that happened in a given place at a given time. It is possible to declare static properties, within the respective subject entity, but these are rarer. In both cases, they are assembled in a corresponding <SourceLnk> element that identifies the supporting source. Properties for the Event itself, such as where and when, are provided separately in the <SourceLnk> element.

 

 

 

 

The entities Person, Animal, Place, Group, and Event can also be associated with entities in one-or-more external systems using instances of the following element:

 

EXTERNAL_ID=

 

<ExtID> prefix:id </ExtID>

 

The value prefix identifies the external system. It is a prefix associated with a namespace URI, as described in Extended Vocabularies. For instance, <ExtID>fs:KWVC-NG4<ExtID> linking to a entry for a John Williams on FamilySearch.

 

An alternative use of this feature is to export an associated identifier for use in some external system. For instance, if exporting STEMMA data for use in a Web-based system then ExtID could be used to define corresponding identifiers different to STEMMA’s Key values for use on that Web site.