There are two distinct date representations in STEMMA: the date-value and the date-entity. These are equivalent for many purposes but the date-entity affords greater flexibility and scope. A date-value is encoded in a single text string, while a date-entity is a combination of elements and attributes.
The general representation of date-values is mentioned under Locale-independence. STEMMA must accommodate multiple calendars but, at the time of writing, no international standard yet exists. It therefore introduces a practical date-value string representation for world calendars and differentiates the ISO and STEMMA forms accepted in the STEMMA syntax as follows.
When a date is referenced in an element, it may have both a granularity and an imprecision (i.e. a margin of error) associated with it. The granularity is implicit in the date-value string (see under Dates and Calendars). The imprecision can be represented by a +/- offset or more explicit min/max limits in a date-entity.
<Date>
{ <Value [Calendar=’name’] [Calc=’boolean’] [Margin=’err’]> std-date </Value>
|
{ <Range [Calendar=’name’] [Calc=’boolean’]> <Min> std-fulldate </Min> <Max> std-fulldate </Max> </Range> } } ...
[ TEXT_SEG ] ...
</Date>
The default calendar name is “Gregorian”. The calendar may be specified explicitly in the Calendar attribute or in the date-value (as described above), but they must not conflict.
This date-entity effectively allows a specific date to be represented using a value, or a range of values, from one-or-more calendars, and this is used in modelling synchronised dates (aka: dual dates). The Calc attribute indicates that the value for that calendar was calculated as opposed to being recorded as part of the original information. A discussion of this, with examples, may be found at: Synchronised Dates.
A date-value may imply a granularity other than one day using truncated forms. For Gregorian dates, this includes the normal yyyy-mm and yyyy, as in the ISO standard, but also yyyy-mm:xx and yyyy:xx. For comparative purposes (e.g. sorting and collation) the truncated variants are equivalent to a corresponding pair of <Min> and <Max> elements. The default error margin is ± 0. The margin units depend on the granularity of the date-value. Hence, a full yyyy-mm-dd specification would expect a margin in days. If the date-value is truncated to yyyy-mm: then any margin would be in months. If the date value is truncated to yyyy: then any margin would be in years. The <Min> and <Max> must always be full-length dates (e.g. yyyy-mm-dd in the Gregorian case).
A representation of yearly quarters (e.g. Q1 = January to March) is noticeably absent from the ISO 8601 standard. Given the way that it represents week numbers, it should have made provision for the format yyyy-Qq, e.g. 1956-Q2. STEMMA acknowledges the importance of this granularity for certain records and so accommodates it in its own world-calendar syntax. The units of any margin would then be quarters of course.
The following table indicates how a Margin specification is interpreted in the context of the date units to yield equivalent Min/Max values.
Date form |
Margin units |
Equivalent Min |
Equivalent Max |
yyyy-mm-dd |
Days |
The day - margin |
The day + margin |
yyyy-mm |
Months |
First day of (month - margin) |
Last day of (month + margin) |
yyyy |
Years |
First day of first month of (year - margin) |
Last day of last month of (year + margin) |
yyyy-mm:03 |
Quarters |
First day of first month of (quarter - margin) |
Last day of last month of (quarter + margin) |
When deterministic dates, such as our normal Gregorian ones, are loaded into some type of indexing system, like a database, it is expected that they will all be stored as pairs of internal 'timestamp' values, i.e. one each for the upper and lower limits. Timestamps represent points-in-time along an absolute timeline, starting at some arbitrary base date (aka: epoch). Since these are usually represented as binary long-integers then it means issues such as the external date representation, imprecision, TZ, etc., all become irrelevant and the values can all be handled efficiently in the same manner.
The following table indicates how comparisons should be implemented between dates when either one of them may be a simple discrete date (e.g. A) or an inclusive date range with an upper and lower limit (e.g. [A1,A2]). In the context of a date range, equality is roughly translated as “some degree of overlap”.
A op B |
[A1,A2] op [B1,B2] |
A op [B1,B2] |
[A1,A2] op B |
A > B |
A1 > B2 |
A > B2 |
A1 > B |
A < B |
A2 < B1 |
A < B1 |
A2 < B |
A = B |
A2 >= B1 & A1 <= B2 |
A >= B1 & A <= B2 |
A2 >= B & A1 <= B |
A >= B |
A1 >= B1 |
A >= B1 |
A1 >= B |
A <= B |
A2 <= B2 |
A <= B2 |
A2 <= B |
A <> B |
A2 < B1 or A1 > B2 |
A < B1 or A > B2 |
A2 < B or A1 > B |
Q: Should the Dataset header specify a default Calendar or simply assume Gregorian as the default? Most Datasets will involve dates from one predominant Calendar and so it would be more convenient to specify a default for cases when no explicit one has been provided. See Locale-independence for potential Calendar names.