Probabilities

Accepted genealogy certainly accommodates qualitative assessments such as primary/secondary information, original/derivative source, impartial/subjective viewpoint, etc. STEMMA supports each of these assessments in the Source entity. They can assist when assessing evidence in order to associate a level of confidence with an item of data (e.g. a date or relationship). That level of confidence will obviously affect conclusions and inferences derived from it. STEMMA goes further, though, by allowing qualitative assessments to be expressed quantitatively using probabilities (written as percentages).

 

This is viewed as a controversial feature by some people since the selection of base probabilities is somewhat subjective, although the mathematics of combining them to investigate scenarios is well-defined. We accept the use of probabilities for gambling, even on subjective issues like the “form of a horse”, but we may feel it implies inappropriate accuracy in our genealogical research. Our brains mostly handle probabilities in an analogue fashion, and we have no issue with ordered, non-numeric scales such as “very likely, likely, probably, maybe, unlikely, improbable”. The step of associating numeric probabilities is a relatively small one from that perspective.

 

The subject of "Structured Indications of Uncertainty" is discussed in the context of TEI here: Structured Uncertainty in section 17.1.2. A further discussion directly related to genealogy may be found at:  You're Probably Right.

 

A STEMMA rationale for using percentages in the Surety attribute rather than simple integers was partly so that it allows some basic arithmetic to assess derived data. For instance, if A => B, and B => C, then the absolute surety of C is surety(A) * surety(B).[1] Another potential advantage, though, is that of ‘collective assessment’. Given three alternatives, X, Y, & Z, simple integers might allow an assessment of X against Y, or X against Z, but not X against all the remaining alternatives (i.e. Y+Z).

 



[1] The probability of ‘A or B’ being true is expressed as:

p(A U B) = p(A) + p(B) - p(A ∩ B)

where p(A ∩ B) is the probability of ‘A and B’ being true. If A and B are independent of each other then:

p(A ∩ B) = p(A) * p(B)

If A and B are mutually exclusive then:

p(A ∩ B) = 0

and

p(A U B) = p(A) + p(B)

Bayesian probability takes this to a much deeper level where a simple true/false is inadequate.