Date: Sun, 24 Oct 1993 01:28:30 -0400 (EDT)
Subject: A direct application of systematic methods to stemmatics
In response to the question "What's the point of comparing systematics and
stemmatics?" Jeff Wills mentioned a project I have been working on with Peter
Robinson, and I thought it might be helpful to give a short description of
that here, since many people will not be familiar with it.

Peter is a manuscript specialist at Oxford who has been working on the history
of Old Norse texts for some time.  He did his doctoral research on a narrative
poem called Svipdagsmal which is known from about 45 manuscript copies written
from the late Middle Ages to the early 1800s.  For his doctoral work Peter
examined all or most of the Svipdagsmal manuscripts, and since he is also
interested in computing he created electronic versions of these manuscripts
and developed a program called _Collate_ to compare them.  From his study of
the many copies of this text Peter reconstructed a stemma (a genealogical
tree) for Svipdagsmal, showing how all of the known copies are related to one
another.  Once all the data were in hand the actual process of reconstrucing
the stemma took Peter about six months.

Although _Collate_ had made direct comparison of the texts fairly efficient,
reconstructing the stemma was clearly still a very difficult task.  Peter
decided to post a challenge on the HUMANIST list (this was about two years
ago) to see whether anyone else could take his raw data -- a large table of
agreements and disagreements among the Svipdagsmal manuscripts -- and
reproduce his stemma by some other means.  I saw his challenge and recognized
the problem as one very much like phylogeny reconstruction, my own specialty,
so I requested a copy of his data.  Systematists have developed a number of
software packages for cladistic analysis in the last ten years that take
tables of data and estimate evolutionary trees from them, and I ran Peter's
data through one such program -- PAUP, "Phylogenetic Analysis Using Parsimony"
by David Swofford.  In about five minutes I produced a tree that was a
reasonable approximation of the stemma it had taken Peter six months to
reconstruct.  PAUP had been developed specifically for the reconstruction of
evolutionary trees; no thought whatever had been given to manuscript data or
the problems of stemmatics when it was written, and Dave Swofford may not have
even known about them at the time.  Nevertheless, this program was able to
take manuscript data and very quickly approximate the result that Peter had
obtained with considerably more effort.  The use of this software, developed
for research in systematics, now holds considerable promise for stemmatic
research, in particular for the reconstruction of large and complex manuscript
traditions, such as the Canterbury Tales, where the volume of data makes
analysis by inspection very difficult.  I also think it holds promise for the
reconstruction of language phylogeny, though this is yet to be explored in
detail.  Here, then, is once concrete example of the value of interdiciplinary
interaction among the historical sciences.

Peter and I published a preliminary report on our collaboration in the Bryn
Mawr Classical Review, and a fuller treatment is in press in the Oxford series
Research in Humanities Computing.  Full citations are:

Robinson, P. M. W., & R. J. O'Hara.  1992.  Report on the Textual Criticism
Challenge 1992.  _Bryn Mawr Classical Review_, 3:331-337.

Robinson, P. M. W., & R. J. O'Hara.  In press.  Cladistic analysis of an Old
Norse narrative tradition.  _Research in Humanities Computing_, 4.  Oxford:
Clarendon Press.

I have an e-version of the BMCR report, and will try to have it put up in the
Darwin-L archives shortly for people to retrieve if they like.  If anyone is
interested in experimenting with the available systematics software I would
recommend getting a copy of _MacClade_, another such program that permits
interactive analysis of trees and comes with an excellent introductory manual
on cladistic analysis.  (PAUP itself is probably a bit stiff for beginners.)
The citation for MacClade is:

Maddison, Wayne P., & David R. Maddison.  1992.  _MacClade: Analysis of
Phylogeny and Character Evolution_, Version 3.  Sunderland, Massachusetts:
Sinauer Associates.  (ISBN 0-87893-490-1)

For people who haven't worked with programs like PAUP before, I append here a
sample of the output from Peter's data, just to show what it looks like. The
root of the tree (the ancestor) is to the left, and each endpoint (St, J, 11,
Gu, etc.) represents an individual Svipdagsmal manuscript.  The horizontal
length of each branch is proportional to the number of changes taking place
along it; the lengths of the vertical lines are arbitrary, and branches may be
rotated about nodes arbitrarily as well.  This tree is by no means correct in
all details; it is an estimate, and it could have been made more precise with
additional effort on our part.  Our interest in presenting it has just been to
show that even with no special attention to the coding of the data it was
possible very quickly to come close to the result Peter had obtained earlier.

      /---- St
      |         /--------- J
      |        /---70   /----- 11
      |        |  \----69 /--- Gu
      |      /----73    \68---- 682
      |      |   | /- 289
      |      |   \--72------- 4877
    /--84      |    71---- E
    | |   /---77   /------- L
    | |   |  |  74----- 47
    | |   |  |  75 S
    | |  /-----81  \---76 223
    | |  |  | /-------- He
    | |  |  \80      /-- 3633
    | \-83    \-------------79/--- 6
    |  |         78---- 818b
    |  |  /------- 1870
    |  \-82------ 34
    |               /--- Ha
    |              49---- 215
    |             /-51 /--------------- 818
   /-67             |  \50------ 934
   |  |            /-54  /----- 1689
   |  |            |  \---53  /- 5
   |  |         /---------55   \---52- 329
   |  |        /---56    \------- 636
   |  |    /--------57  \------ O
   |  |   /-58   \------- 1872
   |  |   |  \-------------- 2797
 /--48  |   |       /------ 1108
 | |  |   |      /-60 /---- 1111
 | |  |  /---65    /----61  \--59-- 165
 | |  |  |  | /---62   \------ 1869
 | |  |  |  | |  \---- 4
 | |  \-66  \--64   /----- 1609
 /47 |   |    \------63-- 1867
 | | |   \--------- 1491
 | | \------ P
44 |   /--------- 1109
|| | /45-- 773
|| \--46- 1492
|\------- 1868
\---- Ra

Bob O'Hara, Darwin-L list owner

Robert J. O'Hara (darwin@iris.uncg.edu)
Center for Critical Inquiry and Department of Biology
100 Foust Building, University of North Carolina at Greensboro
Greensboro, North Carolina 27412 U.S.A.

