Darwin-L Message Log 7:95 (March 1994)

Academic Discussion on the History and Theory of the Historical Sciences

This is one message from the Archives of Darwin-L (1993–1997), a professional discussion group on the history and theory of the historical sciences.

Note: Additional publications on evolution and the historical sciences by the Darwin-L list owner are available on SSRN.

<7:95>From DARWIN@iris.uncg.edu  Thu Mar 31 21:22:42 1994

Date: Thu, 31 Mar 1994 22:22:30 -0500 (EST)
From: DARWIN@iris.uncg.edu
Subject: Re: cladistics & distance data
To: darwin-l@ukanaix.cc.ukans.edu
Organization: University of NC at Greensboro

A few days ago Paul DeBenedictis asked a couple of questions that deserved
answers but didn't elicit any at the time.  I thought I'd take a shot at
them briefly.

Paul asked: Can one estimate cladistic relationships from distance data?

As Paul correctly noted for those who aren't familiar with this topic in
systematics, there are two basic classes of data that systematists might
use: distance data and character data.  A character datum would be a
statement that taxon x has character state k.  A distance datum would be a
statement of that taxon x and taxon y are (say) 3.2 units apart.  Character
data is usually arranged in a table of taxa-by-characters, and distance data
is usually arranged in a table of taxa-by-taxa.

Paul phrases the question correctly when he asks whether distance data can
be used to _estimate_ cladistic relationships (the sequence of branching
events in the history of a clade).  It is best to think of phylogenetic
history as a complex branching sequence of events that we wish to estimate.
That being the case, I think the answer is certainly yes, distance data can
be used to estimate branching sequences, as can character data.  The
nuts-and-bolts question in any particular case is how well do distance
methods or character methods estimate any particular evolutionary chronicle.
I could estimate the phylogeny of a collection of taxa by first joining all
those whose names begin with "A", and then all those whose names begin with
"B", and so on down the alphabet.  This resulting tree would be an estimate
of the phylogeny of the taxa in question.  It would probably not be a very
good one, of course.  I have found that thinking about phylogenetic
inference the estimation of a sequence of events -- something that Greg
Mayer convinced me to do -- is very liberating, as it allows one to escape
some of the polemical rhetoric of the older (1960s and 1970s) systematic
literature, which was filled with accusations of "my method generates
testable hypotheses and yours doesn't", "distance approaches are worthless",
etc.  All these various approaches can give estimates of phylogeny; the
question is which estimates are better, and that is not easily answered as
it depends not just on theory but also the practice of each individual

Since I answered Paul's first question as I did, my answer to his second
question -- "What makes a technique cladistic?" -- may not be surprising.
I have come around to the view that Greg Mayer has expressed here once or
twice (he taught me everything I know), that we should take the terms
"cladistic" and "phenetic" to refer to intentions rather than particular
procedures, types of data, or algorithms.  A technique is cladistic if it is
used for the purpose of estimating phylogeny.  Sibley's intention in his DNA
hybridization work is clearly to estimate phylogeny, and so he is using
distance data in a cladistic manner.  Now whether the phylogenetic estimates
he produces are good ones is a separate issue.  I have been critical of them
here before.  But the fact that they may be poor estimates in some cases
does not, in my view, make them non-cladistic.

A follow-up for the linguists: Were the techniques of "lexicostatistics" and
"glottochronology" that were popular in the 1960s distance techniques?  That
is, did the lexicostatisticians calculate "distance" values among languages,
and use these to reconstruct language history, rather than using individual
lexical items (linguistic character data)?  Is that also perhaps what
Greenberg does with his technique of "mass comparison"?

Bob O'Hara, Darwin-L list owner

Robert J. O'Hara (darwin@iris.uncg.edu)
Center for Critical Inquiry and Department of Biology
100 Foust Building, University of North Carolina at Greensboro
Greensboro, North Carolina 27412 U.S.A.

Your Amazon purchases help support this website. Thank you!

© RJO 1995–2022