15.30
March 13 2009
Seminar Room CAssessing Computational Methods in Historical Linguistics
Mark Donohue
Argumentation in historical linguistics has (for a number of good reasons - Harrison 2003) been dominated by consideration of the comparative method, which relies on unpredictable form-meaning pairings in the lexicon or in morphological paradigms.
Recently various claims have been made about the usefulness of different methodologies (essentially lexical or typological/structural) in the determination of linguistic relationships, and the replicability of traditional methods by these methodologies (eg., Dunn et al 2005, Gray et al 2009; see Nichols and Warnow 2008). In light of our knowledge about the diffusability of almost all elements of linguistic structure (Thomason and Kaufman 1988, Donohue et al 2008, Holman et al 2008), we must question whether these techniques detect linguistic families, or linguistic areas. The matter is complicated by the fact that in most cases geographic distance does correlate with linguistic genetic distance.
We examine 48 languages of Eurasia, selected on the basis of maximum representation in the World Atlas of Language Structures (Haspelmath et al 2005). Applying statistical analyses to the typological data encoded for each language in WALS (Huson and Bryant 2006), I compared resulting clusters to the phylogenies established by the comparative method, and to recognised 'linguistic areas', in order to empirically determine whether the analysis of typological features best detects spatial distance or phylogenetic distance (cf Sokal 1988, Wichmann and Saunders 2007).
Investigating lexical methods I examined the reported replication of the Austronesian family tree by Gray et al (2009), which relied on lexical methods (NOT comparative method techniques). The tree structure that they arrived at can be compared to that reported by Blust, examining the rate of successful replication of subgroups, and to the spatial distribution of languages. This allows us to test whether the analysis of lexical cognate classes detects spatial distance or phylogenetic distance.
Finally, I show the results of an attempt at replicating the Comparative method computationally, using data on the West Wapei subgroup of the Torricelli languages (Crowther 2001), for which both lexical and sound change data are available.
References
Crowther, Melissa. 2001. All the One language(s). Thesis, University of Sydney.
Donohue, Mark, Søren Wichmann and Mihai Albu. 2008.
Typology, areality and diffusion. Oceanic Linguistics 47 (1): 223-232.
Dunn, Michael, Angela Terrill, Ger Reesink, Robert A. Foley and Stephen C. Levinson. 2005. Structural phylogenetics and the reconstruction of ancient language history. Science 309 (5743): 2072-2075.
Gray, R.D. et al 2009. Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. Science 323: 479-483.
Harrison, Shelley. 2003. On the limits of the comparative method. In The Handbook of Historical Linguistics, eds. Brian Josephs and Richard Janda, 213-243. Oxford: Blackwell.
Haspelmath, Martin, Matthew S. Dryer, David Gil and Bernard Comrie, eds. 2005. The World Atlas of Language Structures. Oxford: Oxford University Press.
Holman, Eric W., Christian Schulze, Dietrich Stauffer & Søren Wichmann. 2007. On the relation between structural diversity and geographical distance among languages: observations and computer simulations. Linguistic Typology 11 (2): 395-423.
Huson, D. H., and D. Bryant. 2006. Application of phylogenetic networks in evolutionary
studies. Molecular Biology and Evolution 23 (2):254–67. Software available
from http://www.splitstree.org/
Nichols, J. and T. Warnow. 2008. Tutorial on Computational Linguistic Phylogeny. Language and Linguistics Compass 2 (5): 760-820.
Sokal, R.R. 1988. Genetic, Geographic and Linguistic distances in Europe. Proceedings of the National Academy of Sciences of the U.S.A. 85 (5): 1722-1726.
Thomason, S.G. and T. Kaufman. 1988. Language contact, creolization, and genetic linguistics. Los Angeles: University of California Press.
Wichmann, S. and A. Saunders. 2007. How to use typological databases in historical linguistic research. Diachronica 24: 373–404.