Chapter One: Radical Linguistics and the Word List

Anyone who has never written a book could be forgiven for supposing that published writing represents exactly what the author intended to say. R. Morris, Churches in the Landscape, 1997, 451.

When I first began to experiment with word lists, I was motivated by two factors: curiosity and frustration. I had a natural interest in the words I use every day and the related words in other languages but my interest was frustrated. As I saw it then, and still do now, those who control research into the origins of language stood between me and what I wanted to know. Instead of providing exciting insights into the evolution of language, they produced volumes of incomprehensible jargon. A few aspects of these theories as they relate to words are summarised in an appendix.

Words as a resource have been neglected by historical linguists. And yet the words we use are arguably the most important elements in our human heritage. As individuals we are adapted physically and neorologically to use words. Every human society on the planet has inherited a lexicon from its forebears, created out of their own resources and for their own needs. I have always felt that it should be possible to understand how, and why, and perhaps even when and where language had been created, and this belief has been justified. If I now understand something of their origins, it has been thanks to words themselves. For words, unlike bones, stone tools or fragments of pottery, are not silent witnesses. Words carry meaning with them and in the following chapters they largely speak for themselves.

My proposals and methods rest on certain assumptions that will certainly seem strange to those familiar with current theories about language and I need to warn them of this. They seemed very strange to me at first but in fact I make no novel assumptions. All the bits have been lying around in plain view for several generations, waiting only for someone to recognise that they fit together. The phonetic approach will no doubt seem simplistic but it works very well and appears to reflect the reality of transmission in an oral society.

The difficulties mainly concern barriers to acceptance. My method will seem strange and even simplistic to those familiar with conventional linguistics. In fact none of my assumptions are novel. All the bits of this theory have been lying around in plain view for several generations, like fossil bones on the plains of East Africa, waiting only for someone to recognise what they are. The word list works very well as a practical tool and appears to reflect the reality of lexical transmission in an oral society. I should stress that it is only with oral societies that we have to do here. Literacy introduces a game with entirely different rules.

A possible barrier to acceptance is the adjustment to the time-scale emvisaged for certain linguistic events. Instead of the eight thousand years proposed by some historical linguists or the fifty thousand proposed by others, W. Noble and I. Davidson, Human Evolution, Language and Mind: a Psychological and Archaeological Enquiry, 1996. the date of the first use of language has receded to an enormous distance. One reference point is 1.4 million years ago, with the appearance of Broca’s area, the part of the brain which controls the sequential use of the vocal cords, a process essential for speech. But even this development follows on and is enhanced by the use of language. It is not the earliest point.

We also need to reject most of what we think we know about the languages of Europe, notably the so-called Indo-European theory which proposes that they were brought to Europe by Neolithic farmers. This theory is wrong. It has never worked well – many books and papers on the topic include ‘problem’ or ‘search’ in their titles – and it should be rejected. But to do so and replace it with a more rational theory is no simple thing. Indo-European (or Indo-German theory) has been installed in a dusty corner of our intellectual attic for a very long time The term 'Indo-European' was first cited in English in 1814 while 'Indo-Germanic' first appeared in 1823. Sir William Jones noted the similarities between Sanskrit, Greek and Latin as long ago as 1782. He deduced that this unlikely trio were the descendants of a single language, though no-one has ever worked out where, when or what this was.

But to trace developments back a million years and perhaps even further is no longer so so difficult to believe. Thanks to recent discoveries in human genetics and general advances in palaeontology we are all more familiar than we once were with the long view of human origins. And the long view allows us the time needed to accommodate the immensely complex development of modern words.

What I can now demonstrate is not entirely new after all. It has been proposed or predicted to some extent by thinkers such as Condillac, Darwin and Corballis.1 The most plausible idea is that language began among early men as a system of silent signing to which supplementary hoots, grunts and hisses were added for emphasis or definition. By 1.4 million years ago these verbal signs had evidently evolved into spoken language.

The discoveries laid out in the following pages began almost accidentally. I knew that the American Census board had evolved a system for indexing surnames by reducing the number of effective consonants (the Soundex system). Was it possible that Soundex worked for other elements of language? It took me some time to make the necessary adaptations and compile my first word list. This prototype eventually became the CM list and other lists began and grew in a similar way.

Radical etymology

The word list provides a revolutionary way of looking at language. Every list so far compiled has produced surprising and coherent insights and the technique gives great promise of further discoveries bearing on the way we understand the origins, history, and evolution of language and even, since words mean something, aspects of cultural development. Radical linguistics is about as far as one can get from the current use of lexicon in historical linguistics.

In Western Europe today the guiding principles in historic linguistics are still those of Indo-European theory, a venerable but unproductive concept which is subject to several logical flaws. It confuses assumption with proof and offers no mechanisms for change beyond the mantra that ‘language changes’. Enquiries into the origins of words still rest, as they have done for centuries, on a word-by-word search for a similar word in a neighbouring language that is believed to be older (as if all languages were not of the same age) s or in early written sources, and on the idea that the one ‘derives from’ the other. This merely moves the burden of explanation from one place to another. The old teaser remains valid: ‘If we got it from the Romans, where did they get it from?’ We will come back to this point.

Indo-European theory and Radical Linguistics agree on one point, and on one only: that the use of similar words with similar meanings in different languages is a sign of their common origins. Otherwise they disagree. Indo-European theory traces this common origin no further than the Neolithic and a hypothetical Proto-Indo-European language (PIE) which, it is supposed, was imported into Europe by farmers migrating from the Middle East perhaps 8,000 years ago: there is no agreed date but it is claimed that language cannot be traced beyond this date. This vision of Indo-European migrants flooding into Europe and replacing earlier populations is established as a fact in the minds of all historical linguists and a dwindling number of archaeologists. These experts seem to have forgotten that, as a way of explaining the origins of European language, Indo-European theory is nothing more than a hypothesis, a very disappointing one at that, which has created many problems and solved none. In other words, it does not work and is ripe for replacement. The earliest discoveries were certainly impressive but that is not a valid reason to continue to place one’s trust in a theory when every bit of new evidence suggests something very different.

In terms of genetic input, Neolithic immigrants into Europe contributed less than 20 per cent of the modern population.12 Most Europeans trace their ancestry back to the original hunters of the Upper Palaeolithic. Farmers using pots are certainly very visible against the slighter traces of Mesolithic hunters but are perhaps not as important as archaeologists have thought. Even the Linearbandkermik, the best candidate for actual migration against diffusion of ideas, is now seen to be less homogeneous and more hesitant.13 After a relatively short time the visibility of these migrants declines and they disappear. Far from being easy to demonstrate, the IE language which supposedly swept over Europe at that time is still limited to a list of hypothetical words, uncertainly supported by phonetic extrapolation.

The inability of the Indo-European approach to explain in any coherent way the origins of European language can be shown by the word car. We devote an entire chapter to this word and its relatives. Gaelic alone uses it in fifty or sixty distinct ways (page ref). And yet the most Indo-European scholars can say of *karos or *karsos is ‘cart, raft’,14 or simply ‘vehicle’. 15 They describe it as ‘a wanderword of uncertain provenance’. This is not good enough. If historical linguists persist in claiming, as they do, that European languages derive from a single language brought in to Europe by farmers in the Neolithic, they are under an obligation to explain how *karos also came to mean a stone, a necklace, a fish, the jaw, and a great many other things. To this and other similar questions there are no answers.

Radical Linguistics begins at the other end. It assumes that all languages are equally old, since they have all evolved in a continuous fashion from the first meaningful syllable to the present day, and that much of this lexical continuum remains in use. It assumes that the stimulus to lexical creativity is not a random process of change-for-no-reason which can be measured but not understood but the need to find new names for new items of culture as they are invented, balanced against the more gradual loss of obsolete words for discarded items of culture. This is true of preliterate societies and very largely true of literate societies. If we want to know about the origins and development of the word car we have only to pull together the relevant facts. There is no need to speculate and very little need to extrapolate. The CR words as a group tell us everything there is to know about their own history in the area examined. Some parts of the story are no doubt sketchy, obscure or missing altogether and the categories are open to varying interpretations but those proposed here show a logical outline of what has gone on. There are archaic words for archaic items and more recent words for more recent items. To define car as ‘vehicle’ is not wrong but it is only one small if currently fashionable aspect of a complex development which may well have begun long before Europe was settled. CR words for fire and for symmetrical objects may be as old as mankind’s experience of these things.

The Table of Equivalent Consonants (TEC)

Word lists are based on the ordinary words listed in modern dictionaries. They do not define the oldest form of any words or account for the variety of present-day sounds. Radical linguistics has one tool, the Table of Equivalent Consonants (TEC), and the TEC has one application, the word-list. The TEC was inspired by Soundex, a phonetic indexing system which was developed by Robert C. Russell and Margaret K. Odell and patented in 1918 and 1922. It was designed to bring order to the wildly varied spellings inflicted on immigrant surnames by English-speaking clerks. The main feature of Soundex was a reduction in the number of effective consonants from the twenty or thirty in modern English to six or seven. This allowed archivists to convert surname variants to standard codes and reduced chaos to something approaching order. The efficacy of this empirical but effective system arises from the fact that these variant spellings are not only the result of sloppy speech, or imperfect education, or even laziness on the part of the clerks: they represent what actually happens to words when they are passed from mouth to ear and then written down. Human language has been transmitted orally for most of its existence and then transcribed in a similar way. I wondered if we could apply Soundex to language generally.

The table of equivalents or TEC which was developed provides a framework within which we can recognise related words and index them despite their current variety in spelling. The various equations which operate within and between the languages of Europe can be reduced to six classes of consonants represented by B, M, C, D, L and R (Fig. 1.1). A measure of overlap is discussed below. These classes provide 36 disyllabic roots (Fig. 1.2). Vowels are disregarded because they often change to show case, gender, number or tense, and from one dialect to the next, none of which affects the meaning of the basic root. For our purposes E. sang, sing, song and sung are all the same word. Our selection of words rests entirely on their consonantal structure and ignores meaning. So far it has been applied mainly to European languages but it appears to have general validity.

Fig.1.1 The six TECs

TEC 1 B Mb P F V W
TEC 2 M N Ng Gn F V
TEC 3 C G J K Q S X Y Ch Sh H LL
TEC 4 D T Th Z

Fig.1.2 The 36 disyllabic roots


These six classes appear to have wide validity, though not all of them are used in every language. They embrace the most important of the changes known as lenition, mutation or aspiration which are still in active use in Welsh and Gaelic but which were once more widespread.5 This TEC is a working compromise rather than a fixed position. It could have more or fewer or different classes but these six classes fit most European words. This suggests that this corresponds to the range of sounds in use when Europe was populated some 50,000 years ago. Our TEC represents what we might call their foundation consonants.

Six consonants may seem excessively few but even this small number is reduced on occasion by the equivalence of B/C, B/M, and L/R, which are discussed below. Other less common equations will be discussed as they arise.

B and C

The evidence suggests that B and C derive from a single ambiguous sound which survived at the turn of the twentieth century among the Gaels. It was typical of one very isolated place in Britain: the island of St Kilda (Hirta) which lies in the Atlantic Ocean fifty miles west of Lewis. In 1898, Norman Heathcote and his sister spent the summer there. They had prepared themselves by learning Gaelic as he planned to collect place-names. However he was defeated by the local pronunciation for, as he noted, ‘when a St Kildan pronounces a word I never have the least idea whether it begins with a c, an f or a p; all I can detect is a sort of guttural sound’. N. Heathcote, St Kilda 1900, 132.

We can deduce from early spellings of other Gaelic place-names that this guttural sound was once widespread. In different spellings the second element of Balquhidder (Perthshire) is found with B, P, F, FF, Ch, Quh, and Wh. This reflects the fact that this sound has no equivalent in the Latin alphabet.

This leads to the supposed division of Celtic languages into ‘P’ and ‘Q’ varieties. This is a scholarly notion, based on the fact that in Welsh some words are spelled with P while in Gaelic they are spelled with C. In fact there are very few of these words – I have found a dozen pairs – and when we look beyond them things become much less clear. W. planta ‘children’ certainly begins with P, and G. clann ‘children’ certainly begins with C, but what are we to make of W. llanc, ‘young man, youth’ which is pronounced ‘hlanc’ or ‘chlanc’? And how can we fit W. chwaer and G. piuthar ‘sister’ and W. cus and G. pog ‘kiss’ into the P/Q classification? A simple page count shows that Welsh has three times as many C words as P words and, as we have seen, P, F and C were indistinguishable in certain Gaelic place-names. There are plenty of examples of confusion between P and Q but nothing that resembles a rule. Instead we may see planta, clann and llanc as variant spellings of a word which began with the intermediate B/F/C sound (the Heathcote consonant) and which meant ‘young men’ or ‘hunters’.

Gr. penta and Lat. quinque ‘five’ are also held up as examples of the P/Q divide but, again, establish no kind of precedent. Latin may use a Q word for ‘five’ but it uses two P words, bos ‘ox’ and vacca ‘cow’ instead of a C word akin to cow. English has the B variants, four and five, the C variants, sister and kiss and B/C equivalents such as bull/cow, can/pan, shall/will, sallow/ willow, bake/cake, boil/coal, and peep/peek/keek.

When the Heathcote consonant evolved into the two distinct sounds now represented by TEC 1 and TEC 3, existing words were converted on a more or less random basis to B words or C words. When transcribing a word which retained this old consonant, the clerk had to make a choice between B and C, since the Latin alphabet is incapable of representing a noise that is neither one nor the other. Had Mr Heathcote been employed by the Ordnance Survey to record the Gaelic place-names of St Kilda he might have wavered between C, F and P but must eventually have plumped for one of them. If he chose P, we might now believe that the natives of St Kilda spoke Welsh. The B/C confusion is not a reliable basis for a major division of European languages. It is significant only because we will find signs of this ambivalence in the older levels of language.

B, Mb and M

One other case of consonantal ambiguity has fortunately not attracted learned speculation. M is now an independent consonant with its own class (TEC 2), but occasionally we find it as an initial consonant where we would expect B or P, as in G. maolan ‘beacon’ for *baolan. This M is regarded as an aspect of B and has been designated Mb. A sign of their former identity is that both B and M aspirate to V. Mb is found in Bantu languages (mbula ‘rain’) and in Albanian (mbret ‘king’). The consonantal cluster found in E. lamp, lambent, nimble, number and shamble is treated here as M. Related words such as Du. nummer ‘number’, Sc. shammle ‘shamble’ and E. climb and limb mark its evolution to M (TEC 2).

L and R

The divergence of L and R is probably more recent. European languages generally distinguish between L (TEC 5) and R (TEC 6) but there are occasional signs of doubt, as between Lat. margarita and Heb. mar’ga’leet ‘pearl’. Japanese and Bantu languages still use an indeterminate sound. This suggests that they were already self-contained languages before L and R diverged. This is clear enough in the case of Bantu since the ancestral Bantu remained in Africa when the rest of us emigrated to Asia. It appears that by 50,000 BP, when Europe was settled, people distinguished between the two sounds, since a single foundation event is more probable than a multitude of identical but unrelated local events. Again, the main importance of this equivalence is that we should be prepared to equate L with R when exploring early levels of language.

Tested in Armenian

How well does our TEC work? Dr. Adour Yacoubian, who pioneered the transcription of Armenian into the Latin alphabet, discovered a great many consonantal equivalents which produce ‘better results when comparing with English words’ and, he added, ‘vice versa’. His equivalents are as follows. A.H. Yacoubian, English-Armenian and Armenian-English Dictionary, Los Angeles, 1944, 6.

Q for G; D for T; P for B; P’ for Ph; T’ for Th; G for Q, C, K, Ck, Ch; C’ for J;
J for Ch; H’ for L; Gh for L, R; H for F, V, W; Sh for S; Ch for Kh; Tz for F. 7

All is as it ought to be until we get to H’ for L and Gh for L, R. Examples of H’ for L and Gh for L and R are not common: Arm. deghi ‘place, spot’ for Lat. terra, Arm. gat’oghigea for E. ‘catholic’, Arm. ghampar for E. lamp, and Arm. h’amr for E. lame. H for F, V and W points to overlap between B and C (discussed above). H’ and Gh are evidently a guttural sound like Welsh LL and Parisian R and I have added them to TEC 5 and TEC 6 as LL and RR. The final equivalence, Tz for F, is found in Arm. tzayn ‘sound, voice’, as E. tone and Gr. phone ‘voice, sound’, in Arm. tze’t’ ‘oil’, as E. fat and in Arm. tziut’ ‘pitch’ as E. soot. This appears to be a variant of Heathcote’s consonant. We will find it again in the Fire list (page ref).


Aspiration is responsible for several regular changes in the sounds of consonants.

Fig.1.3 Aspiration
TEC 1 B, P > V or F > (-)
TEC 2 M > V or F > (-)
TEC 3 C, G, S > Ch > H > (-)
TEC 4 D, T > Th > H > (-)

These changes are still active in Welsh and Irish but were once observed in Europe generally.2 They are easiest to follow in place-names and surnames which are evidently old enough to have undergone a lengthy process of change. The following examples come from France and Britain.

Fig.1.4 French surnames (Maine-et-Loire)
Camar, Chamard, Hamard, Amaré
Chasseboeuf, Haudebault, Audebeau
Chaussebourg, Haudebourg, Audebert
Moreau, Forray, Horreau, Auray

Fig.1.5 English surnames
Callicott, Caldecot, Cawcutt, Hallegate, Alcock, Alcott, Aucott
Chadwick, Chattock, Shatlock, Shacklock, Hadcock, Atcock, Awcock
Charnock, Harnock, Arnot
Pallis, Challis, Hallis, Alice
Scatchard, Hatchard, Achard, Ackert
Zachery, Hacker, Ackery

Fig.1.6 English place-names
The suffixes ton, ing and ham are clerical affixes meaning ‘farm’ or ‘estate’.
Ballingham, Billingham, Willingham, Ellingham
Catforth, Chatworth, Hatford, Atworth.
Coldingham, Holdingham, Aldingham
Bellingdon, Wellington, Hallington, Allington, Ellington.
Benwick, Fenwick, Anwick
Marlow, Farlow, Harlow

The evolution of consonants

The six effective consonants, B, M, C, D, L and R, appear to represent a horizon in the evolution of European language. However the much larger number of sounds now in use in Europe not only define national languages but show us people being inventive with sounds, a process that has no doubt been driven by the need for additional lexical resources as human culture evolved. The Finns, living a quiet life in the northern forests, achieve phonetic exactitude with only 9 consonants (D K L M N P R S T) and 4 aspirates (H J V Y). Several early alphabets such as Ogam and Runic had only 16 letters including 12 consonants. Runic lacked graphs for D, G, P, and V/W but made do with T, K, B and U. Irish still has only 13 consonants but compensates with very complicated spelling. Greek originally had only 12 but after several revisions ended up with 18.10 Cyrillic is loosely based on Greek but has 22 consonants including 9 aspirates, while the Arabs use 27 consonants including 8 aspirates. All of this pales beside the complexity found in the Caucasus where Abkhaz has 56 consonants or regular consonantal clusters, Bzyp, a dialect of Abkhaz, has 67, and Ubykh, recently extinct, had 80.3 In various places the ability to pronounce certain difficult sounds has been used as a way of distinguishing friend from foe. Is there a correlation between difficult sounds and ethnic confrontation?

As we all know, the end-result of this process of consonantal evolution is that every language in Europe now uses a distinctive range of sounds. If we work back in time we can imagine this modern variety dwindling down to the six or fewer effective consonants represented by our TEC. It provides a useful horizon, no more.

The word-list

The TEC allows us to construct word-lists which consist of words which share one of the disyllabic roots in the table (Fig.1.2). If we start with the English word sum the root is CM and our list must include all the words in which any variant of C (TEC 3) is combined with any variant of M (TEC 2). This will normally generate a list of several hundred words which can be allocated to categories like those of a thesaurus. A word-list can be long or short.

A long list is inclusive. It represents a complete slice of the lexical pie. A long list should, ideally, contain every word that fits the designated root in every contributing language. It is important that we should not be influenced by any preconceived ideas about which themes might or might not be relevant. If we begin the CM list with E. sum and Du. samen ‘together’ we might see no reason to include Lat. sanguis ‘blood’ or E. song but if they are not included the final picture will be incomplete and may be misleading. On the other hand we can limit duplication by the application of common sense. There is normally no need to bulk up a list by including identical words with identical meanings in closely-related languages such as Dutch and German or Latin and Spanish. The words listed may be nouns, adjectives or verbs but they should be clearly defined and, where possible, devoid of prefixes and aspirates. Obsolete meanings are in general more valuable than recent ones – it is more useful to know that a car was once a sledge than to know that it is now the French word for ‘bus’, though that also is part of the picture. Dialects are a useful source of obsolete words and learned or literary words are seldom of any value, though their Latin roots may be worth searching out.

The categories or subdivisions which I have used are by no means fixed or conclusive but certain of the groups are well defined and may even follow a pattern. The categories become less certain as we move to what appear to be more recent words which are fewer in number, more varied in form and range over a wider variety of topics. This no doubt reflects the fact that they are at the ends of a multitude of peripheral twigs.

A short word-list may be restricted to a specific theme, to a limited range of roots, or to a defined geographical area such as the British Isles. Since it is less objective, its conclusions are less reliable but it is a useful way of exploring a single word or theme and is particularly useful for place-names.

