Chapter 1: The Heathcote Sound and the Wordlist

In science, one tries to phrase questions narrowly and carefully, then gather the data so as to force nature to yield up unequivocal answers. We are just getting started at asking productive questions. W.H. Calvin1

When I first began to experiment with words, I was motivated by two factors: curiosity and frustration. I was curious to understand the origins of the words I use every day and their relationship with words in other languages but this curiosity was frustrated. At the most words were attributed to a similar word in a neighbouring language or in Latin. But where did German and Latin words come from? To that there were no satisfactory answers. As I saw it then, and still see it now, those who control research into the origins of language not only monopolised the topic but for years had stood between me and what I wanted to know. Instead of providing workable theories and exciting insights into the evolution of language, they produced volumes filled with incomprehensible jargon. Moreover, no two of them agreed about anything.

Words have been abused and neglected by historical linguists. And yet our words are arguably the most important elements in our human heritage. As individuals we have been adapted for more than a million years, physically and neurologically, to use words. Every human society on the planet has inherited a lexicon from its forebears, created out of their own resources and for their own needs. I have always felt that it should be possible to understand how, and why, and perhaps even when and where language had been created, and this belief has been justified. If I now understand something of their origins, it has been thanks to the words themselves. For words, unlike bones, stone tools or fragments of pottery, are not silent witnesses. Words carry meaning with them and in the following chapters they largely speak for themselves.

My proposals and methods will no doubt seem strange at first to anyone familiar with current theories about language. They seemed strange to me at first but in fact they are based on existing well-established rules. I have made no novel assumptions. All the bits have been lying around in plain view for several generations, waiting only for someone to recognise that they fit together. The phonetic approach will no doubt seem simplistic but it works very well and appears to reflect the reality of transmission in an oral society.

The difficulties mainly arise from the monopoly exercised by historical linguistics, in particular the Indo-European theory. My method will seem strange and even simplistic to those who believe that language came into Europe with Neolithic farmers no more than six or eight thousand years ago. In fact all my assumptions are supported by traditional linguistics. All the bits which make up this new approach to the origins of language have been lying around in plain view for centuries, like fossil bones on the plains of East Africa, waiting for someone to recognise them for what they are. The wordlist works very well as a practical tool and appears to reflect the reality of lexical transmission in an oral society. I should stress that it is only with oral societies that we have to do here. Literacy introduces a game with entirely different rules.

The most critical barrier to acceptance is probably the very long time-scale envisaged for certain linguistic events. Instead of the eight thousand years proposed by some historical linguists or even the fifty thousand proposed by others,2 I have removed the date of the first use of language to an enormous distance. One reference point, but not the first, is the appearance of Broca’s area 1.4 million years ago. This is the part of the brain which controls the sequential use of the vocal cords, a process essential for speech. But even this development marks a great expansion in the use of existing language. It is by no means the earliest point. Very long dating is not new. It has been proposed or predicted to some extent by thinkers such as Condillac, Darwin and Corballis.3 A plausible idea is that language began among early men as a system of silent signing to which supplementary hoots, grunts and hisses were added for emphasis or definition. By 1.4 million years ago these verbal signs had evidently evolved into spoken language. But this is perhaps the first time that this long evolution has been repeatedly demonstrated.

This new approach rejects most of what we think we know about the languages of Europe, notably the Indo-European theory which proposes that they were brought to Europe by Neolithic farmers. I discuss this theory elsewhere in more detail but the reasons for rejecting it can be stated very briefly. As a theory it does not work well – most books and papers on the topic of the Indo-Europeans include words such as ‘problem’ or ‘search’ in their titles – and it is subject to several logical flaws which invalidate it beyond recovery. It confuses assumption with proof and offers no explanation of the vast changes alleged beyond the mantra that ‘language changes’. Enquiries into the origins of words still rest, as they have done for centuries, on a word-by-word search for a similar word in a neighbouring language and the idea that the lesser word (in terms of Germanic culture) ‘derives from’ the supposedly superior one. Which, of course, merely moves the burden of explanation from one place to another.

Indo-European (or Indo-German theory) is immensely influential. It has been installed in a dusty corner of our intellectual libraries for a very long time. Sir William Jones first noticed the similarities between Sanskrit, Greek and Latin as long ago as 1782. He was the first to propose that Sanskrit, Greek and Latin were the descendants of a single language. Others, on even slighter evidence, added German to the list. The term 'Indo-European' was first cited in English in 1814 and 'Indo-Germanic' first appeared in 1823.

An objection to this genealogical proposal is that Sanskrit is an artificial learned or religious language. It is the only one of fifteen written languages currently used in India which does not have a corresponding spoken vernacular. Its lexicon is not the untidy growth of a natural language, trailing clouds of obsolete and obscure usage. It consists of a list of words, each with a single defined meaning. The constriction of sense, lack of lexical depth, the absence of related words and its densely-structured grammar are typical of a sacred language which has been created within a learned tradition. Such a language is designed to permit communication between the priests and the gods, leaving no grounds for ambiguity of meaning. Sanskrit is only ever spoken, or recited, by the most learned of religious devotees. An artificial sacred language can tell us nothing about the origins of natural spoken vernaculars which used in a changing material world.

Indo-European theory and the new approach agree on a single point: that the use of similar words with similar meanings in different languages is a sign of their common origins. But Indo-European theory traces this common origin no further than the Neolithic and a hypothetical Proto-Indo-European language (PIE) while the new approach accepts no theoretical limits. The IE approach supposes that PIE was imported into Europe by farmers migrating from the Middle East perhaps 8,000 years ago: there is no agreed date but it is claimed that language cannot be traced beyond this date. The new approach can trace language back to the first word. But the vision of Indo-European migrants flooding into Europe and replacing earlier populations is established as a fact in the minds of all historical linguists, most archaeologists and even geneticists (who certainly should know better). These experts all seem to have forgotten that the Indo-Europeans are nothing more than a hypothesis, a very unproductive hypothesis, which has created many problems and solved none. In other words, the Indo-European theory does not work and is ripe for replacement.

I can demonstrate the inability of the Indo-European approach to explain in any coherent way the origins of European language by their treatment of the word car. I devote an entire chapter to this word and its relatives. Gaelic alone gives it fifty or sixty distinct meanings. And yet the most Indo-European scholars can say of *karos or *karsos is ‘cart, raft’4 or simply ‘vehicle’.5 These authorities list as cognates Arm. kark ‘cart’, Gr. karron, Lat. carrus, O.Ir. cárr, Breton karr, W. car ‘raft’, Lith. karai ‘cart’, L.Lat. dim carruca ‘car’. They describe it as ‘a wanderword of uncertain provenance’. This is not good enough. If historical linguists persist in claiming, as they do, that all the related languages of Europe, including Gaelic, derive from a single language brought to Europe in the Neolithic by farmers, they are under an obligation to explain how *karos also came to mean a stone, a necklace, a fish, the jaw, and a great many other things. To this and other similar questions there are no answers.

Having said all that, there is by the standards of normal research no need to demolish this old theory before replacing it with a more rational and productive theory.

The New Approach to Language

Among its novelties the new approach accepts that all languages are equally old, since they have all evolved in a continuous fashion from the first meaningful syllable to the present day. It assumes that lexical change is not a random process but a response to the need to find new names for new items of culture as they are invented, balanced against the more gradual loss of obsolete words for discarded items of culture. If we want to know about the origins and development of the word car we have only to pull together the relevant words. There is no need to speculate and very little need to extrapolate. The CR words as a group tell us everything there is to know about their own history. Some parts of the story may be sketchy, obscure or missing altogether and the categories I have chosen are open to different interpretations but the categories used provide a logical outline of what happened. The CR list contains archaic names for archaic items and more recent names for more recent items. To define car as ‘vehicle’ is not wrong but it is only one small aspect of a complex development which began long before Europe was settled.

The Table of Equivalent Consonants (TEC)

Before we can understand the wordlist we need to understand something about consonants. The new approach has one tool, the Table of Equivalent Consonants or TEC, and the TEC has one application, the wordlist. The TEC was inspired by Soundex, a phonetic indexing system devised early in the twentieth century by Robert C. Russell and Margaret K. Odell. They called their invention the Soundex system and patented it in 1918 and 1922. It has been used, successfully, ever since to bring order to the spellings inflicted on immigrant surnames by English-speaking clerks and, often, by the owners of the names. In a similar way, the TEC has brought order to several million everyday words. The main feature of both systems is a reduction in the number of effective consonants from the dozens used in European languages to six effective consonants. Vowels are ignored. Both systems are effective because these variant spellings are not the result of sloppy speech, imperfect education, or laziness: they represent what actually happens to sounds when they are passed, many times, from mouth to ear and then written down, usually by a clerk who does not speak the same language. Every word in our dictionaries was transmitted orally for most of its existence and then transcribed in a similar way.

Fig.1.1 The TEC

1 B Mb P F V W
2 M N Ng Gn F V
3 C G J K Q S X Y Ch Sh H
4 D T Th Z
5 L LL
6 R RR

Fig.1.2 The 36 disyllabic roots


The TEC reduces all the equivalences which operate within and between the languages of Europe to six effective consonants represented by B, M, C, D, L and R (Fig. 1.1). These consonants provide 36 disyllabic roots (Fig. 1.2). Vowels are disregarded because they often change to show case, gender, number or tense, and from one dialect to the next, none of which affects the structure of the basic root. For our purposes E. sang, sing, song and sung are all the same CM word. Our selection of words for a wordlist rests entirely on their consonantal structure and ignores meaning. So far it has been applied mainly to European languages but it appears to have general validity.

It took me some time to make the necessary adaptations and compile my first wordlist but it produced some remarkable insights which have been paralleled by other lists. It is clear that the wordlist, compiled according to the principles embodied in the TEC, provides a revolutionary way of looking at the origins, history, and evolution of language and even, since words have meanings, on aspects of cultural development. The TEC allows us to recognise related words within any language, despite their current variety in spelling and meaning.

The six-consonant TEC is a working compromise, not a fixed position. It could have more or fewer or different classes but these six consonants work in practice for most European words. The TEC covers the most important of the changes known as lenition, mutation or aspiration which are still used in Welsh and Gaelic (discussed below).

Tested in Armenian How well does our TEC work? Dr. Adour Yacoubian, who pioneered the transcription of Armenian into the Latin alphabet, discovered a great many consonantal equivalents which produce ‘better results when comparing with English words’ and, he added, ‘vice versa’.6 His equivalents were as follows.

Q for G; D for T; P for B; P’ for Ph; T’ for Th; G for Q, C, K, Ck, Ch; C’ for J;
J for Ch; H’ for L; Gh for L, R; H for F, V, W; Sh for S; Ch for Kh; Tz for F.

This is all familiar territory, covered by the TEC, except for H’ for L and Gh for L, R. They can be treated as a single case. Examples include Arm. deghi ‘place, spot’ for Lat. terra, Arm. gat’oghigea for E. ‘catholic’, Arm. ghampar for E. lamp, and Arm. h’amr for E. lame. H’ and Gh are evidently akin to the guttural sound we find in Welsh LL and Parisian R. I have included them in TEC 5 and TEC 6 as LL and RR. We will have reason to refer to them again.

H for F, V and W is an example of overlap between B and C (an old feature which is discussed elsewhere).

Tz for F, is found in Arm. tzayn ‘sound, voice’ and Gr. phone ‘voice, sound’, in Arm. tze’t’ ‘oil’ and E. fat and in Arm. tziut’ ‘pitch’ and E. soot. This appears to be related to the Heathcote sound (an early sound which is discussed below).


Aspiration is responsible for several regular changes in the sounds of consonants.

Fig.1.3 Aspiration
1 B and P > V or F > (-)
2 M > V or F > (-)
3 C, G and S > Ch or Sh > H > (-)
4 D and T > Dh and Th > H > (-)
5 and 6 R and L do not aspirate.

These changes are still active in Welsh and Irish and there are signs of them elsewhere in Europe.7 They are easiest to identify in place-names and surnames which have survived through a lengthy process of change. France and Britain offer many examples.

Fig.1.4 French surnames (Maine-et-Loire)
Camar, Chamard, Hamard, Amaré
Chasseboeuf, Haudebault, Audebeau
Chaussebourg, Haudebourg, Audebert
Moreau, Forray, Horreau, Auray

Fig.1.5 English surnames
Callicott, Caldecot, Cawcutt, Hallegate, Alcock, Alcott, Aucott
Chadwick, Chattock, Shatlock, Shacklock, Hadcock, Atcock, Awcock
Charnock, Harnock, Arnot
Pallis, Challis, Hallis, Alice
Scatchard, Hatchard, Achard, Ackert
Zachery, Hacker, Ackery

Fig.1.6 English place-names
The suffixes ton, ing and ham are affixes meaning ‘farm’ or ‘estate’.
Ballingham, Billingham, Willingham, Ellingham
Catforth, Chatworth, Hatford, Atworth.
Coldingham, Holdingham, Aldingham
Bellingdon, Wellington, Hallington, Allington, Ellington.
Benwick, Fenwick, Anwick
Marlow, Farlow, Harlow

Some further equivalences

Six consonants may seem excessively few but even this small number is reduced on occasion by the equivalence of B/C, B/M, and L/R, which are discussed below. Other less common equations will be discussed as they arise.

B/C (conventionally P and Q)

Conventional linguistics teaches that Celtic languages are divided into ‘P’ and ‘Q’ varieties. This is based on the fact that in Welsh certain words are spelled with P while in Gaelic they are spelled with C. This a rare event – I have found no more than a dozen pairs – and when we look beyond the quoted examples things are much less clear. W. planta ‘children’ begins with P, and G. clann ‘children’ begins with C, but what are we to make of W. llanc, ‘young man, youth’ which is pronounced ‘hlanc’ or ‘chlanc’? And how do we fit W. chwaer 'sister and W. cus 'kiss' both C words, and G. piuthar and pog, both B words, into the P/Q convention? A simple page count shows that Welsh has three times as many C words as P words. There is certainly confusion between P and Q but nothing that resembles a rule. Perhaps instead planta, clann and llanc are variant spellings of a word which began with an intermediate B/F/C sound and which means ‘young men’ or ‘hunters’.

Gr. penta and Lat. quinque ‘five’ are also held up as examples of the P/Q divide but, again, establish no kind of precedent. Latin uses a Q word for ‘five’ but two P words, bos ‘ox’ and vacca ‘cow’ instead of a C word akin to 'cow'. English has B words four and five, C variants, sister and kiss and B/C confusion in bull/cow, can/pan, shall/will, sallow/ willow, bake/cake, boil/coal, and peep/peek/keek.

Evidence suggests that B and C derive from a single ambiguous sound which survived at the turn of the twentieth century in the most isolated outpost of Scottish Gaelic. It was noted in the island of St Kilda (properly Hirta) which lies in the Atlantic Ocean fifty miles west of Lewis. In 1898, Norman Heathcote and his sister spent the summer there. They had prepared themselves by learning Gaelic as he planned to collect place-names. However he was defeated by the local pronunciation for, as he said, ‘when a St Kildan pronounces a word I never have the least idea whether it begins with a C, an F or a P; all I can detect is a sort of guttural sound’.8 We can deduce from early spellings of other Gaelic place-names that this guttural sound was once more widespread. In various documents the second element of Balquhidder (Perthshire) begins with B, P, F, FF, Ch, Quh, and Wh. This reflects the fact that this sound has no equivalent in the Latin alphabet. Since it has no name in any alphabet I have called this elusive sound the Heathcote sound or Hs.

At some time the Heathcote sound evolved into the two distinct sounds now represented by B and C. When words began to be transcribed by clerks educated in Latin, Heathcote words were written down as B words or C words, with the results we have seen. When transcribing a word which began with Hs, the clerk had to make a choice between B, C, Ch, F, Quh and Wh, since the Latin alphabet is incapable of representing an aspirate. Had Mr Heathcote been employed by the Ordnance Survey to record the Gaelic place-names of St Kilda he would have wavered between B, C, F and the other options but must eventually have plumped for one or the other. (If he had chosen P, conventional linguists would now believe that the natives of St Kilda spoke a P-Celtic language akin to Welsh.) In other words, the B/C confusion is not a reliable basis for a major division of European languages. It is significant in applying the TEC because there are signs of this ambivalence in the older levels of language.

B, Mb and M

A similar case of ambiguity is easier to deal with as it has not attracted learned speculation. M is now an independent consonant with its own class but began as an offshoot of B, designated Mb. Occasionally we find M as an initial consonant where we would expect B. An example is G. maolan ‘beacon’ in place of *baolan. A sign of their former identity is that both B and M aspirate to V. Conversely, it is impossible to know, without good parallels, whether a word with initial V or W began originally with B or M. Mb still exists. It is found in Bantu languages (mbula ‘rain’) and in Albanian (mbret ‘king’). It is also found as the consonantal cluster in E. lamp, lambent, nimble, number and shamble. The conversion from Mb to M is shown by E. lamb and Du. lam, E. crumb and Du. kruim, E. limp and E. lame, E. bumble and E. bummel, E. thumb and Du. duim, E. number and Du. nummer.

L and R

The divergence of L and R may be quite recent. European languages generally distinguish between L and R but there are occasional signs of doubt, as between Lat. margarita 'daisy' and Heb. mar’ga’leet ‘pearl’. Japanese and Bantu languages do not distinguish between them. This suggests that L and R began to diverge after Japanese became a self-contained language. This is clear enough in the case of Bantu since the ancestral Bantu remained in Africa when the rest of mankind emigrated to Asia. As European languages in general distinguish between L and R it appears that this was already the case before 50,000 BP, when Europe was settled, since a single foundation event is more probable than a multitude of identical but unrelated local events. The main importance of this equivalence for the moment is that we should be prepared to equate L with R when exploring early levels of language.

The word-list

Word-lists are structured lists of words which share one of the 36 disyllabic roots generated by the TEC (Fig.1.2). If we start with the English word sum the root is CM and our list must include all the words in which a variant of C (TEC 3) is combined with a variant of M (TEC 2): CM, CN, CNg, SM, SN, SNg, and so on. In the case of CM this generated a list of several hundred words which could be allocated without much difficulty to categories like those of a thesaurus. A word-list can be long or short.

A long list should contain every word that contains the designated root in every contributing language. Concrete nouns with specific meanings are more useful than descriptive or abstract words. It is important not to be influenced by preconceived ideas about which themes might or might not be relevant. If we begin the CM list from the point of view of E. sum, we might include Du. samen ‘together’ but not Lat. sanguis ‘blood’ or E. song. This would no doubt tell us something about hunting or rounding-up animals but if blood and singing are not included the final picture will be incomplete.

On the other hand there is no need to bulk up a list by including identical words with identical meanings in closely-related languages such as Dutch and German or Latin and Spanish. The words listed may be nouns, adjectives or verbs but they should be clearly defined and, where possible, devoid of prefixes and aspirates. Obsolete meanings are in general more valuable than recent ones – it is more useful to know that a car was once a sledge than to know that it is now the French word for ‘bus’, though that also is part of the picture. Dialects are a useful source of obsolete words and learned or literary words are seldom of any value, though their Latin roots may be worth searching out.

The categories or subdivisions which I have used are by no means fixed or conclusive but certain of the groups are well defined and may even follow a pattern. The categories become less certain as we move to what appear to be more recent words which are fewer in number, more varied in form and range over a wider variety of topics. This no doubt reflects the fact that they are at the ends of a multitude of peripheral twigs.

A short word-list may be restricted to a specific theme, to a limited range of roots, or to a defined geographical area such as the British Isles. Since it is less objective, its conclusions are less reliable but it is a useful way of exploring a single word or theme and is particularly useful for place-names.

The source of words is of no great importance so long as the form of the word and its meaning are certain but the better your dictionaries the better your list. I have etymological dictionaries for English, Latin and French and they often show that the modern spelling of a word is different from its original form. In general Europe is adequately covered by Gaelic, English, Dutch or German, one of the Slavonic languages, Lithuanian, and a few exotica such as Catalan and Albanian.

Since the lists consist of words gathered where we find them they are not suitable for quantitative analysis. I have analysed the CR and CM listsl but the figures obtained are significant only insofar as they indicate general trends.

