Let’s Search for the Missing Links

When I was growing up, there was much talk about “the missing link.” It was an exciting time for fossil hunters. My frequent visits to the American Museum of Natural History left the impression that some vital evolutionary links were missing. We obviously did not spring from gorillas or chimpanzees; we grew up alongside of them; they were our cousins, not ancestors. 

In due time, the missing links were found. Today the paleontologist can draw a fairly accurate tree, with humans branching off a few million years ago as chimpanzees and gorillas proceeded along their own individual routes.

According to our egotistical judgment, of course, humans occupy the highest branch in the tree of life, for one cannot conceive of a more noble creature. But it is my purpose here to consider the other end of the tree of life – the lowest, lowliest, simplest of living units known to biologists. What is it like?

My “bible” on the subject is Cradle of Life, by J. William Schopf [1]. He serves up biology in a painless manner, diluting it with history and personal stories about Aleksandr I. Oparin, Salvador Dali, and others. According to the subtitle, the book is about The Discovery of Earth’s Earliest Fossils, but it has also pertinent information about the Earth’s living creatures. The input of my biologist daughter, Alice, is also invaluable.

We know a great deal about these simplest of living creatures (bacteria and blue-green algae), and we also know about the basic CHON elements – Carbon, Hydrogen, Oxygen, and Nitrogen – that are the foundation upon which organic structures are built. The list includes water and carbon dioxide, of course, but we also have carbon monoxide (CO), methane(CH4), ammonia (NH3), hydrogen cyanide (HCN), formaldehyde (H2CO), and so forth. But there is a huge gap – how do we go from a few CHON compounds to a simple bacterium, for example? Some conjectured suggestions are given in the present essay.

Perhaps a few definitions are in order. A “simple bacterium” is a microbe known as a prokaryote because it does not contain a nucleus; eventually, some 1.9 billion years ago, much more complicated eukaryotes, that contain a nucleus, appeared.

The question really is – how do we go from a few CHON compounds to the deoxyribonucleic acid (DNA) molecule of the simple bacterium? A DNA molecule is not alive – it doesn’t wiggle if it is placed into a jar of water – but it is the blueprint upon which the genetic code is imprinted.

Today, the DNA molecule has become part of our culture. One can no longer plead ignorance about DNA – in fact, nowadays, it is used to convict people who plead innocent (and to sometimes free those who have been convicted). Deciphering the DNA blueprint and gene functions, for all animals and plants, has become one of the great cottage industries of the 21st century. My main point here is that this “simple” primitive cell contains one billion atoms of carbon, hydrogen, oxygen, and nitrogen, plus a sprinkling of other atoms such as sodium and chlorine (as if they add flavor to the soup). It may sound weird, but one billion atoms can fit inside the smallest bacterial cell, which has a diameter of 1000 angstroms (or 0.1 micrometer). (An angstrom is 10—10 meter, approximately the diameter of an atom.) The arithmetic is trivial: for a spherical cell, atoms are so tiny that 1000 of them, sitting side-by-side, can reach from one side of the cell to the other side. Roughly, then, the total number of atoms is 10003 = 109, or one billion. This includes the atoms that constitute a coiled-up DNA molecule.

The genetic code consists of a sequence of only four nucleotide bases: They are A = Adenine, C = Cytosine, G = Guanine, and T = Thymine. The code portion of a DNA molecule consists of two approximately parallel strands that tend to form a helix – hence, DNA is called a “double helix.” The architecture is depicted in Fig. 1.
fig. 1

Figure 1.  Architecture of a DNA molecule. The center pole does not exist; it has been added to the drawing to clarify the helical structure. The horizontal “ladder rungs” are the AT and CG base pairs.

Along the parallel strands, A is always cross linked with T, and C is always linked with G, so the same message (which is the code for amino acids) is duplicated since each strand fully determines the sequence of the other strand. The bases along a strand are around 3.4 angstroms apart in an axial direction, and there are 10 base pairs per single turn of each helix. The outer diameter is 20 angstroms. A backbone or scaffold is provided by groups of atoms that form phosphate and deoxyribose.

Just how many nucleotide bases are there in a DNA molecule? This depends, of course, on the animal or plant. Table 1, taken from Cradle of Life, is very revealing. From the simplest of animals, a bacterium, to the most complicated of humans, ALL are programmed by a DNA molecule. In every bacterial cell, and in every cell of the mammalian body, there is a string of A, C, G, and T nucleotides characteristic of that organism. Isn’t that remarkable – that all living cells are defined by the same four nucleotides?
Screen Shot 2014-12-01 at 7.00.14 PM

Is it possible that other types of “nucleotides” existed when life first began? If so, they fought it out, and the present nucleotides won the battle (on a geological time scale of 400 million years), as discussed below. Furthermore, the principles of biochemistry indicate that A teams up with T, and C with G, because these combinations are more stable than the only other four possibilities: AC, AG, CT, and GT. But this has literally far-reaching consequences – to the end of the universe, in fact. If after 400 million years the nucleotides A, C, G, and T survived, then DNA molecules everywhere are made of AT and CG base pairs. This sounds like a great unsubstantiated leap into the unknown. But is it really so farfetched? The “everywhere” conclusion is strengthened by the fact that spectroscopic examination of simple molecules that populate space outside the solar system shows the same simple CHON molecules that are found here on Earth [2]. Since, after a tremendous effort, nobody has succeeded in demonstrating how life began on Earth, what is the missing ingredient? It is time: If they begin today, our organic chemists may, in 400 million years, duplicate what nature hath wrought. 

Let’s not be led into a philosophical diversion about how intelligent life has evolved everywhere in the universe. The special conditions under which “intelligent” life evolved must be extremely rare. Plenty of goo, but intelligent life? Rare indeed [3]. 

Returning to Table 1: At the bottom of the “pecking order” were (and still are) bacteria that derived energy by “eating” the “food” in their surroundings (which, of course, is the way we also derive energy). That is how the escherichia in our digestive system live. Normally, they are harmless but, occasionally, they are responsible for a large chunk of the pharmaceutical industry. There wasn’t any atmospheric oxygen in the early history of life on the planet. There was plenty of nitrogen, water, carbon dioxide, and deadly ultraviolet light from the sun [4]. The following is taken from Schopf, page 150: “Among the earliest forms of life were some that lived by glycolysis, a form of fermentation (anaerobic [living in the absence of molecular oxygen] metabolism) in which a molecule of the six-carbon sugar glucose (C6H12O6) is split in half to make two molecules of a three-carbon compound called pyruvate. This produces energy, given off when the chemical bonds of glucose are broken apart, some of which is stored for later use in a chemical known as ATP (adenosine triphosphate)… Glycolysis dates from near life’s beginnings. It is fundamental to life, present in all organisms, a package of ten enzyme-speeded steps too large to have originated more than once. Moreover, it is chemically the simplest energy-making process in biology, takes place in the watery cytosol [intracellular cytoplasmic fluid] of cells (rather than needing membranes or organelles like later-evolved systems), yields much less energy than more advanced mechanisms, and is anaerobic like the early environment… Glycolysis requires glucose fuel. But Miller-type [explained below]  early-Earth experiments show that many other sugars were present also in the primordial soup. Why was glucose pegged as the universal fuel of life? Probably because it is especially sturdy, the least susceptible of all six-carbon sugars to break down by changes in temperature, acidity, and the like.”

The chronological background against which life evolved is depicted in Table 2.

The earth was formed 4550 million years ago (or, using the standard abbreviation for “millions of years ago,” 4550 Ma). For the next 650 million years, until 3900 Ma, the earth was bombarded by asteroids and meteorites. This bombardment released a tremendous amount of energy. Until 3900 Ma, mainly as a result of these collisions, the earth was too hot (temperature greater, everywhere, than the boiling point of water) for life to exist as we know it. Then, according to Schopf, in only 400 million years, by 3500 Ma, life was “flourishing and widespread” (page 167). (With our ever-greater ability to detect asteroids, if not fictitious UFOs, we have become aware that some bombardment, although rare, is unavoidable.)  

Screen Shot 2014-12-01 at 7.01.09 PM

But 400 million years is an enormous chunk of time. Please, Dr. Schopf, can you tell us when, during this period, did life begin? No, because the fossil record is missing, and that is the only reliable way to date the prokaryotes. As Schopf says (page 99), “The very first forms of life … probably didn’t even have fossilizable cell walls, … and were made of chemicals far too fragile to be geologically preserved.”

A tremendous evolutionary advance came next with the development of cyanobacteria, which use sunlight as a source of energy. We all know how that works: You take in carbon dioxide, use sunlight to power the synthesis of needed materials, and release oxygen as a byproduct. As Schopf puts it (pp 97, 98): “… the presence of cyanobacteria in this nearly 3,500-Ma-old community tells us that early evolution proceeded very far very fast. All cyanobacteria are able to do the kind of photosynthesis that gives off oxygen; and, like higher plants and animals, all can breathe in oxygen (by the process known as aerobic respiration). Both of these, however, are advanced ways to live, evolved from more primitive ways in which free oxygen plays no part. So, if cyanobacteria existed at this early time, the earlier evolved processes must also have been present; the living world would have to have included organisms that photosynthesized without giving off oxygen (bacterial photosynthesizers) as well as those that produced it (cyanobactria), and microbes that lived in the absence of oxygen (anaerobes) as well as those that breathed it (aerobes). These are precisely the same processes that power the present-day living world.”

The cyanobacteria didn’t really mean to cause great atmospheric changes, but slowly, to around 2000 Ma, atmospheric oxygen accumulated. At that time, despite 1500 million years of evolution, not much was happening; all of the life on earth consisted of one-celled plants or “animals.” (It is interesting to note that, according to Schopf, “cyanobacteria changed little or not at all since they came on the scene billions of years ago”). 

The progression from one species to the next follows the standard evolutionary pattern: random mutations of DNA followed by survival of the fittest. The random process results in many stretches of A, C, G, and T nucleotides that do not code for useful amino acids; these “nonsense” base pairs are called “introns,” and the meaningful base pairs are called “exons.” The introns survive because they are part of the three-dimensional structure of the DNA molecule. The exons are listed in the “Information-Containing DNA” column of Table 1. In human DNA, for example, although there are 3,500,000,000 base pairs, 2,870,000,000 of them are introns, and only 630,000,000 are exons. It is interesting to note that the salamander is almost as complicated as a human, with 570,000,000 exon pairs (but, nevertheless, it is in bad taste to say that somebody has the brain of a salamander).

Although the human and other entries of Table 1 are of interest, only the bacteria (such as escherichia) are actually pertinent to the present essay. Presumably, if we can synthesize the simplest of living creatures, the more complicated ones are sure to follow. The question boils down to – how did a random process assemble the 4,000,000 base pairs of the “first” bacterium?

Does it really take four million base pairs to define a simple living ensemble? Perhaps the first living assembly had only 400,000 base pairs? Is there an error here somewhere? Probably no. Superficially, the cell consisted of some “cytoplasm” surrounded by a thin membrane. The DNA blueprint had to direct the construction of the membrane. It had to distinguish food from undesirable chemical combinations. It had to convert food into its internal mixture of thousands of proteins, in reasonably proper proportion. It had to define a mixture of internal proteins that can take in energy and use it to power living processes. When it reached a certain critical size (or “obesity,” to use a present-day popular expression), having taken in a sufficient amount of nutrient material, it slowly divided into two daughter cells via a process called asexual reproduction. The important point is that this “simple” cell, which doesn’t have much to do except eat and achieve immortality by dividing in two, is anything but simple. It contains the DNA molecule, and lots of associated genes, perhaps 1000 different proteins, and so forth. If somebody asked you to build a mechanical contraption that could do all of the above, how many small parts would you need? Evolution’s answer – four million – is quite believable.

So how, then, did life begin? (Try to ignore, for the remainder of this essay, the many books and papers that have been written on the subject.) A favorite “scientific” proposal is that, somehow, those one billion atoms came together randomly, and formed the first primitive living cell. In other words, in the primordial soup somewhere on earth, over 3500 Ma, the correct viable mix of atoms came together by chance, and – lo and behold – the atoms thus accidentally gave birth to the first cell. Well, there is only the slightest chance (and “chance” is used here in a probabilistic sense) that it happened that way. With four different elements and a total of one billion atoms, the total number of molecules that can be formed is  41,000,000,000, or 10600,000,000 (or 1 followed by 600,000,000 zeros). But the total number of elementary particles (neutrons + protons + electrons) in the universe is “only” 1080. (Remember that 1081 is ten times as large as 1080 , and so forth.) The entire Earth contains around 1051elementary particles. The randomness scenario is as silly as proposing that a truck can dump a load of one billion bricks onto a foundation and, somehow, on at least one occasion, the bricks can fall so as to form four walls! The chance that this can happen is close to zero; the  bricklayer’s union need not worry. (Nevertheless, the nature of the random “beast” is that it could happen with the first truck load.) 

If you just throw one billion atoms together, randomly, even given millions of years of a huge expanse of warm soup, you will never, ever, get a DNA molecule! The DNA molecule is just too complicated. It has to evolve from simpler ancestors, like everything else. The ancestors are my “missing links.”

Let’s consider the construction of a DNA molecule. Forget about its physical size for the time being. How do we make a nucleotide, A, C, G, or T? It turns out that they are similar to each other, so one can talk about an average nucleotide: It has 5 Cs, 4 Hs, 1 O, and 4 Ns for a total of 14 atoms. They do not form a linear array, which would be around 14 angstroms long; as mentioned above, the average length of a nucleotide is 3.4 angstroms. So we start out by mixing together carbon, hydrogen, oxygen, and nitrogen in a flask, and add a mild source of energy to “stir the pot.” Energy is the least of our problems – there are plenty of trouble makers, such as geothermal hot spots, cosmic rays, X-rays, ultraviolet rays, lightning, and so forth. If we leave out the vitally important details, that is essentially what Stanley L. Miller did in a famous Ph.D. thesis experiment in 1953, which has been repeated any number of times since then. (See Schopf’s book for the details.)

What do we expect? A simple (and simple-minded) approach is to say that, with four different elements and a total of 14 atoms, the total number of molecules that can be formed is 414, or 268 million. This is simple-minded because chemistry doesn’t work that way. Only a very few of the 14-atom molecules that can be formed from CHON are stable, held together by hydrogen and other bonds.

The great excitement that followed Miller’s experiment was that he was able to generate simple organic molecules using very simple apparatus. Furthermore, the spectrum of light from the stars and galaxies shows that these relatively simple CHON molecules exist everywhere in the universe.

So this is how life began – a random buildup of CHON atoms until the simplest of DNA molecules, a few million base-pairs long, came into being. The word “random” here has to be severely restricted: only a relatively few of the huge number of possible combinations are stable—such as A, C, G, and T, for example. 

In recent years, however, an important hint of which molecules are stable has emerged via “reverse engineering”: In order to read its genetic ACGT code, some of the DNA molecules of a particular species are broken down, by chemical or other means, into many relatively short sections. Each short section is tractable; that is, its code can be determined. By matching identical and overlapping sections, with a great deal of assist from computers, the entire original DNA sequence can be revealed [5].  

In what follows, I refer to the original DNA as a “viable molecule,” while the short sections are “pre-viable” molecules (PVMs). The short sections are, of course, stable CHON precursors of the final DNA molecule. In principle, at least, the short sections can be strung together to synthesize the “viable” DNA.

A model for the synthesis of the DNA of the first living cell, that of a bacterium, is depicted in Fig. 2(a). The first step is the random assembly of a 20-base-long stable molecule. With an average of 14 atoms per ACGT base, the total base length is around 280 atoms. How does differ from, say, a crystal of salt? In a typical crystal, a huge number of identical “cells” adhere to each other in a three-dimensional solid array. The pre-viable molecule, on the other hand, is a single unit, with a definite outer boundary, residing in a semi-liquid medium (not necessarily water). A living cell has a “definite outer boundary” in the form of a cell wall or membrane, which is actually a very simple structure, yet sophisticated because it has to allow food in, and undesirable “aliens” out. For the PVMs of Fig. 2, however, it is premature to speculate on their cell “walls,” if any. There are many scenarios in which molecules initially remain apart, only to be induced to combine by some sort of trigger. A pertinent example is that of the ACGT molecules, which join up, end-to-end, to form part of a DNA molecule.  


Figure 2.  Models for the synthesis of the DNA of the first living cell, a bacterium. (a) In steps structured on 20n, culminating in 205.  (b) In steps structured on 5n, culminating in 59.

Returning to Fig. 2(a): Next, the 20-base-long molecule randomly attaches itself to 19 other 20-base-long molecules to form a 400-base-long stable molecule. Repeating these processes, the fifth step is 3,200,000 bases long. Each of the original PVM components has been randomly assembled, so each DNA molecule is unique. There are, therefore, a countless number of different DNA molecules. How does this differ from that 10600,000,000 total? Now we have four different nucleotides and a total of four million base pairs (see Table 1), so the total number of different bacterial DNA molecules that can form is 44,000,000 , or 102,400,000 . This is much smaller than the 10600,000,000 value, by a factor of 10597,000,000 . What I am proposing is that many of those 102,400,000 DNA molecules that turned up in 400 million years were viable. These molecules were unique because, for the first time, a molecule was able to replicate itself; that made it different from simple accumulations of organic molecules. Also, the replicating molecules had to be in their own separate sack or space so they wouldn’t diffuse away. In other words, there were many possible outcomes that were able to launch an organism that we would call a living bacterium. In due time, “survival of the fittest” manifested itself by a few DNA codes that overwhelmed, by reproducing more often, the “weaker” competition. These first living cells did not eat each other; they ate the surrounding non-organic, non-biological material. 

I should mention that ribonucleic acid (RNA), a close relative of DNA, was probably synthesized first. Also, there are large organic molecules—enzymes—that are not really alive, but some of them take part in metabolism by using ambient sources of energy. They are the precursors of plants. But they, too, with a cargo of thousands of atoms, have survived a great deal of evolution. 

Figure 2(b) illustrates a more gradual buildup, in five-base-multiple steps. Here the ninth step is a 1,953,125-base-long bacterium. 

As Table 2 indicates, the system modeled in Fig. 2 took place in 400,000,000 years. That is an awfully long time. What were the environmental specifications under which life developed in this time? It had to be fluid because the pre-viable molecules were not alive; since they couldn’t move from one place to another, their surrounding material had to slowly change. Liquid water is probably too fluid, too dilute. A high-viscosity mud, or what I like to call “goo,” full of CHON molecules of various complexities, is a more likely environment. The temperature was high—close to the maximum tolerable that avoids instability—because thermal agitation literally moves small molecules from one location to another. [This is also known as Brownian movement. Robert Brown (1773-1858), a botanist looking at pollen grains in a microscope, was the first to notice the incessant random motion of minuscule particles.] In a high-viscosity fluid, movement is very slow, but with a span of 400 million years, there is no need to rush along with inter-molecular alliances [6].

What about pressure? Here there is a common misconception that high pressure kills. Yes—if you take an organism that has air spaces and drop it from the top of the ocean to a high-pressure depth, the air spaces become compressed, and the resulting physical distortion can be fatal. But if the organism spends its entire life under high pressure, atoms are slightly closer together than on the surface of the earth, but this has only a minor effect on physiological processes. In other words, the goo can be very far underground—underneath deep oil wells, or beneath the seas, for example [7].  

The exciting conclusion that logically follows it that the simplest pre-viable molecules are evolving right now, probably in goo deep inside the earth, on land, or underneath the sea. Not too deep – just deep enough to be hot, but not past the boiling point of water. Alas, A.D. tells me that “most scientists feel that the pre-viable molecules are evolving but, as soon as they are formed, they are ‘eaten’ by bacteria, so they are not around in detectable quantities.” In other words, since time immemorial, primitive bacteria have been plundering the planet by consuming CHON pre-viable molecules. They sound positively human! But this is no joking matter. It explains why the search for the origin of life has failed thus far. 

Another exciting concept is that pre-viable molecules are evolving everywhere in the universe. Since when is the earth the only place fortunate enough (or unfortunate, considering what we are doing with it) to have primordial goo?

Despite the negative prognosis, there is always the chance that some useful information will turn up.  What should we be looking for? According to Fig. 2, it would be nice to find something around 300,000 bases long. Translated into meters, with a DNA length of 3.4 angstroms per base, we should be looking for molecules that have a physical length of one million angstroms, or 0.1 mm. Unfortunately, there is something unrealistic with regard to this calculation: A 300,000-base molecule, consisting of a string of A, C, G, and T nucleotides, would be folded into something that approximates a ball. That is how it really is with a viable DNA molecule. Perhaps the pre-viable molecules can be unfolded. All of this seems to require high-magnification electron-microscope work, perhaps with special containers that can preserve the temperature and pressure conditions of pre-viable goo. A great deal of useful feedback can come from the people that break up DNA molecules; it certainly seems logical for them to try to reverse the process and reconnect the pieces to get the original DNA molecule. (When life’s origins are thus revealed, one can expect a storm of protest from religious and conservative groups as they strive to suppress the scientific evidence. In the meantime, one cannot expect much in the way of government funds for this “liberal” type of activity.)

Let’s dig deeper. Or, better still, above ground in the sterile laboratory of some clever organic chemist,  Stanley Miller’s experiment can be speeded up by a factor of 100 million, so that long pre-viable molecules can be synthesized in the lab, but in much less than 400 million years. It would be great for a Ph.D. thesis, or even a Nobel prize or two.


In working with powers of 10, it is best to use logarithms to the base 10. Thus log(41,000,000,000 ) = 1,000,000,000 log 4 = 600,000,000, so that the antilog yields 10600,000,000 . Similarly, log(44,000,000 ) = 4,000,000 log 4 = 2,400,000, so that the antilog yields 102,400,000 . Finally, 10600,000,000 /102,400,000 = 10600,000,000 – 2,400,000 = 10597,600,000 .


     [1] J. William Schopf, “Cradle of Life,” Princeton Univ. Press, 1999.
[2] David Darling, “Life Everywhere,” Basic Books, 2001.
[3] Peter D. Ward & Donald C. Brownlee, “Rare Earth,” Springer-Verlag, 2000.
[4] James F. Kasting & Janet L. Siefert, “Life and the Evolution of Earth’s Atmosphere,” Science, 10 May 2002.
[5] Marie E. Csete & John C. Doyle, “Reverse Engineering of Biological Complexity,” Science, 1 March 2002.
[6] Richard A. Kerr, “Deep Life in the Slow, Slow Lane,” Science, 10 May 2002.
[7] Steven D’Hondt, Scott Rutherford, & Arthur J. Spivak, “Metabolic Activity of Subsurface Life in Deep-Sea Sediments,” Science, 15 March 2002.

*Published in a shorter version in IEEE Engineering in Medicine and Biology Magazine, Sept/Oct 2001.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s