The Case for Large-Size Mutations

(Essay #10, Revised September 2007)

Abstract- There are no laws of physics or chemistry that forbid large mutations; therefore, the “size” of a random mutation should fit the mathematics of a Poisson point process: The number of viable generations of a set of mutations,N, versus mutation size, S, should obey an exponential relationship. Three numerical examples are examined: A simple 15-generation sequence; actual experimental data involving a sequence of 56,611 random action potentials; and a synthetic sequence of 65,535-generations of a pseudorandom set of mutations. In the latter example, with an average S of 2.22 units, the largest S is a 25-unit giant punctuated equilibrium that would be associated with major changes.

Punctuated equilibrium (or equilibria) refers to sudden, relatively large viable mutational changes in the DNA/RNA molecule of a species. The subject has been punctuated by disagreements between Niles Eldridge—Stephen Jay Gould [1] and Richard Dawkins [2] and Daniel Dennett [3]. In the present paper, I disparage all of these disagreements because a mutational random process inevitably has to yield large jumps– punctuated equilibria.

Here is how a biologist put it: “… big, beneficial mutations were thought to come along so rarely that many models simply assumed that they play no part in adaptation. But as evolutionists begin to probe the genetic basis behind important adaptations, they are uncovering examples of such large mutations, dramatically revising how biologists think about evolutionary change.” [4]

There are no laws in physics or chemistry that forbid large mutational changes. Therefore, if we wait long enough, punctuated equilibria must occur. Furthermore, they fit a simple mathematical exponential model: A Poisson point process, named after Simeon Poisson (1781-1840), who first shed light on these random events [5-11]. Large jumps become increasingly rare, but they occur nevertheless [12]. Poisson somewhat preceded and overlapped Charles Darwin (1809-1992). Of course, the mechanisms that can explain mutations were unknown in Darwin’s time.

Three numerical examples are examined in the paper: 1) A simple 15-generation synthetic example that illustrates the mathematical basis; 2) A sequence of 56,611 nerve action potentials. Since it is obviously impractical to present a long sequence of DNA/RNA mutations, we are fortunate to have one of the famous plots taken from the auditory sensory receptor of a cat, published by N.Y.S. Kiang et al. This is a random discharge sequence that beautifully illustrates the point process; 3) A run of 65,535 generations that are synthesized using a pseudorandom square wave (PRSW) sequence.

Preprocessing of data
The data, especially in a biological application, have to be preprocessed to emphasize the salient features, thus simplifying the mathematics. This is illustrated in Fig. 1. A set of transmitted mutations (TMs) is represented by a vertical line. The width of the space to the right of the line is proportional to the “strength” of the TMs; this is a measure the biologist must determine.


Fig. 1—Three short synthesized sections of a pseudorandom sequence of 65,535 transmitted mutations (TMs). The size of each mutation corresponds to the width of the space to the right of each line. (a) The usual perception that only small mutational changes are viable. The average mutation size is 2.5 units wide for the section shown. (b) The section that contains the largest mutation, a major change that is S = 25 units wide. The mutations before and after this giant are small, as in (a). (c) The section that contains a medium-large mutation, S = 16. Again, the mutations before and after are small, as in (a). [Because of the pseudorandom sequence used to synthesize the sections, (b) and (c) are almost identical.]

A reasonably small set of TMs, as shown in Fig. 1(a), is defined as being “1 unit wide”. One can think of mutation size S = 1 as representing a change in one nucleotide: for example, from thymine to guanine or vice versa. Then 2, 3, 4, … sets of TMs that are transmitted to the next generation are represented by spaces S = 2, 3, 4, … units wide, respectively. (Mathematically, S need not be an integer, which becomes abundantly clear in the remainder of the paper.)

Figure 1(a) depicts 26 generations. Ten of the TM sets are minuscule (S = 1); six are of size S = 2; four of size S = 3; two each of S = 4 and 5; one of S = 6; and one “giant,” at the right end, of S = 8.

As an example of punctuated equilibrium, Fig. 1(b) shows a section of a 65,535-generation sequence. Here the largest set of TMs– with S = 25– is pictured near the center of the sequence. The S = 25 value is not a mathematical error; it is mathematically certain that huge, viable transmitted mutation sets have to occur. Of course, any data that show up with such unexpected mutations would be, in many cases, discarded (the laboratory sheet would perhaps be torn up), hopefully accompanied by an explanation that the power failed, or a “bug” or gremlin momentarily took over. In this case, however, outliers must be allowed; if they are deleted, the data averages become distorted.
In Fig. 1(c), an S = 16 punctuated equilibrium is depicted.

A 15-generations synthetic example
If we plot the number of TM sets of a particular size, N, versus the mutation set size, S, we get a straight line on a semi-log plot, as in Fig. 2(a). Obviously, entirely in agreement with the revelations of evolution, there are a large number of relatively small mutations and, of course, a small number of large mutations. Figure 2 is a simple numerical example with a sequence of only 15 generations, as shown in Fig. 2(b). (I return to the 65,535-generation example below.)

Fig. 2— simple 15-TM numerical example. (a) A plot of the number of generations, N, versus the size of each mutation, S. The size of each mutation is nominally given by the Δt = 1 unit wide bin into which it falls. Ideally, this is a point process that yields a straight line on this semi-log paper. (b) The 15-mutation sequence. As listed in Table 1, the dots of (a) represent N = 5 mutations that fall into the S = 1 bin; N = 3 into the S= 2 bin; and N = 2 into the S = 3 bin. Two of the mutation sets, S = 0.2 and 0.4, are too small to fall into a valid bin. The sequence is too short to expect an accurate fit for the N = 1 values of S = 4, 5, and 6.

The straight line of Fig. 2(a) corresponds to a simple exponential equation,

 N = N0eS                                                                                                                  (1)

where  N0 is the zero-intercept value at S = 0, and λ is a slope coefficient. The line of Fig. 2(a) terminates at N0 = 7.5, and λ = 0.45. Substituting these values, the plot of Fig. 2(a) is described by N = 7.5e-0.45 S.

The smooth line of Fig. 2(a) is somewhat unrealistic because mutations are quantized. If thymine is “accidentally” and randomly replaced by guanine, it is a sudden, violent act. Normally, the change is not partial; it is total replacement of one group of atoms by another. In Fig. 2(a), the plot is quantized by a set of vertical lines, as shown, representing bins that are TM set size = 1 unit wide. These are the Δt = 1 bins. The S value of each mutation set nominally becomes that of the center of its bin. The half-bin extending from S = 0 to 0.5 is ignored because its S values are insignificantly small mutations. An S value falling between 0.5 and 1.5 is regarded as having the value S = 1, and so forth.
Quantization not only reflects the physics and chemistry of mutations; it is a major convenience in that it allows the assignment of an inverse-size index, M. The relation between M and mutation set size is given by

M = (N0/λΔt)e S.                                                                                                        (2)   

For a derivation of this equation, see [13] and [14]. For Fig. 2(a), we have M = [7.5/(0.45)(1)]e-0.45 S. Table 1 is a listing of S values, for Fig. 2(a), as a function of M. The S values range from 6.3 for M = 1, to 0.2 for M = 15. The third column gives nominal S values based on the center of each bin. The last two M values, corresponding to M = 14 and 15, do not fall into a valid bin and are therefore discarded. There are N = 5 bin values at S = 1, represented by a dot; there are N = 3 bin values at S = 2, also represented by a dot; and N = 2 bin values at S = 3, likewise represented by a dot. We also haveN = 1 at S = 4, 5, and 6, but these are not indicated because a 15-generation sequence is too short to expect an accurate fit, for large-size mutation sets, to Eq. (1).

Table 1- The M index is an inverse-size measure, so that M = 1 belongs to the widest mutation in a sequence. The second column describes the sequence of Fig. 2, and the third column lists nominal mutation size if bins are Δt = 1 unit wide.

Screen Shot 2014-12-28 at 5.39.27 PM

A nerve action potential point process
The above equations are normally used to describe a time process rather than a “mutation size” process. However, before we look at a more complicated example of a mutation size sequence, it is good strategy to examine actual point process data. They are depicted in Fig. 3(a), which is taken from one of many examples of a point process given by Kiang et al. [15].


Fig. 3—xperimental point process, taken from [15], consisting of spontaneous discharges from an auditory sensory receptor neuron of a cat. Total sequence time was 12.41 min. (a) A plot of N versus time between action-potential spikes,t. The bins are Δt = 1 ms wide. (b) A short synthesized section of the random sequence of 56,611 action potentials. Theaverage interval width is 13 ms.

Figure 3 is analogous, of course, to Fig. 2. The data points almost perfectly trace out a straight line on the semi-log paper. This is experimental data consisting of spontaneous (unstimulated, random) discharges from an auditory sensory receptor neuron of a cat. In (a), the interval histogram uses bins Δt = 1 ms wide. Total sequence time was 12.41 min. In (b), a synthesized train of action potentials is shown. The equation describing the plot is

N = 6842e-0.09461t,

where t is the interval between action potentials, in ms. Fig. 3 is associated with a total of Mmax = 56,611 action potentials.

Although the refractory period at the upper left of the plot is very important where the nervous system is concerned, it is irrelevant with regard to the present paper. Instead, we are drawn to the data at the lower-right end of the plot: Out of 56,611 discharges, there were over 10 that were 74 ms wide. Visualize a 74-ms gap inserted in the sequence of Fig. 3(b), where the average width is 13 ms. The 74-ms gap would stand out “like a sore thumb,” and be regarded with suspicion. But this is the main raison d’etre of the present paper. The entire lower-right end of the plot of Fig. 3(a) consists of rare but perfectly legitimate wide gaps that inevitably occur in a point process.

A 65,535-generations synthetic example

Finally, consider a sequence of Mmax = 65,535-generation TM sets. It is convenient to use the same value for λ as in Fig. 2: λ = 0.45, which corresponds to an average mutation size of S =1/λ = 2.22.

In choosing a value for N0, it is a good idea to compensate for the S values that are discarded because they fall into the 0 < S < 0.5 half-bin. My method is to calculate a first estimate, N01, using Eq. (2) with S = 0 and M = Mmax. This obviously yields
N01 = Mmax λΔt.                                                                     (3)
For the Mmax = 65,535 sequence, we get N01 = (65,535)(0.45)(1) = 29,491. Now increase this by the factor 7.5/7.2 [which is taken from Fig. 2(a)] to get N0 = 30,720. Equation (1) becomes N = 30,720e-0.45 S.

The plot of N versus S is shown in Fig. 4.


Fig. 4—umber of generations versus mutation size for a synthetic sequence of 65,535 pseudorandom mutations as an example of a point process. The plot is based on the same bin size (Δt = 1) and slope coefficient (λ= 0.45) as those of Fig. 2(a). The value of N0 is 30,720. The three short sections of Fig. 1 are taken from this sequence.

Synthesized mutation sequences

Figure 4 is of little help, however, in visualizing what the mutation size sequence actually “looks” like. These synthesized sequences are shown in Figs. 1, 2(b), and 3(b). The synthesized S sequence is derived as follows:

First generate an M sequence. This in turn, using Eq. (2), yields the S sequence
S = (1/λ) ln[N0/(Mλ Δt)].                                                             (4)

How does one go about generating a suitable M sequence? One can refer to a table of random numbers, but it is better to use a pseudorandom sequence because one can then recall any element based on the rules of the sequence. In the present paper, a pseudorandom square wave (PRSW) sequence is used [16]. An example is given, in Table 2, for the range from 0 to 24 – 1 = 15. Reading downward, the M value sequence is given by 15, 7, 11, … , 4, 8, 0 (but the last value is not used because M = 0 is meaningless).

Table 2—he generation of M index integers using a PRSW sequence. This is shown for the mutation sequence of Fig. 2.

Screen Shot 2014-12-28 at 5.41.19 PM

The PRSW sequence offers the following advantages:
1) It is easily constructed, as shown in Table 2.
2) Each integer in the sequence appears once and only once, without any omissions.
3) Two very wide or very narrow intervals in succession are impossible (for example, 15 adjacent to 14 or 1 adjacent to 2 do not occur).
4) Given any particular row, it is relatively easy to recall the rows preceding and following the given row. (See, for example, Table 3.)

Table 3—erivation of the sequence of Fig. 1(b), for which the central M index value is 1 and the corresponding S is 25 units wide. The table also shows the values for the 15 mutations preceding and following the giant M = 1 mutation. The S values are derived from the M values by using Eq. (4). (The row values 1 to 31 are of no particular significance.)


Screen Shot 2014-12-28 at 5.42.14 PM

The M sequence of Table 2 was used to generate Fig. 2(b) with the aid of Eq. (4).
Returning to the TM sequence of Fig. 1(a), which belongs to an unbroken string of 216 – 1 = 65,535 viable generations. The first PRSW-derived M value is 1 + 2 + 4 + ××× + 213 + 214 + 215 = 65,535. The second M value is 1 + 2 + 4 + ××× + 213 + 214 = 32,767, and so forth.

But the monster gap of Fig. 1(b), which is associated with M = 1, is the main focus of the present paper. According to Table 2, the value M = 1 is located in the center of the sequence, at 1 + 0 + 0 + ××× . For the 65,535-generation example, Table 3 lists the M = 1 row, and the M values of the 15 rows above and below the M = 1 row. The third column lists corresponding S values as calculated from Eq. (4) with N0 = 30,720. The contents of the Δt = 1 bins appear in the last column. This is the sequence shown in Fig. 1(b).

Similarly, Fig. 1(c) is a plot of the bin value of the M = 63 row (for which S = 16), and also shows the bin values of the 15 rows above and below the M = 63 row. Because of the rules of the PRSW sequence as applied to a long string of 65,535 generations, Fig. 1(b) and (c) are almost identical.

We again ask ourselves: Are the large mutation sizes valid? In my opinion, the answer is “Yes” for two reasons: First, the fact that humans have evolved from humble beginnings some 4 billion years ago surely testifies to the fact (but does not prove) that large mutations are sometimes viable. Second, a more convincing argument is that there are no laws of physics or chemistry that forbid large mutations. They are rare but allowable and, sometimes, they are viable. If so, they trace out the lower-right portion of the point process line down to around N = 1 transmitted mutation set.

There is no guarantee that, in a sequence of 65,535 generations, the huge S = 25, 23, 22, … changes will occur. All of them are expected probabilities. But Fig. 3 shows that, in an analogous situation, departures from the straight-line locus are minor. If we wait long enough, a giant mutation will show up, sooner or later. In fact, it may appear much sooner. That is the nature of the random beast.

Evolution is a general organizing principle that applies to everything under the sun (or universe). It is tempting to conjecture that large evolutionary changes can be viewed as large “mutations.” The following are three examples:

1) There was a random accretion of material by planets as they were bombarded by asteroids and comets.  On a plot similar to that of Fig. 1, the interspike distance would obviously be proportional to the mass of each asteroid or comet.

2) The so-called “mysterious” creativity of the human mind is easily explained. The brain is a caldron in which a myriad number of signals dash about with a strong element of randomness. Creativity is the serendipitous output of the human mind as many noisy signals happen to simultaneously converge to create a meaningful “new concept” [14].

3) As stated above, we “evolved from humble beginnings some 4 billion years ago.” In a universal process that undoubtedly continues at the present time, to be discovered by a clever chemist one of these days, the first living cell is an assembly of a relatively few atoms. We can conjecture that, given relatively large-size mutations, plus millions of years, the simplest of DNA/RNA molecules will be synthesized; this is the beginning of life as we know it.


[1] N. Eldredge and S.J. Gould, “Punctuated equilibria: an alternative to phyletic gradualism,” in T.J.M. Schopf, ed.,Models in Paleobiology, San Francisco: Freeman Cooper, 1972.  
[2] R. Dawkins, The Blind Watchmaker, New York: W.W. Norton, 1996.
[3] D. Dennett, Darwin’s Dangerous Idea, New York: Simon & Schuster, 1995.
[4] V. Morell, “Size matters: The genes behind adaptation,” Science, vol. 284, p. 2106, 1999.
[5] D.L. Snyder, Random Point Processes, New York: Wiley, 1975.
[6] A. Papoulis, Probability, Random Variables, and Stochastic Processes, New York: McGraw-Hill, 1965.
[7] I. Gath, “Analysis of point process signals applied to motor unit firing patterns,” Math. Biosci., vol.22, p. 211, 1974.
[8] A.V. Holden, Models of the Stochastic Activity of Neurones, Berlin: Springer-Verlag, 1976.
[9] J.P. Landolt and M.J. Correia, “Neuromathematical concepts of point process theory,” IEEE Trans.Biomed. Eng., vol. BME-25, p. 1, 1978.
[10] G.P. Moore, D.H. Perkel, and J.P. Segundo, “Statistical analysis and functional interpretation of  neuronal spike data,” Ann. Rev. Physiol., vol. 28, p. 493, 1966.
[11] E. Parzen, Stochastic Processes, San Francisco: Holden-Day, 1962.
[12] S. Deutsch, “The case for large-size mutations,” IEEE Trans. Biomed. Eng., vol. 48, p. 124, 2001.
[13] S. Deutsch and E. Micheli-Tzanakou, Neuroelectric Systems, New York: New York Univ. Press, 1987.
[14] S. Deutsch and A. Deutsch, Understanding the Nervous System, Piscataway, NJ: IEEE Press, 1992.
[15] N.Y.S. Kiang, T. Watanabe, E.C. Thomas, and L.F. Clark, Discharge Patterns of Single Fibers in  the Cat’s Auditory Nerve, Cambridge, MA: M.I.T. Press, 1965.
[16] S. Deutsch, “Pseudo-random dot scan television systems,” IEEE Trans. Broadcasting, vol. BC-11, p.11, 1965.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s