While this is a remarkable finding in itself, there are many other processes in nature that have also been found to be Zipf compliant. What makes the findings of Eugene Stanley et al (1994) so amazing is that they also used information theory as a wholly separate test of the non-coding data. Using the theory which governs modern scientific treatments of data compression, storage and communication, they also found that non-coding DNA has the properties necessary for maintaining data integrity even in error prone conditions. This necessary form of redundancy, which is expressed in terms of “Shannon Entropy,” was not found in the coding portion of the DNA, nor was it found in random characters used as a control.
The fact that these two very different tests for linguistic characteristics are found in the non-coding sections leads very strongly to the conclusion that structured and useful information is contained within the non-coding sections of DNA. This initially led Hancock to posit that non-coding DNA, which makes up the vast majority (or at least around 90%) of the human genome, might actually be a repository of knowledge which could in some way be accessible. Recent science has added a great deal to this mind-blowing hypothesis.
“Junk” DNA? What are we talking about?
DNA is a set of molecules whose shape and configuration of charges allow them to interact with various other molecules in ways that ultimately result in building and operating the super-complex micro machines we call cells. One little section of DNA, therefore, is like some of those complex little machines full of gears that can do fascinating but specific things. Now imagine creating enough different little machines to populate a factory and that factory’s purpose was to create even more tools, gears, and machines. This is a basic description of a cell.
In this analogy the configuration of atoms which make up DNA are like specially shaped multi-pole magnets which self-assemble and fit together to make up the gears of the machinery. The specific way all these gears and machines fit together and how they are arranged is crucial to making the factory actually be able to create what it does. The sequence of events dictated by the arrangement of equipment can be said to be a sort of “program” and that is why we often refer to DNA this way. The simplest pieces are all pretty much the same so it’s the arrangement that makes the difference between “pieces of metal” and “a machine.” That configuration is information.
The problem is that when we look in at the super complex machinery of DNA, we can only really separate “machines” from “hunks of metal” by the fact that we can see certain components work together to create things. We call these sections of DNA “coding” sections because they take part in creating proteins. When we examined all we knew about DNA and its functioning, we found that only a small percentage of the DNA actually seemed to do anything at all.
This is like looking into a factory in which over 90% of what you see is machine parts in various states of configuration and assembly that don’t actually do anything and are never used at all in the production process. It’s just hanging around. Then, when the factory fully duplicates itself, it also duplicates all the seemingly pointless junk as well. The question we have to ask is if the non-coding DNA really is “junk” or not. Nature tends to get rid of superfluous things, so what is all the extra information for?