Friday, April 15, 2016

Linguistic characteristics of Non-coding DNA:


In chapter twenty of Graham Hancock’s Supernatural, a fascinating question about the contents of “junk” DNA is brought to light. At the time of publication, researchers had discovered that only the non-coding sections of DNA had special properties of long range ordering while the coding sections did not. This long range ordering conforms to the pattern found in Zipf’s law which is also a pattern found in spoken language.

While this is a remarkable finding in itself, there are many other processes in nature that have also been found to be Zipf compliant. What makes the findings of Eugene Stanley et al (1994) so amazing is that they also used information theory as a wholly separate test of the non-coding data. Using the theory which governs modern scientific treatments of data compression, storage and communication, they also found that non-coding DNA has the properties necessary for maintaining data integrity even in error prone conditions. This necessary form of redundancy, which is expressed in terms of “Shannon Entropy,” was not found in the coding portion of the DNA, nor was it found in random characters used as a control.

The fact that these two very different tests for linguistic characteristics are found in the non-coding sections leads very strongly to the conclusion that structured and useful information is contained within the non-coding sections of DNA. This initially led Hancock to posit that non-coding DNA, which makes up the vast majority (or at least around 90%) of the human genome, might actually be a repository of knowledge which could in some way be accessible. Recent science has added a great deal to this mind-blowing hypothesis.

“Junk” DNA? What are we talking about?

DNA is a set of molecules whose shape and configuration of charges allow them to interact with various other molecules in ways that ultimately result in building and operating the super-complex micro machines we call cells. One little section of DNA, therefore, is like some of those complex little machines full of gears that can do fascinating but specific things. Now imagine creating enough different little machines to populate a factory and that factory’s purpose was to create even more tools, gears, and machines. This is a basic description of a cell.

In this analogy the configuration of atoms which make up DNA are like specially shaped multi-pole magnets which self-assemble and fit together to make up the gears of the machinery. The specific way all these gears and machines fit together and how they are arranged is crucial to making the factory actually be able to create what it does. The sequence of events dictated by the arrangement of equipment can be said to be a sort of “program” and that is why we often refer to DNA this way. The simplest pieces are all pretty much the same so it’s the arrangement that makes the difference between “pieces of metal” and “a machine.” That configuration is information.

The problem is that when we look in at the super complex machinery of DNA, we can only really separate “machines” from “hunks of metal” by the fact that we can see certain components work together to create things. We call these sections of DNA “coding” sections because they take part in creating proteins. When we examined all we knew about DNA and its functioning, we found that only a small percentage of the DNA actually seemed to do anything at all.















This is like looking into a factory in which over 90% of what you see is machine parts in various states of configuration and assembly that don’t actually do anything and are never used at all in the production process. It’s just hanging around. Then, when the factory fully duplicates itself, it also duplicates all the seemingly pointless junk as well. The question we have to ask is if the non-coding DNA really is “junk” or not. Nature tends to get rid of superfluous things, so what is all the extra information for?


Ups and downs in the recent history of “junk” DNA

One of the first advances of note in examining this idea is the recent finding that some of the non-coding areas of DNA previously labelled “junk” have been reported to play a crucial role in modulating gene expression. Some of the previously labeled “junk” is, therefore, actually functional. In many reports on these findings, the very notion of “Junk DNA” is strongly dissuaded, as though all of the non-coding DNA has now been shown to have some purpose. This is the farthest thing from the truth, however!

Upon closer examination, it has long been known (as early as 1951 through Barbara McClintock) that gene regulation exists within the genome and that all of these regulators are not fully known or understood. Furthermore, the idea of “junk” DNA has been held up as support of an evolutionary development model and has been a hot-button issue for creation science proponents. Much of the literature and impetus to fight the idea of “junk” DNA has come through creation science sources with a specific axe to grind.

Before the completion of the human genome project, most arguments against “junk” DNA were based upon comparative genomics. It was found that upon comparing genes between species, a number of genes in non-coding areas are shared. The further apart the species are genetically, the more “highly conserved” an identical gene sequence found in both is considered to be. 

Some genes are considered “ultra conserved” because they are found in widely divergent species such as rats, humans, and cows. The first indication that these areas in particular may be useful in some way was when over-expression of these ultraconserved regions was linked with certain types of cancer (Calin et al 2007). This finding is a special case which runs contrary to many other findings discussed below.

Around the turn of the millennium, upon completion of the human genome project, it was known that around a mere 1.5% of the DNA is actually coding. But in September 2012 this same project group called the Encyclopedia of DNA Elements Project (ENCODE) had published a series of papers in Nature which showed that up to 76% of non-coding DNA was being transcribed. This report – that around 80% of genome is biologically active – also led to a great deal of conjecture and presumption. Scientifically speaking however, these findings were not direct evidence of real or direct biological functions, but evidence of the possibility of them.

As early as 2011 however, the same project had identified that a species of bladderwort was deleting the vast majority of non-coding DNA such that only 3% of its total genome was non-coding DNA; this was in opposition to the 98.5% found in humans. This extremely complex species was functioning quite well with a miniscule number of regulatory genes.

 


While some of the non-coding DNA in humans has been directly shown to have regulatory action, the most recent findings now show that the amount of the actual functioning non-coding DNA is still extremely small. This conclusion fits well with the bladderwort findings that only a small number of regulatory non-coding genes were necessary to function. Most notably, the ENCODE group has reported that the total functional DNA in humans is no greater than 8.2%. (Rands et al 2014) This is, however, not proof of direct functioning of even that small percentage, but an estimation made, once again, based upon comparative genomics. Only a very, very small percentage of non coding DNA has been directly shown to have regulatory action.

 




Comparative genomics in this context relies predominantly upon the conservation of genes between species, as mentioned above. It is automatically presumed that the conservation of genes occurs because of selection pressures acting upon the animal, meaning that the gene must either grant some survival advantage or otherwise play some functional role that can impact selection. Experimentation in deleting these genes, however, has shown this presumption to be highly questionable.

It is important to note that one of the crucial aspects of regulatory genes is that their placement near coding genes is considered the mechanism by which they function. While the first experiments which deleted highly conserved areas showed zero effect in the animals (Nobrega, M.A. et al. 2004), these extremely unexpected results were explained as perhaps related to the ease of lab conditions in keeping the animals healthy and hiding possible frailties. In another experiment however, researchers specifically chose ultra conserved areas in close proximity to genes which are known to cause severe abnormalities when not functioning normally. Once again, the deletion of these genes had absolutely zero impact upon morphology, longevity, reproduction, or metabolism (Ahituv, N. et al. 2007).

DNA data storage and information theory …again

In the past decade and a half, the computer industry has delved deeply into the use of DNA for developing long-term, space-saving data storage. The first major proof of the concept emerged in 2012 with the writing and reading of over 5 megabits of information in DNA storage (Church et al 2012). While there have been simultaneous advancements in numerous biological computation technologies (Shaji Varghese 2015), the most telling example is the development of a DNA storage system for data which is not only nearly as easily manipulated for editing etc. as an electronic computer system, but has been shown to potentially last as long as a million years in cold storage (Grass et al 2015). Even using less than perfect data compression, all of the data on Wikipedia and Facebook combined could fit in a couple of drops of liquid and all the knowledge of mankind in the space of an elevator cabin.

The further breakthrough in this most recent development is the application of information theory to maintain data integrity even in conditions that might cause errors. DNA can be very unstable, so the largest challenge is overcoming this limitation. By using error correction codes, lost data can be reconstructed. This is accomplished by duplicating information.

This methodology of redundancy results in precisely what was searched for (and found) in non-coding DNA in the study mentioned at the beginning of this article. With this insight, it seems as though we may have come full circle with an ancient advanced civilization who had previously discovered and used this technology.

In the continued pursuit to design a data system which could eternally store knowledge, it is inevitable that any singular storage place for a library will be subject to destruction via simple mechanisms such as bombing, to more long term concerns such as subsumption by tectonic plates at large enough time scales. Our technology is already strongly and quickly advancing from our manipulation of DNA, towards using it as a form of wet nanotechnology (Zakeri et al 2015), which could accomplish all of the tasks that our current technologies do with the added advantage of self-assembling, self-healing and self-replicating devices.

An obvious solution to the eternal storage problem would be to write our knowledge into creatures that will grow, replicate and most importantly adapt to an ever-changing environment through branching into every possible species, to colonize every possible space and environmental niche. It makes perfect sense to do again what might have been done before.

We may decide to build an eternal library and fill it with our own books. Chances are, as we excavate the location to build this library that we will find, just under the surface, a library already built in the same way and in the same place as before.

Data means nothing without an interpretation key

One of the most crucial aspects to consider when designing this eternal library, however, is the leaving of a Rosetta stone, so that in a few million years, when a wholly different species with no concept of our language attempts to read it, it can be decoded. The Rosetta stone itself must not only be hidden in the DNA with the information, but also somehow cross language and culture barriers that are separated by an alien mental gulf of time and circumstance.

One conceivable method for creating a Rosetta stone is to cause library sections (non-coding) of the information to somehow also impact and interact with the sections of DNA which do actually form the creature (coding DNA). This could be a way of linking pure information with real world actions and objects. By combining the method of action of this “Rosetta stone” section of regulatory DNA with its location in relationship to other non-coding sections, it might represent a word within some larger statement. If this was combined with the component of the animal that a particular section of DNA represents, one might develop a seed for universal translation of the data through the symbolic actions and actors found within the DNA itself.\

For instance, if the regulatory action causes a gene to “hide” or “diminish” then that sequence of DNA could be interpreted to have that meaning. If it “cuts” or “translates,” these might also give us a meaning to particular sequences. Then if it affects “blood” or an “arm” then we begin to have additional nouns to work with. All of these actions and components might be used in a metaphorical sense to build a language which can be unpacked through self-modification. Each relationship might lead to other relationships by which a well designed language might lead to ever more complex words based upon a system in which few words act as modifiers upon each other to create ever greater refinement and specificity of definition.

The discovery – that a small portion of our non-coding DNA does directly regulate some genes – is a natural conclusion to, and in further support of, the genetic library hypothesis.



Eliminating selection pressure impact on genetic information storage

 

One of the other important design considerations in using animals and their offspring to store crucial information, is the ability to somehow allow the normal developmental process of species, but keep it from impacting the passing on of the data to offspring. There would need to be a selection mechanism within the genes that favors preservation of the data even though the data provides no survival advantage.

Recent development in genetics has actually provided such a mechanism and it’s called a “gene drive” or a Mutagenic Chain Reaction (MCR). In applying the CRISPR gene editing technology with a targeting sequence, researchers have created an auto-catalysing reaction that results in a self-replicating gene editing system (Gantz & Bier 2015). This system targets and edits all copies of the targeted gene and therefore not only affects the genes of a given animal, but also the genes of offspring. The editing system, passed down genetically, bypasses the normal mendelian mechanics of dominant and recessive traits by overwriting any other copy received from a normal donor parent with the gene passed down from the gene drive parent. In the experiments, the gene drive system resulted in passing down a recessive trait in fruit flies with around 97% accuracy.



 

This system so completely bypasses any selection pressures that it could even result in preferentially passing down traits which directly lower the survival rate of the offspring. This system, which is entirely encapsulated within the molecular mechanics, could provide a means by which DNA-based information storage is passed down preferentially in animals completely regardless of natural selection processes.

The end result of storing information in this way would be that after millions of years there would be ultra-conserved DNA which could be directly deleted, with zero impact on the morphology or other traits of the animals, precisely as was seen in the UCR deletion studies mentioned above.


Epigenetics, Genetic Memory, and the Library of the Ancients

Epigenetics is another crucial component to this puzzle, the knowledge of which has advanced a great deal in the past decade. We now know that the events and influences within a single animal’s life can be passed on to their offspring. Epigenetics is not a process that alters the actual DNA code but works, instead, through regulation of gene expression much like cell differentiation does. The reason we can have all the various cells of different organs appear from the single stem cell type is because of gene expression.

As one might expect, the sections of non-coding DNA which act in a regulatory role and impact the coding DNA have been found to be a part of the mechanism of epigenetics (Cao, J. 2014). These epigenetic changes can also, however, alter the likelihood that a real genetic mutation might occur in the affected area (Skinner 2015). The more short-term mechanisms of epigenetic change, when transmitted over longer trans-generational time scales, may, therefore, lead to direct genetic change.\

The idea of genetic memory has been around for a very long time as an anecdotal conjecture. The ability of savants to know things it seems they have never learned has been explained in such a way, as have past life regressions etc. While the idea that instincts must exist as instructions hidden in genes which regulate brain morphology is widely assumed, these general instructions have generally been postulated to have formed purely through selection pressures.

With the recent understanding of epigenetics allowing for the experiences of a single individual to become heritable, the possibility of true genetic memory has become a scientifically viable area of inquiry. As such, a study was performed on mice in which parental traumatic exposure to cherry blossom scent passed down the fear conditioning to future generations in which even the grandchildren reacted very aversively toward the scent. Furthermore a gene was identified with sensitivity to that scent and changes to the expression of the gene had indeed been altered. Concurrent changes in brain structure were also observed in offspring (Dias & Ressler 2014).


Now that genetically transmitted memory is a scientifically validated concept, it is obvious that there is a data storage mechanism in DNA which interacts with mental experiences and it seems no large presumption that the interaction could possibly be two-way in nature.

Might we then suppose that our non-coding DNA may be comprised of both trans-generational memories as well as a storehouse of more directly inserted information from long ago? Might the directly inserted older data play a role as a guiding principle for trans-generational memory and might those memories, built upon that original key, provide a sort of interpretational Rosetta stone for understanding that original information?

From a programming and engineering perspective, it makes sense to integrate the interpretational system in such a fashion.

One final thought

In 2011 another new method of identifying spoken language emerged from the application of information theory in a novel fashion. While the “Shannon Entropy” (information density) of various languages differed, one aspect remained almost exactly the same across all languages. The amount of information carried by the structure of real communicative language is higher than a word-salad mix of words. When we take any text of any language and rearrange the words randomly, the amount of information per word decreases by the same amount regardless of language (Montemurro & Zanette 2011).

My personal opinion is that if non-coding DNA were subjected to this test, it would probably fail because the language of memories is likely more symbolic than syntax based. A reliance upon word ordering is usually reserved only for spoken language and not general data storage. Therefore a failure would not, in any way, falsify the library hypothesis. However, if this test were to come back positive, not only would the library hypothesis be utterly validated, but it would also likely mean that all genetic memory is encoded directly into the language once spoken by the ancients who created this library.

While this reliance upon syntax in the storage of genetic memory seems unlikely, it is not completely unsupported by scientific discoveries. Research at Carnegie Mellon has resulted in the discovery that language is stored so similarly from one human to the next, that the physical location in the brain for the data that represents nouns can be modeled and predicted by a computer. Tests using fMRIs to read the blood flow and activation of these areas in the brain resulted in the ability to read a subject’s mind with between 70% and 94% accuracy when the subject focused upon a noun. The accuracy was so great that this computer-modeling-based “mind-reading” technique was able to distinguish between thoughts of extremely similar objects like “pliers” and “hammer.” (Mitchell et al 2008)

While this is not a reliance upon word ordering, it shows a standardization of language storage from one human to the next. Therefore the concept of storing additional information by the relationships between words is not necessarily a large leap, but is a worthy area of future inquiry. It does, however, lend itself to a representational mechanism, based upon physical position, by which genetic memory may be encoded. Specifically, it provides a structural system by which one language may be translated into a different one.

References:
R. N. Mantegna, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng, M. Simons, and H. E. Stanley.
Linguistic Features of Non-coding DNA Sequences,
Phys. Rev. Lett. 73, 3169-3172 (1994)
Calin, George A. et al.
Ultraconserved Regions Encoding ncRNAs Are Altered in Human Leukemias and Carcinomas
Cancer Cell , Volume 12 , Issue 3 , 215 – 229, 2007
Ibarra-Laclette, E., Albert, V. A., Pérez-Torres, C. A., Zamudio-Hernández, F., Ortega-Estrada, M. de J., Herrera-Estrella, A., & Herrera-Estrella, L.
Transcriptomics and molecular evolutionary rate analysis of the bladderwort (Utricularia), a carnivorous plant with a minimal genome.
BMC Plant Biology (2011), 11, 101. http://doi.org/10.1186/1471-2229-11-101
Rands CM, Meader S, Ponting CP, Lunter G.
8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage.
Schierup MH, ed. PLoS Genetics. 2014;10(7):e1004525. doi:10.1371/journal.pgen.1004525.
Nobrega, M.A. et al. (2004).
Megabase deletions of gene deserts result in viable mice.
Nature 431: 988-993.
Ahituv, N. et al. (2007).
Deletion of ultraconserved elements yields viable mice.
PLoS Biology 5(9): e234.
Church, G. M., Gao, Y. & Kosuri, S.
Next-generation digital information storage in dna.
Science 337, 1628–1628 (2012)
Shaji Varghese, Johannes A. A. W. Elemans, Alan E. Rowan, Roeland J. M. Nolte,
Molecular computing: paths to chemical Turing machines,
Chem. Sci., 2015, 6, 11, 6050
Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. and Stark, W. J. (2015), Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes.
Angew. Chem. Int. Ed., 54: 2552–2555. doi:10.1002/anie.201411378
Bijan Zakeri, Timothy K Lu,
DNA nanotechnology: new adventures for an old warhorse,
Current Opinion in Chemical Biology, 2015, 28, 9
Valentino M. Gantz, Ethan Bier
The mutagenic chain reaction: A method for converting heterozygous to homozygous mutations
Science 24 Apr 2015 : 442-444
Cao, J.
The functional role of long non-coding RNAs and epigenetics.
Biological Procedures Online (2014), 16, 11. http://doi.org/10.1186/1480-9222-16-11
Skinner MK
Environmental Epigenetics and a Unified Theory of the Molecular Aspects of Evolution: A Neo-Lamarckian Concept that Facilitates Neo-Darwinian Evolution.
Genome Biol Evol. 2015 Apr 26;7(5):1296-302. doi: 10.1093/gbe/evv073.
Brian G Dias & Kerry J Ressler
Parental olfactory experience influences behavior and neural structure in subsequent generations
Nature Neuroscience 17, 89–96 (2014) doi:10.1038/nn.3594
Marcelo A. Montemurro and Damián H. Zanette.
Universal Entropy of Word Ordering Across Linguistic Families.
PLoS ONE, Vol. 6, Issue 5, May 13, 2011. DOI: 10.1371/journal.pone.0019875.
Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L.. Malave, Robert A. Mason, Marcel Adam Just
Predicting Human Brain Activity Associated with the Meanings of Nouns
Science, 30 MAY 2008 : 1191-1195

No comments:

Post a Comment