What is a Gene?
A geek’s view at this nerdy subject
This is an essay on what a gene is. It takes the view from a computer geek perspective. However, one does not have to know unix or computer to understand this essay. The goal is to use this comparison with computers to provide an alternate insight into the subject. By necessity, it is a comparison and thus it cannot be exact. At times, the comparison may be stretching a bit, but please read it from the ideas and concepts being presented and don’t be a purist and say things are not exactly as stated. And one last word about terminology; in this essay, the words organism, host, and computer will be used loosely to mean the same thing: a single entity in the biological or electronic world.
What is a gene?
Biologically, a gene can roughly be defined as a stretch of DNA that encodes the sequence of a protein. As a comparison, the source code that makes up the programs that run in a computer can be seen as the gene. In the computer world, there are a large number of programs that are put together to make a computer functional, for example, the command “ls” that list files on the file system, “ps” that display running processes, etc. Similarly, there is a large number of genes that make up an organism: gene for hemoglobin to carry oxygen in the blood, gene for making actin filaments that makeup muscles, etc.
But ask any programmer, and s/he will explain that source code is not what runs on a computer. The human-readable code must first be compiled (translated) into a machine-readable binary code before it can run. The same conversion takes place in a biological system. DNA is the source code for a protein (program), and it must be translated into protein before it becomes functional. Genes are blueprints; proteins are executing agents that produce traits — phenotypes that we can observe. To better understand genes, a few of their “behaviors” are explained below.
Alternate splicing: Static code, different end product
Some programs are rather simple, and the source code compiles into one static version to produce one binary. However, for larger programs, there are often pragma directives that check for conditions and produce different binaries that are tailored to the environment the program would run. For example, a development environment may add a lot of debugging codes. A production environment may add a variety of optimization code. The compiler may add CPU specific extensions that may be available to specific machines and make multimedia MMX instructions run faster. As such, the compilation of source code can be optimized to its intended environment by “splicing in” specific code that is appropriate. This kind of conditional checking and optimization also exist in the biological world and is actually widespread. A gene in the high organism (eukaryotes) can have several alternate splicing, which would produce proteins that have different executing behavior. For example, HIV has a gene with 40 different splicing, which allows it to create that many different proteins; such variation allows specific proteins to be produced according to the environment it finds itself in.
Homologous genes: code variation adjustment for different host/environment
The source code for the program “ps” that displays the running processes on the computer is largely the same for the many variants of Unix. But there are some slight differences that account for the specific hardware that the computer is running on. In the biological world, genes express such slight variation as homologs, where the “source code” of these genes varies slightly, thereby “compiling” to slightly different proteins. One example is the family of oxygen-carrying proteins called hemoglobin. This family is very similar between human, chimpanzee, dog, and many other mammals, but there are many similarities between them that one can consider them to be the same program adapted to run on different hosts, but carrying out the same function
Furthermore, homologous genes can be found within the same host. More than one protein may exist that does essentially the same thing but have slightly different characteristics which make them optimal for the task at hand. In humans, the hemoglobin gene comes in three forms: adult, fetal and embryonic. The “source code” of these genes varies slightly, thereby “compiling” to slightly different proteins, which have slightly different oxygen affinity. Adults inhale oxygen from the air which travels through the lung. The fetus gets its oxygen from the mother’s blood through the placenta. The embryo is almost like an organ in overdrive and without its own blood capillary distribution. As such, the same task, but a slightly different upload/download mechanism is needed, which is provided by the homologous proteins. This situation has its parallel in the unix world as well: there are different codes that can provide information about the running process on the host machine, namely, the BSD-style ps, System-V ps, or the pstree command. All these commands can run on the same host and would provide essentially the same info, but in a slightly different form, that would be particularly suitable for different tasks on hand.
What exactly constitutes a gene, really?
If one asks a software engineer what a program is, s/he would give a simple pragmatic answer that whatever binary produced by the compilation of the source code is the program. But, does this mean all the DNA sequences make genes? Similarly, if one drills the engineer a bit more, does comments in the source code make a program? Does code that goes into a library but not the executable binary file a program? Would it be the same as the parent program or a separate individual program? What about those source code that resides in a location that, because of the way they are coded, could never be reached in the executable? The parallels to these questions in the biological world are, respectively: Are introns part of a gene? Is a specific exon that makes a specific protein motif a gene? Are pseudogenes, which never get executed anymore, still genes? Applying the pragmatic approach of the software engineer, where a program is the executable that users interact with (ie has visible phenotypes), then the answers would be no, no and maybe.
With all the nuances of the genes described above, it should be understood that it would be a major challenge to come up with an all-encompassing definition of what a gene is, yet at the same time exclude what a gene isn’t. Steven Jay Gould, of the punctuated equilibrium fame, once wrote in his Natural History book that “Science is a method for testing claims about the natural world, not an immutable compendium of absolute truth.” With that in mind, the reader is led to understand that what exactly a gene is doesn’t matter. We are trying to describe billions of years of evolution and complex biochemical processes with a paragraph or two. By necessity, we would only be able to get a working definition that conveys the spirit of the idea, but not the exact letters. We cannot expect biology to work like mathematics, where definitions can be written as law and followed to the tooth.
Genes are the smallest biological unit that evolution selects for
Lack of exact definition aside, there is one guiding principle of “what a gene is” can guide biological research, and it is the idea that Richard Dawkings proposed in his book, The Selfish Gene. The DNA that makes up the gene, accounting for its nuances described above, is what is being selected for by evolution. And here, it is hoped that the comparison with computer programs would be most fruitful.
A computer needs a large repertoire of programs to be useful. A computer that isn’t useful may as well be considered dead. A large set of “genes” must therefore work together to keep the “host” alive. Any gene that does not play by the rule and causes things to break could cause the host to die, and the gene/program to die with it. But if the computer is like the bacteria, and can undergo “conjugation” when it is in stress, then it may be able to pick up new (DNA) code, compile and express its phenotype, and it may restore functioning order, allowing the host to survive. A bad gene/program may cause the whole host to be selected against by evolution, but the right gene may single-handedly restore the selection fitness. The host provides a tidy environment for the genes/programs to come together and make a pact to work together, thereby ensuring mutual survival.
Biological viruses are not considered “alive” because of the way biologists define life, but it encompasses a small set of genes in a package that natural selection can act on. While current computer viruses always cause harm, it is not inconceivable that they could evolve to become a self-replicating, self-propagating vector to deliver new codes to new hosts that could have a beneficial function. In any case, viruses are very prolific in both the biological and computer world.
Many biologists argue that evolution acts on organisms, that genes are just components that make up the organism and thus cannot be acted on by evolution. But an organism is just a collection of proteins made to work in harmony. These proteins are coded by genes. When an organism dies, it may not be biologically fit and takes all its proteins and genes with it. But in reality the genes have siblings in other individuals of the same species and homologs in other species. Those genes could very well be generating a lot of good proteins and helping many other individuals survive.
When a computer has met its fate, the genes/programs that it carries may not die with it. They may be transferred to a new host and start “life” anew there. But it may not be all the programs from the old computer that get propagated, but only the most desirable genes would be picked.
Genes are specific segments of DNA that encode a protein (or other biological entity that can affect the phenotype of an organism). While the entity genes encode may seem rather inanimate compared to what an organism is, it is genes that are selected for in natural selection, and pressured to evolve and adapt. The proof of this is left as an exercise to the reader :)
In the order of their presentation:
Fluorescent dyed chromosome: https://unsplash.com?utm_source=medium&utm_medium=referral
Gene Alternate Splicing: By Smedlib — Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=60328096
Histone H1 sequence: By Thomas Shafee — Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=37188728