Friday, 27 March 2009

Jingo jargon

BELOW you will notice a short scientific highlight, entitled When autumn falls, published in this month's issue of Nature Reviews Molecular Cell Biology. I include it for one very simple reason: I wrote it. But in preparing it for this site, I realized quite how high the assumed understanding of the piece is, and so offer this as a short lesson in the jargon of science. What is with all those italics? What is a transcript, and what is a gene? What the devil is a microRNA?

Much like starch (that wonderful carbohydrate in our staple foodstuff, the potato), which is a string of individual glucose sugars (a polymer of glucose monomers), DNA is a string of individual nucleic acids. Each nucleic acid is a little more complicated than glucose molecules - comprising a sugar, a phosphate and a nitrogenous 'base'. Cleverly, there are two strings wound around each other (the famed 'double helix'), which provides a cunning mechanism for the DNA to copy itself - you see, there are four common types of nucleic acid, identical except for the nitrogenous bases. The four different bases are called adenine, cytosine, guanine and thymine, or, for simplicity, A, C, G and T. Owing to their chemical structures, the bases 'pair' up across the helix, with A aligning with T, and C with G:

Strand 1 Strand 2
| |
A = T
C = G
| |
(diagram for illustrative purposes only: at ease, scientific pedants)

So, all you need to copy a strand of DNA is a plentiful supply of free nucleic acids and a way to unzip the double strand. Once unwound, the exposed single strand becomes a template: A binds to T, and C binds to G in the order of the complete strand, and then some enzymes come in, work their magic, and seal the monomers into second strand.

But, we hear that DNA is the 'information' in a cell. Packaged in the nucleus of every cell in the body (well, almost), DNA is the source of those elusive entities: genes. But how can a string of four types of acid be responsible for building you, hedgehogs and eyes of newt?

Contrary to common understanding, proteins are much more than a dietary requirement for making muscles. They are an enormously complicated family of molecules that do everything that makes you you - they are the enzymes that break down your food, they help to make hormones, they assist the division of your cells and, yes, they are structural components too.

DNA makes proteins. Specifically, it encodes them. First, a messenger molecule is made in the same way as DNA is replicated: the double-strand unzips and free nucleic acids align with their counterparts to form a copy. The nucleic acids in this circumstance are RNA, not DNA, monomers. These differ chemically, and are more transient, but work in the same way - although RNA does not have thymine (it uses uracil (U) instead). This messenger (a transcript) is called a messenger RNA, or mRNA, and is single stranded. Second, the mRNA leaves the nucleus (as DNA cannot) and feeds into a complex structure called a ribosome, which is, to all intents and purposes, a protein factory. And here's the clever bit:

Proteins are made of 20 or so amino acids. To turn a genetic code of 4 letters into a 20-letter amino acid alphabet, the mRNA molecule is 'read' by the ribosome three letters at a time. So, for example, GGU equates to a glycine amino acid. The amino acids attach to another specific RNA molecule, a transfer RNA (tRNA), which exposes three specific RNA nucleic acids to pair up with the mRNA in the ribosome, thereby aligning the amino acids in the correct order, as specified by the original DNA in the nucleus.

These are the fundamentals of genetics: DNA encodes mRNA, which leaves the nucleus and uses ribosomal wizardry to pair up with amino acid-carrying tRNAs, allowing the ribosome to combine the amino acids in the order aseembled, thereby making a protein. A gene? Well, this is the stretch of DNA of the correct length to make a specific protein*.

In scientific literature, gene symbols are denoted by italics and proteins in roman. The use of capitals depends on the species involved.

In my article, I talk about Arabidopsis thaliana, a plant species from the brassica family that is used as a model organism - that is, it is the standard and best-characterized species for study. The authors uncovered a genetic loop of two genes - EIN2 and ORE1 (note italics) - and a microRNA called miR164. A microRNA is encoded by DNA but is never translated into a protein; instead, in RNA form, it suppresses the mRNA of other genes. How this works is not known with any certainty. It follows, based on the nomenclature stated above, that miR164 is the actual microRNA, and MIR164 is the gene that encodes it. Lastly, the article mentions ore1 and ein2 mutants - these are plants in which ORE1 and EIN2 have been altered in some way to function differently or not at all (the lower case is a specific style for Arabidopsis).

With all this in mind, hopefully my highlight now makes some kind of sense!

*Definitions vary.

No comments: