Wednesday, December 15, 2010

Genes and protein synthesis

There are many discussions between biologists to find a comprehensive definition of a gene, which is not easy, if possible at all. For our purposes
 
A gene is a continuous stretch of a genomic DNA molecule, from which a complex molecular machinery can read information (encoded as a string of A, T, G, and C) and make a particular type of a protein or a few different proteins.

This “definition” is not precise, and to better understand it we need to describe the molecular machinery making proteins based on the information encoded in genes. This process is called protein synthesis and has three essential stages: (1) transcription, (2) splicing, and (3) translation.
1. In transcription phase one strand of DNA molecule is copied into a complementary pre mRNA (pre stands for preliminary and m for messenger) by the protein complex RNA polymerase II (see section 2.2 and 2.4). In the process the two-stranded DNA double helix is unwound and information is read only from one strand (sometimes called the W-strand). 
2. Splicing removes some stretches of the pre mRNA, called introns, the remaining sections called exons are then joined together. Note that the removal of introns is a consequence of the way how eukaryote genomes are organised.   The genomic DNA that corresponds to the coding part of genes is not continuous, but consists of exons and introns. Exons are the part of the gene that code for proteins and they are interspersed with non coding introns which must be removed by splicing. The number and  size of introns and exons differs considerably between genes and also between species. Only very few genes in yeast have introns, while  for human threre are about 4 introns per gene on average, and the average size of exons is 150 bp and just above 3400 bp for introns. Prokaryote genes do not have introns and the splicing step is not present. The result of splicing is mRNA. Many eukaryote genes are known to have different alternative splice variants, i.e. the same pre-mRNA producing different mRNAs, known as alternative splicing.

(picture taken from  On-Line Biology Book )
3. Translation is the process of making proteins by joining together amino acids in order encoded in the mRNA. The order of the amino acids is determined by 3 adjacent nucleotides (triplets) in the DNA. This is known as the triplet or genetic code . Each triplet is called a codon and codes for one amino acid. As there are 64 codons and only 20 amino acids the code is redundant, for example histidine is encoded by CAT and CAC.  In cytoplasm the mRNA forms a complex with ribosomes, which are large complexes of proteins and RNA molecules. The precise interactions and functions of all protein in ribosomes are not yet fully understood.

(picture taken from  On-Line Biology Book )
Different transfer or tRNA molecules each carries one specific amino acid to the ribosome and specifically recognises one codon on the mRNA. The amino acid carried by the tRNA is added to the nascent (growing) protein. The translation is a complex process and not all the details are understood. Luckily most of these details are not crucial for understanding of bioinformatics. What is crucial however is to realise that there is nothing magical about proteins synthesis.
 
The end of translation is the final part of gene expression and the final product is a protein, the sequence of which corresponds to the sequence encoded by the mRNA. Proteins can be post-translationally modified e.g., by adding of sugars or cleavage (chopping), and this affects their location and function.
Biologists used to believe in paradigm - 'one gene - one protein'. Now this is known not to be true - due to alternative splicing and post-translational modifications one gene can produce a variety of proteins. There are also genes that do not encode proteins but encode RNA (for instance tRNA and ribosomal RNA).