Wednesday, December 15, 2010


Proteins are the main building blocks and functional molecules of the cell, taking up almost 20% of a eukaryotic cell’s weight, the largest contribution after water (70%). Among others, there are
  • Structural proteins, which can be thought of as the organism's basic building blocks. An example is collagen, which is the major structural protein of connective tissue and bone.
  • Enzymes, which perform (catalyse) a multitude of biochemical reactions, such as altering, joining together or chopping up other molecules. Together these reactions and the pathways they make up is called metabolism. For example the first step in the glycolysis pathway, which is the conversion of glucose to glucose 6-phosphate, is catalysed by the enzyme hexokinase. Usually enzymes are very specific and catalyse only a single type of reaction, however the same enzyme can play role in more than one pathway.
  • Transmembrane proteins are key in maintenance of the cellular environment, regulating cell volume, extraction and concentration of small molceules from the extracellular environment and generation of ionic gradients essential for muscle and nerve cell function. An example is the sodium/potassium pump.

Proteins have complex three dimensional (3D) structure (see figure below). Four levels of protein structure are distinguishable:
  1. Proteins are chains of 20 different types of amino acids, which in principle can be joined together in any linear order, sometimes called poly-peptide chains. This sequence of amino-acids is known as the primary structure, and it can be represented as a string of 20 different symbols  (i.e., a word over the common alphabet of 20 letters). Information about various protein sequences and the functional roles of the respective proteins, can be found in UniProtKB/Swiss-Prot database. UniProtKB/Swiss-Prot is a joint project between the EBI and the Swiss Institute of Bioinformatics (SIB). The length of the protein molecule can vary from few to many thousands of amino-acids. For example insulin is a small protein and it consists of 51 amino acids, while titin has ~28,000 amino acids.
  2. Although the primary structure of a protein is linear, the molecule is not straight, and the sequence of the amino acids affects the folding. There are two common substructures often seen within folded chains - alpha-helices and beta-strands. They are typically joined by less regular structures, called  loops. These three are called secondary structure elements.
  3. As the result of the folding, parts of a protein molecule chain come into contact with each other and various attractive or repulsive forces (hydrogen bonds, disulfide bridges, attractions between positive and negative charges, and hydrophobic and hydrophilic forces) between such parts cause the molecule to adopt a fixed relatively stable 3D structure. This is called tertiary structure. In many cases the 3D structure is quite compact.
  4. A protein may be formed from more than one chain of amino-acids, in which case it is said to have quaternary structure. For example haemoglobin, is made up of four chains each of which is capable of binding an iron molecule.
Proteins are much too small to be seen in an optical microscope - a characteristic protein size varies from about 3 to 10 nanometers (nm), i.e., 3 to 10 times 10-9 m, and solving (i.e., discovering) their structure is a difficult and expensive exercise (approximately €50,000 - €200,000 per novel structure), which is done by a variety of methods including X-ray crystallography, nuclar-magnetic resonance spectroscopy, and advanced electron microscopy. PDbe is a database of known protein structures, which is housed and developed at the EBI. The images below shows the structure of triosephosphate isomerase visualised by RasMol software package, a 3D viewer for PDBe structures.
In this image the magenta coloured bits are alpha-helices, while yellow bits are beta-strands.
An alternative view in which the two monomer units are highlighted. The size of this protein in a crystallised state is about 13 x 7 x 5 nm. The images above are only models of these molecules, as the molecules are two small to have a ‘real’ image. For instance they cannot have any conventional colour, they are in constant motion, and when we start zooming in into a finer structure, quantum effects, such as Heisenberg uncertainty principle start playing role. 
There are roughly 15,000 protein structures deposited in public databases, though many of them are very similar to each other. Whether to consider two protein structures  similar or different depends on the similarity threshold (as with cell types). Structural biologists think that currently there are about 1,500 different representative protein structures known. 
All four structural levels are essentially determined by the primary structure (i.e., the amino-acid sequence) plus the physico-chemical environment where the molecule is placed. Predicting protein structure from the amino-acid sequence is one of the most important problems of computational biology (another name for bioinformatics, though some try to make a distinction between these two terms) and is far from being solved. Characteristic, frequently reoccurring structural elements are called protein domains. Sometimes it is possible to identify these domains in proteins of unknown structure, if their sequence is similar to that of a known structural domain. Structural domains are often associated with a particular protein function. Protein similarity is also deemed to be the result of evolutionary relationship.
What are the comparative sizes of proteins and cells? There is a proverb saying that size does not matter. Still comparative sizes may matter, particularly if we try to imagine the cellular processes described in the next sections. A typical linear dimension (diameter) of a globular protein is about 5 x 10 -9 m, while of a eukaryotic cell about 5 x 10 -5 m. This means the a cell is about a 10,000 times larger than a protein linearly. Alternatively, if we estimate the average weight of a human cell as about 10 -9 g, and remember that proteins constitute about one fifth of cell mass, then assuming the weight of an average protein to be about 10 -19 g (say hemoglobin is 64,500 atomic units, each of which is 1.66 x 10 -24 g), we see that there are 0.2 x 10 -9 / 10 -19 proteins per cell, which equals two billion (2 x 10 9 ). These of course are very rough estimates which would vary from cell to cell. If we remember that there are about 6 x 10 13 cells, we see that there are 30,000 times more cells per human, than proteins per cell. This may be an indication of the relative complexity of a human compared to a single cellular organism (a similar estimate regarding the relative complexity of an elephant or dinosaur and human may not be flattering for a human). 
Although forces such as hydrogen bonds are weak individually, when two or more biological macromolecules with complementary shapes come close to each other, the sum of all such weak forces may cause the molecules interact rather strongly, e.g., to make them stick together. In fact, such weak inter-molecular forces and interactions play a fundamental role in life and are at the basis of virtually all biological processes. For instance many proteins can stick together to form large protein complexes such as yeast RNA polymerase II, which reads and transcribes the genetic information (see Section 3.3), and which has 10 subunits and for which the structure has been solved recently. These weak interactions also underlie how microarrays work, which is discussed in the last section.