Evolution of the first proteins

Sequencing of nearly two hundred genomes has made it possible to extrapolate back to the Last Universal Common Ancestor (LUCA) of all present-day life, to reconstruct "Hadean Park" proteins. And these proteins provide hints that:
 * The origin of life had occurred in a hydrothermal vent
 * Amino acids were originally acquired from the prebiotic environment
 * DNA was invented not only after RNA and proteins, but also after discrete cells

Di Giulio has attempted to find the preferred temperature of the ancestors of the three domains, Eubacteria, Archaebacteria, and Eukarya, in order to find that of the LUCA. The amino-acid compositions of proteins of organisms with different temperature preferences enable one to calculate a "thermophily index" from a protein's composition. And calculating the thermophily indices of reconstructed ancestral proteins, he found: (the remaining kind of preference is for cold, being a psychrophile)
 * Eubacteria: (hyper)thermophile: 40-100+ C
 * Archaebacteria: (hyper)thermophile: 40-100+ C
 * Eukarya: mesophile: 10-40 C

This suggests that the LUCA was a (hyper)thermophile, which is consistent with hydrothermal-vent-based scenarios for the origin of life. The common ancestor of present-day Eukarya is likely to be much more recent, thus its lower temperature preference.

Parallel to this, Brooks et al. have compared 65 Conserved Orthologous Domains of various proteins whose phylogenies have approximately followed the "average" one, in order to avoid lateral-gene-transfer effects, and used that "average" to reconstruct their ancestors. These ancestors tended to be enriched in amino acids easily produced in prebiotic-chemistry experiments, and depleted in those relatively difficult to produce. Their list of easy-to-produce biological amino acids:
 * Alanine
 * Aspartate
 * Glutamate
 * Glycine
 * Isoleucine
 * Leucine
 * Proline
 * Serine
 * Threonine
 * Valine

This result is consistent with the hypothesis that early organisms had first acquired their amino acids from their environment, and later invented some extra, fancier ones.

This conclusion is approached in a somewhat different direction by Brian Davis, who has worked out the order of emergence of some important proteins using a different criterion: how many metabolic steps necessary to produce their amino acids from Krebs cycle precursors. He follows Günter Wächtershäuser, an advocate of hydrothermal-vent-based origin-of-life scenarios, in supposing that the reductive Krebs cycle was an ancestral and possibly-prebiotic metabolic pathway. Here is the number of steps from the Krebs cycle to each biological amino acid:
 * 2: Aspartate, Glutamate, Asparagine, Glutamine
 * 4: Alanine, Proline, Serine, Valine
 * 5: Cysteine, Glycine
 * 6: Threonine
 * 7: Isoleucine, Leucine, Methionine
 * 9: Arginine
 * 10: Lysine
 * 11: Phenylalanine, Tyrosine
 * 13: Histidine
 * 14: Tryptophan

The closer amino acids tend to be relatively easy to produce with prebiotic chemistry, as can be seen from the earlier list; the major discrepancy is in that list's omission of the sulfur-containing amino acids cysteine and methionine. Their biosynthesis likely originated according to the Horowitz hypothesis, in which an organism that runs out of some substance will try making it from a simpler one, repeating the process until something very common is found. The farther amino acids may originally have been produced by metabolic leakage &mdash; accidental formation of various substances that proved to be useful &mdash; followed by Horowitz-style evolution of their biosynthesis (Lazcano and Miller).

He examined ten proteins, working out ancestral sequences and scoring them by this "biosynthesis age" of their amino acids; here they are in order of this age:
 * Low-potential (4Fe-4S) ferredoxin, an electron-transfer protein important in biosynthesis
 * Proteolipid helix 1, part of the ATPase complex (adapted for cell membranes)
 * FtsZ, involved in prokaryotic-cell division
 * Flap 5'-3' exonuclease, which does some RNA/DNA editing
 * Three subunits of DNA-dependent RNA polymerase, which copies from DNA to RNA
 * Riboneucleotide reductase (Fe), which makes DNA nucleotides from RNA ones
 * Reverse transcriptase, which copies from RNA to DNA
 * Topoisomerase I, which changes the topology of a DNA strand for replication

Perhaps the most interesting result was the structure of the oldest protein, 23-amino-acid ancestral ferredoxin. It produced the ancestor of the present-day ferredoxins by a tandem-repeat duplication, enabling reconstruction by internal sequence comparison. Like the present-day ones, it had the sulfur necessary to become an iron-sulfur protein, meaning that it had kept its electron-transfer function over its entire history. Its iron-sulfur electron-transfer chemistry may originally be prebiotic, being performed by mineral surfaces, as WÃ¤chtershÃ¤user has proposed; it was then taken over by that protein for the convenience of some early organism.

This ancestral protein had a very interesting structural feature: a acidic, negatively-charged "tail". This can make it stick to a plausible prebiotic environment: mineral surfaces, which tend to become positively charged. The Krebs cycle's members can also do that, because they are also acidic. And mineral surfaces are abundant in hydrothermal vents.

This also suggests that ferredoxin goes back to a pre-cellular phase of life, something like Ernst Haeckel's Urschleim (original slime).

But despite ferredoxin's antiquity, it may not be the first protein; Davis mentions a speculation that the first one was a simple aspartate-glutamate protein that fixed nitrogen from ammonia.

Ferredoxin was soon followed by proteolipid helix 1, which is hydrophobic (water-repellent), making it adapted for residing in cell membranes, with their soap-bubble-like hydrophobic interiors. It is part of the ATPase enzyme complex, which uses gradients of H+ ions to assemble ATP molecules &mdash; something which requires that a cell membrane be in place.

Next was FtsZ, a protein involved in the division of prokaryotic cells. It is distantly related to the eukaryotic cytoskeleton protein tubulin. Its presence also requires cell membranes; this and the previous protein suggest that discrete cells had evolved early in the evolution of proteins.

The more difficult amino acids soon followed, including the more alkaline ones like lysine.

And also following were several proteins involved in synthesis and manipulation DNA, and also for doing DNA->RNA and RNA->DNA copying. Their late appearance suggests that DNA is a latecomer; this is also consistent with various lines of evidence suggesting the former occurrence of a RNA world. DNA, is only present as a master-copy molecule, and not in any other known role.

DNA as a latecomer is supported by the work of Leipe et al, who conclude that much of DNA-replication systems was invented twice, once in Eubacteria, and once in Archaebacteria + Eukarya. They conclude from this that the LUCA had had a combination DNA-RNA genome, which would be replicated by each strand being copied to the opposite kind of nucleic acid (DNA getting RNA, RNA getting DNA).