Eukaryotic transcription factors

In this article, I briefly explain the eukaryotic transcription factors and their function.

Transcription

The process of synthesis of RNA from a DNA template is called transcription. It is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. In both prokaryotes and eukaryotes, the DNA→RNA transcription is the primary level at which gene expression is regulated. The stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene. RNA polymerase is the principal enzyme responsible for RNA synthesis. It catalyzes the polymerization of ribonucleoside 5′-triphosphates as directed by a DNA template. Transcription in eukaryotic cells is more complex than that of prokaryotic cells. However, both possess the same fundamental mechanism. The transcription process begins with binding the transcription factors with the enzyme RNA polymerase.

Transcription factors

The specific proteins required for RNA polymerase II to initiate transcription are called transcription factors. Transcription factors are found in all living organisms and the number of transcription factors within an organism increases with genome size. Thus, larger genomes tend to have more transcription factors per gene. Transcription factors contain one or more DNA-binding domains (DBDs), which attach to specific sequences of DNA adjacent to the genes they regulate. It is estimated that about 5% of the genes in the human genome encode transcription factors, specifying the importance of these proteins. Two types of transcription factors have been defined i.e., general transcription factors and gene-specific transcription factors. General transcription factors are involved in transcription from all polymerase II promoters, thus constituting part of the basic transcription machinery.

Eukaryotic transcription factors

Eukaryotic transcription is more complex than prokaryotic transcription. In eukaryotes, RNA polymerase II requires general transcription factors (GTFs) to initiate transcription. Eukaryotic transcription factors contain a variety of structural motifs that interact with specific DNA sequences. Transcription factors (transcriptional activators) have a modular structure consisting of DNA binding and transcription activating domains. DNA-binding domains mediate association with specific regulatory sequences. Activation domains interact with mediator proteins, general transcription factors, and co-activators, thus stimulating transcription. In addition, many transcription factors occur as homo-dimers or hetero-dimers, held together by dimerization domains.

DNA binding domain

A DNA-binding domain (DBD) is an independently folded protein domain. It contains at least one motif that recognizes double- or single-stranded DNA. A DBD can acknowledge a specific DNA sequence (a recognition sequence) or have a general affinity to DNA. Different transcription factors in eukaryotic cells possess DNA binding domains, which are related. The types of DNA binding domains are given below.

The helix-turn-helix domain

Two α-helices joined by a short strand of amino acids comprise a helix-turn-helix domain. It is found in many proteins that regulate gene expression. The helix-turn-helix motif was first recognized in prokaryotic DNA-binding proteins, which includes the E.coli catabolite activator protein (CAP). In these proteins, one helix (helix-3) makes most of the contacts with DNA, while the other helices (helices 1 and 2) lie across the complex to stabilize the interaction (figure 1).

**Figure 1: The helix-turn-helix domain**

During embryonic development in eukaryotes, the helix-turn-helix proteins play a vital role in regulating gene expression. The genes encoding these proteins were first discovered as developmental mutants in Drosophila. One example can be considered as the homeotic mutant of Drosophila called Antennapedia. In this type, legs rather than antennae grow out of the head of the fly.

Ed Lewis in the 1940s, analyzed these mutants. According to his analysis, Drosophila contains nine homeotic genes, each specifies the identity of a different body segment. These genes undergo Molecular cloning and analysis, which showed that they contain conserved sequences of 180 base pairs, called homeoboxes that encode DNA-binding domains (homeodomains) of transcription factors. In the Antennapedia transcription factor of Drosophila, the helix-turn-helix domain consists of four α helices in which helices II and III are placed at right angles and are separated by a characteristic β turn.

Vertebrate homeobox genes are strikingly similar to their Drosophila counterparts in structure and function, illustrating the highly conserved roles of these transcription factors in animal development.

The zinc finger domain

These types of domains were earlier identified in the polymerase III transcription factor TFIIA. However, these are also commonly found among transcription factors that regulate polymerase II promoters, including Sp1. The zinc finger domains contain repeats of cysteine and histidine residues that bind zinc ions and fold into looped structures that bind DNA.

The C₂H₂ zinc finger domain

This domain generally exists in two forms. In eukaryotic transcription factors, the C₂H₂ zinc finger (figure 2) is one of the most common DNA-binding motifs. There is a sequence of repeating units in the DNA-binding domain of transcription factor IIIA, which is required for transcribing 5S rRNA genes by RNA polymerase III. Each repeating unit has the consensus sequence (Tyr/Phe) X Cys X2-4 Cys X3 (Phe/Tyr) X5 Leu X2 His X3-4 His, where X is any amino acid. Each repeating unit binds one zinc ion through the two cysteine (C) and two histidine (H) side chains. Usually, three or more C2H2 zinc fingers are required for DNA binding.

**Figure 2: The C₂H₂ zinc finger motif**

The C₄ zinc finger motif

The steroid hormone receptors regulating gene transcription in response to hormones such as estrogen and testosterone, also contain zinc finger domains. These factors consist of homo or hetero-dimers, in which each monomer contains two C₄ zinc finger (figure 3) motifs. The two motifs fold together into a more complex conformation stabilized by zinc, which binds to DNA by the insertion of one α-helix from each monomer into successive major grooves. C₄ zinc-finger proteins generally contain only two finger units and bind to DNA as homodimers or heterodimers. The C₂H₂ zinc-finger proteins contain three or more repeating finger units and bind as monomers.

The C₆ zinc finger motif

The yeast GAL4 protein has a DNA-binding domain, which exhibits a third type of zinc-finger motif, known as the C₆ zinc finger. Proteins of this class have the consensus sequence Cys-X2-Cys-X6-Cys-X5 – 6-Cys-X2-Cys-X6-Cys. The six cysteines bind two Zn²⁺ ions, folding the region into a compact globular domain. The Gal4 protein binds DNA as a homodimer in which the monomers associate through hydrophobic interactions along one face of their α-helical regions.

Dimerization domains

Two polypeptide chains dimerize to form dimerization domains. The DNA-binding proteins, leucine zipper, and helix-loop-helix proteins contain dimerization domains.

Many DNA-binding eukaryotic proteins possess the basic leucine zipper domain. One part of the domain contains a region that mediates sequence-specific DNA binding properties and the leucine zipper holds together (dimerizes) two DNA binding regions (figure 4). The DNA binding region comprises several basic amino acids such as arginine and lysine.

Leucine zipper

Both eukaryotic and prokaryotic regulatory proteins contain leucine zippers. However, it is a main feature of eukaryotes. The leucine zipper is the dimerization domain of the B-ZIP (basic-region leucine zipper) class of eukaryotic transcription factors. The B-ZIP family of transcription factors consists of a basic region and a hydrophobic region. The basic region interacts with the major groove of a DNA molecule through hydrogen bonding. The hydrophobic is responsible for dimerization. Pauling and Corey, and Crick in 1953 proposed the structure of the leucine zipper as a left-handed parallel dimeric coiled-coil.

The leucine zipper contains four or five leucine residues spaced at intervals of seven amino acids in a region, often present at the C-terminal part of the DNA-binding domain. These leucine residues lie in an α-helical region (figure 4). The regular repeat of these residues forms a hydrophobic surface on one side of the α-helix with a leucine every second turn of the helix. The hydrophobic faces of the α-helices interact with each other to dimerize, and this interaction eventually results in a coiled-coil structure. Leucine zipper regulatory proteins include c-fos and c-jun (the AP1 transcription factor), the important regulators of normal development, as well as myc family members including myc, max, and mdx1.

Helix-loop-helix proteins

There is a similarity in helix-loop-helix proteins and the leucine zipper, except that their dimerization domains are each formed by two helical regions separated by a loop. Hydrophobic residues on one side of the C-terminal α-helix allow dimerization. Generally, transcription factors including this domain are dimeric, each with one helix containing basic amino acid residues that facilitate DNA binding. Usually, one helix is smaller and the loop flexibility allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions (figure 5). This structure is found in the MyoD family of proteins.

**Figure 5: The helix-loop-helix domain**

MyoD gene regulates gene expression in cell determination, and commands cells to form muscle. MyoD protein has been shown to activate muscle-specific gene expression directly. Four genes such as myoD, myogenin, myf5, and mrf4 have been shown to have the ability to convert fibroblasts into muscle. The encoded proteins are all members of the helix-loop-helix transcription factor family.

Both leucine zipper and helix-loop-helix proteins play important roles in regulating tissue-specific and inducible gene expression. The formation of dimers between different members of these families is a critical aspect of their function.

Transcription activation domain

Transcription factor contains some regions known as transcriptional activation domains (TADs). These regions in conjunction with a DNA binding domain can activate transcription from a promoter by contacting transcriptional machinery (general transcription factors + RNA Polymerase) either directly or through other proteins known as co-activators.

To understand the mechanism of functioning of transcription activation domains, certain transcription factors are used as model proteins. These include GAL4, GCN4, HAP1, etc., in yeast cells, Steroid hormone receptors, heat shock transcription factors, NFKB, etc., in mammalian cells, and viral proteins such as herpes virus activator VP16, HIV TAT, etc.

Transcription factors may have acidic amino acid domains, glutamine-rich domains, or proline-rich activation domains.

Acidic activation domains

The transactivation domains of yeast GCN4 and GAL4, mammalian glucocorticoid receptor, and herpes virus activator VP16 have a very high proportion of acidic amino acids. These have been called acidic activation domains and are characteristic of many transcription activation domains.

Glutamine-rich domains

Transcription factor Sp1 was the first identified factor to possess glutamine-rich domains in its two activation regions. These glutamine-rich motifs are essential for the activation of transcription mediated by these domains as their deletion ceases the ability to activate transcription.

However, transcriptional activation can be restored by substituting the glutamine-rich regions of Sp1 with a glutamine-rich region from the Drosophila homeobox transcription factor Antennapedia, which has no sequence homology to the Sp1 sequence. The activating ability of a glutamine-rich domain like an acidic activation domain is not defined by its primary sequence but rather by its overall nature in being glutamine-rich.

Similar glutamine-rich regions have also been defined in the N-terminal activation domains of the octamer binding proteins Oct-1 and Oct-2, the Drosophila homeobox proteins ultra-bithorax and zest, and the yeast HAP1 and HAP2 transcription factors.

Proline-rich domains

Many transcription factors possess proline-rich domains in their activation regions. Like other activation domains, this region can activate transcription when linked to the DNA binding domains of other transcription factors. As with glutamine, a continuous run of proline residues can mediate activation, indicating that the function of this type of domain depends primarily on its richness in proline.

In the activator CTF-1, a proline-rich domain is identified, which has a domain of 84 amino acids. Among the 84 amino acids, 19 amino acids are prolines. CTF-1 is a member of a class of transcription factors that bind to an extended promoter element called a CCAAT box. The N-terminal domain has been shown to regulate transcription of certain genes. The C-terminal end is a transcription regulator, and is known to bind to histone proteins via the proline repeats. The proline-rich domain is also found in other transcription factors such as the oncogene product Jun, AP2, and the C-terminal activation domain of Oct-2. Thus, unlike the glutamine-rich domains, proline-rich domains are not confined to a single factor, while a single factor such as Oct-2 can contain two activation domains of different types.

Repressor domain in eukaryotes

Transcriptional activators and repressors regulate gene expression in eukaryotic cells. Specific DNA sequences bind repressors and inhibit transcription. Repressors may simply interfere with the binding of other transcription factors to DNA and may inhibit transcription.

Many eukaryotic repressors have two functional domains: a DNA-binding and a repression domain. Like in activation domains, several amino acid sequences can function as repression domains. Many of these are relatively short (≈20 amino acids) and contain high proportions of hydrophobic residues. Other repression domains contain a high proportion of basic residues. Some repression domains are larger, well-structured protein domains. Some repressors contain the same DNA-binding domain as the activator but lack its activation domain. Thus, when they bind to a promoter or enhancer, block the activator binding, thereby inhibiting transcription.

Repressors with specific functional domains

Some repressors contain specific functional domains that inhibit transcription via protein-protein interactions. Molecular analysis of the gene called Kruppel, involved in embryonic development in Drosophila, demonstrated that it contains a discrete repression domain, which is linked to a zinc finger DNA-binding domain. The Kruppel repression domain could be interchanged with distinct DNA-binding domains of other transcription factors. Many active repressors serve as critical regulators of cell growth and differentiation. The repression domain of Kruppel is rich in alanine residues, whereas other repression domains are rich in proline or acidic residues.

Repressors have different functional targets

Repressors have diverse functional targets. They can inhibit transcription by interacting with specific activator proteins with mediator proteins or general transcription factors. They can also inhibit transcription with co-repressors that act by modifying chromatin structure. Geneticists have identified mutations in yeast that result in constitutive expression of certain genes. This indicates that these genes normally are regulated by a repressor. While mutation of an activator-binding site leads to decreased expression of the linked reporter gene, mutation of a repressor-binding site leads to increased expression of a reporter gene.

The protein encoded by the Wilms’ tumor (WT1) gene is a repressor, which is preferentially expressed in the developing kidney. Children who inherit mutations in the maternal and paternal WT1 genes, can’t produce functional WT1 protein. Thus, they invariably develop kidney tumors early in life. The WT1 protein, which has a C₂H₂ zinc-finger DNA-binding domain, binds to the control region of the gene encoding a transcription activator called EGR-1. Like many eukaryotic genes, this gene is subject to repression and activation. The binding of WT1 represses transcription of the EGR-1 gene without inhibiting the binding of the two activators that normally stimulate the expression of this gene.

The regulation of transcription by repressors and activators considerably extends the range of mechanisms that control the expression of eukaryotic genes.

Conclusion

Eukaryotic transcription factors contain a variety of structural motifs that interact with specific DNA sequences. Transcription factors (transcriptional activators) have a modular structure consisting of DNA binding and transcription activating domains. DNA-binding domains mediate association with specific regulatory sequences. Activation domains interact with mediator proteins, general transcription factors, and co-activators, thus stimulating transcription.

A DNA binding domain can recognize a specific DNA sequence (a recognition sequence) or have a general affinity to DNA. The various DNA binding domains are the helix-turn-helix domain, the zinc finger domain, the leucine zipper, and the helix-loop-helix proteins. The helix-turn-helix domain is composed of two α-helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression. The zinc finger domains contain repeats of cysteine and histidine residues that bind zinc ions and fold into looped structures that bind DNA. The DNA-binding proteins, leucine zipper, and helix-loop-helix proteins contain dimerization domains.

Transcription factor contains some regions known as transcriptional activation domains (TADs). These regions in conjunction with a DNA binding domain can activate transcription from a promoter. Transcriptional activation domains may be of different types, such as acidic activation domains, glutamine-rich domains, or proline-rich domains. Transcriptional activators and repressors regulate gene expression in eukaryotic cells. Specific DNA sequences bind repressors and inhibit transcription. Repressors may interfere with the binding of other transcription factors to DNA and may inhibit transcription.

Transcription

Transcription factors

Eukaryotic transcription factors

DNA binding domain

The helix-turn-helix domain

The zinc finger domain

The C2H2 zinc finger domain

The C4 zinc finger motif

The C6 zinc finger motif

Dimerization domains

Leucine zipper

Helix-loop-helix proteins

Transcription activation domain

Acidic activation domains

Glutamine-rich domains

Proline-rich domains

Repressor domain in eukaryotes

Repressors with specific functional domains

Repressors have different functional targets

Conclusion

The C₂H₂ zinc finger domain

The C₄ zinc finger motif

The C₆ zinc finger motif