Genes vary in complexity. In humans, they range in size from a few hundred DNA bases to more than 2 million bases. Different living things have different shapes and numbers of chromosomes. Humans have 23 pairs of chromosomes, or a total of A donkey has 31 pairs of chromosomes, a hedgehog has 44, and a fruit fly has just 4. DNA is passed from adult organisms to their offspring during reproduction. The building blocks of DNA are called nucleotides.
Nucleotides have three parts: A phosphate group, a sugar group and one of four types of nitrogen bases. A gene consists of a long combination of four different nucleotide bases, or chemicals. There are many possible combinations. Different combinations of the letters ACGT give people different characteristics. Genes carry the codes ACGT.
Each person has thousands of genes. They are like a computer program, and they make the individual what they are. A gene is a tiny section of a long DNA double helix molecule, which consists of a linear sequence of base pairs. A gene is any section along the DNA with instructions encoded that allow a cell to produce a specific product — usually a protein, such as an enzyme — that triggers one precise action.
DNA is the chemical that appears in strands. This is what makes each person unique. DNA is made up of two long-paired strands spiraled into the famous double helix. Each strand contains millions of chemical building blocks called bases.
Genes decide almost everything about a living being. One or more genes can affect a specific trait. Genes affect hundreds of internal and external factors, such as whether a person will get a particular color of eyes or what diseases they may develop. A gene is a basic unit of heredity in a living organism. Genes come from our parents. We may inherit our physical traits and the likelihood of getting certain diseases and conditions from a parent.
Genes contain the data needed to build and maintain cells and pass genetic information to offspring. Each cell contains two sets of chromosomes: One set comes from the mother and the other comes from the father. The male sperm and the female egg carry a single set of 23 chromosomes each, including 22 autosomes plus an X or Y sex chromosome.
A female inherits an X chromosome from each parent, but a male inherits an X chromosome from their mother and a Y chromosome from their father.
It is the largest single research activity ever carried out in modern science. It aims to determine the sequence of the chemical pairs that make up human DNA and to identify and map the 20, to 25, or so genes that make up the human genome. The goal was to sequence 3 billion letters, or base pairs, in the human genome, that make up the complete set of DNA in the human body. By doing this, the scientists hoped to provide researchers with powerful tools, not only to understand the genetic factors in human disease, but also to open the door for new strategies for diagnosis, treatment, and prevention.
The HGP was completed in , and all the data generated is available for free access on the internet. Apart from humans, the HGP also looked at other organisms and animals, such as the fruit fly and E.
Over three billion nucleotide combinations, or combinations of ACGT, have been found in the human genome, or the collection of genetic features that can make up the human body. Mapping the human genome brings scientists closer to developing effective treatments for hundreds of diseases.
The project has fueled the discovery of more than 1, disease genes. This has made it easier for researchers to find a gene that is suspected of causing an inherited disease in a matter of days. If you have any other comments or suggestions, please let us know at comment yourgenome. Can you spare minutes to tell us what you think of this website?
Open survey. In: Facts In the Cell. When the information stored in our DNA is converted into instructions for making proteins or other molecules, it is called gene expression. Gene expression is a tightly regulated process that allows a cell to respond to its changing environment.
There are two key steps involved in making a protein, transcription and translation. This is carried out by an enzyme called RNA polymerase which uses available bases from the nucleus of the cell to form the mRNA. Related Content:. However, there are a few problems with the GLS e. Pocock et al. Tang et al. In this article a one-sided test statistic means that the changes of individual gene expressions in a gene class are all in one direction: either up or down. A two-sided test statistic means that the changes of individual gene expressions in the gene class can be up-regulated for some genes and down-regulated for other genes.
The GSEA generated the null distribution by permuting the samples. The Tian et al. The function scoring analysis generated the null distribution by re-sampling permuting genes.
The Fisher's exact test in the ORA is an analysis of permuting genes where the genes are dichotomized into differentially and non-differentially expressed groups. Goeman et al. Under the null hypothesis, the test statistic is asymptotic normal. All authors, except Tian et al. The P -values were computed, either by a parametric method or by a re-sampling either permuting genes or permuting samples method; and the P -values were used for gene class rankings as well as significance assessment.
Therefore, they did not deal with the issue of different null distribution and different rankings under Q1 and Q2. On the other hand, Tian et al. Their gene class rankings for the enrichment pathways are very different from the rankings by the GSEA, except the most significant pathway. However, the Tian et al. The SDs of the averages are in parentheses. The results are the average over 10 replications. Eight top ranked gene classes are selected from the combination of the five smallest permutation P -value in parentheses from the T ols and GSEA ES statistics under each hypothesis Q1 and Q2.
The gene classes are listed according to the observed T ols statistic with the P -values computed by the normal approximation Columns 3. In this article, we investigate the null distributions of gene classes under Q1 and under Q2 for three one-sided and two two-sided test statistics. In two-sided test, we consider a modified GESA statistic and a sum of the standardized differences statistic. The five statistics are applied to the diabetes dataset and a breast cancer dataset. Both T ols and T lws can be used to test for Q1 or Q2; and both are a one-sided test either up- or down-regulation.
The two statistics asymptotically have a standard normal distribution under Q2. We used the parametric normal approximation and permutation methods to compute the P -values and compared their gene rankings under Q1 and Q2. The ranking order based on permutation P -values may differ from the ranking order based on t -values if test statistics are not identically distributed Tsai and Chen, The original ES is proposed for a down-regulated alternative based on the ordered signal-to-noise ratio SNR.
The ES U is also considered in the investigation. A GCT analysis assigns an overall statistical significance of phenotype differences in gene expression for a gene class. It does not identify which genes in the gene class actually contribute to the phenotype difference. After identifying gene classes that show a difference, the standard univariate test can be used to identify which genes in the gene class are significant.
However, in testing individual genes in a gene class, the total number of the tests in a gene class is m i , instead of m genes in the array. That is, for a given significant gene class identified, a multiple test procedure such as FDR or FWE should be only based on m i comparisons.
In the follow-up analysis, we are interested in which genes are significant in the given significant gene class. A gene may be significant in one gene class, but, insignificant in another. The diabetes dataset consisted of gene classes from 10 genes measured on 17 subjects with normal glucose tolerance and 18 subjects with type 2 diabetes mellitus.
Class sizes range from 1 to We simulated the null distribution of each gene class under Q1 by permuting gene labels and simulated the null distribution of each gene class under Q2 by permuting samples.
The simulation was repeated 10 times to obtain null distributions for each of the gene classes for the six GCT statistics. The means and SDs over the 10 replicates were computed for each of the gene classes.
Figure 1 plots the means for the gene classes versus the class sizes for the six statistics under Q1 and under Q2. Mean y -axis of the null distributions among the gene classes versus class size x -axis under Q1 and under Q2.
The gene classes are ordered from small to large class sizes. The means are the averages of 10 replicates. The upper panel is for Q1 and the lower panel is for Q2. For hypothesis Q1, except T two , the means of the null distributions increase as the class size increases.
For hypothesis Q2, the means of the null distributions for T ols and T lws are fairly constant. The means in T two are somewhat similar, as are the means in ES. The means in ES are rather different. Also, the means for the gene classes in T ols are similar to the corresponding means in T lws under either Q1 or Q2.
For the T av statistic, the mean increases as the class sizes increases under either Q1 or Q2. The gene class of size one has the smallest means ranging 0.
It is apparent that the null distributions for the classes are very different. Since under Q2, the null distribution should not depend on the class size, the T av statistic will not be considered in the further evaluation. Table 1 shows the average of the means and the average of the SDs over the gene classes of the five GCT statistics under Q1 and under Q2, and the correlation coefficients between Q1 and Q2 for the mean and the SD. Both statistics are a summation of quantities which depend on the class size.
When the class sizes are different, the summations would result in large differences. That is, the genes classes have almost the same means and same SDs. Under Q2 the means as well as standard deviations for the null distributions are very different for ES.
For example, the means range from These results are very similar to the results of Tian et al. The means for ES are more similar; the means range from These values are in good agreements with the theoretical results that the null distributions of the two statistics have an asymptotic N 0,1. A deviation from 0, either positive or negative, indicates favoring an alternative hypothesis, either an up- or down-regulated pathway.
Under Q1, the means for T ols and T lws are about 0. The null distributions under Q1 do not appear to have an asymptotic N 0,1. Similarly, a large ES indicates a difference. In the further analysis we will report only the results from the T ols since the null distributions for the two one-sided statistics T ols and T lws are very similar. In each of the five GCT statistics, the distribution of each gene class under Q1 appears to be different from the corresponding distribution under Q2 Fig.
In other words, the null distributions of the gene classes generated by permuting genes under Q1 differ from the null distributions generated by permuting phenotypes under Q2. Figure 2 shows the scatter plots of the means of null distributions for Q1 versus Q2 over the gene classes.
However, these positive correlations could be from an indirect effect of class sizes. Scatter plots of the means of null distributions between Q1 and Q2. Each point represents a gene set. In testing Q1 and Q2, different null distributions could result in different gene rankings. Under Q2, the P -values of the gene classes are calculated under no difference between two phenotypes. If the null distributions of the gene classes are identical, e.
T ols or T lws under Q2,, then these P -values are comparable and can be used for ranking of gene classes as well as identifying differential expression. The P -value computed under Q2 is consistent with the principle of statistical significance testing. The evaluation and analysis of the GCT tests will be primarily based on Q2. Table 2 lists eight gene classes from the one-sided T ols and ES tests under Q1 and under Q2, and the corresponding P -values from the observed T ols using the normal approximation.
These are the eight most significant gene classes selected from the five smallest P -values in each test under each hypothesis. For T ols , the P -values from the parametric and permutation methods are in good agreement.
Despite the low correlation between Q1 and Q2 Table 1 , the rankings under Q1 and under Q2 are similar. For ES, under Q1 the four top gene classes all have large class sizes.
It is worth mentioning that the P -values for gene class 73 are 0. This gene class is up-regulated and ranked second or third by T ols.
The T ols test is designed to identify either up- or down-regulated gene class, while ES here is designed to identify only the down-regulated gene class. The P -values of ES U for an up-regulated alternative are 0.
Table 3 lists eight gene classes from the two-sided ES and T tw o statistic under Q1 and under Q2, and the corresponding ranks from T ols are also shown. Seven of the eight, except for gene class 1, represent the most significant gene classes. For ES , the rankings under Q1 and under Q2 are similar.
For T two , the rankings under Q1 and under Q2 are different. For example, the two larges classes and , which are ranked first and second in ES , are ranked 47 and 9 in T two under Q1 and ranked 70 and 31 under Q2. Eight gene classes are selected from the combination of the five smallest P -value in parentheses from ES and T two under each hypothesis Q1 and Q2. The gene classes are listed according to P -values of ES under Q1. The ranks from the one-sided T ols are shown for comparison.
The top-ranked genes from the two-sided and one-sided tests are different, as expected. Among the two lists Tables 2 and 3 , there are six gene classes in common. Gene class 38, which is ranked second and third by T ols , is not in Table 3.
0コメント