| 1. Introduction
2. General Analysis Data Flow
4. View Results in Text Mode
5. View Gene-annotation on 2-D View
6. Introduction of heuristic fuzzy clustering
|Grouping genes based on functional similarity can systematically enhance biological interpretation of large lists of genes derived from high throughput studies. The Functional Classification Tool generates a gene-to-gene similarity matrix based shared functional annotation using over 75,000 terms from 14 functional annotation sources. Our novel clustering algorithms classifies highly related genes into functionally related groups. Tools are provide to further explore each functional gene cluster including listing of the “consensus terms” shared by the genes in the cluster, display of enriched terms, and heat map visualization of gene-to-term relationships. A global view of cluster-to-cluster relationships is provided using a fuzzy heat map visualization. Summary information provided by the Functional Classification Tool is extensively linked to DAVID Functional Annotation Tools and to external databases allowing further detailed exploration of gene and term information. The Functional Classification Tool provides a rapid means to organize large lists of genes into functionally related groups to help unravel the biological content captured by high throughput technologies.|
|2. General Analysis Data Flow|
Clustering Stringency (lowest -> highest): a high level single control to establish a set of detailed parameters involved in functional classification algorithms. In general, higher stringency setting generates less functional groups with more tightly associated genes in each group, so that more genes will be treated as “irrelevant” one into unclustered group. Default setting is Medium, which gives balanced results for most cases based on our studies. Customize allows you to set it any way you want with Advanced options.
Similarity Term Overlap (any value >=0; default = 4): the minimum number of annotation terms overlapped between two genes in order to be qualified for kappa calculation. This parameter is to maintain necessary statistical power to make kappa value more meaningful. The higher value, the more meaningful the result is.
Similarity Threshold (any value between 0 to 1; Default = 0.35): the minimum kappa value to be considered biological significant. The higher setting, the more genes will be put into unclustered group, which lead to higher quality of functional classification result with a fewer groups and a fewer gene members. Kappa value 0.3 starts giving meaningful biology based on our genome-wide distribution study. Anything below 0.3 have great chance to be noise.
Initial Group Members (any value >=2; default = 4): the minimum gene number in a seeding group, which affects the minimum size of each functional group in the final. In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups.
Final Group Members (any value >=2; default = 4): the minimum gene number in one final group after “cleanup” procedure. In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups. It co-functions with previous parameters to control the minimum size of functional groups. If you are interested in functional groups containing only 2 or 3 genes, you need to set it to a very low value. Otherwise, the small group will not be displayed and will be put into the unclustered group.
Multi-linkage Threshold (any value between 0% to 100%; default = 50%): It controls how seeding groups merge each other, i.e. two groups sharing the same gene members over the percentage will become one group. The higher percentage, in general, gives sharper separation i.e. it generates more final functional groups with more tightly associated genes in each group. In addition, changing the parameter does not contribute extra genes into unclustered group.
|4. View Results in Text Mode|
Gene(s) not in the ouput: Any genes in user’s list are NOT mapped to any of the functional groups, i.e. orphan genes or irrelevant genes. The possible reasons are: 1. it does not have relationship with any of other genes above similarity threshold. 2. it has relationship with a few other genes. But they do not have enough members to form a functional group based on minimum final cluster members. 3. False negative. We know our current algorithm could have up to 2% false negative rate. If you believe it happens to your list, please report to us.
Enriched Term in Group (T): It submits the gene members in the group to our functional annotation engine. The result of DAVID chart report tries to highlight the most likely biology associated with the group.
2-D View: It allows user to see gene members and their associated annotation term in a heatmap type of view so that user can further explore the gene-gene and term-term relationships within a group. The terms displayed in the map have to pass the term frequency setting in option session, i.e. 50% of gene associates it as default.
Group Enrichment Score: It ranks the biological significance of gene groups based on overall EASE scores of all enriched annotation terms. In another words, step 1, run user's gene list with DAVID functional annotation chart to get p-value(EASE score) for each enriched annotation terms; step 2, calculate geometric mean of EASE scores of those terms involved in this gene group.
Search Related Genes (RG): It summarizes the common (consensus) annotation term profile of the functional group based on term frequency and ask the question “ which other genes have similar annotation terms profile?”. The function allows user to search within user’s list or defined genomes, e.g. homo sapiens.
2-D View ( ): It allows users to exam the common and difference of annotations cross the group gene members. See 2-D session for more details.
Gene-Annotation Association on 2-D View
Multiple Linkage Clustering
|We developed a novel
heuristic partitioning procedure that allows an object (gene) to
participate in more than one cluster. The use of this method in
grouping related genes better reflects the nature of biology in that a
given gene may be associated with more than one functional group of
genes. Two additional advancements included in this algorithm are: 1)
the automatic determination of the optimal numbers of clusters (K), and
2) the exclusion of members (genes) that have weak relationships to
other members. Users are permitted to
change default parameters to set cluster membership similarity
stringencies. Fuzzy Heuristic Partitioning
of a gene list yields high quality clusters of highly related genes,
with some genes participating in more than one function cluster.
Algorithm:o Fuzzy seeding by allowing each gene to serve as a medoid (# neighbor > 4 && cross relevance > 50%)
Last Edit: Jan. 2007