Linear Search Algorithm in Gene Name Batch Viewer


Gene Name Batch Viewer

For given gene list, the viewer is able to quickly list all gene names which is a straight forward feature.  This manual will  mainly focus on Related Genes/Terms Search algorithms provided by this viewer as well.
The Related Gene Searching Algorithm

1.  Introduction
2.  A Hypothetical Example
3.  Options and Results
4.   Kappa Statistics
The Related Annotation Term Searching Algorithm

1.  Introduction
2.  A Hypothetical Example
3.  Options and Results
4.   Kappa Statistics


The Related Gene Searching Algorithm
    1. Introduction

Any given gene is associating with a set of annotation terms. If genes share similar set of those terms, they are most likely involved in similar biological mechanisms. The algorithm adopts kappa statistics to quantitatively measure the degree of the agreement how genes share the similar annotation terms. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. Kappa more than 0.7 typically indicates that agreement of two genes are strong. Kappa values greater than 0.9 are considered excellent.

    2. A Hypothetical Example


Figure: A hypothetical example to detect gene-gene functional relationship by kappa statistics. A. The all-redundant and structured terms are broken into ‘independent’ terms in a flat linear collection. Each gene associates with some of the annotation term collection so that a gene-annotation matrix can be built in a binary format, where 1 represents a positive match for the particular gene-term and 0 represents the unknown. Thus, each gene has a unique profile of annotation terms represented by a combination of 1s and 0s. B. For a particular example of genes a and b, a contingency table was constructed for kappa statistics calculation. The higher kappa score (0.66) indicates that genes a and b are in considerable agreement, more so than by random chance. To flip the table 90 degrees, the kappa score of term-term can be achieved, based on the agreement of common genes (not shown).


    3. Options and Results



Overlap Threshold Option:


The minimum number of terms in common between query gene and candidate gene  for the consideration in the searching algorithm. For most cases, it should be above 3 for the statistical reasons.

Kappa Threshold Option:


The minimum Kappa value for the consideration. The higer of threshold, the stricter of the search. Default is 0.25 and setting range from 0 to 1.


Related Gene Column:


The result of related genes to the query gene.

Agreement (Kappa) Column:


The agreement score calculated by Kappa statistics. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. Kappa more than 0.7 typically indicates that agreement of two genes are strong. Kappa values greater than 0.9 are considered excellent

Evidence Page:

The term numbers of agreement and disagreement between query gene and hit gene. These numbers are used to calculate the agreement score (Kappa or Fisher Exact).

    4. Kappa Statistics

The Kappa Statistic is a chance corrected measure of agreement between two sets of categorized data. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. If Kappa = 1, then there is perfect agreement. If Kappa = 0, then there is no agreement. For further details about Kappa statistics please refer to "A coefficient for agreement of nominal scales" Educational and Psychological Measurement 20: p 37-46.
top

The Related Term Searching Algorithm
     1.  Introduction

Typically, a biological process/term is a cooperationof a set of genes. If two or more biological processes are done by similar set of genes, the processes might be related in the biological network somehow. To identify the related biological processes/terms can help biologists to assemble a bigger biological picture for better understanding biological themes. This algorithm adopts kappa statistics to quantitatively measure the degree of the agreement how terms share the similar participating genes for. After scanning all pairs of given term to other terms, the closely related terms to the given one could be listed and sorted. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. Kappa more than 0.7 typically indicates that agreement of two genes are strong. Kappa values greater than 0.9 are considered excellent

    2. A Hypothetical Example

After reducing participating gene information to its most basic level using a binary mode (1 represents ‘Yes’ and 0 is ‘No’), term A and B share the same participating genes 1, 3, and n, in contrast that term A and C only share gene 3. Obviously, the relationship of term A-B is stronger than that of term A- C.


Raw Data Table:


gene 1
gene 2  
gene 3 
gene n
Term A   1
0
1
1
Term B
1
0
1
1
Term C
0
0
1
0
Term D
1
0
0
1

2x2 contigency tables for both, based on above raw data:


Term A
Term B

    1
     0
   1 
3 genes
0 gene
   0
0 gene
1 gene


Term A
Term C
     1
     0
   1
1 gene
0 gene
   0
2 genes
1 gene

Kappa for Term A-B = 1; Kappa for Term A-C = 0.2; Therefore, the relationship of A-B is much stronger than that of A-C.
    3. Options and Results

Overlap Threshold Option:


The minimum number of genes in common between query term and candidate term for the consideration in the searching algorithm. For most cases, it should be above 3 for the statistical reasons.

Kappa Threshold Option:

The minimum Kappa value for the consideration. The higer of threshold, the stricter of the search. Default is 0.25 and setting range is from 0 to 1.


Related Terms Column:


The result of related terms to the query term.

Similarity(Kappa) Column:


The agreement score calculated by Kappa statistics or Fisher Exact. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. Kappa more than 0.7 typically indicates that agreement of two terms are strong. Kappa values greater than 0.9 are considered excellent

Evidence Page:


The gene numbers of agreement and disagreement between query term and hit term. These numbers are used to calculate the agreement score (Kappa or Fisher Exact).

    4. Kappa Statistics

The Kappa Statistic is a chance corrected measure of agreement between two sets of categorized data. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. If Kappa = 1, then there is perfect agreement. If Kappa = 0, then there is no agreement. For further details about Kappa statistics please refer to "A coefficient for agreement of nominal scales" Educational and Psychological Measurement 20: p 37-46.

top

Last Updated:  Feb. 2005