Field Annotations in Output Tables
All tables are tab-separated text files. Some fields contain comma-separated lists
NIA Mouse Gene Index: lgsun.grc.nia.nih.gov/geneindex/mm6
FirstEF (first exon finder): rulai.cshl.org/tools/FirstEF
File: TSSdata.txt - Coordinates and attributes of TSS
- TSS name (R-name)
- Chromosome
- Strand
- TSS (bp)
- Transcript (NIA Mouse Gene Index)
- Main transcript (1=yes, 0=no)
- Type
(0=NIA Mouse Gene Index, no match with FirstEF;
1=NIA Mouse Gene Index + match with FirstEF;
2=RefSeq + match with FirstEF;
3=NIA Mouse Gene Index + match with DBTSS;
4=DBTSS, no NIA Mouse Gene Index match)
- Quality (1=high, 2=medium, 3=low)
- Quality of TATA box (0=no TATA; 1=high TATAWA; 2=medium KATWWW; 3=low KAWWW)
- Position of TATA box (-9999 = no TATA)
- Position of initiator YYANWYYY (-9999 = no initiator)
- CpG island start
- CpG island end
- CpG island presence (1=yes, 0=no)
- RefSeq sequence name (from DBTSS) if matches to DBTSS within 300 bp.
File: CRMcoord.txt - Coordinates and attributes of CRM
- Cis-regulatory module (CRM) name
- Chromosome
- Strand (always +)
- Length (bp)
- Start position (numbering starts from zero)
- Type (0=distal CRM; 1=promoter; 2=3'UTR)
- Quality (1=high, 2=medium, 3=low)
- U-clusters (NIA Mouse Gene Index), comma-separated list
- Distance to TSS of these U-clusters (negative=upstream; 1000000000=no TSS), comma-separated list
- Regulatory Potential Score (RPS)
- Probability that RPS is greater than in semi-random sequences of the same size generated with 3rd order Markov process
(CpG-rich and CpG-poor regions were handled separately).
- False discovery rate (FDR) that RPS is greater than in semi-random sequences
File: patmatOutput.txt - Coordinates and attributes of TFBS
- Transcription factor binding site (TFBS) name
- Chromosome
- Strand
- Length (bp)
- Start position (numbering starts from zero)
- mismatch score (from 0 to 0.2) = 1-similarity score
- Conservation score (from 0 to 100)
File: freqTFHMCpg.txt, freqTFHMNocpg.txt - Frequency distribution of TFBS in promoters
Header row = center of 25-bp interval within a 2000 bp promoter sequence (-1000 to +1000)
Thus, 1000 bp corresponds to TSS.
Other rows = TFBS name and abundance of TFBS per 1 Mb.
File: peaks.txt - Peaks of TFBS frequency distribution in promoters
- Transcription factor binding site (TFBS) name
- Palindromic (1=yes, 0=no)
- Peak start (relative to TSS)
- Peak end (relative to TSS)
- Peak width, bp
- Type of promoters (CpG-rich or CpG-poor)
- z-value of peak maximum compared to background
- FDR that peak maximum is higher than the background
- Number of positively-oriented TFBS within the peak in CpG-poor promoters
- Number of negatively-oriented TFBS within the peak in CpG-poor promoters
- Number of positively-oriented TFBS within the peak in CpG-rich promoters
- Number of negatively-oriented TFBS within the peak in CpG-rich promoters
- Peak/background ratio of positively-oriented TFBS within the peak in CpG-poor promoters
- Peak/background ratio of negatively-oriented TFBS within the peak in CpG-poor promoters
- Peak/background ratio of positively-oriented TFBS within the peak in CpG-rich promoters
- Peak/background ratio of negatively-oriented TFBS within the peak in CpG-rich promoters
- Forward/back ratio of TFBS within the peak in CpG-poor promoters (N/A= no applicable for palindromes)
- Forward/back ratio of TFBS within the peak in CpG-rich promoters (N/A= no applicable for palindromes)
- FDR of TFBS preferred orientation within the peak in CpG-poor promoters (N/S = non significant, FDR<0.05)
- FDR of TFBS preferred orientation within the peak in CpG-rich promoters (N/S = non significant, FDR<0.05)
File: distanceSignif.txt - Pairs of TFBS that are significantly over-represented at a specific distance
- First transcription factor binding site (TFBS1) name
- Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
- Distance interval number where the peak is located (starts from 5 with 5bp width)
- Mutual orientation (0=+a+b, 1=+b+a, 2=+a-b, 3=-b+a, a=TFBS1, b=TFBS2)
- Distance interval start - end
- Orientation of both TFBS (a=TFBS1, b=TFBS2)
- Peak height
- Background
- Peak/background ratio
- Peak/background ratio in semi-random sequences
- Peak/background ratio in repeats
- z-value of peak height compared to background
- p-value that peak is higher than the background
- FDR that peak is higher than the background
File: distanceDistrib.txt - Frequency distribution of TFBS in promoters
Header row = distance between TFBS1 and TFBS2 (center of 5-bp interval)
Other rows = TFBS names, orientation, and the number of TFBS pairs
- First transcription factor binding site (TFBS1) name
- Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
- Mutual orientation (0=+a+b, 1=+b+a, 2=+a-b, 3=-b+a, a=TFBS1, b=TFBS2)
File: GOsummary.txt - Summary of association between TFBS and GO-terms of target genes
- Transcription factor binding site (TFBS) name
- Gene ontology (GO) ID
- Gene ontology annotation
- Total no. of genes with this GO term (among those that have a high- or medium-quality TSS)
- Location of TFBS (at_TSS= from -200 to 50; before_TSS=from -1000 to -200; after_TSS= from 0 to 500)
- Number of genes with this GO term that have TFBS the a given location
- Was this TFBS over-represented alone (yes) or only in combination with other TFBS (no)?
- Other TFBS with which the given TFBS was over-represented in pairs
File: GOpairs.txt - Pairs of TFBS associated with GO-terms of target genes
- First transcription factor binding site (TFBS1) name
- Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
- Location of TFBS (at_TSS= from -200 to 50; before_TSS=from -1000 to -200; after_TSS= from 0 to 500)
- Number of GO terms associated with this pair
- Main gene ontology (GO) ID
- Main gene ontology annotation
- Number of genes with the main GO term that have this pair of TFBS
- Total number of genes with the main GO term (among those that have a high- or medium-quality TSS)
File: goTSS.txt, goAfter.txt, goBefore.txt, goDCRM.txt - Details on association between TFBS and GO-terms of target genes
- First transcription factor binding site (TFBS1) name
- Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
- Single occurrence (1) or multiple occurrence (2)
- All TFBS considered (0) or only concerved TFBS with conservation scor>= 0.5 (1)
- Gene ontology (GO) ID
- Number of genes with this GO term that have this TFBS (or pair of TFBS)
- Total no. of genes with this GO term (among those that have a high- or medium-quality TSS)
- Number of genes that have this TFBS (or pair of TFBS) (among those that have a high- or medium-quality TSS)
- Total no. of genes (among those that have a high- or medium-quality TSS)
- Over-representation ratio (=#6*#9/#7/#8)
- z-value of over-representation
- p-value of over-representation
- FDR of over-representation
File: TFmatrix.txt - TFBS position-weight matrixes
Header line for each TFBS:
- TFBS name
- TFBS pattern
- TFBS group
For each position:
- Position, starting from zero
- Frquency of A
- Frquency of C
- Frquency of G
- Frquency of T
- Sum of frquencies
- Part of the core pattern (1/0)
File: TFcore.txt - Core patterns derived from TFBS matrixes
- TFBS name
- TFBS pattern
- TFBS core pattern
- Offset of the core pattern
File: TFpattern.txt - TFBS patterns
- TFBS name
- TFBS pattern
- TFBS group
- Used as a matrix because of high information content (1/0)