CisView: Mouse (mm9, Jul 2007)

Field Annotations in Output Tables

All tables are tab-separated text files. Some fields contain comma-separated lists
NIA Mouse Gene Index: lgsun.grc.nia.nih.gov/geneindex/mm9
FirstEF (first exon finder): rulai.cshl.org/tools/FirstEF

File: TSSdata.txt - Coordinates and attributes of TSS

  1. TSS name (R-name)
  2. Chromosome
  3. Strand
  4. TSS (bp)
  5. Transcript (NIA Mouse Gene Index)
  6. Main transcript (1=yes, 0=no)
  7. Type (0=NIA Mouse Gene Index, no match with FirstEF; 1=NIA Mouse Gene Index + match with FirstEF; 2=RefSeq + match with FirstEF; 3=NIA Mouse Gene Index + match with DBTSS; 4=DBTSS, no NIA Mouse Gene Index match)
  8. Quality (1=high, 2=medium, 3=low)
  9. Quality of TATA box (0=no TATA; 1=high TATAWA; 2=medium KATWWW; 3=low KAWWW)
  10. Position of TATA box (-9999 = no TATA)
  11. Position of initiator YYANWYYY (-9999 = no initiator)
  12. CpG island start
  13. CpG island end
  14. CpG island presence (1=yes, 0=no)
  15. RefSeq sequence name (from DBTSS) if matches to DBTSS within 300 bp.

File: CRMcoord.txt - Coordinates and attributes of CRM

  1. Cis-regulatory module (CRM) name
  2. Chromosome
  3. Strand (always +)
  4. Length (bp)
  5. Start position (numbering starts from zero)
  6. Type (0=distal CRM; 1=promoter; 2=3'UTR)
  7. Quality (1=high, 2=medium, 3=low)
  8. U-clusters (NIA Mouse Gene Index), comma-separated list
  9. Distance to TSS of these U-clusters (negative=upstream; 1000000000=no TSS), comma-separated list
  10. Regulatory Potential Score (RPS)
  11. Probability that RPS is greater than in semi-random sequences of the same size generated with 3rd order Markov process (CpG-rich and CpG-poor regions were handled separately).
  12. False discovery rate (FDR) that RPS is greater than in semi-random sequences

File: patmatOutput.txt - Coordinates and attributes of TFBS

  1. Transcription factor binding site (TFBS) name
  2. Chromosome
  3. Strand
  4. Length (bp)
  5. Start position (numbering starts from zero)
  6. mismatch score (from 0 to 0.2) = 1-similarity score
  7. Conservation score (from 0 to 100)

File: freqTFHMCpg.txt, freqTFHMNocpg.txt - Frequency distribution of TFBS in promoters

Header row = center of 25-bp interval within a 2000 bp promoter sequence (-1000 to +1000)
Thus, 1000 bp corresponds to TSS.
Other rows = TFBS name and abundance of TFBS per 1 Mb.

File: peaks.txt - Peaks of TFBS frequency distribution in promoters

  1. Transcription factor binding site (TFBS) name
  2. Palindromic (1=yes, 0=no)
  3. Peak start (relative to TSS)
  4. Peak end (relative to TSS)
  5. Peak width, bp
  6. Type of promoters (CpG-rich or CpG-poor)
  7. z-value of peak maximum compared to background
  8. FDR that peak maximum is higher than the background
  9. Number of positively-oriented TFBS within the peak in CpG-poor promoters
  10. Number of negatively-oriented TFBS within the peak in CpG-poor promoters
  11. Number of positively-oriented TFBS within the peak in CpG-rich promoters
  12. Number of negatively-oriented TFBS within the peak in CpG-rich promoters
  13. Peak/background ratio of positively-oriented TFBS within the peak in CpG-poor promoters
  14. Peak/background ratio of negatively-oriented TFBS within the peak in CpG-poor promoters
  15. Peak/background ratio of positively-oriented TFBS within the peak in CpG-rich promoters
  16. Peak/background ratio of negatively-oriented TFBS within the peak in CpG-rich promoters
  17. Forward/back ratio of TFBS within the peak in CpG-poor promoters (N/A= no applicable for palindromes)
  18. Forward/back ratio of TFBS within the peak in CpG-rich promoters (N/A= no applicable for palindromes)
  19. FDR of TFBS preferred orientation within the peak in CpG-poor promoters (N/S = non significant, FDR<0.05)
  20. FDR of TFBS preferred orientation within the peak in CpG-rich promoters (N/S = non significant, FDR<0.05)

File: distanceSignif.txt - Pairs of TFBS that are significantly over-represented at a specific distance

  1. First transcription factor binding site (TFBS1) name
  2. Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
  3. Distance interval number where the peak is located (starts from 5 with 5bp width)
  4. Mutual orientation (0=+a+b, 1=+b+a, 2=+a-b, 3=-b+a, a=TFBS1, b=TFBS2)
  5. Distance interval start - end
  6. Orientation of both TFBS (a=TFBS1, b=TFBS2)
  7. Peak height
  8. Background
  9. Peak/background ratio
  10. Peak/background ratio in semi-random sequences
  11. Peak/background ratio in repeats
  12. z-value of peak height compared to background
  13. p-value that peak is higher than the background
  14. FDR that peak is higher than the background

File: distanceDistrib.txt - Frequency distribution of TFBS in promoters

Header row = distance between TFBS1 and TFBS2 (center of 5-bp interval)
Other rows = TFBS names, orientation, and the number of TFBS pairs

  1. First transcription factor binding site (TFBS1) name
  2. Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
  3. Mutual orientation (0=+a+b, 1=+b+a, 2=+a-b, 3=-b+a, a=TFBS1, b=TFBS2)

File: GOsummary.txt - Summary of association between TFBS and GO-terms of target genes

  1. Transcription factor binding site (TFBS) name
  2. Gene ontology (GO) ID
  3. Gene ontology annotation
  4. Total no. of genes with this GO term (among those that have a high- or medium-quality TSS)
  5. Location of TFBS (at_TSS= from -200 to 50; before_TSS=from -1000 to -200; after_TSS= from 0 to 500)
  6. Number of genes with this GO term that have TFBS the a given location
  7. Was this TFBS over-represented alone (yes) or only in combination with other TFBS (no)?
  8. Other TFBS with which the given TFBS was over-represented in pairs

File: GOpairs.txt - Pairs of TFBS associated with GO-terms of target genes

  1. First transcription factor binding site (TFBS1) name
  2. Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
  3. Location of TFBS (at_TSS= from -200 to 50; before_TSS=from -1000 to -200; after_TSS= from 0 to 500)
  4. Number of GO terms associated with this pair
  5. Main gene ontology (GO) ID
  6. Main gene ontology annotation
  7. Number of genes with the main GO term that have this pair of TFBS
  8. Total number of genes with the main GO term (among those that have a high- or medium-quality TSS)

File: goTSS.txt, goAfter.txt, goBefore.txt, goDCRM.txt - Details on association between TFBS and GO-terms of target genes

  1. First transcription factor binding site (TFBS1) name
  2. Second transcription factor binding site (TFBS2) name (TFBS2 >= TFBS1)
  3. Single occurrence (1) or multiple occurrence (2)
  4. All TFBS considered (0) or only concerved TFBS with conservation scor>= 0.5 (1)
  5. Gene ontology (GO) ID
  6. Number of genes with this GO term that have this TFBS (or pair of TFBS)
  7. Total no. of genes with this GO term (among those that have a high- or medium-quality TSS)
  8. Number of genes that have this TFBS (or pair of TFBS) (among those that have a high- or medium-quality TSS)
  9. Total no. of genes (among those that have a high- or medium-quality TSS)
  10. Over-representation ratio (=#6*#9/#7/#8)
  11. z-value of over-representation
  12. p-value of over-representation
  13. FDR of over-representation

File: TFmatrix.txt - TFBS position-weight matrixes

Header line for each TFBS:
  1. TFBS name
  2. TFBS pattern
  3. TFBS group
For each position:
  1. Position, starting from zero
  2. Frquency of A
  3. Frquency of C
  4. Frquency of G
  5. Frquency of T
  6. Sum of frquencies
  7. Part of the core pattern (1/0)

File: TFcore.txt - Core patterns derived from TFBS matrixes

  1. TFBS name
  2. TFBS pattern
  3. TFBS core pattern
  4. Offset of the core pattern

File: TFpattern.txt - TFBS patterns

  1. TFBS name
  2. TFBS pattern
  3. TFBS group
  4. Used as a matrix because of high information content (1/0)