#******************************************************* # 2005, The National Institute on Aging (NIA/NIH). #******************************************************* This software is provided "AS IS". NIA makes no warranties, express or implied, including no representation or warranty with respect to the performance of the software and derivatives or their safety, effectiveness, or commercial viability. NIA does not warrant the merchantability or fitness of the software and derivatives for any particular purpose, or that they may be exploited without infringing the copyrights, patent rights or property rights of others. NIA shall not be liable for any claim, demand or action for any loss, harm, illness or other damage or injury arising from access to or use of the software or associated information, including without limitation any direct, indirect, incidental, exemplary, special or consequential damages. This software program may not be sold, leased, transferred, exported or otherwise disclaimed to anyone, in whole or in part, without the prior written consent of NIA. Programmer: Alexei Sharov (sharoval@mail.nih.gov) National Institute on Aging, Genetics Lab. The software was not sufficiently tested. Thus, problems may arize in the case of misconfiguration or missing components. If you are familiar with Perl you may try to fix the problem yourself or contact Alexei Sharov at sharoval@grc.nia.nih.gov. Please indicate the error/warning message in your e-mail. 1. GENERAL DESCRIPTION The CisView software is available for download from http://lgsun.grc.nia.nih.gov/cisview. It is designed to work within the NIA Mouse Gene Index (http://lgsun.grc.nia.nih.gov/geneindex/mm6). Thus, if you want to run CisView on your server, you need first to download necessary data files from our server or to assemble the gene index using gene index software (available in the avove mentioned web site). The CisView software has a modular structure. Each module can be executed independently of other components. All code is written in the Perl language, except the "togif" module that is used to generate image files to be displayed on the web. MAIN PROGRAM: regulindex.pl CONFIGURATION FILE: geneindex.cfg (it is parsed by regulindex.pl) The program works in the same directory that you have used to assemble the gene index. This directory has the following subdirectories that are used for the gene index. Geneindex (location of all Perl programs and togif.exe program) |-data |-output |-archive |-CpG To assemble CisView, you need to expand the subdirectory tree as follows: Geneindex |-data |-output |-CRM |-promoters |-refseq |-regions |-archive |-CpG The output is generated to the web server. Thus, you need to make a directory for the web home page. In the geneindex.cfg file it is named as "www/geneindex". In this directory you need to make the following tree of subdirectories: geneindex (home directory) |-bin (for cgi scripts, togif program) |-U |-T |-truncated |-exons |-images |-download |-lists |-regul |-regData |-TFBS 2. ADDITIONAL SOFTWARE togif.exe (included) Program togif.exe draws images for the web interface. It should be present in 2 places: together with perl scripts and in the "bin" directory for the web page. 3 compiled versions are provided for Windows, UNIX, and LINUX. If you have UNIX OS, rename the file "togif.UNIX.exe" as "togif.exe" in both locations. 3. INPUT DATA The following input data files are needed: List A: output generated by the gene index software: (1) output/T-members1.txt (2) output/T-major.txt (3) output/T-repeat.txt (4) output/U-annotation.txt (5) output/U-genes.txt (6) output/T-psl.txt (7) output/Torf-param.txt (8) data/T-ontology.txt (9) data/T-domain.txt (10) data/go-hierarchy.txt (11) data/go-annotation.txt (12) data/promoters.txt Files 1-9 are available for download at http://lgsun.grc.nia.nih.gov/geneindex/mm6/download.html If needed, files 10-11 can be generated using program parse_go.pl using syntax: parse_go.pl gene_ontology.obo data/T-ontology.txt data/go-annotation.txt data/go-hierarchy.txt where gene_ontology.obo file can be downloaded from the Gene Ontology web page http://www.geneontology.org/ File (12) contains information generated by the First Exon Finder program (http://rulai.cshl.org/tools/FirstEF/) List B: databases mouse genome sequences (fasta) - from UCSC repeat masker output for genome sequences - from UCSC conservation score for genome sequences - from UCSC 4. PROGRAM COMPONENTS abundantPairs.pl identifies abandant pairs of TFBS compilePromData.pl copiles TSS attributes from several files coordinates.pl generates promoter coodrinates from TSS coordinates CRMlink.pl generates look-up tables for TFBS search in CRMs CRMselect.pl selects DCRMS that are within 10kn of TSS detectGaps.pl detects gaps in promoter sequences (to avoid TSS that start after a gap) extract_conservation.pl extracts conservation scores from UCSC data files extract_fasta.pl extracts sequences from Fasta file according to a list of genes or other criteria extract_genome_seq.pl generates fasta sequences for given coordinates extractFromTable.pl extracts lines from table based on specified conditions extract_lines.pl extracts lines from BLAT output file according to a list of genes or other criteria extractPromoters.pl extracts lines with TSS attributes depending on these attributes extractTable.pl extracts lines from a table findPeaks.pl finds peaks in frequency distribution of TFBS in promoters findTSS.pl finds potential TSS based on several data sources generateDistPages.pl generates web pages with distance distribution between TFBS generateGOpages.pl generates web pages that specify association of TFBS with GO-annotations of target genes generateTFpages.pl generates web pages for TFBS and a table with patterns and matrixes graph_multi.pl plots graphs of TFBS distribution in promoters listAllChr.pl creates a list of all promoters from chromosome-specific lists makeDatabase.pl generates data files for each TSS, which are used for the browser maskCombine.pl combines (overlay) 2 mask files maskCpg.pl masks CpG-rich sequences, and generates a file with CpG positions maskInvert.pl inverts a mask maskORF.pl masks open reading frame (ORF=CDS) and generates coordinates of 3'UTR maskRepeats.pl masks repetative sequences mediumQualityPromoters.pl compiles a list of medium-quality TSS from a set of data files mismatchThresh.pl generates a file with mismatch thresholds for TFBS based on the frequency of TFBS hits in semi-random sequences nucleotFreq.pl estimates transition probabilities for the 3rd order Markov process that generates semi-random nucleotide sequences nucleotRandom.pl generates semi-random nucleotide sequences using 3rd order Markov process parse_go.pl parses the gene ontology (GO) datafile patternLookup.pl finds TFBS matches in a nucleotide sequence promoterBounds.pl identifies boundaries of CpG island at TSS refOntology.pl identifies TFBS (or TFBS pairs) associated with GO annotations of target genes regionDatabase.pl generates intermediate data files for each 60 kb region regulatoryCRM.pl generates lists of TFBS for each CRM regulatoryDistance.pl finds peaks in the distribution od distansces between TFBS regulatoryDistrib.pl generates a frequency distribution of TFBS in promoters regulatoryOntology.pl identifies TFBS (or TFBS pairs) over-represented in sets of genes that correspond to each gene ontology (GO) term regulatoryPattern.pl processes a list of TFBS and combines redundant TFBS (that belong to the same class and located too close) regulatoryRegions.pl identifies cis-regulatory modules (CRM) regulindex.pl main file that calls other programs removeDuplic.pl removes duplicates of TSS that are too close. tata.pl checks if TATA box and intiator are present at TSS