Chris Lewis at the U of S



Email: Christopher Lewis

CSAGE Source Page

This page contains the source for the CSAGE application, as well as associated Perl scripts to help with analysis. I will add examples and updated documentation as time permits. Please check back from time to time to look for updates. Feel free to contact me with questions or comments.

Please note that the CSAGE project was encorporated into the saskSAGE project and is now maintained via this sourceforge project.


CSAGE

CSAGE is now distributed under the LGPL. As I understand it, this means that you can use CSAGE for whatever you want and your work will remain yours. If you modify/improve the CSAGE source, please let me know so that I can make your changes available to others; you will receive credit for your work.


About CSAGE

CSAGE is a C program to extract sage tags and assign matches to genomics sequence. The mechanism for extracting sage tags is a 5-ary tree with nodes representing A,C,G,T,N, which allows efficient detection of duplicate ditags, and rapid assignment of matches (a tag can be found in the tree in O(m) time, where m is the length of the tag). The functionality is described in more detail in csage_note. Please note that this description is somewhat out of date -- so some of the functionality differs.

For instance:

The CSAGE suite has been used for a large scale cold tolerance experiment in A. thaliana (manuscript in preparation, contact Steve Robinson). As such the majority of the supporting scripts are geared toward analysis of this data. I will make them available so that others can modify them, or develop their own scripts based on them.


Examples

Reading ../temp/temp//4796756.fasta
Reading ../temp/temp//4796757.fasta
Reading ../temp/temp//4796758.fasta
Reading ../temp/temp//4796759.fasta


Tree
Tag (id)        Count   dTags   dDitags
   Matching Genes

<snip>
TTACACAGAC      (     0)        5       0       0       0.000109
TTACACAGCT      (     0)        1       0       0       0.000022
TTACACAGGA      (     0)        1       0       0       0.000022
TTACACCTGT      (     0)        4       0       0       0.000087
TTACACTAAT      (     0)        1       0       0       0.000022
TTACACTCCA      (     0)        1       0       0       0.000022
TTACACTGGA      (     0)        1       0       0       0.000022
TTACACTTAT      (     0)        1       0       0       0.000022
TTACAGACTT      (     0)        1       0       0       0.000022
TTACAGAGCT      (     0)        2       0       0       0.000043
TTACAGCACA      (     0)        1       0       0       0.000022
TTACAGGCTT      (     0)        1       0       0       0.000022
TTACAGTAGA      (     0)        1       0       0       0.000022
TTACAGTCTT      (     0)        2       0       0       0.000043
TTACAGTGAT      (     0)        1       0       0       0.000022
TTACAGTGTT      (     0)        1       0       0       0.000022
TTACATAAGC      (     0)        1       0       0       0.000022
TTACCAAATC      (     0)        1       0       0       0.000022
TTACCATACC      (     0)        1       0       0       0.000022
TTACCATATC      (     0)        53      0       7       0.001151
TTACCATATG      (     0)        1       0       0       0.000022
TTACCATTGC      (     0)        1       0       0       0.000022
TTACCATTGG      (     0)        4       0       0       0.000087
TTACCCACAC      (     0)        1       0       0       0.000022
TTACCCACTC      (     0)        1       0       0       0.000022
TTACCCATAC      (     0)        1       0       0       0.000022
TTACCCCACA      (     0)        1       0       0       0.000022
TTACCGATAA      (     0)        1       0       0       0.000022
TTACCGTACA      (     0)        1       0       0       0.000022
TTACCTACTT      (     0)        1       0       0       0.000022
TTACCTCAAG      (     0)        1       0       0       0.000022
TTACCTCACC      (     0)        1       0       0       0.000022
TTACCTCCTT      (     0)        30      0       4       0.000652
TTACCTCGTT      (     0)        1       0       0       0.000022
TTACCTCTTC      (     0)        2       0       0       0.000043
TTACCTCTTT      (     0)        1       0       0       0.000022
TTACCTGTAA      (     0)        1       0       0       0.000022
TTACCTTACC      (     0)        4       0       0       0.000087
TTACGAATGA      (     0)        1       0       0       0.000022
TTACGAGGAA      (     0)        3       0       0       0.000065
TTACGATGAA      (     0)        2       0       0       0.000043
TTACGGGATC      (     0)        1       0       0       0.000022
TTACTAAATG      (     0)        11      0       0       0.000239
TTACTAAGGG      (     0)        1       0       0       0.000022
<snip>

                  Ditags:  28111
          Invalid Ditags:  1709
        Duplicate Ditags:  3308
      Low Quality Ditags:  0
        Ditags Processed:  23094

 Sequence Tags Processed:  46188
    Tags in Exclude File:  0

  Sequence Tags Excluded:
         In Exclude File:  0
            Too many N's:  142
          Duplicate Tags:  0

     Valid Sequence Tags:  46046
                With N's:  0
                  Unique:  18622

        Gene Tags Parsed:  0
                 Matches:  0

System time used: 0(s) 110000(us)
User time used: 1(s) 90000(us)

Description




Getting the Software

The software is not available online. Send me an email and I'll send it to you. This is so I can keep track of who is using the software.

A Work in Progress