Example Files

I have set up a separate page with example applets since they take a little while to load.

Simple Example
Color CDT Example
Scatterplot Example
Karyoscope Example
Human FGF Receptor Example
Dendrogram Coloring By Correlation Threshold Example

Click images to enlarge.

Simple Example

This is just a small, simple file that makes a good testcase,

[ cdt ] [ gtr ] [ atr ]

This garish image demonstrates how you can independantly control the background and foreground colors for genes and arrays using BGCOLOR and FGCOLOR rows and columns. In practice, you would want the colors to represent some kind of biological information; here, I just did something random with excel.

. Open the cdt file in excel to see how it works. [ colorTest.cdt ]

I also have an older example set of files which show how to associate colors with genes which I haven't made a screenshot of.

[ DLim_color.cdt ] [ DLim_color.gtr ]

Special bonus: the perl script which I used to do the coloring:
[ color.pl ]
You can use it as a starting point for your own script. A more mature version may one day make it into the helper-scripts package.

Scatterplot Example

scatterplot example [cdt] [gtr]

Open in Java TreeView, click on "Analysis->Make Scatterplot..." then select SPOT for the Y-axis and leave INDEX as the X-axis and click "Go!". You should be confronted with a plot similar to the above. Selecting genes in the dendrogram will cause the corresponding spots in the scatterplot to be selected. In the the image shown, a compelling cluster selected in the dendrogram is revealed to be a spot artifact, since nearly all spots in the cluster come from the same spots in the same location on the array (adjacent spot numbers are located in the same sector on Stanford arrays).

Karyoscope Example

karyoscope example [pcl]

This example of a karyoscope was made using data from Proc Natl Acad Sci USA 2002 Dec 10:99(25):16144-9, "Characteristic Genome Rearrangements in Experimental Evolution of Saccharomyces cerevisiae.". Clearly visible is an amplification of the left arm of chromosome 14, and a deletion of the right arm. What is not obvious from the image is the way that java treeview lets you rapidly zoom in on different parts of the image to find out what genes are present at the breakpoints, and what the enrichment ratios are. There are also a lot of visualization options, including color customization and whether or not to draw lines between the tips of the bars.

To recreate and browse this view on your own computer, download the pcl file above, and BE CAREFUL TO SELECT LINKED when you open it in java treeview . If you do not, it will open in classic mode and you won't be able to make the karyoscope plot. Now, select Analysis->Make Karyoscope. A karyoscope display should now appear.

But how to do this with non-yeast datasets? As detailed in the manual, Java Treeview needs to know the coordinates of every gene id in order to display the ratio in the correct part of the karyoscope. The easiest way to do this is to set up a coordinates file for each set of gene ids. I have included coordinate files for human and yeast stanford arrays, and I am willing to help construct such files for other organisms/arrays.

In the abscence of a coordinates file, java treeview can guess where to put yeast genes using the YORF column. For purposes of illustration, let's use a real coordinates file to view this yeast data. To set coordinate, click on the "Coordinates..." button in the upper left panel of the karyoscope display. Next, click "Load from file...". This should pop up a file dialog; navigate to "Treeview/coordinates" (whereever you installed treeview) and select "YeastCoordinates.pcl". The karyoscope should redraw, as the genes shift around a little. Moving the mouse over the image should now indicate the position of each locus in bases from the centromere; fractional positions can result if the orf has an odd number of bases, as I just averaged start and stop.

Human FGF Receptor example

fgf_receptors example [cdt] [gtr]

Open in Java TreeView, click on Analysis->Alignment.

Overview

The purpose of this example is to demonstrate how to view sequence alignments with java treeview. My purpose was to make a viewer suitable for looking at large alignments of lots of genes. I found the existing tools to be lacking, and realized that it was fairly easy for me to extend java treeview to do what I wanted.

By selecting "Settings->PixelSettings..." you can make the global view fit in the available space, seeing the whole alignment at once, a feat I could never manage with the clustalx gui. Also, you can hold down shift to zoom in on a particualr section of the global alignment without having to scroll around on the x- axis in the zoomview.

How I did it (and you could too)

I chose as an example the human FGF receptors. I took the following steps:

Download a list of human fgf receptors from the KOGS database from the url http://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi?KOG3627
Download the complete human proteome from ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/protein/protein.fa.gz
Use the splitFasta.pl script from the helper-scripts package to extract the fgf receptors with the command
```
~/perl/splitFasta.pl -l fgf_receptors.lst -o fgf_receptors.fasta human_proteome.fa
```
Download ClustalX from ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX and use to align fgf_receptors.fasta
Run the "aln2cdt.pl" script as follows to convert the .aln and .dnd files into .cdt and .gtr :
```
~/perl/aln2cdt.pl  fgf_receptors.aln
```

At the end of this process, you will have an fgr_receptors.cdt and fgf_receptors.gtr file, which I have so conveniently provided above.

Warning

You may note that there are some weird terminal branches in the dendrogram. These have not escaped my notice, and result from negative branch lengths in the fgf_receptors.dnd file produced by clustal. For example, consider the following excerpt, which is responsible for perhaps the most egregious weirdness in the example screenshot:

(
gi|21264361|ref|NP_631947.1|:0.08162,
gi|21264363|ref|NP_006601.2|:-0.05999)
:0.30572)

As a result, the gi|21264363 leaf node is at an earlier time than its parent branch! I'm not sure how to handle these correctly, or if I am. The dendrogram was of secondary importance to me, so I did not pursue it.

Dendrogram Coloring By Correlation Threshold

threshold example [cdt] [gtr] [atr]

(Requires Java Treeview 1.7 or later)

Overview

Java Treeview supports explicit coloring of nodes in the dendrogram. This can be used in a variety of ways, depending upon the ingenuity of the programmer. In this particular example, I have colored in nodes in both the gene and array dendrogram such that subclusters that have correlations higher than 0.7 are colored the same.

NOTE: The figure was made without clustering the experiments, whereas the files were made with experiment clustering. You can't see the array tree coloring in the global view when the genes are selected because that dendrgram becomes entirely selected. There are three ways around this:

The dendrogram in the ArrayZoomView should show the array dendrogram colors
Clicking on a node in the array dendrogram directly will cause the other nodes to become unselected, revealing the coloring
Holding down the Shift key while selecting genes in the global view will cause a range of arrays to be selected, which deselects the dendrogram.

How I did it (and you could too)

The first step is to obtain some data of interest and cluster it using software that creates a cdt, gtr and atr file, such as Cluster 3.0 by M. J. L. de Hoon . In this case, I downloaded the data from Michael Eisen's 1998 PNAS paper "Cluster analysis and display of genome-wide expression patterns". I then reformatted it into PCL format (see File Formats section of User Manual, or Cluster 3.0 user manual) and clustered both genes and arrays with Gavin Sherlock's command line tool, although Cluster 3.0 would work as well. As a shortcut, you could just download the "threshold" files using the links just under the figure above.

Now, we just need to run colorByThreshold.pl, a script from the helper-scripts package (version 0.0.2 or later). If you're just getting started, just put the .gtr and .atr files in the helper-scripts folder and run the following commands:

colorByThreshold.pl -t .7 threshold.gtr 
colorByThreshold.pl -t .7 threshold.atr

Here's an example of what the output should look like:

prompt% ./colorByThreshold.pl -t .7 threshold.gtr 
Loading from threshold.gtr 
Loaded 0
Loaded 1000
Loaded 2000
2465 Total Nodes
Using header 3, 'CORRELATION'
root has -0.606047, looking for subtrees with CORRELATION > .7
Overwriting column 4, NODECOLOR
Found 114 clusters
prompt% ./colorByThreshold.pl -t .7 threshold.atr 
Loading from threshold.atr 
78 Total Nodes
Using header 3, 'CORRELATION'
root has -0.141257, looking for subtrees with CORRELATION > .7
Adding column NODECOLOR
Found 11 clusters
prompt%

In other words, we're running the script "colorByThreshold" with a threshold of .7 (meaning requiring a correlation of .7 or better), on both the gtr and atr file. You can change the correlation value to be anything you think is reasonable, guided perhaps by preliminary browsing of the dendrogram. If you want the tree to be all black again, run with a threshold of 1 or more.

Example Files

Simple Example

Color CDT Example

Scatterplot Example

Karyoscope Example

Human FGF Receptor example

Overview

How I did it (and you could too)

Warning

Dendrogram Coloring By Correlation Threshold

Overview

How I did it (and you could too)