We live at a remarkable time when the complete DNA sequence of many living organisms is being decoded. To facilitate the study of genome-wide information, several groups have developed powerful software tools that make it possible to search, visualize, interpret, and download the genetic instruction set of sequenced species. This sticklebrowser site is based on the genome browser developed by the bioinformatics group at UC-Santa Cruz. For basic information about how the browser works, see Karolchik et al. 2011, or click on the Help menu available in the blue bar at the top of the browser window. The coordinates presented in the sticklebrowser are based on a high quality reference genome assembly constructed from a single freshwater female stickleback from Bear Paw Lake Alaska (Jones et al. 2012b). We have added many custom information tracks at the sticklebrowser mirror site, which are accessible though the buttons that can be toggled on and off beneath the browser window. These custom tracks present a variety of types of information that should be useful to researchers studying stickleback biology and evolution, including over 5.8 million sequence variants found among 21 different stickleback genomes, and a search for regions that underlie repeated adaptation to marine and freshwater environments (Jones et al, 2012b).
You can enter a gene symbol, or keyword, or marker name, or chromosome coordinate position in the search box at the top of the browser. Examples: ectodysplasin, EDA, Stn433, chrIV:12,800,220-12,810,446
You can rapidly search for highly similar sequences in the stickleback genome by clicking on Blat at the top of the browser window, and pasting your favorite sequence into the search box. For queries involving more distantly related sequences, or complex families of genes, you may also want to try initiating a separate search using the BLAST/BLAT option at Ensembl. Click on BLAST/BLAT in the blue bar at top of the Ensembl screen. Paste your sequence into the Query box. Select species: "Gasterosteus aculeauts" Search tool: "BLAST", and Search sensitivity for "distant homologies". You can then retrieve a list of ranked search results in the stickleback genome, including a list of chromosome regions and coordinates that can be entered back into the sticklebrowser (e.g. chrIV:12,800,220-12,810,446).
Ensembl has a very useful computational pipeline for predicting likely genes and gene products in large-scale genome assemblies (Curwen et al. 2004). The predictions for sticklebacks take into account sequence motifs and information from >300,000 sequenced ESTs from a variety of stickleback cDNA libraries (Kingsley et al. 2004). Each of the resulting gene predictions are assigned unique IDs made up of the following letters and numbers: ENS for Ensembl; GAC for Gasterosteus aculeatus: the single letter G or T or P to denote either a predicted gene, transcript or protein; and then a string of numbers unique to each prediction (e.g.: ENSGACT00000026917). The Ensembl designations work as search terms in the sticklebrowser. All Ensembl genes can also be viewed and intersected with other data using the "Ensembl Genes" track found beneath the sticklebrowser window. We find it useful to turn on the "Human Proteins" track along with the "Ensembl Genes" track, which also presents a simple set of gene names based on known human proteins that align to particular regions the stickleback genome.
Stanford microsatellite markers ("Stnxxx") and SNP genotyping chip markers ("rsxxxxxxxxx") have been used in many previous QTL mapping and population genetic experiments (for microsatellites see Peichel et al. 2001; for SNP genotyping chips see Jones et al 2012a). These markers can be searched by name in the search box. They can be visualized in the genome by toggling the "Stn Markers" or "Stanford_GenotypingChips_SNPs" button shown under "Marker Tracks" beneath the sticklebrowser window.
If you want to design new microsatellite markers to tag a specific gene or genomic region, you can visualize promising locations where simple repeat sequences have been in identified in the reference stickleback genome. Toggle the "Simple Repeats" track on by clicking the appropriate button in the "Variation and Repeats" section beneath the genome browser window. DNA primers made to unique sequences flanking di- and tri-nucleotide repeats have a high likelihood of revealing size polymorphism in other individuals and populations (Peichel et al. 2001).
Finally, you can also use sticklebrowser to view the positions of nearly six million single nucleotide polymorphisms (SNPs) predicted from genome-wide re-sequencing data in 21 natural stickleback populations from the Pacific and Atlantic ocean basins (Jones et al. 2012b). Toggle the "Visual Genotype" button shown under "Jones et al. Tracks" beneath the browser window to full view. The 21 visual genotype tracks that appear show the predicted value of these SNPs in the 21 different marine and freshwater individuals that were sequenced. If you are zoomed in far enough in any genomic region, you can hover your pointer above the position of any colored SNP position in the Visual Genotype track, and see the exact genome position of a SNP, as well as the value of the corresponding position in the (R)eference Bear Paw genome, and the (A)lternative base detected in reads from other genomes. Clicking on this track gives the same information and a visual genotype plot of the SNPs identified in your genomic region.
The 5.8M polymorphisms are also represented in a more condensed format in the "Marker Tracks" group. Use the "5.8M SNPs from 21 Genomes" button at the bottom of the page to turn on this track and see the SNPs displayed in bed format. Note that SNPs are only listed if they were supported by at least four different sequence reads from the pooled data of all 21 individuals. Experimental tests show a validation rate of over 80% for SNPs predicted using this criterion (Jones et al. 2012b).
All of the 5,897,368 single nucleotide polymorphisms (SNPs) identified by Jones et al. 2012b are contained in the track called "visual genotype" and a more condensed format track called "5.8m SNPs from 21 genomes". More information about the visual genotype track, populations, and display conventions can be found by clicking the underlined text above the visual genotype browser button in the Jones et al. track controls under the browser window.
The more condensed "5.8m SNPs from 21 genomes" track presents the genomic position of the snp and the alternate alleles, but does not provide information of the snp genotypes in the 21 sequenced fish. this track can be turned on by selecting "full" view from the "5.8m SNPs from 21 genomes" button in the "marker tracks" section at the bottom of the page.
You can use the table function in the browser to download all SNPs in particular regions, as well as the corresponding genotype calls in different populations. Click on the Table Browser heading found under the "Tools" section in the blue banner at the top of the browser window. In the new table browser page that opens up, select group: "Jones et al" and track: "visual genotype", and enter the coordinates of your favorite region: (or choose the entire stickleback genome). Note that by also clicking on the intersection button in the table browser, you can intersect the snp genotype information with many other interesting tracks available in the sticklebrowser. For example, to get a list of the subset of SNPs occurring within predicted genes, start a table search beginning with the "visual genotype" track as described above, then click the intersect button, and then choose group: "genes and gene prediction tracks", track: "Ensembl genes". Additional controls in the table browser make it possible to restrict your search to particular fields within a track, get summary statistics for your search, and choose different formats for exporting the data.
In 2005, Colosimo et al. showed that repeated evolution of armor plate changes in many different freshwater sticklebacks occurred via an ancient shared haplotype at the Ectodysplasin locus, which encodes a key developmental signaling molecule. The repeated use of ancient haplotypes gives rise to a distinctive pattern of allele sharing: all fish that share the same armor phenotypes also share similar distinctive sequences at the key locus controlling the armor trait, a pattern that is dramatically different than the phylogeographic patterns seen at other neutral loci (Colosimo et al. 2005). Jones et al. 2012b looked for this distinctive pattern throughout the genome using two different computational methods. You can visualize the other repeatedly used regions found by SOM/HMM trees or marine-freshwater Cluster Separation Scores (CSS) by turning on the corresponding tracks listed under "Jones et al" controls beneath the browser window. Click on the underlined text above any track button to get a brief summary of the type of information presented in the track, and consult the Jones et al. 2012b paper for detailed information about the SOM/HMM and CSS methods. The default window in the sticklebrowser opens to the region surrounding the prototypical Ectodysplasin gene, and illustrates the characteristic sequence patterns now being used to recover many other loci that also underlying repeated evolution in natural environments.
An Excel table listing key regions recovered jointly by both SOM/HMM and CSS approaches is available as part of the supplementary information in Jones et al. 2012b. These 81 regions can be viewed in bed format by turning on the track "CSS-FDR-0.02_HMM-TreeA_Intersect" in the Jones et al Track Group at the bottom of the page. You can also do your own custom searches and downloads of SOM/HMM and CSS data using the "Tables" function in the sticklebrowser. Click on Tables in the blue banner at the top of the sticklebrowser window. In the new Table Browser window that opens up, select group: "Jones et al" and set track: to any of the particular SOM/HMM tree types or CSS analysis you want to look at in greater detail. You can intersect these tracks with any other tracks in the browser, get summary statistics for your search, and download data in a variety of other formats by choosing the appropriate buttons at the bottom of the Table Browser.
Genome sequencing projects are jigsaw puzzles with millions of pieces. The primary sequence data are short sequence reads made up of strings of A, C, G, and Ts. Many of these sequences overlap with each other, and can be aligned to produce longer continuous strings of sequence, called "contigs". Adjacent "contigs" can be bridged into larger "scaffolds", based on bridging clones whose ends sequences map to different contigs. Finally, the larger "scaffolds" can be roughly positioned on chromosomes by following the inheritance pattern of genetic sequences in families. The high quality stickleback reference assembly has contig and scaffold sizes that are five to ten fold higher than other published fish genomes (Jones et al. 2012b). However, the process of connecting, lining up, and ordering all pieces is still incomplete. Most of the large sequence scaffolds have been tied to previously mapped chromosome linkage groups. In some of these cases, there was only a single informative genetic marker within a scaffold, or no recombination between internal markers within the scaffold in the small genetic mapping cross used. In these cases, the scaffold has been correctly assigned to a particular linkage group, but the orientation of the scaffold within that linkage group was arbitrary and may be changed by further mapping. In addition, many of the smaller scaffolds in the genome assembly did not contain any mapped genetic markers. These scaffolds have been concatenated together into an artificial "Unmapped" chromosome, so that they can still be searched and analyzed in the browser. As sequence and mapping information continues to grow in the future, more and more Unmapped scaffolds will be linked to sequences on other chromosomes and the large scale order and orientation of all scaffolds will continue to improve.
A “genotyping-by-sequencing” approach has recently typed many additional stickleback genomic markers on larger genetic mapping crosses (Glazer et al. 2015). Based on these results, many of the smaller unmapped scaffolds on chromosome “UN” can now be moved to known linkage groups. In addition, many previously mapped scaffolds now have newly resolved orientations within known linkage groups, which sometimes reverses the orientation that had to be assigned arbitrarily in the original stickleback genome assembly. For an updated list of scaffold orders and orientations, and a simple method for translating between genome coordinates in the initial and an improved genome assembly, see the supplemental information from the Glazer et al. study in the Dryad Digital Repository.
Yes!! Sequence reads in highly repetitive regions are difficult to align and assemble. For example, the Pitx1 gene maps near a chromosome end, and the reads in this highly repetitive sub-telomeric region failed to incorporate with the rest of the stickleback assembly (Chan et al. 2010). We think that this problem affects a relatively small number of genes, since more than 97% of cloned RNAs from stickleback tissues (ESTs), DO have corresponding genes in the reference assembly. If you don't find a stickleback ortholog of your favorite gene in the reference assembly, you can try searching against raw reads in the trace archives as well, setting the database to "Gasterosteus aculeatus-WGS".
Although Jones et al. 2012b predict millions of genome wide sequence variants, many additional SNPs have undoubtedly been missed because of the requirement that all predicted SNPs be supported by at least four separate reads in the pooled data from 21 individuals. This criterion helps minimize false positives in high throughput sequencing data, and works well for finding regions of shared divergence between many different marine and freshwater populations. However, the same method will under-recover SNPs that are unique to single individuals or populations.
Finally, any automated annotation of genes and transcripts across an entire genome is likely to have many local prediction errors, including: undetected genes, incorrect start or termination sites, missed exons, concatenation of exons from separate genes into single predictions, poor recovery of small genes or non-coding RNAs, etc. Comparison of automated gene predictions with the actual expressed sequences identified in stickleback tissues or other organisms may help sort out some of these prediction errors (see "Stickleback ESTs" and "Other mRNAs" tracks under mRNA and EST tracks beneath the browser window). However the gold standard for confirming interesting observations will always be actual experiments. The genome browser can serve as a convenient starting point for exploring massive amounts of genetic information about the rapidly evolving stickleback species complex. We hope you will find it useful as you get you started on your own questions and research!
Chan, Y. F., M. E. Marks, F. C. Jones, G. Villarreal, M. D. Shapiro, S. D. Brady, A. M. Southwick, D. M. Absher, J. Grimwood, J. Schmutz, R. M. Myers, D. Petrov, B. Jonsson, D. Schluter, M. A. Bell and D. M. Kingsley (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science, 327: 302-305.
Colosimo, P. F., K. E. Hosemann, S. Balabhadra, G. Villareal , M. Dickson, J. Grimwood, J. Schmutz, R. Myers, D. Schluter and D. M. Kingsley (2005). Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science, 307: 1928-1933.
Curwen, V., E. Eyras, T.D. Andrews, L. Clarke, E. Mongin, S.M. Searle, and M. Clamp (2004). The Ensembl automatic gene annotation system. Genome Res., 1 942-950.
Glazer AM, Killingbeck EE, Mitros T, Rokhsar DS, Miller CT (2015). Genome assembly improvement and mapping convergently evolved skeletal traits in sticklebacks with genotyping-by-sequencing. G3: Genes - Genomes - Genetics, online in advance of print.
Glazer AM, Killingbeck EE, Mitros T, Rokhsar DS, Miller CT (2015). Data from: Genome assembly improvement and mapping convergently evolved skeletal traits in sticklebacks with genotyping-by-sequencing. Dryad Digital Repository.
Jones, F. C., Y. F. Chan, J. Schmutz, J. Grimwood, S. D. Brady, A. M. Southwick, D. M. Absher, R. M. Myers, T. E. Reimchen, B. E. Deagle, D. Schluter and D. M. Kingsley (2012a). A genome-wide SNP genotyping array reveals patterns of global and repeated species pair divergence in sticklebacks. Current Biology, 22:83-90.
Jones, F. C., M. G. Grabherr, Y. F. Chan, P. Russell, E. Mauceli, J. Johnson, R. Swofford, M. Pirun, M. C. Zody, S. White, E. Birney, S. Searle, J. Schmutz, J. Grimwood, M. C. Dickson, R. M. Myers, C. T. Miller, B. R. Summers, A. K. Knecht, S. D. Brady, H. Zhang, A. A. Pollen, T. Howes, C. Amemiya, B. I. G. S. P. a. W. G. A. Team, E. S. Lander, F. Di Palma, K. Lindblad-Toh and D. M. Kingsley (2012b) The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 484:55-61.
Karolchik, D., A.S. Hinrichs, W.J. Kent (2011). The UCSC genome browser. Curr. Protoc. Hum. Genet., Chapter 18: Unit 18.6.
Kingsley, D. M., B. Zhu, K. Osoegawa, P. J. de Jong, J. Schein, M. Marra, C. L. Peichel, C. Amemiya, D. Schluter, S. Balabhadra, B. Friedlander, Y. M. Cha, M. Dickson, J. Grimwood, J. Schmutz, W. S. Talbot and R. M. Myers (2004). New genomic tools for molecular studies of evolutionary change in sticklebacks. Behaviour, 141:1331-1344.
Peichel, C. L., K. S. Nereng, K. A. Ohgi, B. L. Cole, P. F. Colosimo, C. A. Buerkle, D. Schluter and D. M. Kingsley (2001). The genetic architecture of divergence between threespine stickleback species. Nature, 414:901-905.