The Genome Sequencing Facility (GSF) at Greehey Children’s Cancer Research Institute in the University of Texas Health Science Center at San Antonio utilizes state-of-the-art genomic platforms to generate high-quality genomic data and provides support with its analysis.
The GSF provides genomic service of Illumina next generation sequencing (both HiSeq and MiSeq platforms) for researchers both inside and outside Institute, University, other academic institutions as well as in the biotechnology and pharmaceutical industries. We welcome opportunities to partner on many kinds of research projects regardless of size, both as scientific collaborations and on a fee-for-service basis. Please contact us to discuss how we can help you with your research.
Zhao Lai, PhD
Director of Genome Sequencing Facility
Greehey Children's Cancer Research Institute - 2.120,
UT Health Science Center at San Antonio
8403 Floyd Curl Dr.
San Antonio, Texas 78229
The primary missions of the GSF are to:
- Provide Greehey Children’s Cancer Research Institute, Cancer Therapy & Research Center and University of Texas Health Science Center faculty access to high throughput next generation sequencing technology and bioinformatics support, with a high-quality service & research and highly competitive and comparable price.
- Support the development of genome-enabled research projects, scientific programs and grant proposals.
- Education and training on NGS focused genomics approach for future generation of scientists.
The Genome Sequencing Facility (GSF) was created in 2011 by Greehey Children’s Cancer Research Institute (Greehey CCRI) in University of Texas Health Science Center at San Antonio (UTHSCSA), upon the approval from the Medical School Dean’s office, with the acquisition of an Illumina HiSeq 2000 system. The GSF was originally as a service facility supported entirely by the Greehey CCRI, with a focused mission for the genomics of pediatric cancer research. Since then, It has rapidly become a heavily utilized and well functioning unit, used by many members of Greehey CCRI, the Cancer Therapy and Research Center (CTRC, an NCI-designated Cancer Center, the only one in South Texas), UTHSCSA and some other researchers in surrounding San Antonio institutions, including the University of Texas at San Antonio, Texas Biomedical Research Institute, Brooke Army Medical Center and U.S. Army Institute of Surgical Research. In addition, enabled by providing the superior sequencing performance and fast project turn-around time, GSF has established a track record with wide user bases from outside researchers such as from Fox Chase Cancer Center, Johns Hopkins Medical Institute, Indiana University Medicine School, University of North Carolina, The Methodist Hospital Research Institute and others.
Due to increasing requests and demands of NGS applications to cancer research, the GSF was recruited as a new CTRC Shared Resource, as Next Generation Sequencing Shared Resources (NGSSR) in UTHSCSA in the year of 2014. From 2015, the GSF has formally acquired Illumina MiSeq sequencer from The Research Core Laboratories managed by the Office of Vice President of Research. MiSeq is an ideal platform for relative small sequencing projects. The major applications with MiSeq are 16S metagenomics, targeted re-sequencing and targeted RNA expression.
In March 2016, The GSF was awarded $600,000 NIH Shared Instrument grant (S10 grant 1S10OD021805-01) to purchase Illumina HiSeq 3000 to upgrade the sequencing platform. Illumina HiSeq 3000/4000(3000 carries one flow cell vs HiSeq 4000 with two flow cell stations) is the latest and most efficient sequencing platform in Illumina NGS market. It provides greater efficiency in terms of high-throughput, faster turn-around-time and cheaper cost. We are very grateful to NIH funding support to continue the GSF’s NGS journey to the new exciting direction.
If you are considering next generation sequencing for your research, please contact Dr. Zhao Lai for consultation on experimental design, pricing and scheduling.
Director: Zhao Lai, PhD
Office: GCCRI 4.100.14
Technical Director: Dawn Garcia, MS
Genomics Laboratory: GCCRI 2.120
Genomics Technologist: Korri Weldon, BS
Genomics Laboratory: GCCRI 2.120
The billing of GSF is managed by GCCRI administrator office.
Accounting: Richard McDougle, MA
Bioinformatics: CBBI (Computational Biology and Bioinformatics Initiative) led by Dr. Yidong Chen. Dr. Chen serves as Faculty Advisor of GSF and he is also in charge of downstream bioinformatics analysis.
Yidong Chen, PhD
Office: GCCRI 4.100.06
Scientific Advisory Committee:
- Dr. Peter Houghton, Director of Greehey Children’s Cancer Research Institute, University of Texas Health Science Center at San Antonio
- Dr. Tim Huang, Professor and Chair of Department of Molecular Medicine, Deputy Director of Cancer Therapy and Research Center, University of Texas Health Science Center at San Antonio
- Dr. Gail Tomlinson, Professor of Department of Pediatric, Division Chief of Hematology-Oncology, Greehey Children’s Cancer Research Institute, University of Texas Health Science Center at San Antonio
The Genome Sequencing Facility (GSF) provides complete project consultation for optimal experimental design and set up. The GSF is also a research group with experience in customizing experiments and developing new protocols that leverage the latest advances in genomic technology.
The GSF performs all protocols necessary for preparing a biological sample to be sequenced on the Illumina HiSeq 3000 system. For next generation sequencing studies, the GSF provides the library preparation and sequencing for the following applications:
- DNA seq (whole genome de novo sequencing, whole exome sequencing, candidate gene re-sequencing, target sequencing, amplicon sequencing)
- RNA seq (Total RNA sequencing, stranded mRNA sequencing, RIP sequencing, CLIP sequencing)
- Small RNA seq
- ChIP seq
- MBDCap DNA seq
- Single cell DNA & RNA seq
For lower density sequencing applications, Illumina MiSeq system is an option. The GSF prepares libraries for sequencing on the Illumina MiSeq system.
- 16S based metagenomics
- Targeted gene re-sequencing
- Targeted gene expression
- Cancer gene panel
Bioinformatics analysis is available through CBBI (Computational Biology and Bioinformatics Initiative) led by
Dr. Yidong Chen.
The Genome Sequencing Facility (GSF) is equipped with the Illumina HiSeq3000 system and Illumina MiSeq system for the high throughput next generation sequencing, and high-performance computer cluster for massive data processing and storage requirements. The genomics laboratory is equipped with specialized equipment needed to create the range of sequencing libraries from genomic DNA, ChIP-DNA, RNA and small RNA samples.
- Illumina HiSeq 3000 Sequencing System
- Illumina MiSeq Sequencer
- Illumina cBot Cluster Generation Station
- Covaris S220 Ultra Sonicator
- Beckman Coulter SPRIworks Fragment Library System I for Illumina
- Agilent 2100 Bioanalyzer
- Advanced Analytical Fragment Analyzer
- Invitrogen Qubit 2.0 Fluorometer
- Eppendorf Plate Centrifuge
- SpeedVac Centrifuge
- Eppendorf Realplex Quantitative PCR
- Thermo Scientific NanoDrop 2000
- Eppendorf Thermo-Cyclers
Illumina cBot Cluster generation system, MiSeq, and HiSeq 3000 sequencing system are highly specialized equipment, only the trained and qualified people can have direct access to the sequencing instrumentation. The GSF manages a Covaris sonicator and Agilent Bioanalyzer which are available for NGS users. These instruments are critical to many NGS processes and are frequently used by the GSF. To balance these demands, we have established user service mechanism to use these instruments: users are trained to use instruments and they are responsible for any reagents/consumables used for the instruments. There is no instrumentation usage charge if they process the samples that will be used for sequencing at GSF. Otherwise, there is associated instrumentation usage charge if they process the samples not related to the GSF.
Next Generation Sequencing
The past decade has witnessed a profound revolution in biology, driven largely by advances in high-throughput sequencing, functional genomics, and computational technologies. The power of high-throughput sequencing, also known as next-generation sequencing (NGS) technologies, is being harnessed by researchers to address an increasingly diverse range of biological questions. The scale and efficiency of NGS that can now be achieved are providing unprecedented progress in areas from the analysis of genomes, such as in which changes in gene copy number, sequence, expression, structure, modification, and interaction, to how proteins interact with nucleic acids, thereby enabling impressive scientific achievements and novel biological applications.
Since first introduced to the market in 2005, NGS technologies and applications have made a tremendous impact on genomic research. Particularly, a great deal of NGS efforts today centers on biomedical research, which is the major research theme of investigators at Greehey CCRI, CTRC and UTHSCSA.
The Genome Sequencing Facility (GSF) provides high-quality and cost-effective next-generation sequencing services. The GSF works on various types of sequencing library preparation and Illumina next generation sequencing. Our users and collaborators include researchers in the Greehey CCRI, CTRC and UTHSCSA community, as well as investigators at other universities and Institutions.
Next generation sequencing (NGS) experiments and projects are still relatively new technologies and applications for many Principal Investigators. It poses great challenges and investments to design the experiment and understand the logic of downstream bioinformatics analysis. Project planning and experimental design are required. For sequencing project the GSF works on, we request that prior discussion takes place between the GSF and the project Investigators submitting the samples. Based on the biological questions you are asking, we will work with you to decide the experiment outline including the number of samples needed (biological replicates and groups), the sample preparation details (DNA-Seq, RNA-Seq, polyA selected or rRNA depleted, enriched procedures or not), the number of reads needed (it affects the pooling scheme), single read or paired end reads needed, and sequencing length (sequencing module choice, 50SR, 75PE or 150PE), and so on. Please contact us if you are interested in working with us.
We QC every sample we receive. We make judgment of samples based on the time point we have QC data. There are different QC steps involved for the different sample types. Generally Agilent Bioanalyzer or agarose gel (for genomic DNA), together with Qubit, is used for QC purpose. We will notify the user if there is any issue regarding the quality and quantity of samples and if samples need to be replaced. The tracking of specific QC metrics is recorded in our Wiki LIMS system.
We have developed extensive tools to monitor sequence quality and accuracy; every sequencing run that is performed by the GSF is subjected to quality control evaluation in the form of a report that includes a review of read output and overall quality metrics including the Q30 score, percentage of undetermined reads, FastQC result, duplicate rate, mappable rate et al. These QC mechanisms allow GSF to maintain the highest level of sequence quality that simplifies subsequent analyses.
Some facts you like to know before getting startedThe factors you want to know about the new sequencer HiSeq 3000
- Illumina HiSeq 3000/4000(3000 carries one flow cell vs HiSeq 4000 with two flow cell stations) is the latest and most efficient sequencing platform in Illumina NGS market. It provides greater efficiency in terms of high-throughput, faster turn-around-time and cheaper cost.
- The enhanced sequencing performance of HiSeq 3000/4000 is enabled by two new technologies: patterned flow cell and kinetic exclusion amplification. These two new technologies were only available on the X Ten sequencers for a single application, the re-sequencing of human genomic samples. In contrast to the random clustering employed previously (HiSeq 2500 and earlier models, including the GSF old sequencer HiSeq 2000), the clusters in HiSeq 3000 are now generated in ordered nano-wells to allow for higher cluster densities and unambiguous cluster identification.
- Illumina HiSeq 3000 flow cell uses an 8-lane flow cell. HiSeq3000 typically generates 300 to 350 million single-reads per lane or 600 to 700 million paired-reads (single read number X 2) per lane.
- Illumina HiSeq3000 carries one flow cell with 8 lanes and generates up to 750Gb data per run. We must have all samples ready for 8 lanes before we can start the sequencing run.
- Illumina HiSeq provides high quality of sequencing data. The sequencing read length from HiSeq3000 is 50bp, 75bp, 100bp, or 150bp. The Q30 (1 error chance in 1,000) is ≥ 85% for reads of 50 bps and ≥ 75% for reads of 150 bps.
- Illumina HiSeq3000 takes about 1 day to finish 50bp single read sequencing run and about 3.5 days to complete 150bp paired-end sequencing run.
Sequencing libraries for HiSeq 3000
- Sequencing library for HiSeq 3000 can be sensitive due to patterned flow cell: 1. Primer dimer % < 0.5%; library insert size needs to < 550bp or total length of library < 670bp. Minor “tails” of longer fragments are still suitable.
- Library fragments can be sequenced from one end (single-reads) or from both ends (paired-ends). Single reads are typically used for re-sequencing, gene expression profiling, ChIP sequencing, MBDCap DNA sequencing, and small RNA sequencing. Paired-end reads are most commonly used for de novo assembly, splicing variants & structural variation identification, and other applications.
- Pooling strategy can significantly improve the sequencing efforts. Samples individually barcoded during the library preparation can be multiplexed on a lane. The number of samples that can be pooled per lane depends on the number of reads per sample needed for the following bioinformatics analysis. For human & mouse & rat samples, the general guideline is 6-8 RNA or ChIP samples and 8 to 24 small RNA samples can be pooled together for one lane sequencing.
- Illumina uses a green laser to sequence G/T and a red laser to sequence A/C. At each cycle at least one of two nucleotides for each color channel needs to be read to ensure proper image registration. It is important to maintain color balance for each base of the index read being sequenced; otherwise index read sequencing could fail due to registration failure. Follow these low plex pooling guidelines, depending on the TruSeq® Sample Prep kit you are using. (Attachment here)
Please read the sample submission guideline before submitting samples. It covers many aspects of sample preparation for NGS project and has become a useful toolkit for sample preparation and submission.
Samples are brought to the GSF (or are sent by an appropriate carrier for outside users) and are logged into Laboratory Information Management System (LIMS). We are currently using Wiki LIMS to keep track of sample receiving, QC, preparation and sequencing. Users are requested to submit the Sample Submission Form with the samples, with hard copy and electronic version when available. Samples then are quality control checked and the samples passing the QC will be moved forward to library production for sequencing purposes.
Sample submissions are initiated via entries in the LIMS database. After receiving samples, they are entered and recorded into LIMS. Every step involving the sample QC, library preparation, library quantification and sequencing is recorded and tracked in the LIMS. Users can also submit library to the GSF for sequencing. Users are requested to submit the Library Submission Form with the library, with hard copy and electronic version when available. Libraries then are quality control checked move forward to sequencing.
Please read through these important notes regarding to the sample preparation:
- Genomic DNA for DNA-Seq:
- Genomic DNA should be provided as high molecular weight DNA. Any sample exposed to phenol or other organic solvents should be run through a Qiagen cleanup column prior to submission to avoid contaminants that may inhibit the activities of enzymes used in the Illumina library preparation protocols.
- DNA samples should be treated with RNase.
- Preferred genomic DNA sample preparation methods include the Qiagen DNeasy kit and CTAB method.
- Please provide agarose gel picture of genomic DNA and readings from Nanodrop, Qubit or PicoGreen.
- For RNA-Seq sample, Trizol combined with Qiagen RNeasy mini kit should work well. Ambion’s mirVana miRNA isolation Kit, Qiagen’s miRNeasy kit, RNeasy kit, Tissue and lipid RNeasy kit, and microarray RNeasy kit are good options too. You will only need to stop at the total RNA isolation point when you use mirVana miRNA isolation Kit or Qiagen’s miRNeasy kit.
- Total RNA samples should be treated with DNase.
- Please provide Bioanalyzer trace file of total RNA and readings from Nanodrop, Qubit or RiboGreen.
- We like to start with total RNA for small RNA–Seq application. We don’t recommend you enrich small RNA molecules from your total RNA extraction.
- For small RNA-Seq sample, Ambion’s mirVana miRNA isolation Kit and Qiagen’s miRNeasy kit is recommended to use for total RNA isolation. You will only need to stop at the total RNA isolation point when you use mirVana miRNA isolation Kit or Qiagen’s miRNeasy kit.
- Please provide bioanalyzer trace file of total RNA and readings from Nanodrop, Qubit or RiboGreen.
- Users will need to think about the choice of antibodies, cell numbers and controls when they plan the experiment.
- Biological replicates are necessary, at least duplicate biological experiments should be done.
- Before ChIP, chromatin must be fragmented into manageable size. The optimal size range of chromatin for ChIP-Seq analysis should be between 150 and 300 bps. DNA fragments in this size range, which are equivalent to mono and dinucleosome chromatin fragments, provide high-resolution analysis of binding sites, and they work well for next-generation sequencing platforms.
- Please provide the agarose gel picture or Bioanalyzer trace file of ChIP-DNA and readings from Qubit or PicoGreen.
Following generation of the sequencing libraries, an aliquot will be run on an Agilent High Sensitivity DNA Bioanalzyer chip or Fragment Analyzer NGS kit to validate quality and size range of the library. The library will also be quantified by Qubit and qPCR to accurately measure the valid DNA template within the library. These steps enable us to load an appropriate quantity of the library on a flowcell to target optimal cluster passed filter rate.
The GSF shipping address is:
Attn: Dawn Garcia
Greehey Children's Cancer Research Institute Room 2.120
University of Texas Health Science Center at San Antonio
8210 Floyd Curl Drive
San Antonio, Texas 78229
The fee schedule was formulated on the basis of operating expenses of the laboratory in addition to comparison with other academic NGS facilities. For some multi-faceted complex investigations, a project charge is agreed upon to cover a specific time period and estimated number of analyses. Fees are reviewed annually and adjusted as necessary, with guidance from the Science Advisory Committee. The GSF offers the significant discount for large sample size projects. The discount rate varies case by case, depending on the sample size. Please contact with Director Dr. Zhao Lai for details.
NGS HiSeq3000 Sequencing/lane
|Sequencing Type||Internal||External||Major Applications|
|50bp Single Read Sequencing
(1 x 50 cycles)
|75bp Paired End Sequencing
(2 x 75 cycles)
|$1850||$1950||RNA-seq, exome seq for DNA varients|
|100bp Paired End Sequencing
(2 x 100 cycles)
|$2260||$2360||RNA-seq for splicing varients, exome seq for DNA varients|
|150bp Paired End Sequencing
(2 x 150 cycles)
|$2550||$2650||Whole genome sequencing|
NGS MiSeq Sequencing/run
|Sequencing Type||Internal||External||Major Applications|
|75bp Paired End Sequencing
(2 x 75 cycles)
|$1000||$1100||Targeted gene re-sequencing, Targeted RNA expression|
|300bp Paired End Sequencing
(2 x300 cycles)
Illumina Library Preparation(Includes Sample QC and Index)
|DNA-Seq (with Nextera XT for small genomes < 5Mb)||$100||$110|
|mRNA-Seq (with PolyA selection)||$160||$170|
|Total RNA-Seq (with rRNA depletion)||$180||$190|
|Small RNA-Seq (with TruSeq small RNA kit)||$200||$210|
|Small RNA-Seq (with NEB or TriLink small RNA kit)||$160||$170|
|Nextera Rapid Exome Capture Enrichment||$200||$220|
|TruSeq Exome Capture Enrichment||$200||$220|
|16S Metagenomics sequencing per reaction||$30||$35|
Sample QC, library QC
|NGS library QC
(for up to 11 samples)
|RNA sample QC
for up to 12 samples for RNA nano chip)
|RNA sample QC
(for up to 11 samples for RNA pico chip)
GCCRI's Computational Biology and Bioinformatics (CBBI) team is designated to support the Genome Sequencing Facility. Our bioinformatics services are highly customizable so we will work with you to analyze NGS data and reports you are looking for.
Bioinformatics in NGS data analysis includes two major areas:
- NGS data quality assurance and initial genome alignment
- Customized NGS Analysis
NGS data quality assurance and initial genome alignment:
We have developed extensive tools to monitor sequence quality and accuracy; every sequencing run that is performed by the GSF is subjected to quality control evaluation in the form of a report that includes a review of read output and overall quality metrics including the Q30 score, percentage of undetermined reads, FastQC result, duplicate rate, mappable rate et al. These mechanisms allow GSF to maintain the highest level of sequence quality that simplifies subsequent analyses.
Customized NGS Analysis
CBBI is capable of analyzing almost all different types of sequencing data generated by Illumina HiSeq 2000 platform. These NGS data include ChIP-Seq, mRNA-Seq, small RNA-Seq, MBDCap-Seq, and exome-cap-Seq. The following is a list of bioinformatics capability examples for common NGS applications:
CBBI’s RNA-Seq services include counts for all known mRNAs, differential expression, heatmap, and other standard RNA-Seq processing. Additionally, we can also provide intron-exon junction sites, non-coding RNA counts, SNPs within transcripts, and other tasks.
The following files will be provided with your whole-transcriptome results:
- Alignment report (total mappable reads, etc)
- Alignment results (.BAM. optional .SAM file)
- Counts file containing the number of reads matching annotated genes
- Differential expression report (optional)
- Functional analysis (optional)
- Non-coding counts report (optional)
- SNP report (optional)
For ChIP-Seq data, besides the sequence alignment to the reference (using Burrows-Wheeler Aligner, or BWA), the CBBI will further analyze your data with tools such as the Model-based Analysis for ChIP-Seq (MACS) and other tools to unveil binding sites within the genome. Users can load the results onto the UCSC browser or IGV to view regions in the context of genome. We will also assist user to use motif identification software such as the Motif-based sequence analysis suite (MEME) to discover common binding motifs.
The example files that will be provided with your ChIP-Seq are:
- Alignment file (.SAM or .BAM file)
- Peaks file (in .BED format)
- Peak annotation file
- Binding peak characteristics (percent in promoter regions, intronic regions, intergenic regions).
The facility should be listed in the "Acknowledgements" section of any publication using data generated in the Genome Sequencing Facility as follows:
- For internal members, "Data was generated in the Genome Sequencing Facility which is supported by UTHSCSA, NIH-NCI P30 CA054174 (CTRC at UTHSCSA) and NIH Shared Instrument grant 1S10OD021805-01 (S10 grant)."
- For all other members, "Data was generated in the Genome Sequencing Facility which is supported by NIH Shared Instrument grant 1S10OD021805-01 (S10 grant)."