BioInfo 4U

Steve's Quick Links

Steve's Home page

Steve's SGCEP Courses: Introductory Biology I and Biology II, and Environmental Science

Steve's VSU nonmajors introductory General Biology courses: I and II

Steve's VSU graduate level Molecular Phylogenetics course

Steve's FSU Introduction to Bioinformatics Laboratory course

Steve's FSU Comparative Genomics Outline

Steve's FSU GCG Workshops

Florida State University Department of Scientific Computing

High Performance Computing at Florida State University

Florida State University Biology Department

Guide to Online Higher Education in the United States

Florida State University bioinformatics workshop series:

SeqLab and GCG* version 11 Sequence Analysis

(sponsored by the School of Computational Science [SCS])

My regular GCG workshops will NO longer be offered. This decision, which came from SCS, is due to several factors:

  • One, the old FSU GCG server,, has outlived its usefullness - post-warranty hardware failures are prohibitively expensive and its security was compromised - it has been retired by SCS.
  • Two, Accelrys, which owns GCG, has decided that GCG is not making the company enough money - they will be concentrating on their drug-design software - and they will be retiring the product in June 2008. They are allowing institutions with valid licenses, such as FSU, to continue using the package indefinitely, but operating system (OS) upgrades will eventually render GCG incompatible with the OS. Here's a copy of the letter they sent all active license managers.
  • Three, FSU and SCS have acquired a new High Performance Computing (HPC) cluster that went online September 11, 2007. I have installed and configured GCG and many public-domain bioinformatics softwares, databases, and appropriate interfaces on this cluster. This transition to HPC will require bioinformatics users to apply for new accounts, (See "How to Connect" for details). All necessary user data from the old Mendel installation has been migrated to the new system.

I will develop and offer a new workshop to introduce the HPC system's bioinformatics resources to the FSU life-sciences community after it has been thoroughly tested. In spite of the decision to no longer offer the GCG workshops, you may still access my old workshop tutorials and data below. They should still operate once you have gotten your new HPC account, though some minor adjustments may have to be made.

Author and Instructor: Steven M. Thompson, SCS

(All workshop tutorials available as pdf files through the indicated links below, and all required data files avalable in my Data_Files directory -- you're welcome to download them and give 'em a try. Contact me for help. Logon X-server access to institutional GCG server required to actually perform tutorials.)

Workshop #1 and its appendices

A Brief Introduction to Multiple Sequence Analysis through GCG's SeqLab Interface.

The SeqLab Graphical User Interface (GUI) is a 'front-end' to the Wisconsin Sequence Analysis Package. It provides an intuitive alternative to the command line by allowing menu-driven access to over 100 different programs and is a great way to develop, refine, and analyze multiple sequence alignments. So what's so great about a multiple sequence alignment? They are:

  • very useful in the development of PCR primers and hybridization probes;
  • great for producing annotated, publication quality, graphics and illustrations;
  • invaluable in addressing structure/function questions through inference by homology;
  • essential for building sensitive "Profiles" for remote homology similarity searching; and
  • required for molecular evolutionary phylogenetic inference programs such as those from PAUP* (Phylogenetic Analysis Using Parsimony [and other methods]) and PHYLIP (PHYLogeny Inference Package).

This introductory workshop will illustrate many of SeqLab's multitude of features, just the 'tip-of-the-iceberg.' Additionally, a short review of the basics -- Internet bioinformatics and its limitations, connecting to and using UNIX, file transfer, text editors, and sequence formatting -- will be presented.

Workshop #2

Computational Methods for Rational Oligonucleotide PCR Primer and Hybridization Probe Design and Analysis: Two SeqLab Scenarios; neither, Your Ordinary Primer Design.

The Polymerase Chain Reaction has revolutionized modern molecular biology. From Jurassic Park scenarios in popular novels and the identification of unculturable organisms from extreme environments, to everyday research in countless laboratories across the world and cutting-edge forensic pathology techniques, PCR is being used to analyze tinier concentrations of DNA than ever before imagined possible. But 'ya' gotta have primers. These two scenarios go way beyond your ordinary primer design notions:

  • A complicated case where the target DNA is unknown and the sequences are somewhat difficult to align -- the "guessmer" -- useful for discovering genes in organisms where they have not yet been identified when the gene's encoded protein sequence is known in several other, related organisms. Here the example is the prion gene in primates.
  • a simpler case where the DNA sequences are known and easily aligned -- Human Papilloma Virus major capsid protein L1 -- strain differentiation versus 'universal' primers.

Workshop #3

DataBase Searching and Pairwise Comparisons: What's available, the methods, algorithms, and programs, and ascertaining similarity significance.

The dynamic programming algorithm, symbol comparison tables, dot-matrix analysis, motifs, hashing techniques, and heuristics are all covered in this workshop. But what do database searches tell us and what can we gain from them; why even bother? Given the nucleotide or amino acid sequence of a biological molecule, what can we know about that molecule? We can find biologically relevant information in it by searching for particular patterns that reflect some function of the molecule such as catalogued motifs. But what about comparisons with other sequences? Can we learn about one molecule by comparing it to another? Yes, naturally we can; inference through homology is fundamental to all the biological sciences.

By comparing the conserved portions of sequence amongst a set, all of the sensitivity and power of the computational techniques is magnified. But what sort of a comparison is significant and what level of significance implies homology? This tough question is a major focus of the workshop.

Workshop #4

Molecular Evolution: The rationale, methodology, and interpretation of molecular phylogenetic inference software.

"Nothing in biology makes sense except in the light of evolution" (Dobzhansky, 1973). The proper use and interpretation of computational tools for the inference of molecular phylogenies is an extremely complicated subject. Perhaps more blatant errors are reported as truth in the literature in this field than in any other aspect of molecular biology. This workshop will attempt to familiarize you with the basics of most methods: distance based, maximum parsimony, and maximum likelihood algorithms. The premise is, you know how to create and refine an appropriate multiple sequence alignment. Without this, all methods are guaranteed to fail! Primary emphasis will be on PAUP* as implemented within GCG's SeqLab, but PHYLIP will be reviewed as well.


I'm available for individual personal consultation or instruction wthrough my e-mail address and I will happily e-correspond:

* GCG is the Genetics Computer Group, producer of the Wisconsin Package for sequence analysis, a previous division of Accelrys, Inc.

© 2013 Steven M. Thompson, acknowledgements and thanks to the Florida State University Biology Department for generously extending Web hosting and e-mail services beyond my FSU tenure.
fsu seal