|
|
An Introduction to Bioinformatics -- Laboratory Links
|
|
|
 |
Introduction to Bioinformatics -- Lab
Syllabus
Spring 2008 Laboratory Section: Wednesdays from 3:35 to 5:45
PM in Dirac 152.
Course lectures include some demonstration of biocomputing techniques
with local and remote servers. However, in our experience we have seen
that learning occurs much more readily when using real data with actual
biocomputing software. Students apply theory learned in lecture to
experimental settings yielding an advanced understanding of evolution,
form, and function.
Steve Thompson is available to assist students in using their own
laboratory and the Dirac or Conradi Computing Lab computers for GCG server
access, and to help with their term projects throughout the semester.
The order of the labs roughly follows the order of lectures in the
course. Exceptions are required to maintain the project-like progress of
the labs, with each tutorial building on the previous.
Lab Reports are to be completed online using the provided form each
week, and are due anytime before the subsequent week's lab
session.
- Lab 1, Wed. Jan. 9, 2008:
An introduction to the
computing platforms on which the course is taught
(pdf)
(Lab
Report #1).
- This includes background information on computers in general,
all forms of remote computing, text editing, basics of the UNIX operating
system, and the X environment, as well as requesting your new FSU HPC
account.
- Lab 2, Wed. Jan. 16, 2008:
Molecular databases and how they are organized and accessed
(pdf)
(Lab
Report #2).
- Internet sequence and structural databases as well
a brief introduction to the Wisconsin Package (aka Genetics Computer
Group or GCG) and its graphical user interface (GUI) SeqLab and the
on-site GCG sequence databases will be reviewed. Access methods such as
those available on the WWW, including NCBI's Entrez, and those available
locally, GCG's LookUp, will be emphasized but data entry and format
conversion are also covered.
- Lab 3, Wed. Jan. 23, 2008:
Unknown DNA -- rational probe design and analysis --
the "guessmer"
(pdf)
(Lab
Report #3).
- How to design and analyze oligonucleotide primers for
discovering genes in organisms where they have not been identified when
the gene's encoded protein sequence is known in other organisms.
Techniques used include basic multiple sequence alignment, consensus
creation, back translation, and primer discovery and evaluation.
- Lab 4, Wed. Jan. 30, 2008:
DNA fragment contig assembly (GCG's SeqMerge)
and restriction enzyme mapping
(pdf)
(Lab
Report #4).
- How to get sequencing fragment data from an automated sequencer into
the computer and assembled into a contiguous sequence (contig) using
GCG's SeqMerge, and then how to perform restriction enzyme mapping and
compositional analysis on that contig for subcloning and other
purposes.
- Open Lab, Wed. Feb. 6, 2008.
- I'm helping to teach a workshop at the CDC this week.
See you next week.
- Lab 5, Wed. Feb. 13, 2008:
Database similarity searching and the dynamic programming algorithm
(pdf)
(Lab
Report #5).
- What's available, the methods and algorithms, their
limitations, and the significance of their finds. You should never search
DNA against DNA, if dealing with coding sequences -- six frame 'blind'
translation. Searching methodology -- motifs, substitution matrices,
hashing and heuristics, homology versus similarity, dot matrix analysis,
pair-wise comparisons, and significance testing.
- Lab 6, Wed. Feb. 20, 2008:
Gene finding strategies. How are coding sequences
recognized in genomic DNA
(pdf)
(Lab
Report #6)?
- Searching by signal versus searching by content, i.e.
transcriptional/translational regulatory sites and exon/intron splice
sites, versus 'nonrandomness,' codon usage; and homology inference.
Understanding the concepts and limitations of the methods and
differentiating between the approaches.
- Lab 7, Wed. Feb. 27, 2008:
Multiple sequence alignment, expectation
maximization, profiles, and Markov models
(pdf)
(Lab
Report #7).
- Lab covers: 1) using MEME to discover hidden motifs; 2) running
the progressive, pairwise alignment program PileUp with the SeqLab editor
to develop and refine a multiple sequence alignment, and contrasting that
with the SeaView/MAFFT pair;
3) understanding traditional Gribskov profiles and using HMM
profiles for remote similarity searching and further alignment;
4) visualization and annotation techniques for multiple sequence
alignments.
- Lab 8, Wed. Mar. 5, 2008:
Molecular evolutionary phylogenetic inference
(pdf)
(Lab
Report #8).
- How to use PAUP* (Phylogenetic Analysis Using Parsimony [and
Other Methods], PHYLIP (PHYLogeny Inference Package), and other tools to
ascertain and draw phylogenetic trees from multiple sequence alignment
datasets. Emphasis is placed on the reliability, congruence, and accuracy
of model-based approaches, especially using Maximum Likelihood
methods, though time limits restrict the lab to quicker methods.
- Spring
Break! Wed. Mar. 12, 2008
- Lab 9, Wed. Mar. 19, 2008:
Estimating protein secondary structure and physical attributes
(pdf)
(Lab
Report #9).
- The various methods, their usefulness, and their limitations
are all covered. This includes proteolytic digestion mapping, molecular
weight and amino acid composition determination, isoelectric point
estimation, hydrophobicity and hydrophobic moment determinations, surface
probability and antigenicity mapping, and secondary structure prediction,
particularly using methods based on homology inference (e.g.
PredictProtein,
http://cubic.bioc.columbia.edu/predictprotein/, in North America).
- Lab 10, Wed. Mar. 26, 2008:
Molecular modelling and visualization
(pdf)
(Lab
Report #10).
- Homology modelling combines sequence analysis and molecular
modelling to predict three-dimensional structure. Students pick a
homologue of their chosen protein that has not had its structure yet
solved and use the SwissModel WWW resource
(http://www.expasy.org/swissmod/SWISS-MODEL.html)
to model the molecule. The theoretical structure is then visualized with
RasMol (http://www.openrasmol.org/)
and Swiss PDB View (http://au.expasy.org/spdbv/)
to gain insight into the way in which its structure relates to its
function. Color coding different physical attributes such as residue
charge, hydrophobicity, and secondary structure elements, different
representation models, such as alpha-carbon traces, and super-positioning
of the model with an actual structure all assist in the
interpretation.
After students have had their introduction to basic UNIX concepts,
utility operations, editing procedures, and molecular databases within the
first couple weeks, they decide on a protein of current interest from a
list of molecules for which complete structural coordinates are known.
They then perform all of the laboratory computer exercises upon that
particular molecule. This way they are able to gain experience in all
aspects of biocomputing in the course in a project-oriented fashion using
the same natural progression as would be used in an actual experimental
setting.
Resultant predictive data derived from sequence analysis will no doubt
conflict with aspects of the known structural data, but elements of truth
will also be found. In this way the strengths and weaknesses of each
approach can be better understood and a greater empathy can be found for
the tremendous problems encountered in the all-too-common case of a newly
sequenced gene product without any structural information available.
With this approach to computerized molecular biology, students will
"come full swing" gaining appreciation for the full
biocomputing spectrum available.
This structured exercise tutorial sequence lasts for the first two
thirds of the semester, ten weeks. After the laboratory tutorial portion
of the course has completed, students then devote scheduled lab sessions
to working on their individual research projects. Students should begin
dialogue with their instructors regarding their project topic early on in
the semester, and will be required to submit a one page project proposal
no later than March 5. Students are encouraged to choose term
projects related to their academic research. This helps to insure
excellence by providing a vested interest.
|
|