Bioinformatics 101
Increasingly the term Bioinformatics comes up in applications I come across. I began to look into it more so I can understand the issues a scientist faces in the use of tools and databases to help accomplish their task. Bioinformatics is defined as “the task of organizing and analyzing increasingly complex data resulting from modern molecular and biochemical techniques.”
I spoke with Mia Markey at the University of Texas who gave me some pointers for researching this area. The first area to research is the existence and use of databases. A wealth of information exists in these databases and most are open to researchers around the world at no cost. By using a web link one can search through these databases. The key databases are:
NCBI – sequence data
Stanford Microarray Database – gene expression data
Swiss-Prot – protein sequence database
EMBL-EBI – European Nucleotide Sequence DB
PubMed – scientific papers database
In addition to databases, a number of tools are used such as the following:
BLAST – sequence matcher for alignment applications.
SAGE – provides absolute results unlike microarrays which provide relative results.
FASTA – alignment tool for proteins
PERL – useful for text string searches. See BIOPERL http://bioperl.org/
According to Markey, the number one problem scientific researchers face is that results from one test do not hold up under repeated testing. The number two problem is the need for better visualization tools for all the voluminous data available.
I was interested in what experience a scientist has in using the above mentioned databases so I found an example guide that walked me through the process so I could see what they see. If you are interested in seeing it for yourself, try these steps:
1. Go to the GeneCards at nciarray.nci.nih.gov/cards/index.html
2. Type in the name of your disease in the search window
3. Make note of the gene name and chromosomal location for each gene.
4. Go to the MapViewer at the NCBI Bioinformatics website at www.ncbi.nlm.nih.gov, to visually identify the location of each gene.
5. Choose a gene in which a protein product has been identified. You can check the box titled “proteins” for this information.
6. Click on the Unigene Cluster # or RefSeq# under sequence. The number starts with “NM”. Make note of the gene name and its number.
7. Find the function of the protein in the SwissProt database, http://us.expasy.org/sprot/
8. Find the amino acid sequence for the protein by looking at the Locus Link on the NCBI page. Go to the LocusLink page on the NCBI web site (see step 4) and type the gene name into the search box.
9. Finally, use PubMed, http://www.pubmedcentral.nih.gov/ to look up any published papers on the topic.
As you can see the information is spread among several databases and even a casual search starts to generate tremendous amounts of information that needs integration and analysis to make sense of it.
This blog is not meant to be a complete tutorial on Bioinformatics, but I found it informative to walk through the above steps to get a flavor for the type of data and analysis that is going here.
If you have experience with the Bioinformatics or an interest in this area, please email me at hall.martin@ni.com.
Best regards,
Hall T. Martin
I spoke with Mia Markey at the University of Texas who gave me some pointers for researching this area. The first area to research is the existence and use of databases. A wealth of information exists in these databases and most are open to researchers around the world at no cost. By using a web link one can search through these databases. The key databases are:
NCBI – sequence data
Stanford Microarray Database – gene expression data
Swiss-Prot – protein sequence database
EMBL-EBI – European Nucleotide Sequence DB
PubMed – scientific papers database
In addition to databases, a number of tools are used such as the following:
BLAST – sequence matcher for alignment applications.
SAGE – provides absolute results unlike microarrays which provide relative results.
FASTA – alignment tool for proteins
PERL – useful for text string searches. See BIOPERL http://bioperl.org/
According to Markey, the number one problem scientific researchers face is that results from one test do not hold up under repeated testing. The number two problem is the need for better visualization tools for all the voluminous data available.
I was interested in what experience a scientist has in using the above mentioned databases so I found an example guide that walked me through the process so I could see what they see. If you are interested in seeing it for yourself, try these steps:
1. Go to the GeneCards at nciarray.nci.nih.gov/cards/index.html
2. Type in the name of your disease in the search window
3. Make note of the gene name and chromosomal location for each gene.
4. Go to the MapViewer at the NCBI Bioinformatics website at www.ncbi.nlm.nih.gov, to visually identify the location of each gene.
5. Choose a gene in which a protein product has been identified. You can check the box titled “proteins” for this information.
6. Click on the Unigene Cluster # or RefSeq# under sequence. The number starts with “NM”. Make note of the gene name and its number.
7. Find the function of the protein in the SwissProt database, http://us.expasy.org/sprot/
8. Find the amino acid sequence for the protein by looking at the Locus Link on the NCBI page. Go to the LocusLink page on the NCBI web site (see step 4) and type the gene name into the search box.
9. Finally, use PubMed, http://www.pubmedcentral.nih.gov/ to look up any published papers on the topic.
As you can see the information is spread among several databases and even a casual search starts to generate tremendous amounts of information that needs integration and analysis to make sense of it.
This blog is not meant to be a complete tutorial on Bioinformatics, but I found it informative to walk through the above steps to get a flavor for the type of data and analysis that is going here.
If you have experience with the Bioinformatics or an interest in this area, please email me at hall.martin@ni.com.
Best regards,
Hall T. Martin
2 Comments:
Some Visualization tools that are used in the research and development of Drug Design.
RasMol: Easy to use, freeware and accepts many data formats that can be downloaded easily from Protein Data Bank
http://www.umass.edu/microbio/rasmol/
MOE: A trial version available at
http://www.chemcomp.com/
Very useful tool for computing/visualizing and simulating protein interactions.
Hey, you have a great blog here! I'm definitely going to bookmark you!
I have a land Magnolia texas
land Magnolia texas
news site/blog. It pretty much covers related stuff.
Come and check it out if you get time :-)
Post a Comment
<< Home