Term Project Part 2 - Database Searches

Getting started

Your next task is to perform a search of the Ribosomal Database Project with your sequence(s). This will give you a good idea of what kind of organisms your sequences might come from.

Logging-in to the RDP web site

The URL of the RDP web site is: http://rdp.cme.msu.edu/

Click on the web address above to go there. This link will open the RDP web page in a new browser window, so you can go back-and-forth between the RDP site and these directions.

Loading your sequence into the RDP

  1. On the RDP web site, click on the link for "myRDP login" on the upper right-hand corner of the page. This takes you to a myRDP login page. You don't need an account to use this, however; just click on the "Test Drive" button. Now you're on the myRDP overview page. This page lists all of the public user data.
  2. Click on the "Upload" button (in the second row of buttons, near the top on the right). On the upload page, use these settings:
    • Choose gene...: Bacterial 16S rRNA (you aren't likely to have an archaeal sequence)
    • Assign group name : enter "MB452" followed by your initials, e.g. "MB452 JWB"
    • Project : enter your full name
    • Choose file to load : click this botton, & navigate to wherever you saved the unknown.txt data file for your sequence.
  3. Now click "upload". If there's a problem with your sequence, it'll let you know & return you to the Upload page. If it looks OK, it'll tell you there's 1 sequence in the file & ask if you want to load it - click "Continue".
  4. If you have more than one good sequence, repeat this process with these sequences as well.
  5. Your sequences should now appear at the top on the myRDP Overview page list. While you're doing other things, it will align your sequence(s) to the database; when it's done, the "1" will move from the "pending" column to the "A" (aligned) column.
  6. Click on the grey "+" box in front of your sequence listing(s). They should now be red "-" boxes. This adds your sequences to your working list

Sequence Match

Sequence Match is used to identify the most similar sequences in the RDP to yours.

  1. Now, click on the link to "SeqMatch" in the menubar at the top of the page.
  2. Scroll to the bottom of the page, & use the following settings:
    • Strains : Type - to show only defined species in the results
    • Source : Isolates - to exclude sequences from uncultivated organisms.
    • Size : >1200 - so that only nearly full-length sequences are included.
    • Quality : Good - to exclude potentially poor sequence data.
    • Taxonomy : Nomemclatural - this uses a nice consistent taxonomy developed by the RDP
    • KNN matches : 20 - so the best 20 sequence matches will be shown.
  3. Click on the "Do Seqmatch with Selected Sequence" button and wait for the results - usually less than a minute.
  4. Look at the "Hierarchy View" - this gives you the taxonomy (lineage) of the sequence(s) as the RDP sees it; Domain, Phylum, &c, &c, down usually to the genus (depending on how closely related your sequence is to something in the database).
  5. Click "View Selectable Matches" to see the details. There will be a list of the 20 best matches in the database, and the similarty of these sequences (S_ab) is shown in orange (the similarity score in purple might not be calculated). S_ab is a complex similarity score, but 2 identical sequences will have a score of 1.0, and the closer the score is to 1.0, the more similar the sequences are.
    • If none of the matches are very close (less than 0.5), try going back and changing the "Size" setting from the default of ">1200" to "Both" to allow the program to search the shorter sequences in the database. If you can't get any matches above S_ab=0.5, please see me about it ASAP.
  6. Once you have an informative lineage, print out this page.
  7. Look through the resulting sequence list and find the best match (highest S_ab or similarity scores), and click on the number in front of it (it should look something like "S000463918") to pull up it's sequence record. Print out this page. If you have ties, print them all out.
  8. If you're going to do the "Classifier" next, skip to the next section. If you're going to go ahead and generate trees right away, you should select all of the 20 best metches shown. If you have more than one with the same genus and species name, select the one that's the closest match to your sequence and left the rest unselected. Make sure to "Save selections and return to summary".


Classifier is used to estimate the taxonomy of your sequence.

  1. Click the link to "Classifier" in the menubar at the top of the page.
  2. Make sure the "Shoose a gene" menu is set to "16S rRNA training set 10".
  3. Click the "Do Classification with Selected Sequences" button and wait for the results - it should just be a few seconds.
  4. Look at the "Hierarchy View" - this gives you the taxonomy (lineage) of the sequence according to this analysis. This should look a lot like the results for the "Sequence Match".
  5. Change the "Confidence Threshold" to it's lowest level - 50% - and see if this changes the result (it usually doesn't).
  6. Print out this page.

Critical reminder!

Remember that what you have identified is the closest relative of your isolate whose 16S rRNA sequence is available in the RDP. You have not identified your isolate unless it is a perfect match - and even then you can't be sure!