I thought I'd put together some simple instructions for anyone who wants to learn how to align sequences and understand what they see.
There are other ways to do this but I thought this is the easiest to begin with. If you choose, you can download sequences and the MUSCLE program to your hard drive and work offline. Genbank does have this program installed and you may find it easier to use theirs or one from one of the other sites.
I work with 2 open Internet Explorer browsers; Opera is not compatible and I'm not sure about Firefox.
-----------------------------------------
As an example, we will compare 2 sequences with the H274Y mutation that confers Tamiflu resistance to one that doesn't. This will help you see where the mutation is if you don't know what position to look for.
We need to look at the NA segment to find this particular mutation.
1. Start with the program found here:
2. This page contains the latest H1N1 sequences at Genbank:
3. At the bottom of the page, there is a small box to click to give access to all the sequences. Open it and scroll down to Nov 20, where we see Genbank noted that Pavia/21 contains that mutation; click on GU216651* and that sequence segment will open.
Edit: For convenience, so you don't have to search through the long list, I made a file for these 3 sequences.
4. Almost at the top of the page, we see "FASTA"; that is the format we want to use. Click it and the segment will reload.
5. We need to cut and paste that information starting with that little ">gi" and continuing all the way to the end of the rows of letters.
6. Now we go to the MUSCLE window and paste that into the box where it says to enter a sequence. Make sure the FASTA box is checked.
7. Go back to the Genbank page and click on A/Omsk/02/2009, which is a couple of sequences down the page from Pava/21. Omsk does not have the mutation.
8. Do the same process with Omsk. Add this one on the next line in the MUSCLE box just like we did the first one. The entries will look strange, there will be one long line and one single letter, one long line, one letter, etc. Do not change this.
9. Scroll down to Oct 13 and there is A/Quebec/147365/2009, open FN434454*, repeat what we did with the other two.
10. After we've entered our 3rd sequence in the MUSCLE box, click on "RUN" (and wait a while if you're on dialup). A window will come up as our job processes and when it's done, we will see "Start Jalview", click it.
11. A widow will open with our segments lined up in different colors and a black bar on the bottom with a scroll bar. Up on the left of the colored bar, mouse over the each of info lines and a small window will pop up with segment info. Hopefully, the 2 with the mutations (Pavia, Quebec) will be the 2 top ones.
12. Slowly move the scroll bar to the right and the single nucleotide mutations will appear. See the white rectangle in the black bar? When we put the mouse over the "C", the position will appear at the bottom left; it should be 218 C. Now, click on the C and a red bar will appear at the top, serving as a visual aid.
13. We know 3 nucleotides make 1 amino acid; so the amino acid change H274Y should be near the 822 position (274 x 3), so we continue to scroll until we reach that point. At position 831, we see that white rectangle in the black bar and a "C" in a white box in the colored bar. Note that the two above letters are TT.
------------------------------------------------
Now, this is where my knowledge pretty much ends; I'm not sure if #14 is 100% correct or why we see the mutation at position 831 instead of 822.
I hope someone with more knowledge will chime in and futher explain how to interpret what we see.
There are other ways to do this but I thought this is the easiest to begin with. If you choose, you can download sequences and the MUSCLE program to your hard drive and work offline. Genbank does have this program installed and you may find it easier to use theirs or one from one of the other sites.
I work with 2 open Internet Explorer browsers; Opera is not compatible and I'm not sure about Firefox.
-----------------------------------------
As an example, we will compare 2 sequences with the H274Y mutation that confers Tamiflu resistance to one that doesn't. This will help you see where the mutation is if you don't know what position to look for.
We need to look at the NA segment to find this particular mutation.
1. Start with the program found here:
2. This page contains the latest H1N1 sequences at Genbank:
3. At the bottom of the page, there is a small box to click to give access to all the sequences. Open it and scroll down to Nov 20, where we see Genbank noted that Pavia/21 contains that mutation; click on GU216651* and that sequence segment will open.
Edit: For convenience, so you don't have to search through the long list, I made a file for these 3 sequences.
4. Almost at the top of the page, we see "FASTA"; that is the format we want to use. Click it and the segment will reload.
5. We need to cut and paste that information starting with that little ">gi" and continuing all the way to the end of the rows of letters.
6. Now we go to the MUSCLE window and paste that into the box where it says to enter a sequence. Make sure the FASTA box is checked.
7. Go back to the Genbank page and click on A/Omsk/02/2009, which is a couple of sequences down the page from Pava/21. Omsk does not have the mutation.
8. Do the same process with Omsk. Add this one on the next line in the MUSCLE box just like we did the first one. The entries will look strange, there will be one long line and one single letter, one long line, one letter, etc. Do not change this.
9. Scroll down to Oct 13 and there is A/Quebec/147365/2009, open FN434454*, repeat what we did with the other two.
10. After we've entered our 3rd sequence in the MUSCLE box, click on "RUN" (and wait a while if you're on dialup). A window will come up as our job processes and when it's done, we will see "Start Jalview", click it.
11. A widow will open with our segments lined up in different colors and a black bar on the bottom with a scroll bar. Up on the left of the colored bar, mouse over the each of info lines and a small window will pop up with segment info. Hopefully, the 2 with the mutations (Pavia, Quebec) will be the 2 top ones.
12. Slowly move the scroll bar to the right and the single nucleotide mutations will appear. See the white rectangle in the black bar? When we put the mouse over the "C", the position will appear at the bottom left; it should be 218 C. Now, click on the C and a red bar will appear at the top, serving as a visual aid.
13. We know 3 nucleotides make 1 amino acid; so the amino acid change H274Y should be near the 822 position (274 x 3), so we continue to scroll until we reach that point. At position 831, we see that white rectangle in the black bar and a "C" in a white box in the colored bar. Note that the two above letters are TT.
------------------------------------------------
Now, this is where my knowledge pretty much ends; I'm not sure if #14 is 100% correct or why we see the mutation at position 831 instead of 822.
I hope someone with more knowledge will chime in and futher explain how to interpret what we see.
Comment