Announcement

Collapse
No announcement yet.

Sequence Analysis Using MUSCLE

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequence Analysis Using MUSCLE

    I thought I'd put together some simple instructions for anyone who wants to learn how to align sequences and understand what they see.

    There are other ways to do this but I thought this is the easiest to begin with. If you choose, you can download sequences and the MUSCLE program to your hard drive and work offline. Genbank does have this program installed and you may find it easier to use theirs or one from one of the other sites.

    I work with 2 open Internet Explorer browsers; Opera is not compatible and I'm not sure about Firefox.

    -----------------------------------------

    As an example, we will compare 2 sequences with the H274Y mutation that confers Tamiflu resistance to one that doesn't. This will help you see where the mutation is if you don't know what position to look for.

    We need to look at the NA segment to find this particular mutation.

    1. Start with the program found here:
    Data resources and analysis tools to support life science research


    2. This page contains the latest H1N1 sequences at Genbank:


    3. At the bottom of the page, there is a small box to click to give access to all the sequences. Open it and scroll down to Nov 20, where we see Genbank noted that Pavia/21 contains that mutation; click on GU216651* and that sequence segment will open.

    Edit: For convenience, so you don't have to search through the long list, I made a file for these 3 sequences.


    4. Almost at the top of the page, we see "FASTA"; that is the format we want to use. Click it and the segment will reload.

    5. We need to cut and paste that information starting with that little ">gi" and continuing all the way to the end of the rows of letters.

    6. Now we go to the MUSCLE window and paste that into the box where it says to enter a sequence. Make sure the FASTA box is checked.

    7. Go back to the Genbank page and click on A/Omsk/02/2009, which is a couple of sequences down the page from Pava/21. Omsk does not have the mutation.

    8. Do the same process with Omsk. Add this one on the next line in the MUSCLE box just like we did the first one. The entries will look strange, there will be one long line and one single letter, one long line, one letter, etc. Do not change this.

    9. Scroll down to Oct 13 and there is A/Quebec/147365/2009, open FN434454*, repeat what we did with the other two.

    10. After we've entered our 3rd sequence in the MUSCLE box, click on "RUN" (and wait a while if you're on dialup). A window will come up as our job processes and when it's done, we will see "Start Jalview", click it.

    11. A widow will open with our segments lined up in different colors and a black bar on the bottom with a scroll bar. Up on the left of the colored bar, mouse over the each of info lines and a small window will pop up with segment info. Hopefully, the 2 with the mutations (Pavia, Quebec) will be the 2 top ones.

    12. Slowly move the scroll bar to the right and the single nucleotide mutations will appear. See the white rectangle in the black bar? When we put the mouse over the "C", the position will appear at the bottom left; it should be 218 C. Now, click on the C and a red bar will appear at the top, serving as a visual aid.

    13. We know 3 nucleotides make 1 amino acid; so the amino acid change H274Y should be near the 822 position (274 x 3), so we continue to scroll until we reach that point. At position 831, we see that white rectangle in the black bar and a "C" in a white box in the colored bar. Note that the two above letters are TT.

    ------------------------------------------------

    Now, this is where my knowledge pretty much ends; I'm not sure if #14 is 100% correct or why we see the mutation at position 831 instead of 822.

    I hope someone with more knowledge will chime in and futher explain how to interpret what we see.
    The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918

  • #2
    Re: Sequence Analysis Using MUSCLE

    I just tried muscle, but it ran out of memory and seems a little slow.
    I had been using kalign.exe and MAFFT online.

    For simple flu-alignments (no insertions,deletions, no different
    HAs or NAs or NSs) I have a self-written simple program
    which is very fast and needs little memory.
    So, e.g. I can align 7000 PB2s in 2min.
    I'm interested in expert panflu damage estimates
    my current links: http://bit.ly/hFI7H ILI-charts: http://bit.ly/CcRgT

    Comment


    • #3
      Re: Sequence Analysis Using MUSCLE

      the nucleotide-sequences start with ~50 nucleotides
      which are not decoded to amino-acids.
      Often only parts of these 50 are given or none
      The first occurrance of "ATG" is usually the first decoded amino-acid
      (Methionine,Met,M)

      also niman-H274Y is H275Y in N1
      and D225G is D239G in H1
      I'm interested in expert panflu damage estimates
      my current links: http://bit.ly/hFI7H ILI-charts: http://bit.ly/CcRgT

      Comment


      • #4
        Re: Sequence Analysis Using MUSCLE

        OK! The example I'm showing has the first "atg" for all three starting at position 9.

        "We see the mutation at position 831 instead of 822;" so if I subtract 8, I'm at position 823... so I'm still 1 off? Niman's 275 makes it worse.

        When I've seen the nucleotides and amino acids aligned for comparison purposes, the M is under the T.. so when I count, is the M position considered #1 or #2?

        BTW, thank you for all your hours and patience.
        The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918

        Comment


        • #5
          Re: Sequence Analysis Using MUSCLE

          C823T(6,n)=H275Y(NA) CAC-->TAC

          this is for starting to count at the coding region (=first amino acid, ATG), which I think is unusual

          for ******:
          S224P in PA is T670C(3)
          M582L in PA is A1741C(3)
          S91P in HA is T298C(4)
          S206T in HA is T658A(4)
          V323I in HA is G1012A(4)
          V100I in NP is G298A(5)
          T373I in NP is C1118T(5)
          V106I in NA is G316A(6)
          N247D in NA is A742G(6)




          all the 3 can mutate, see list below

          mutations at position 3 in a codon (3 consecutive nucleotides)
          are usually synonymous
          (don't change the encoded amino acid)



          Alanine,Ala,A,4,GCT,GCC,GCA,GCG
          Arginine,Arg,R,6,CGT,CGC,CGA,CGG,AGA,AGG
          Asparagine,Asn,N,2,AAT,AAC
          AsparticAcid,Asp,D,2,GAT,GAC
          Cysteine,Cys,C,2,TGT,TGC
          GlutamicAcid,Glu,E,2,GAA,GAG
          Glutamine,Gln,Q,2,CAA,CAG
          Glycine,Gly,G,4,GGT,GGC,GGA,GGG
          Histidine,His,H,2,CAT,CAC
          Isoleucine,Ile,I,3,ATT,ATC,ATA
          Leucine,Leu,L,6,TTA,TTG,CTT,CTC,CTA,CTG
          Lysine,Lys,K,2,AAA,AAG
          Methionine,Met,M,1,ATG
          Phenylalanine,Phe,F,2,TTT,TTC
          Proline,Pro,P,4,CCT,CCC,CCA,CCG
          Serine,Ser,S,6,TCT,TCC,TCA,TCG,AGT,AGC
          Threonine,Thr,T,4,ACT,ACC,ACA,ACG
          Tryptophan,Trp,W,1,TGG
          Tyrosine,Tyr,Y,2,TAT,TAC
          Valine,Val,V,4,GTT,GTC,GTA,GTG
          STOP,Sto,},3,TAG,TGA,TAA



          hydrophobic:GAVLIMFWP
          hydrophilic:STCYNQ,DE,KRH
          I'm interested in expert panflu damage estimates
          my current links: http://bit.ly/hFI7H ILI-charts: http://bit.ly/CcRgT

          Comment


          • #6
            Re: Sequence Analysis Using MUSCLE

            Thank you very much, we all want to learn.

            225G Preliminary Worldwide Tracking & Evaluation




            Norway - H1N1 "Mutation" Announced by Health Department




            Wales: Tamiflu-resistant swine flu spreads 'between patients'




            Tamiflu-resistant cluster in N. Carolina





            FluTrackers Swine Flu Genetic Forum

            http://www.flutrackers.com/forum/for...lay.php?f=1527



            Comment


            • #7
              Re: Sequence Analysis Using MUSCLE

              Thanks, great instructions!

              Comment


              • #8
                Re: Sequence Analysis Using MUSCLE

                Converting bare sequences to FASTA format

                1. Get bare sequence
                2. paste in to the Readseq - biosequence conversion tool http://www.ebi.ac.uk/cgi-bin/readseq.cgi
                3.Select PeasonFasta,
                4 Select view in browser (or download to file ) click submit
                5. Copy paste contents into MUSCLE http://www.ebi.ac.uk/Tools/muscle/

                Comment


                • #9
                  Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine

                  We both used the same data, why is the numbering different?

                  Comment


                  • #10
                    Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine

                    Originally posted by Sally View Post
                    Comparing both sequences against the vaccine. A/California/07/2009(H1N1) accession number FJ969540
                    Why would there be an R (2 of them) on the A/California/07/2009(H1N1) at about positions 715 and 718 ?

                    Comment


                    • #11
                      Stop codons: the Good, the Bad, and the Ugly

                      Comes with video
                      http://www.mcb.arizona.edu/courses/m.../XLateTut.html


                      Stop codons: the Good, the Bad, and the Ugly

                      Stop codons are a normal part of protein synthesis--they're the reason that all proteins don't go on 'forever'. Given a translation machinery that simply puts one foot in front of the other endlessly, a mechanism must exist for derailing the machine when its work is done. This machinery is the three Stop (or 'nonsense') codons and the proteins that read them. They're encoded by every gene, and are already there when the mRNA is produced--the whole process of translation is the interpretation of a ticker tape by an elegant machine (the ribosome) charged with 'translating' a nucleotide language into an amino acid language.

                      It is not known, at least by me, why there are 3 stop codons and why they are UAA, UAG and UGA (indeed, in some systems, such as some mitochondria, UGA actually specifies Trp instead of stop). But given that there are 64 possible codons and 3 mean 'stop', ON AVERAGE, with all other things being equal (which they never are...) 1 of 20 randomly selected codons says STOP. Similarly, if you're reading in an unanticipated/incorrect reading frame, you're in essence reading random codons, so will ON AVERAGE get about 20 amino acids before being stopped out. That's not very far!

                      The existence of stop codons needs to permeate your thinking about what is and is not 'fixable'. Sure, a -1 frameshift has the ability to compensate for a +1 frameshift--IF there is no intervening stop codon! Recall the translation tutorial (or review it if you can't recall it...). In the second movie shown, reading in the +1 frame (the result of a single nucleotide insertion) 'uncovered' a stop codon that derailed translation. In the third movie, our hero, in the form of a -1 frameshift (== nucleotide removal) fixed things 'just in time' such that reading frame was restored before the evil stop codon brought the party crashing down. Any mutations FURTHER DOWN (rightward, = the 3' direction) would have availed us naught.

                      Some simple questions to direct you thinking in fruitful ways about the influence of stop codons for good and ill:
                      --How can you pick a region such that you can be reasonably confident that a stop codon occurs in a given reading frame?
                      --if you don't wish to worry your pretty little head about the nasty possibility of stop codons, what locations will you choose to examine for your compensating mutations vis-a-vis the location of the mutation they're meant to fix?
                      --in general, what rules determine where a compensating mutation can occur relative to the mutation being 'fixed' or compensated for (this can be a little tricky, given most of our innate biases about who is the 'problem' and who the 'solution'--recall any frameshift is a drag unless corrected in a timely fashion, and that any solution is a good solution so long as we're still reading and reading in frame when we hit the 'business end' of the rIIb gene!

                      Comment


                      • #12
                        Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine

                        Originally posted by Sally View Post
                        We both used the same data, why is the numbering different?
                        The wonders of the numbering systems is something I have yet to master. I did use some other sequences in my alignment and a program called CLC sequence viewer for my nucleotide alignment. I exported a Custal .aln alignment file which I then loaded into Bioedit (because I am more familiar with it).
                        However if I adjust my aligned sequences so D225G really is at position 225 then the two non-change-changes are N2N (ANADTL) & D475D (HKCDNTC) or in nucleotide terms 6 & 1425.

                        EDIT:
                        oops this must be confusing the hell out of everyone as this is in the wrong thread and relates to Sally and my numbering differences on the Lviv sequences

                        Comment


                        • #13
                          Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine

                          Originally posted by JJackson View Post
                          The wonders of the numbering systems is something I have yet to master. I did use some other sequences in my alignment and a program called CLC sequence viewer for my nucleotide alignment. I exported a Custal .aln alignment file which I then loaded into Bioedit (because I am more familiar with it).
                          However if I adjust my aligned sequences so D225G really is at position 225 then the two non-change-changes are N2N (ANADTL) & D475D (HKCDNTC) or in nucleotide terms 6 & 1425.
                          How did you get these easily. N2N (ANADTL) & D475D (HKCDNTC) . Do you have conversion program?

                          Comment


                          • #14
                            Re: Sequence Analysis Using MUSCLE

                            I use a program call Bioedit. You just hold the Ctrl key down and press G to toggle backwards and forwards between the Protein & Nucleotide sequences.

                            Comment


                            • #15
                              Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine

                              Originally posted by JJackson View Post
                              The wonders of the numbering systems is something I have yet to master. I did use some other sequences in my alignment and a program called CLC sequence viewer for my nucleotide alignment. I exported a Custal .aln alignment file which I then loaded into Bioedit (because I am more familiar with it).
                              However if I adjust my aligned sequences so D225G really is at position 225 then the two non-change-changes are N2N (ANADTL) & D475D (HKCDNTC) or in nucleotide terms 6 & 1425.

                              EDIT:
                              oops this must be confusing the hell out of everyone as this is in the wrong thread and relates to Sally and my numbering differences on the Lviv sequences
                              This is good on this thread because this is the learning to read sequences thread.

                              Comment

                              Working...
                              X