Announcement

Collapse
No announcement yet.

Rudimentary Gene Sequencing Question(s)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rudimentary Gene Sequencing Question(s)

    I was looking at a recent post by Niman that included some sequences and I'm curious to know, why isn't there a standard structure for these sequences? Is the structure tied to the date? For instance, it seems the more recent sections of this sequence indicicate the type of animal that its from:
    DQ826532 A/mallard/BC/373/2005 2005 H5N2

    However, older onese lack that level of clarity.
    U53162 A/Wisconsin/4754/94 1994 H1N1

    Has anyone written a parser that would place these sequences along a world map? Would that be of any use?

    DQ826532 A/mallard/BC/373/2005 2005 H5N2
    DQ280250 A/swine/Ontario/11112/04 2004 H1N1
    DQ447187 A/swine/Taiwan/CO935/2004 2004 H1N2
    DQ431990 A/swine/Taiwan/CO935/2004 2004 H1N2
    AY129156 A/Swine/Korea/CY02/02 2002 H1N2
    AY233393 A/Duck/NC/91347/01 2001 H1N2
    DQ058215 A/swine/Guangdong/2/01 2001 H1N1
    AF455682 A/Swine/Illinois/100084/01 2001 H1N2
    AF455681 A/Swine/Illinois/100085A/01 2001 H1N2
    AF455679 A/Swine/Iowa/930/01 2001 H1N2
    AY060050 A/Swine/MN/16419/01 2001 H1N2
    AY060048 A/Swine/MN/23124-S/01 2001 H1N2
    AY060047 A/Swine/MN/23124-T/01 2001 H1N2
    AY060051 A/Swine/MN/34893/01 2001 H1N2
    AF455677 A/Swine/North Carolina/93523/01 2001 H1N2
    AF455675 A/Swine/Ohio/891/01 HA (4) 2001 H1N2
    AF455680 A/Swine/Indiana/P12439/00 2000 H1N2
    AF455678 A/Swine/Minnesota/55551/00 2000 H1N2
    AF250124 A/Swine/Indiana/9K035/99 1999 H1N2
    AF222035 A/Swine/Wisconsin/458/98 1998 H1N1
    AF222036 A/Swine/Wisconsin/464/98 1998 H1N1
    AF222026 A/Swine/Wisconsin/125/97 1997 H1N1
    AF222027 A/Swine/Wisconsin/136/97 1997 H1N1
    AF222032 A/Swine/Wisconsin/235/97 1997 H1N1
    AF222033 A/Swine/Wisconsin/238/97 1997 H1N1
    U53162 A/Wisconsin/4754/94 1994 H1N1
    U53163 A/Wisconsin/4755/94 1994 H1N1
    S67220 A/Swine/Nebraska/1/92 1992 H1N1
    L09063 A/Swine/Nebraska/1/92 1992 H1
    L24362 A/Maryland/12/91 1991 H1N1
    U46783 A/Swine/Beijing/47/91 1991 H1N1
    U11857 A/Swine/St-Hyacinthe/106/91 1991 H1
    M81707 A/Swine/Indiana/1726/88 1988 H1N1
    DQ508905 A/Wilson-Smith/1933 1933 H1N1
    J02176 A/WSN/33 1933 H1N1

  • #2
    Re: Rudimentary Gene Sequencing Question(s)

    Originally posted by hawkeye
    I was looking at a recent post by Niman that included some sequences and I'm curious to know, why isn't there a standard structure for these sequences? Is the structure tied to the date? For instance, it seems the more recent sections of this sequence indicicate the type of animal that its from:
    DQ826532 A/mallard/BC/373/2005 2005 H5N2

    However, older onese lack that level of clarity.
    U53162 A/Wisconsin/4754/94 1994 H1N1

    Has anyone written a parser that would place these sequences along a world map? Would that be of any use?

    DQ826532 A/mallard/BC/373/2005 2005 H5N2
    DQ280250 A/swine/Ontario/11112/04 2004 H1N1
    DQ447187 A/swine/Taiwan/CO935/2004 2004 H1N2
    DQ431990 A/swine/Taiwan/CO935/2004 2004 H1N2
    AY129156 A/Swine/Korea/CY02/02 2002 H1N2
    AY233393 A/Duck/NC/91347/01 2001 H1N2
    DQ058215 A/swine/Guangdong/2/01 2001 H1N1
    AF455682 A/Swine/Illinois/100084/01 2001 H1N2
    AF455681 A/Swine/Illinois/100085A/01 2001 H1N2
    AF455679 A/Swine/Iowa/930/01 2001 H1N2
    AY060050 A/Swine/MN/16419/01 2001 H1N2
    AY060048 A/Swine/MN/23124-S/01 2001 H1N2
    AY060047 A/Swine/MN/23124-T/01 2001 H1N2
    AY060051 A/Swine/MN/34893/01 2001 H1N2
    AF455677 A/Swine/North Carolina/93523/01 2001 H1N2
    AF455675 A/Swine/Ohio/891/01 HA (4) 2001 H1N2
    AF455680 A/Swine/Indiana/P12439/00 2000 H1N2
    AF455678 A/Swine/Minnesota/55551/00 2000 H1N2
    AF250124 A/Swine/Indiana/9K035/99 1999 H1N2
    AF222035 A/Swine/Wisconsin/458/98 1998 H1N1
    AF222036 A/Swine/Wisconsin/464/98 1998 H1N1
    AF222026 A/Swine/Wisconsin/125/97 1997 H1N1
    AF222027 A/Swine/Wisconsin/136/97 1997 H1N1
    AF222032 A/Swine/Wisconsin/235/97 1997 H1N1
    AF222033 A/Swine/Wisconsin/238/97 1997 H1N1
    U53162 A/Wisconsin/4754/94 1994 H1N1
    U53163 A/Wisconsin/4755/94 1994 H1N1
    S67220 A/Swine/Nebraska/1/92 1992 H1N1
    L09063 A/Swine/Nebraska/1/92 1992 H1
    L24362 A/Maryland/12/91 1991 H1N1
    U46783 A/Swine/Beijing/47/91 1991 H1N1
    U11857 A/Swine/St-Hyacinthe/106/91 1991 H1
    M81707 A/Swine/Indiana/1726/88 1988 H1N1
    DQ508905 A/Wilson-Smith/1933 1933 H1N1
    J02176 A/WSN/33 1933 H1N1
    Flu isolates are named by convention which has the type, species, location, sample numbber, year of isolation, and sero-type. No listing of species indocates the isolate is human.

    An alphabetical list of sequences at GenBank (grouped by sero-type) is here

    http://www.ncbi.nlm.nih.gov/Taxonomy...hmode=1&unlock

    The list above is by date of isolate (group alphabetically within each year).

    There are over 7000 flu isolates at GenBank (and Los Alamos).

    Comment


    • #3
      Re: Rudimentary Gene Sequencing Question(s)

      Thnx Niman for the additional info

      I've taken the h5n1 entries that are listed at:
      http://www.ncbi.nlm.nih.gov/Taxonomy...hmode=1&unlock

      and I've converted them into an xml format.

      I was going to write a parser, which I've done, but it's difficult to interpret the data since many entries don't follow the same convention (i.e. two locations etc.).

      -hawkeye
      Attached Files

      Comment


      • #4
        Re: Rudimentary Gene Sequencing Question(s)

        From a data querying point of view it is a mess.

        If the basic structure is

        A/species/location/sample no./year Serotype

        the only bit that reliably works A (or B or C) pre fix.

        Species can be omitted (in humans), a common name (Bar headed geese) or a Latin name. With exceptions even form these 'rules' e.g. Wilson-Smith

        Location is often a country but can be a region or town.

        Sample no. is often just a sequential no. but some times contains other data like the sampling lab NAMRU2

        Even the year varies from yyyy or yy

        Serotype format is normally consistent although I have seen it omitted.

        All this makes life unnecessarily tricky and is darned right unscientific. An agreed standard with refusal to accept incorrectly 'addressed' submissions would be a good start perhaps with a little more data. method of amplification, culture date etc.

        Just a suggestion -

        A/homo sapiens/Latitude-Longitude/testing lab/sample no.(sequential numeric only)/amplification (e.g.PCR)/cultured (e.g.MDCK)/mm-yyyy/HxNx ?

        On the internet we place RFCs (Request For Comments) all ideas welcome, agree a standard and adopt it

        Any one else got suggestions?

        Comment


        • #5
          Re: Rudimentary Gene Sequencing Question(s)

          I think its a great idea. Standardization should make for easier database entry and query. Ya gotta like that!

          My vote is you go for it. Maybe one or more of the journalists would find such an inititative, especially from a group like ours, to be an interesting topic for an article. That type of exposure might just tip the scales in favor of such a standradization.
          Upon this gifted age, in its dark hour,
          Rains from the sky a meteoric shower
          Of facts....They lie unquestioned, uncombined.
          Wisdom enough to leech us of our ill
          Is daily spun, but there exists no loom
          To weave it into fabric..
          Edna St. Vincent Millay "Huntsman, What Quarry"
          All my posts to this forum are for fair use and educational purposes only.

          Comment


          • #6
            Re: Rudimentary Gene Sequencing Question(s)

            P.S. I know rigid structures have weaknesses and sometimes it will not be possible to tick all the boxes.

            For instance I have seen the species type listed as environment (water samples or poultry litter), The Spanish Grebe would have been difficult to pin down to species level. Samples handed in by 'concerned citizens' may present a problem re grid reference.

            Following H5N1 has show up some weaknesses in the system e.g. Dates only being held to the year can be frustration is you are trying to follow patterns of mutations along migratory routes (was a sample taken from a migratory bird going to Africa from Asia or returning months later etc.)
            In these days of GIS (http://en.wikipedia.org/wiki/GIS) a powerful analysis opportunity is being missed by the vague area classifications, sometimes this is a country covering a sizable chunk of the globe and on other occasions I never worked out what part of the world the sample came from (A/Swine/MN/16419/01 2001 H1N2 Myanmar?).

            Comment


            • #7
              Re: Rudimentary Gene Sequencing Question(s)

              JJackson- I think that the place is sometimes a US state, rather than a country name- MN is Maine IIRC. I'm not sure if thats the answer here, just a speculation.

              Myanmar is the new name for Burma.

              HTH
              Upon this gifted age, in its dark hour,
              Rains from the sky a meteoric shower
              Of facts....They lie unquestioned, uncombined.
              Wisdom enough to leech us of our ill
              Is daily spun, but there exists no loom
              To weave it into fabric..
              Edna St. Vincent Millay "Huntsman, What Quarry"
              All my posts to this forum are for fair use and educational purposes only.

              Comment


              • #8
                Re: Rudimentary Gene Sequencing Question(s)

                Yes, MN is common for strain coming from minesota.

                Comment


                • #9
                  Re: Rudimentary Gene Sequencing Question(s)

                  Minisota or Maine? Anyway if I find initials like that again I will search for US states. With thanks a European.

                  Comment


                  • #10
                    Re: Rudimentary Gene Sequencing Question(s)

                    LOL, I got that wrong! OK, Minnesota! That just gave me the impetus to look for a chart of the different state abbreviations, in case ots of any help to our readers.

                    from the US Post Office Site (far more likely to be right than I am!!! LOL)

                    State Abbreviations


                    State/Possession
                    Abbreviation

                    ALABAMA
                    AL

                    ALASKA
                    AK

                    AMERICAN SAMOA
                    AS

                    ARIZONA
                    AZ

                    ARKANSAS
                    AR

                    CALIFORNIA
                    CA

                    COLORADO
                    CO

                    CONNECTICUT
                    CT

                    DELAWARE
                    DE

                    DISTRICT OF COLUMBIA
                    DC

                    FEDERATED STATES OF MICRONESIA
                    FM

                    FLORIDA
                    FL

                    GEORGIA
                    GA

                    GUAM
                    GU

                    HAWAII
                    HI

                    IDAHO
                    ID

                    ILLINOIS
                    IL

                    INDIANA
                    IN

                    IOWA
                    IA

                    KANSAS
                    KS

                    KENTUCKY
                    KY

                    LOUISIANA
                    LA

                    MAINE
                    ME

                    MARSHALL ISLANDS
                    MH

                    MARYLAND
                    MD

                    MASSACHUSETTS
                    MA

                    MICHIGAN
                    MI

                    MINNESOTA
                    MN

                    MISSISSIPPI
                    MS

                    MISSOURI
                    MO

                    MONTANA
                    MT

                    NEBRASKA
                    NE

                    NEVADA
                    NV

                    NEW HAMPSHIRE
                    NH

                    NEW JERSEY
                    NJ

                    NEW MEXICO
                    NM

                    NEW YORK
                    NY

                    NORTH CAROLINA
                    NC

                    NORTH DAKOTA
                    ND

                    NORTHERN MARIANA ISLANDS
                    MP

                    OHIO
                    OH

                    OKLAHOMA
                    OK

                    OREGON
                    OR

                    PALAU
                    PW

                    PENNSYLVANIA
                    PA

                    PUERTO RICO
                    PR

                    RHODE ISLAND
                    RI

                    SOUTH CAROLINA
                    SC

                    SOUTH DAKOTA
                    SD

                    TENNESSEE
                    TN

                    TEXAS
                    TX

                    UTAH
                    UT

                    VERMONT
                    VT

                    VIRGIN ISLANDS
                    VI

                    VIRGINIA
                    VA

                    WASHINGTON
                    WA

                    WEST VIRGINIA
                    WV

                    WISCONSIN
                    WI

                    WYOMING
                    WY


                    Military "State"
                    Abbreviation

                    Armed Forces Africa
                    AE

                    Armed Forces Americas
                    (except Canada)
                    AA

                    Armed Forces Canada
                    AE

                    Armed Forces Europe
                    AE

                    Armed Forces Middle East
                    AE

                    Armed Forces Pacific
                    AP
                    Upon this gifted age, in its dark hour,
                    Rains from the sky a meteoric shower
                    Of facts....They lie unquestioned, uncombined.
                    Wisdom enough to leech us of our ill
                    Is daily spun, but there exists no loom
                    To weave it into fabric..
                    Edna St. Vincent Millay "Huntsman, What Quarry"
                    All my posts to this forum are for fair use and educational purposes only.

                    Comment


                    • #11
                      Re: Rudimentary Gene Sequencing Question(s)

                      I live in Minnisota MN is for that ME is standard post office abbrevb. for Maine.

                      Comment

                      Working...
                      X