genbank sequences

gsgs

Registered User

Join Date: Feb 2006

Posts: 11496
- Share
- Tweet
#1

genbank sequences

May 13, 2008, 04:13 AM

I finished my first version (untested) of flu-genbank at:

http://magictour.free.fr/panflu/flu.gz

5MB compressed, 110MB expanded, version from 2008/04/18
Description at

http://magictour.free.fr/panflu/flu.txt

copy below

errors corrected, notations uniformized, computer-readable
so hopefully future changes will be easy.

some improvements are still possible...

then I have tools to extract/merge headers
extract subsets by keyword
make mutation-tables, draw mutation graphs etc.

to be uploaded later
work in progress, I can send by email if someone is interested

names.exe
xtract.exe
merge.exe
seq1.exe
seqa.exe
align.exe
mn.exe
seq1q.exe

source-code attached to the executables

--------------------------------

file flu.gz
62869 records consisting of 2 lines, the first has a header
with 16 entries, separated by commas , the 2nd line has
the nucleotide-sequence.

my current headers:

examples:
>AB000605,H,6,,Japan,1971,1136,C,C/Sapporo/71,,,y,199356,,26-MAR-2003,
>CY009388,H,4,H3N2,New Zealand,2000,1721,A,A/Canterbury/94/00(H3N2), 31411,F,y,363048,20-10-2000,15-MAR-2006,36817

1) genbank access code
2) species (H:human,A:avian,S:swine)
3) segment 1..8 , 1..7 for C
4) serotype empty for B,C,u
5) country
6) year
7) length
8) type (A,B,C,u)
9) name
10) host-age in days
11) host-sex (m,f)
12) full-length ?
13) taxon
14) collection date (year and month at least, else empty)
15) submission date
16) days since 1900/01/01 (if collection date is given)

the nucleotide-sequences are aligned by inserting "-" for
influenza-A :segments 1,2,3,5,7,8, 4-H1N1,4-H3N2,4-H5N1,6-H1N1,6-H3N2,6-H5N1

(simple alignment : "-"s are only attached at the start and end

if no neighbor <5% then print to extra-file instead

don't calculate all d(f,g), if d>min then exit-for

I'm interested in expert panflu damage estimates
my current links: http://bit.ly/hFI7H ILI-charts: http://bit.ly/CcRgT
Tags: None

Announcement

genbank sequences