I have been trying to work out what I can make the nextstrain tool do and was fairly sure it was capable of more than I had found by trial and error. My search led me to a lecture which I will come back to and link below.
re rosmarina's post I don't think there has been anything close to evidence that humans were involved in this viruses evolutionary history, beyond being unwitting hosts. I suspect the short fragment was not deemed worth uploading at the time but once COVID arrived it, and bits like it, suddenly became a lot more interesting. I expect more partial sequences but few full genomes. I looked at the human sequence data using nextstrain and looked at the AA mutation frequency across the full genome and its entropy (first image below) to get a feel for which parts were conserved and which changing. In the second image I zoom in to just the short section that matches the fragment (note the little black triangles at the bottom) and there are a few AA changes at random, with very low entropies, indicating they have little impact on the phylogenic tree's structure.
The sequence covers the region around 15,340 to 15,709 which is part of the RdRp gene which in turn is part of ORF1b. This accounts 1/75th of the genome and so would be expected to show high homology in a conserved region, and when the full sequence is blasted against the NCBI data set I get 89% homology in a range of bat and SARS-1 sequences. This is lower than I expected. It came up in today's TWiV (link below) that when the civit intermediate host for SARS-1 was found those sequences had 99% homology with the human strain (presumably across the full genome, based on context, all though not stated explicitly).
In the top image I highlighted one spike in green, this is C14408T (therefore outside KP876546) resulting in ORF1b P314L and creating clade A2a which is active in Northern Europe, hence its high entropy score.
The promised lecture link by Richard Neher, University of Basel on 6 March 2019 https://www.youtube.com/watch?v=YxTUF10redQ
He was a co-developer of Nextstrain and uses it as research tool. He starts with an intro on flu and then starts using it to analyse H3N2 data. Unfortunately in later parts of the video he is not at the podium and the sound is variable also he is pointing out features on graphics for the audience which we can not see which makes it trickier to follow. However if you persevere he looks at the predictive ability of the tree structures and how well their predictions for H3N2 over the years have compared with what actually occurred. The system is obviously making a better than average estimate of which branches would show mutations and become dominant.
This is a plug for the current TWiV which the panel, and I, both thought was Awesome! I am not even going to try list the areas covered in detail as the list of items not covered would be shorter. It is 2hrs long but I doubt you could improve you understanding of this epidemic with 2hrs spent in any other way. TWiV 591 http://www.microbe.tv/twiv/
re rosmarina's post I don't think there has been anything close to evidence that humans were involved in this viruses evolutionary history, beyond being unwitting hosts. I suspect the short fragment was not deemed worth uploading at the time but once COVID arrived it, and bits like it, suddenly became a lot more interesting. I expect more partial sequences but few full genomes. I looked at the human sequence data using nextstrain and looked at the AA mutation frequency across the full genome and its entropy (first image below) to get a feel for which parts were conserved and which changing. In the second image I zoom in to just the short section that matches the fragment (note the little black triangles at the bottom) and there are a few AA changes at random, with very low entropies, indicating they have little impact on the phylogenic tree's structure.
The sequence covers the region around 15,340 to 15,709 which is part of the RdRp gene which in turn is part of ORF1b. This accounts 1/75th of the genome and so would be expected to show high homology in a conserved region, and when the full sequence is blasted against the NCBI data set I get 89% homology in a range of bat and SARS-1 sequences. This is lower than I expected. It came up in today's TWiV (link below) that when the civit intermediate host for SARS-1 was found those sequences had 99% homology with the human strain (presumably across the full genome, based on context, all though not stated explicitly).
In the top image I highlighted one spike in green, this is C14408T (therefore outside KP876546) resulting in ORF1b P314L and creating clade A2a which is active in Northern Europe, hence its high entropy score.
The promised lecture link by Richard Neher, University of Basel on 6 March 2019 https://www.youtube.com/watch?v=YxTUF10redQ
He was a co-developer of Nextstrain and uses it as research tool. He starts with an intro on flu and then starts using it to analyse H3N2 data. Unfortunately in later parts of the video he is not at the podium and the sound is variable also he is pointing out features on graphics for the audience which we can not see which makes it trickier to follow. However if you persevere he looks at the predictive ability of the tree structures and how well their predictions for H3N2 over the years have compared with what actually occurred. The system is obviously making a better than average estimate of which branches would show mutations and become dominant.
This is a plug for the current TWiV which the panel, and I, both thought was Awesome! I am not even going to try list the areas covered in detail as the list of items not covered would be shorter. It is 2hrs long but I doubt you could improve you understanding of this epidemic with 2hrs spent in any other way. TWiV 591 http://www.microbe.tv/twiv/
Comment