No announcement yet.

Discussion - 2019-nCoV genetics

  • Filter
  • Time
  • Show
Clear All
new posts

  • Discussion - 2019-nCoV genetics

    For those of us with an interest in viral genetics I have started this thread to look at the little data that is now available. What follows is not my analysis but based on the work posted on the forum.

    1] The consensus is that much of the sequence data is unreliable due to the high number of sequencing errors.
    2] Based on the more trusted sequences the MRCA (most recent common ancestor) is probably in early Dec. 2019 giving a recent single common ancestor for all sequences. If accurate this means it probably has not been circulating below the radar for a while.
    3] Due to the recent MRCA the cladogram is boringly flat with only a one or two AA variance from WH01 and most being identical. Where there are changes they seem random with no evidence of host adaption.

    A note on the Virological site. If you followed the H7N9 discussion thread you may remember Andrew Rambaut from Edinburgh Uni. who posted really useful phylogenetic trees and analysis on the University's Epidemic site (which had sections on MERS, Ebola and flu). When this outbreak occurred I checked the site only to find it had disappeared but Andrew had reappeared as Administrator of Virological which at that time only had a couple of threads. In the main discussion thread I posted a speculation this was a Beta CoV before it was announced, I would not normally have done this based on a single media report but the article was by Lisa at Cidrap and she was quoting Marion Koopmans of Erasmus. I trust MK not to have made the comment without good reason and LS not to use it without reason to think MK had good reason for saying it. Which brings me back to Virological where MK & AR are contributing and which was used to post the first open source sequence. Currently some sequences are being deposited at GISAID and others at Genebank.

  • #2
    there was speculation about a lab-release , biggameindia zerohedge , are the mutations and comparisons with older bat-viruses matching or are there timely gaps , has this been checked

    I'm currently downloading and aligning coronaviruses from genbank, got 388 , so slow with mafft
    I'm interested in expert panflu damage estimates
    my current links: ILI-charts:


    • #3
      I am learning as I go as I have not looked at CoVs before. What I have gathered is that a fair bit of research was performed in to SL SARS which are the group of beta CoVs found in the host reservoir that forms the genetic pool from which SARS emerged. Much of the RNA is conserved with 95%+ homology but like flu there are areas where that drops to 85%ish. One area that seems to be a focus of interest S 153syn which sits in the S2 binding pocket making it a potential drug target which, if I understood the TWiV you linked to, is an ACE2 receptor. This pocket is part of the low homology zone and includes the primary antigenic site making life harder for the immune system. Much of this may be my poor understanding of what I read as I am getting out of my comfort zone with these papers.


      • #4

        we need this RaTG13 virus, which is by far the closest.
        Allegedly from Rhinolofus affinis from Yunnan

        Data Availability statement. Sequence data that support the findings of this study
        have been deposited in GISAID with the accession no. EPI_ISL_402124 and


        Yen Shu Chen • 3 days ago
        The authors did not share (no GenBank/GISAID accession number are
        provided) the genome sequence of the critical bat-CoV that represents a
        close relative to human 2019-nCoV.
        No way to access/reproduce/further use their result. Do scientific journals accept such practice?
        reuns Yen Shu Chen • 2 days ago • edited
        Exactly, there has been thousands of bats coronavirus sequenced since the Sars epidemic,
        it is not unusual that most of them are just lost into some dusty lab, but it is weird they
        didn't upload thisone since it is the most astonishing result of their study, and the exact
        genome might contain some useful information on how bat viruses can mutate and
        contaminate human. Also note that the paper is from Wuhan's institute of virology, probably
        the same lab which discovered the aforementioned bat coronavirus
        Last edited by gsgs; January 27, 2020, 01:53 AM. Reason: link
        I'm interested in expert panflu damage estimates
        my current links: ILI-charts:


        • #5
          I finally got it aligned etc. , 395 coronavirus genomes from genbank, mostly SARS.
          file corona20.c6 , 16MB , someone wants it ?
          The recombinations are not so clear and there could be variations of the mutation rate in the 30000 nucleotide genome
          I made these charts, #mutations per 100 nucleotides:

          the fasta was 1.2GB, now downloading the whole genbank records with the dates etc.
          Last edited by gsgs; January 27, 2020, 10:12 PM.
          I'm interested in expert panflu damage estimates
          my current links: ILI-charts:


          • #6
            gs Recombination seems to be a feature of this virus, along with deletions and insertions. There has been an interesting conversation developing over at Virological relating to the dangers of using whole genome homology or even single gene homology to achieve the true proximity of isolates. Large recombination events causing poor homology overall while very high homology remains across the unaffected sections of RNA.

            Do you have a sample collection date for RaTG13? The very high homology across the Spike gene is at odds with everything else except nCoV2019 including all other bat SL CoVs given that this is probably the least conserved region, unless the sample was very recent, I do not see how it has maintained its sequence so faithfully nor why this outlier has given its S gene genetics to nCoV. Is there something specific to this unique Spike sequence that makes it well adapted to infect humans? The SL CoVs generally are well know for there ability to spread to other mammalian host (civet, racoon dog etc.) so it seems unlikely that these two Chinese culinary favourites have not presented SL CoVs to humans since SARS yet this atypical sequence has very successfully made the jump and seems to have little difficult binding to our ACE2 receptors (assuming that is what they are using for access this time around).

            I found the RaTG13 date and it was 2013 so several years of drift.
            Last edited by JJackson; January 27, 2020, 08:43 PM.


            • #7
              2013 is still 7 years ago, surprisingly long for the 96.5% similarity. Compared with the other viruses.
              Suggesting lots of bat-coronavrus diversity ...
              I don't know about the different proteins but read about the recombinations and different mutation rates.
              That's why I made the pics with mutationrate over the 30000-genome.

              The region ~12000-~20000 looks suitable for mutation-timing-comparison


              where did you find 2013 ? It's not in the paper. Well, the 13 in RaTG13 may stand for 2013

              > RaTG13 which we previously detected in Rhinolophus affinis from Yunnan Province showed

              I couldn't find a current authors in a related reference

              Yang, L. et al. Novel SARS-like Betacoronaviruses in Bats, China, 2011.
              Emerg Infect Dis 19, 989-991, (2013)

              Hu, B. et al. Discovery of a rich gene pool of bat SARS-related
              coronaviruses provides new insights into the origin of SARS coronavirus.
              PLoS pathogens 13, e1006698, (2017)

              Wang, N. et al. Serological Evidence of Bat SARS-Related Coronavirus
              Infection in Humans, China. Virol Sin 33, 104-107, (2018)
              Last edited by gsgs; January 27, 2020, 11:17 PM.
              I'm interested in expert panflu damage estimates
              my current links: ILI-charts:


              • #8
                There is an interesting post at, “not snakes v2”, that discusses fungi and SARS, MERS, and nCov-2019. They joke that in spite of the high CAI values that there is no significance to this.

                I'm wondering now.

                Widespread Bat White-Nose Syndrome Fungus, Northeastern China

                Do Viruses Exchange Genes across Superkingdoms of Life?

                Ask Congress to Investigate COVID Origins and Government Response to Pandemic H.R. 834

                i love myself. the quietest. simplest. most powerful. revolution ever. ---- nayyirah waheed

                (My posts are not intended as advice or professional assessments of any kind.)
                Never forget Excalibur.


                • #9

                  here is another paper :
                  > The BatCoV RaTG13 sequence was downloaded from the GISAID BetaCov 2019-2020 repository

                  IMO it's a shame, that this important sequence is not public and at genbank.
                  When it's at GISAID, it may only be revealed to other GISAID member,
                  that's not really "publicly available", as they claim.

                  no recombination BaTG13 -- 2019-nCoV
                  Last edited by gsgs; January 30, 2020, 10:37 PM.
                  I'm interested in expert panflu damage estimates
                  my current links: ILI-charts:


                  • #10
                    This is interesting it is in Chinese but this worked for me

                    This is the first I have seen on lab tests with live nCoV virus.
                    They look at two questions
                    1] Is the ACE2 receptor, used by SARS, also used here. They used HeLa cells modified to display various ACE2 receptors from humans, bats, civets, pigs and mice - the virus grew in all but mice so mice are probably out as a lab animal unless a there is a linage with modified ACE2s.
                    2] They also looked at serum treatments and both human and horse anti-sera worked well and could be a useful therapeutic at least as a stop gap measure.
                    This graphic nicely illustrates just how far RaTG13 has wondered from the pack (it is rooted against nCoV2019) and helpfully provides a nucleotide and AA scale across the top. ZC45 seems to share a high homology across ORF1a but reverts to the pack for the Spike protein. I assume that ORF just stands for Open Reading frame but it seems odd that it is not given a designation based on a protein produced as they account for 2/3 of the genome.

                    Click image for larger version  Name:	nCoV tree.JPG Views:	0 Size:	51.1 KB ID:	825460
                    I can not find the paper I found RaTG13 was from 2013 but it definitely gave that as the date and stated the host as horse shoe bat. For RaTG13 to hold such a high % homology for so long makes me wonder if there is another host species in which this RBD configuration needs to be conserved and nCoV came from this source and RaTG13 is a reintroduction to bats and ZC45 is a recombination event. As is often the case in wild animal sampling there are just too few relevant sequences to be sure we have any real understanding of the viral genetic dynamics much of what we do have is due to post SARS investigation and fairly dated given the speed of viral evolution.
                    Last edited by JJackson; April 29, 2020, 05:58 PM.


                    • #11
                      This is a useful visualisation tool


                      • #12
                        how are these pictures called ? Also 3dim ? I didn't first understand it
                        and did the same in multi-pics, one per chart..
                        It looks as if the mutation rate is just different in different areas
                        I figured out that RatG13 is at GISAID
                        can we email or pm seems that i can't pm or send visitor message
                        You might apply some function to account for the mutationrates per region
                        Isn't it likely that China has many more bat-sequences with
                        multiple strains but only academics publish because it's good for their career
                        I complete my genbank downloads ... still reorginising data with my old flu-tools
                        I'm interested in expert panflu damage estimates
                        my current links: ILI-charts:


                        • #13

                          [after WIV1 in 2012]
                          The sampling of this bat cave in Yunnan continued for another 5 years. From these samples,
                          the research team isolated 3 live viruses in succession, and obtained the full-length genomic
                          sequences of a total of 15 bat SARS-like coronaviruses. Surprisingly, the 15 strains contained
                          all the genome components of the SARS virus.
                          Hu Ben, an assistant researcher at the Wuhan Institute of Virology, Chinese Academy of Sciences:
                          Shi Zhengli
                          fuchsia line
                          the highest similarity between SARS-like coronaviruses found in bat caves and their respective
                          genes is above 97%,
                          After 13 years of virus tracing, the origin of the SARS virus was finally found.

                          [ one of those Yunnan cave viruses must have been RaTG13 , 96% similar to 2019-nCoV ]
                          I'm interested in expert panflu damage estimates
                          my current links: ILI-charts:


                          • #14

                            So far, the nCoV-2019 has been reported to share 96% sequence identity to the RaTG13
                            genome (EPI_ISL_402131). However, the S1 Receptor Binding Domain (RBD) of the nCoV-2019
                            genome was noticeably divergent between the two at amino acid residues 350 to 550 – Figure 1A.
                            We aimed to identity coronaviruses related to nCoV-2019 in viral metagenomics datasets available
                            in the public domain. In a recently published dataset describing viral diversity in
                            Malayan pangolins (PRJNA573298 10) we used VirMAP 11 to reconstruct a coronavirus genome
                            (approximately 84% complete from samples SRR10168377 2 and SRR10168378 1) that
                            shared 97% amino acid identity across the same RBD segment – Figure 1B. This result
                            indicates a potential recombination event for nCoV-2019.

                            [these are amino-acids, but what about nucleotides ?
                            I remember the amino-acid conservation in inner segments of flu-A in mallards]

                            pangolin=Schuppentier ,
                            I'm interested in expert panflu damage estimates
                            my current links: ILI-charts:


                            • #15
                              La m?me source a produit aussi ceci :