top of page
  • Writer's pictureBJW

How to (potentially) make a deadlier coronavirus

Updated: Jun 3, 2021

Fig. 1 Electron micrograph of SARS-CoV-2 virions [NIH]


In high school biology class, you were probably taught (if at all) that when viruses reproduce, their offspring or daughter viruses are genetically identical, barring perhaps a few mutations. What if I told you that things aren’t quite as simple as you were led to believe? I’ll do you one better. What if I told you that some viruses have the ability to take genetic material from other viruses, and integrate said material into their own genome? Not only does this actually happen, but “some viruses” also happens to include those of the taxonomic family Coronaviridae. Does that name look somewhat familiar to you? Yes, SARS-CoV-2, the highly unanticipated sequel to SARS-CoV and the causative agent of COVID-19, belongs to this taxonomic family of viruses. This “ability” is more formally known as the process of recombination, the “recombining” of genomes.


Fig. 2 This mechanism is called “template switching” Adapted from [1]


Not surprisingly, viral recombination can result in a variety of changes. Get new genes, get a modified phenotype (= set of observable traits). Such changes include an expanded host range, evasion of host immune response, and increased virulence/pathogenicity [1,2]. I was more than mildly alarmed upon learning of coronaviral recombination. Here’s a few examples to show why. In the US, there are 1.2 million cases of AIDs. What if SARS-CoV-2 infected an AIDs patient and managed to recombine successfully with HIV? Could you imagine, HIV , but with the transmissibility of coronavirus? And then there’s the Ebola outbreak in the Democratic Republic of Congo that only recently ended (June 25th). The first case of COVID-19 was reported there on the 10th of March. Yeah.

Before I get pegged as some sort of radical alarmist, I have to add that after learning more about the mechanics of coronaviral recombination, I’m no longer quite as concerned. Thanks to the latter and a few requirements of the cell host bits that do the viruses’ bidding, I don’t think the creation of some Corona-Ebola-HIV monstrosity is incredibly likely. Let’s get into some background.


Fig. 3 The RNA genome of SARS-CoV-2. The colored bars represent individual genes. We’ll get more into what those shorter strands are at the bottom right, they’re important. Diagram pulled from [3]


Coronaviral Recombination

How exactly does coronaviral recombination occur? There’s one thing that’s important to know, and that it’s that recombination doesn’t really occur through a specialized process, it’s more like a byproduct of how coronaviruses operate. More specifically, how they transcribe their RNA genomes in order to get the necessary proteins translated.


Looking at the diagram, you might think that there’d be no issue translating all of the genome in one go. Surely a ribosome just glides from end to end, linking amino acids together as it goes, right? Not really. Eukaryotic ribosomes (as in those from organisms with nuclei, like us) only translate the first gene found on the 5’ (or “front”) end of an RNA strand [2]. This is because a 5' cap (which you should remember from high school bio) is required for ribosome recognition and translation of the gene next to it. This isn’t really an issue for regular cellular mRNA, as they’re always monocistronic, meaning they encode only one product from one gene. Coronaviruses, however, have far more than one gene in their RNA genome. This is a problem.


Subgenomic RNAs to the rescue

This is where the shorter strands you see in the diagram come in. They’re called subgenomic RNAs, or sgRNAs. If you look at them, you’ll realize for each sgRNA, there’s a different “first gene.” They also have something known as a leader sequence (represented by the black boxes) added at the front, which contains the necessary 5’ cap. Also a sequence that allows for recognition by viral replication machinery. There is an sgRNA for every single gene seen in the genome above.


[Except for the ones labeled ORF1a and ORF1b. For reasons sort of complicated and unnecessary to explain, (it involves something hocus-pocus-like called ribosomal frameshifting) those two genes effectively function as one, so they both get translated right off the bat. So no need for a separate sgRNA for them.]

Anyhow, these sgRNAs serve as a solution to this only-first-gene-gets-translated problem. But how do they get made?


Formation of the sgRNAs

Well, the ORF1a and ORF1b genes encodes for a specialized piece of machinery. It’s known as a replicase-transcriptase complex. As its name implies, its job is to replicate copies of the genome and transcribe sgRNAs. It does sgRNA-making via a process called “discontinuous transcription” (although this process is also important for recombination in the context of genome replication. We’ll discuss this later).


Positioned before each gene in the coronaviral genome are particular sequences called transcription-regulating sequences, or TRS’s. These are important to the process of discontinuous transcription. You can see them marked on the diagram as the green boxes. Within the “body” of the genome, these sequences are called TRS-B’s, B for body. At the head of the genome, there is the TRS-L, L for leader. These sequences all are very similar to each other [4,5].


What about discontinuous transcription?

Fig. 4. The process of discontinuous transcription. Continued below.


Getting there. When the replicase-transcription complex gets about to doing its job, it moves along the genome, making a complementary (also called antigenomic, or negative sense) strand to the genome (A pairs with T or U; G with C). [Part A of figure].


When a TRS-B has been reached and fully transcribed, the replicase-transcription complex pauses [Part B]. It then has the chance to keep going, or jump to the TRS-L, where the complementary copy of the TRS-B within the forming strand will attach to the TRS-L [Part C]. The replicase-transcription complex (RTC) then continues as if nothing happened, finishing after it reaches the start of the genome, having added the complement of the leader sequence to the developing strand [Part D].


[In the case of full genome replication, the RTC would continue past all of the TRS-Bs to the 5’ end of the genome, never jumping once]


Since the result is a strand that is complementary to the genome [Part E], it’ll need to go through one more round of transcription before a translatable sgRNA is produced [Part E + F].

Note: the above diagrams were adapted from [2]


To be clear: the second round of transcription needs to occur, as otherwise, you’ve just got a strand with a sequence complementary to the genome. Which means the bases within are different (again, A is complementary to U or T, G to C). Different bases, different product (if there even is one, as the ribosome start codon’d be UAC instead of AUG. You kind of need a recognizable start codon for the ribosome to do anything). Complementary does not mean identical. Important distinction.


Anyhow, this end product is what you would get if you cut the viral genome at the jumped-from TRS-B and the TRS-L, took out the middle piece, then fused what’s left.

Looking now at the diagram of the SARS-CoV-2 genome, you hopefully now have an appreciation of what goes into the creation of the sgRNAs.


I’m confused, why did the RTC have to jump again?

That is a good question, let me clarify. The RTC jumps so it can add the leader sequence (which contains the essential 5’ cap) to the developing sgRNA. Without the jump and the resulting addition of the leader sequence, the host cell’s ribosomes wouldn’t be able to to recognize the sgRNA as something to translate. Also, without a jump, and the omission of all the genes in between, the gene right after the TRS-B wouldn’t be “first in line” then. So the jumps are necessary to add the leader sequence, and to make sure that all the genes get a chance to be translated via their own sgRNAs.


So what does this have to do with recombination?

The main accepted theory of how recombination (in non-retroviruses) happens involves jumps of the replicase-transcriptase from one genome to another. This was shown in the opening diagram (see above). It’s called “template switching,” as in, the template of one genome was traded for another midway. As hopefully you understand now, the routine practice of RNA transcription leading up to translation for coronaviruses also involves jumps, albeit from one place to another on the same genome. Now you might be beginning to understand why I was somewhat alarmed when I first learnt of this.


To make things abundantly clear: what if instead of jumping to the TRS-L sequence something else happens? What if the replicase-transcriptase complex and developing (or “nascent”) strand jumped to a site on a new genome? A new, hybrid genome would be produced. Recombination would then have occurred. Does this actually happen? Sure it does, it’s been shown to occur under laboratory conditions. A 1992 experiment showed that when you put naked fragments of a coronavirus genome into a cell already infected by another coronavirus, a recombinant genome was made [6].


The coronavirus genome fragments that were introduced belonged to a closely related strain to that of the already-infecting coronavirus. This is hardly as exotic as the scenario imagined at the start, with recombination occurring between viruses such as Ebola or HIV, I admit. I was also unable to find a study attempting something more daring. However, it’s unlikely that someone would try to create a hybrid genome out of Ebola or HIV with coronavirus as part of a study on recombination. Imagine trying to get to the funding for that.

Nature, however, is not limited by the scruples of grant committees. There is a widely accepted instance of a possible coronaviral recombination event of the more exotic variety already out there. In 1988, it was noticed that a particular strain of Mouse Hepatitis Virus, a coronavirus, possessed an inactive gene that was oddly similar to one found in influenza C [7]. The gene in question was one encoding the membrane protein hemagglutinin esterase. It was subsequently realized that members of the toroviruses, taxonomic “cousins” of the coronaviruses (they both belong to the family Nidovirales) also possessed similar versions of this gene, some inactive, others active [8].


Interestingly, the locations of the HE gene in the toroviral and coronoviral genomes differ, suggesting that these two families of viruses acquired their versions in two separate recombination events (as the sites of integration should be more or less random) [9]. If they were acquired in the same recombination event (occurring in the common ancestor of both families), the position ought to be the same. But the positions differ.

Fig. 5 Pulled from [8]. You can sort of see how the exterior of the virus on the right is more packed than that of the left virus due to the additional HE membrane proteins.


Today we know that a variety of coronaviruses, some of which infect humans (e.g. HCoV-OC43 and HCoV-HKU1) possess versions of this gene [4]. While it’s accepted that the HE gene possibly showed up through recombination in these viruses, the precise sequence of events and recombinational partners that led to this somewhat odd situation remains to be definitively proven.


So recombination between quite unrelated genomes occurs, so why aren't we seeing new, scary viruses emerging from these interactions?

Here is where I need to qualify my remarks extensively. In order for a mutated and potentially deadlier coronavirus to emerge from a recombination event, a certain set of events (some less likely than others) would have to occur. Based on what we’ve discussed thus far (plus some extra) here are some events that I think would need to occur, plus a few conditions:

Event #1: coinfection with another virus

I think this one’s pretty obvious. In order for recombination to occur, as shown by the first figure, two different viruses need to infect the same cell. In order for that to occur, a person needs to be sick with more than one virus.


Event #2: the coronavirus RTC jumps to the other genome

Again, fairly obvious. In order for new genetic material to get added to the coronavirus genome, the RTC needs a template of that material (i.e the genome). However, I need to clarify something about the sites at which the RTC might jump at, both to and from. Based on what I’ve been saying, it may seem that the RTC only jumps to and from TRS sites, thanks to what you know about the process of discontinuous transcription. The implication of that would be recombination would be limited to other genomes that contain sequences similar to the TRS’s. However, some recent data might suggest that the RTC might not necessarily require this similarity [3].


You can scroll down (a lot) to the next section if you’re not interested in the added detail.


*sidebar start*


Fig. 6. Pulled from [3]. A mapping of crossover events on the SARS-CoV-2 genome.


What you’re looking at is a heatmap showing the frequency of particular jumps made by the RTC. Each coordinate represents a certain jump made from a particular spot on the genome, to a particular spot on the genome.


The “from” (3’ breakpoint) is the y coordinate, and “to” (5’ breakpoint) is the x.


The color of the pixel at that coordinate denotes the frequency at which that particular jump was detected, with blue being the lowest, pink the highest. Finally, the x and y axis units are both in terms of kilobases (one thousand bases) from the start of the genome. Hopefully that makes sense.


An example. Say you’re interested in how frequently the RTC jumps from (y coordinate) the 20-thousandth base, [about two thirds of the way from the 5’ (front) end of the genome] to (x coordinate) the 5-thousandth base. You’d start by looking at the y axis, and trace your way down to the 20, as the units are in kilobases (kb). You would then trace your finger (or cursor) horizontally, until it hovers vertically above the 5 on the x axis. Looking back at the un-annotated map, we can tell that this particular jump occurs at the lowest frequency, as its coordinate is blue.


Fig. 7 The spot on the heatmap as described in the example.

Fig. 8 What the RTC jump for the chosen coordinate on the heatmap would look like on the genome. For the above example.

A couple more important details. The locations of the TRS-B’s on the genome are indicated by the red dots chart running up the y axis. Finally, the red asterisk marks the location of the TRS-L.


The most jarring thing about this chart when I first understood what this map stood for was that there’s neon yellow a lot of places it shouldn’t be, according to the normal TRS-dictated jumps. In other words, a lot of abnormal jumps (“non-canonical fusion events” in the terms of the paper). According to the theory that the RTC only jumps from TRS-B’s to the TRS-L, really, only the left-most vertical stripe (which was enlarged for greater detail), should contain neon or pink, and only on the same y-levels as the TRS-B-indicating dots. This is not to say that “normal” jumps aren’t occurring though, and in the highest frequency as the few places where pink does show up is at those aforementioned intersections of the vertical TRS-L line and TRS-B horizontal lines.


However, looking at how a faint neon line traces the y-level of pretty much any TRS-B (meaning that the RTC is jumping from that one TRS-B to basically any part of the genome preceding it), it becomes clear that sequence similarity to TRS’s (both at the site being jumped from and to) might not be as necessary for a RTC to make a jump as I previously thought.


Moral of the story here is that it might not be incredibly difficult for an RTC to make a jump to a foreign genome.


*Sidebar ends*




Event #3: the RTC jumps back to the coronavirus genome and transcribes the rest of it

This one’s not so obvious, but still deducible based on what I’ve told you guys thus far. I need to make one thing clear though: recombination does not necessarily create functional viruses. For recombination to produce a new, functioning virus, much of the coronavirus genome must remain intact. As we’ve already seen, the 5’ end of the genome contains essential sequences that make ribosomal translation of sgRNAs possible. There are also sequences on the 5’ end that are needed for RTC recognition. And many of the genes (though we didn’t discuss their purpose really) are essential for proper coronavirus function. It probably wouldn’t be that as necessary if the other participating virus were another coronavirus, but this would be essential for “coronavirus + something else” recombination.


Condition #1: The new gene lands in a favorable position

If the new gene gets inserted into the middle of an already existing gene, that could potentially disrupt the function of the latter, which could then possibly result in a non-viable phenotype. Had I not read the paper describing the abnormal jumps of the SARS-CoV-2 RTC, I would have added that the gene would have to be the “first in line” after a TRS-B, but now it seems completely possible that the RTC may jump even before a TRS-B, making another first gene in the resulting sgRNA than would occur “normally.”


Condition #2: The new gene is one that actually increases infectiousness or virulence

I don’t know enough virology/general biology to speculate too much on what such genes might be, but one study I briefly skimmed described a laboratory strain of influenza with increased pathogenicity after a bit of cellular ribosomal RNA was incorporated into a receptor gene [10]. The point here is that a new gene or gene fragment might not have to be one that is a “virulence gene” in its original setting in order to increase pathogenicity in its new place.


In conclusion...

These are just a few events and conditions that I could think of, so this is by no means an exhaustive list. As you can see, there are a lot of factors at play here, I just wanted to show a few of them that might need to work out before anything of consequence happens. What’s the probability of all of the above working out? I couldn’t give you a number, all I can say is, it possibly has happened once, as seen in the case of the HE gene (although this didn’t necessarily result in a deadlier virus). [Edit 8/23] The fact that other cold-causing coronaviruses have been endemic, or present within the human population for some time without complication gives me some reassurance though. I am by no means an expert on this matter so I defer to vetted experts like Dr. Fauci or Birx. Perhaps tellingly, they have not (to the best of my knowledge) brought up recombination as something significant to consider.


If anything, I hope the contents of this post have given you an understanding of why I think the coronavirus is something to be treated with a healthy [pun intended] amount of respect. I’ll leave it up to you to decide what that means exactly.




Thanks for reading, stay safe!

- BJW


104 views0 comments

Recent Posts

See All

Comments


bottom of page