Justin Zook, PhD: A Conversation with the Co-Leader of the Genome In a Bottle Consortium Developing Standards For Benchmarking Genetic Variant Detection

Oct 31, 2022 | News

In this episode of The Rising Tide Podcast, Dr. Justin Zook, Co-Leader, Biomarker and Genomic Sciences Group at the National Institute of Standards and Technology (NIST) sits down with the Center For Genomic Interpretation to discuss the Genome In a Bottle (GIAB) Consortium utilized for benchmark mutation detection. A detailed discussion on what laboratories, insurance payers or anyone that wants to better understand genetic and genomic testing should listen to this podcast.

The telomere-to-telomere (T2T) reference genome and other resources are also reviewed in detail.

Thank you again Dr. Zook!

If you enjoy this podcast, more Rising Tide podcast episodes are available on this website! You may also listen on Apple Podcasts, Spotify or Google Podcasts.

Transcript:

EGGINGTON: Hello and welcome to the Rising Tide Podcast. It’s an honor today to be interviewing Dr. Justin Zook. He is the co-leader of the Biomarker and Genomic Sciences group at the National Institute of Standards and Technology. Today we’ll be talking about the Genome in a Bottle Consortium and their development of standards for benchmarking genetic variant detection. Thank you Dr. Zook for being here.

ZOOK: Thanks so much for the invitation to talk about the work that we’ve been doing in Genome in a Bottle at NIST. We formed the Genome in a Bottle Consortium about 10 years ago now, with my group leader Marc Salit at the time, and the idea of this consortium was to

bring together the community. That’s included people from other government agencies like the FDA and NIH, it included people from commercial companies that were developing the

sequencing technologies, as well as clinical laboratories that are doing sequencing. Also, a lot of people from academic laboratories that are developing new bioinformatics methods or

new sequencing methods that we could use.

We brought together this group of people to help us to characterize a really small number of human genomes as well as we could so that these can be used as benchmarks, or sort of as truth for people when they want to see how accurate their variant calls are. So we worked with this consortium to select a set of seven genomes that we’ve characterized so far and to do the sequencing of these genomes with lots of different technologies, and then worked with them to integrate those technologies together to basically come up with the best answer that we could by combining all the different technologies together for these genomes, and carefully define basically which regions of the genome we could be confident in based on current methods and which regions we weren’t really sure. Then a clinical laboratory or other laboratories can take the same DNA that we’ve sequenced, measure it using whatever method they tend to use, run it through their bioinformatics methods, and then compare their variant calls to ours in the end to get a sense of what their accuracy is and what the strengths and weaknesses are.

EGGINGTON: That’s why the work you’re doing is just so very important and it goes along well with the reason why we’re interested in speaking with you today. I’m with the Center for Genomic Interpretation, a co-founder of this non-profit and our mission is to encourage careful stewardship of clinical genetics, genomics, and precision medicine.

Before I begin to ask you about clinical labs, you said something about clinical labs. I just want a little caveat: everything we say here today should not be used as medical advice, it should be used as informational only. And another quick note since we’re talking about caveats is, any products we might mention today in this podcast, please remember that we’re not endorsing those products with the exception of Genome in a Bottle standards.

On to my questions now, thank you for that introduction Dr. Zook. How are the Genome in a Bottle samples typically used by clinical laboratories in the United States based upon your experience?

ZOOK: That’s a good question. They typically will purchase our DNA from our samples,

run it through their sequencing process, and then run it through their analysis

process up through the point of calling variants in the samples.

EGGINGTON: I’m gonna hit pause for you on that. When you say up to the point of calling variants in the samples, can you for our non-expert listeners explain what that means, ‘calling variants’?

ZOOK: Good question. This is jargon in the Genomics community that essentially means looking for differences between our sample, or any sample, and the reference genome that was basically the first reference genome that was sequenced by the Human Genome Project 20 years ago or so. These variants, or calling variants, means identifying these differences between an individual you’re sequencing and the reference genome.

EGGINGTON: When you call a variant, are you also naming it or is that a different thing or is that the same thing?

ZOOK: So there are steps after the process of calling variants where you’re trying to understand: what do the variants mean or how do you interpret what the variants mean? Are they associated with a disease or causing a disease, or should you give someone a particular medication versus another medication? Generally in Genome in a Bottle, we’ve scoped our work so that we stop at the point of actually just calling the variants, or saying ‘where are these changes in the genome’? And then, do they come from both the mother and the father, or just from the mother? We’re helping labs to basically understand how accurate they are doing up to that point in the process.

EGGINGTON: Thank you for specifying that. I interrupted you, we were in the middle of talking about how clinical laboratories in the USA are using the samples that Genome in a Bottle provides.

ZOOK: I guess one thing that not all labs know is that we actually have seven different samples that we’ve characterized at this point. Our first sample which we call our ’pilot genome,’ we released back in 2014 I think, as our first benchmark. Then over the next couple years after that, we characterized six additional samples and these are two mother-father-son family trios; one of Ashkenazi Jewish ancestry, and one of Han Chinese ancestry so we at least get a little bit of diversity in the samples that we have. We’ve characterized the variants in all seven of those samples and generally it’s helpful for laboratories to sequence all seven of those so that you get

more examples of different variants. It helps to reduce the risk of you overtraining to any particular sample also when you’re doing your benchmarking.

EGGINGTON: That makes a lot of sense. So are laboratories using those seven samples as you guys intended? Are they meeting the vision of what Genome in a Bottle had for these seven samples?

ZOOK: Yeah, I think in general, laboratories are basically using these as we’ve intended, though there have been creative uses of these that I think both creative in a good way and also it’s good to know the caveats when you’re using them – how they weren’t intended to be used. One of the things to note about Genome in a Bottle is we’ve really been focused on germline variants or basically normal genomes as opposed to cancer genomes. But these Genome in a Bottle samples have been used to help to test variant calling methods for somatic variants or cancer variants as well and that’s where some of the limitations of what we’ve done come in, but they can be used to assess parts of that process as well.

EGGINGTON: For our non-expert listeners, when Justin is talking about cancer variants or somatic variants, there are variants that you’re born with and we talk about those as being

germline. You might call it ‘hereditary’ sometimes, although those two meanings are slightly different, but a lot of people conflate them. So if it’s germline, it means that every cell in your body has those genetic variations. Or almost every cell, there’s some cells that don’t have DNA. Anyway, if there’s somatic or if we talk about cancer mutations, another kind of

potentially inaccurate way we talk about somatic mutations, that means that they’ve been acquired. That means that they’re not in every cell in your body that has DNA, that they’re only in a portion of cells in your body that have DNA; they were acquired at some point in your lifetime and that’s very, very, very important for precision oncology to consider those somatic mutations. So Justin, you were describing how what you have are seven germline, well-defined genomes and that you’ve seen people using them in the cancer space as well in the somatic space.

ZOOK: Yeah, that’s right. We are working on actually characterizing some tumor genomes that could in the future be used as real tumor samples or for somatic variant detection. But one of the ways that people have used our existing Genome in a Bottle samples is to mix them together in different ratios. One of the reasons you do that is because often these somatic variants or the variants that are acquired in cancers are only in some fraction of the cells of the tumor or of all the cells that are sequenced. So this helps you to see how low you can go in terms of the level of variants that you can detect. It’s not that these are perfect mimics of real somatic

mutations, but they do help to give you a sense for how well you’re detecting these lower level mutations.

EGGINGTON: Thank you. Now, if all you do is watch crime dramas on TV, you’ll get the sense that DNA sequencing is really, really simple, right? That you show up at the scene of a crime or something and some sample is taken and then on someone’s laptop five minutes later, this double helix is spinning and they’re like ‘oh, we found the perpetrator of this crime,’ right? There’s this idea that DNA sequencing is really, really simple I think. What you and I both know is that is not true. Well, for one the data doesn’t come up as a spinning helix, right? That’s not how we visualize DNA sequence data. Also, let’s probe on this idea that DNA

sequencing is easy. I would say that there are parts of DNA sequencing or parts of the genome that are easy, but other parts that are more challenging to analyze correctly. How can Genome in a Bottle benchmarks be used to understand the strengths and weaknesses of a sequencing method if it’s true that there are different parts that are more challenging to sequence than others?

ZOOK: One analogy to this that we sometimes use is the analogy of a puzzle. For example,as a Christmas gift the other year, one of our relatives gave us a puzzle of our family outside

and we were all wearing different clothes with different colors and then there was the background, which was an out-of-focus grassy area and dirt area and trees. If you think about the genome as a puzzle, the places where my family was were quite easy to put together, or at least relatively easy to put together because they were all distinct from each other. But then the background regions, the pieces looked very similar to each other and it took a lot longer to figure out how to put those pieces together. In the same way, there are regions of the genome that are quite distinct from each other and it’s much easier to analyze those regions of the genome using pretty much any of the current sequencing technologies that are out there. Then there are other regions of the genome that are very similar to each other, we sometimes call these ‘repetitive regions,’ and these are much harder to put together and sometimes you need specific sequencing technologies to analyze them that maybe can read longer stretches of DNA at a time. It’s sort of like having a bigger piece of the puzzle to put together. So there are these different regions of the genome and one of the reasons that Genome in a Bottle has kept on going over these past 10 years and we didn’t just release our first genome and stop is because we only covered about 77% of the genome in our initial benchmark that we released. Over time as new sequencing technologies have been developed, and as new analysis methods have been developed, we’ve been pushing into these harder and harder regions of the genome and developing benchmarks for these harder types of variants and harder repetitive regions of the genome.

One of the ways that laboratories can use our latest benchmarks that we’ve been working on is to see how well do they perform in these easier regions of the genome versus the harder regions of the genome, because part of what we’ve developed are some resources to accompany our benchmarks where they’re basically lists of genomic regions that are difficult for different regions, different types of repetitive regions, and that type of thing that helps clinical laboratories to understand which regions they’re doing well in and which regions they might not be doing as well in.

EGGINGTON: A clinical laboratory that elects to figure out what their weaknesses are, for example. I think it’s important to note there’s a misunderstanding in many people in the clinical space that sequencing is the same in every laboratory, that the same strengths and the same weaknesses will be the same in every laboratory and that’s not true. There’s a lot of stuff that laboratories do, perhaps the sequencing instrument is the same, but beyond that there’s a lot of stuff that laboratories do to improve the accuracy of their testing. So what you’re saying Dr. Zook, is that a laboratory who elects to figure out what their strengths and weaknesses are can say, ‘okay give me the puzzle,’ you know, ‘give me the solution to the puzzle but also give me the pieces and I’m going to put it together independent of the solution and then compare the two and see what I was great at and see what I was poor at’. Then what we hope they’ll do is two things: be transparent with clinicians and the public about what they’re poor at and work on getting better and improving that. Without standards such as the Genome in a Bottle standards, you can’t do that. There’s no way to find out what those weaknesses are. Have I summarized that correctly?

ZOOK: Yeah that was a great summary.

EGGINGTON: Okay, great. Checking your group’s website may not be something that some clinical labs are doing on a daily basis, so tell us about the new resources that have become

available for use by clinical labs and what are those that are currently in the works? What’s new, what’s out there that can be used, and what’s being worked on?

ZOOK: Those of you who have been following Genome in a Bottle, you know that we’ve released multiple versions of our benchmarks over the past 10 years. The latest versions that we’ve released had some significant improvements in these difficult regions of the genome. One of them, we call this our version 4 benchmark, we used some of the newer long-read sequencing technologies and some other technologies called Linked-Reads to help to access these really repetitive regions of the genome that had previously been excluded because we weren’t sure what the answer was. This included some genes like PMS2, which are commonly tested in clinical laboratories, and it has some quite challenging regions within it.

EGGINGTON: Just an aside, that’s for hereditary colon cancer disorders, that gene. So that’s really important.

ZOOK: Yeah, there are parts of that gene that are really difficult because there are what are called pseudogenes that are related to it, and these pseudogenes may or may not be actually functional. But they are so similar to the PMS2 gene that it’s hard to know whether a change that you detect is in the actual PMS2 gene or in one of these pseudogenes. That really affects the interpretation downstream, which of the genes it’s in. One of the things that we were able to do is include genes like PMS2 in this latest benchmark that we developed that ended up covering it. It depends on how you calculate it, but something like 92% of the genome or so is in the latest benchmark.

Then there’s another resource that we developed in parallel with this, but also subsequent to this, where we noticed that even this version 4 benchmark that we had was not covering some medically important genes that were challenging for one reason or another. We used an even newer method for analyzing genomes that’s called ‘de novo assembly’ where instead of using a reference genome at the beginning that you compare all of your sequencing data to, you try to stitch these sequencing reads that you get together. That enabled us to characterize a few hundred genes that are medically important but were not included well by our previous benchmarks. We developed a specialized benchmark for one of our samples for this set of

challenging, medically relevant genes and we released that over the past year as well.

EGGINGTON: So a laboratory can go on your website and find out which one that is and get that in-house to work with?

ZOOK: Yeah, that’s right. We also have a publication that was just published earlier this year

in Nature Biotechnology about that.

EGGINGTON: Awesome, thank you. I did not know about that so that’s exciting news. What’s coming up, what’s in the works that you’re allowed to tell us about?

ZOOK: We generally operate very transparently in Genome in a Bottle and I can tell you about what we’re working on and thinking about. Basically there are a few different areas that we’re working on. One of them is to expand in these methods where we’re using de novo assembly. These methods have advanced really rapidly over the past couple years. We’re working with other consortia like the Human Pangenome Reference Consortium, where they are doing de novo assembly of a larger number of individuals, including some of our Genome in a Bottle samples, and we’re trying to use these de novo assemblies to expand our benchmarks to include as much of the genome as we can. That also will include larger genomic changes that can occur in our genomes that we often call structural variants, where it’s a large insertion or deletion of sequence in the genome, or other types of more complicated changes that can occur. That’s for our existing samples, that’s one of the efforts that we have ongoing right now.

EGGINGTON: Let me just interject for one second. So Pangenome, for those that are unfamiliar with this, the existing genome that we call the reference genome that we compare all clinical sequencing to, is not very representative of different ancestry groups, different populations across the globe. We know this to be a great weakness in clinical testing and so the National Institutes of Health – it is the NIH right?

ZOOK: Yes

EGGINGTON: -is running a Pangenome project where the goal is, if I get the number right, to sequence 250 genomes, which is a lot. Deep sequence, get it right genomes, from across the globe to better represent the variability in genetics in different populations. It’s very exciting work. I think it’s going to bring clinical genetics really, really forward a lot in terms of accuracy and efficacy across the globe, as well as here in the United States. It’s very exciting to hear that you guys are involved in that as well. Okay, keep going. What’s new that you guys are working on?

ZOOK: Sort of related to that, one of the things that we’ve started to explore with the Human Pangenome Reference Consortium is whether there are differences in accuracy between people that have different ancestries. So someone has African ancestry versus European ancestry, do you see differences in the accuracy of the variant calls between those individuals? We’re still in the initial stages of doing this exploration to better understand what those differences might look like. That’s one of the paths that we’re pursuing, since up to

this point, our benchmarks have been from what I mentioned: that there’s someone of European ancestry, someone of Ashkenazi Jewish ancestry, and a family of Han Chinese ancestry. Having broader representation will help us to understand differences between ancestries.

EGGINGTON: That’s great. I’ll be curious to see what the results of that are. As an FYI, what my specialty is – I’m not sure if I introduced myself – I’m Dr. Julie Eggington. My specialty in genetics is the interpretation of DNA. It’s not actually what we’re talking to Justin about. What we’re talking to Justin about is the detection or the calling of DNA variants. What my job has

traditionally been in the industry is then interpreting the clinical meaning of that. Definitely what has been demonstrated on multiple occasions is that clinical scientists are less accurate at interpreting the clinical meaning of true genetic variants in populations that are non-white Western European and so I’ll be interested to see if we see the same thing. There’s lots of consortia that are working on that, the industry is trying to fix that, it’s a big issue in health disparity, trying to get the interpretation correct for groups that have traditionally not accessed genetic testing. I’ll be interested to see what happens in terms of the detection and the variant calling in the Pangenome project.

Okay, anything else you want to tell us about with what’s new and coming?

ZOOK: The other thing future looking I mentioned, that we’re looking into characterizing some tumor and normal cell line pairs where you have normal cell lines and tumor cell lines from the same individual. We characterized them really deeply, similar to what we have with the normal cell lines that we’ve used up to this point. We have one candidate tumor cell line that we’ve

just started to do some sequencing of. One of the things that makes this tumor cell line unique as far as we know, is that it’s clearly consented to be able to publicly release all of the genomic information. That’s really important for efforts like Genome in a Bottle where we’re trying to work with people around the world and around the country at different institutions and want to make all our results easily accessible by the community.

EGGINGTON: Is it a newer cell line or is this one that’s been around for many years?

ZOOK: This is a new one that we have been working with someone at the Massachusetts General Hospital to develop. It’s a pancreatic cancer cell line that we’ve been working on. Unfortunately, none of the existing ones were consented appropriately for what we wanted.

EGGINGTON: That’s definitely a challenge, yep. Okay, anything else before I move on to the next question?

ZOOK: No, I think you’re good.

EGGINGTON: That’s it? That covers it.

ZOOK: Yep.

EGGINGTON: Okay, so I want to ask you about the Telomere-to-Telomere Consortium. But before we talk about that, for a lot of the audience we need to actually define what a telomere is. Let’s define that. How would you describe that for a lay audience?

ZOOK: Essentially, the telomere is the sequence that’s at the very end of each chromosome. We have 46 chromosomes in normal cells in our body and half of them come from the mother, half of them come from the father. Each of those chromosomes is one long string of DNA and at each end there’s what’s called a ‘telomere,’ which is just a repetitive sequence at the end and does have some important biology, but that’s not really the purpose of this consortium that was formed. Essentially, the reason it’s called the Telomere-to-Telomere Consortium is because

what they were able to do is sequence each of the chromosomes from one end of the chromosome to the other, so from one telomere to the other telomere. They were able to get the

complete sequence of each of the chromosomes in the cell.

EGGINGTON: So we could call this the start to finish consortium of genomics? Okay, so this year there’s been news about the genome finally being completely sequenced. People have seen that and it caused confusion for some people because they’re like ‘I thought that happened 20 years ago’. But some of it happened 20 years ago. It was a little bit incomplete, but this year they’ve been saying it’s happened. Tell us about this consortium, tell us about the Telomere-to-Telomere Consortium.

ZOOK: I was fortunate to be part of this consortium of over a hundred people from around the world that worked on this project that formed during the Covid pandemic. We had a big Slack workspace that everyone coordinated on. It included a lot of experts in doing genome assembly, which I mentioned earlier, this is stitching together reads to be able to assemble the complete human genome. In particular, one of the regions that was really hard to sequence and had never been sequenced well before were the centromeres of each of the chromosomes.

EGGINGTON: That sounds like the center is that right? Centromere? Center?

ZOOK: Essentially, yeah. It’s not exactly in the center but that’s why it was called the centromere. It turns out that it’s super repetitive, that it’s the most repetitive, long, biggest repetitive regions in the genome.

EGGINGTON: This is the part of the puzzle that is near impossible to solve because every piece looks the same. So it’s been tough.

ZOOK: Yeah, that’s exactly right. There have been new sequencing technologies that have come out over the past couple years that are often called long-read sequencing technologies because you can read longer stretches of DNA at a time. There are two different ones, one from PacBio that reads maybe about 20,000 bases, so 20,000 letters of DNA and does it quite accurately for each of the individual reads. Then there’s another one called Oxford Nanopore Technologies that can read even longer stretches of DNA, a hundred thousand or even several hundred thousand or a million bases at a time. It’s less accurate right now, but these can be used complementary to each other to sort of resolve these most difficult regions that couldn’t be resolved before.

EGGINGTON: To use your analogy, when you say PacBio has these long reads and Oxford Nanopore has these even bigger reads, we’re talking about the size of the puzzle piece, right?

ZOOK: Yeah.

EGGINGTON: The bigger the puzzle piece, the easier it is to put together your puzzle. So we have these new technologies coming out that have allowed this type of Telomere-to-Telomere Consortium to be able to actually solve the hard bits. When will this Telomere-to-Telomere consortium reference genome become available to clinical labs for them to actually use?

ZOOK: It is already available. It was published in Science earlier this year in a set of different papers [https://doi.org/10.1126/science.abj6987 and https://github.com/marbl/CHM13]. It included a paper showing where you get improvements in variant calling using this new reference. It turns out that it’s not only these new regions that are included in the new T2T reference that are important, but also it fixed some errors in previous versions of the reference. That allows you to make more accurate variant calls in a number of different regions including some medically important genes that people have either had to ignore in the past or they’ve had to have sort of workarounds to be able to analyze.

EGGINGTON: This sequence that’s utilized these long reads, that’s one of your group’s Genome in a Bottle samples?

ZOOK: That’s a good question. Actually, it’s not. The reason that it’s not is because they

used a really special cell line for this work to make it an easier process than it would have been otherwise with a normal genome. It’s called a hydatidiform mole cell line which is –

EGGINGTON: Wait, say that again for me? What is it?

ZOOK: I think I’m saying it correctly: hydatidiform mole. It’s sometimes called a ‘molar pregnancy so this is a pregnancy that doesn’t go well, and that’s because at least some of these end up having two copies of the chromosome from the – oh now I’m forgetting – I think it’s from the father. It’s either from the father or the mother, now I’m actually forgetting which one, but it doesn’t really matter. You get two identical or almost identical copies of each chromosome, so it’s almost like instead of having 46 different chromosomes that you have to do this assembly for, you only have 23 different chromosomes and you don’t have to disentangle the mother and the father’s copies.

EGGINGTON: Oh, so that made this easier to do?

ZOOK: It definitely wasn’t easy. There was a lot of work put into it.

EGGINGTON: Okay, I shouldn’t say that. It makes it less difficult.

ZOOK: Relatively. Actually, now these same groups are working on doing something similar

for the Genome in a Bottle cell lines, as well as some other individuals where they’ve continued on their methods development to be able to do this for normal cells as well.

EGGINGTON: Okay. With the sequence that you say is available for clinical labs, is there any work that remains to be done in order for the clinical laboratory to use it as a potential reference sequence? Talk about that.

ZOOK: That’s definitely true. Even though it’s available now for anyone to use that wants to use it, and even has been available for more than a year now, there are challenges with using it as a clinical lab right now. One of them is that as a clinical laboratory, anytime you change any part of your process, you have to do re-validation. Even if everything is clearly much better with this new reference, it’s still a lot of work to change your process as a laboratory. That’s one of the barriers, just simply the inertia of having to re-validate everything or at least re-validate the parts of the process that are important. That’s why a lot of clinical laboratories are actually not even in the latest version of the reference previous to this one. Probably most clinical laboratories are still using what’s called GRCh37 or hg19 which is the previous version before this. There are barriers just in the re-validation part. There are some efforts to make this process easier

going forward.

I think over the next year or so, probably we’ll see a number of different ways that clinical laboratories can use this new assembly. One of them that I’m actually working on with a collaborator, Fritz Sedlazeck at Baylor, we’re working on how can we use what we’ve learned in the Telomere-to-Telomere reference to improve the current reference genome so that a clinical laboratory might be able to stick with their current reference that they’re using, but make some modifications to it that will improve their analysis at least in some of these regions that are improved in the new reference.

EGGINGTON: Is that the pitch then for laboratories to put the extra effort in if they want to re-validate their pipeline or if they want to develop a way against this reference sequence, is that it will help them be more accurate in their testing in some regions of the genome?

ZOOK: Yeah, that’s right. We do know that there are genes that are medically relevant that are improved if you use this new reference or you use one of the variety of different methods that people are using to use that information. There’s also the other value that’s probably a little more longer term is, there will probably be new discoveries of important genes because we have this new reference – we couldn’t know whether this gene was important because it couldn’t be analyzed well previously.

EGGINGTON: So we didn’t know what we didn’t know. Now we know, so if you’re not on

that boat you’re missing stuff. I think I mixed like three analogies on that, but okay. That makes sense. Ideally, what I’m hearing is that what we want clinical labs doing is researching to see, given their clinical genetic panels, what they’re focused on, is there potentially a better reference sample for them to be using than what they’re using now? It may not be this one, right? Maybe for a given gene or a given panel, maybe there’s no benefit to switching. But if there is accuracy benefit, we want to encourage labs to think about switching to these better, more nuanced reference samples to map to, to sequence to, to run in their instrumentation as controls.

ZOOK: I think at a minimum, if a clinical laboratory has their list of genes that they’re testing, one of the things they can do is look at one of the papers that we published that was a companion to this Telomere-to-Telomere paper that looked at where is variant detection improved with this new reference. You can look at those lists of genes that are in the supplementary tables and see, are any of these or ones that you’re currently doing? That’s the lowest hanging fruit probably as an initial thing to double check. If you do have any genes on that list, then you should probably go back and at least make sure that you think your current analysis pipeline is detecting those correctly.

EGGINGTON: Okay, that makes sense. Alright, what we’re going to do is we’re going to make sure we give the hyperlink to those papers you mentioned in the written text on our website, genomicinterpretation.org. We’re going to transcribe this and we’ll put hyperlinks to the papers. I think it’s really important that those that need it have rapid access to what you just described.

Let’s step back a bit and let’s imagine non-expert stakeholders. These are people like patients, clinicians, health insurers. What is something unique from your perspective that you would like them to know to help them navigate clinical genetics and genomics?

ZOOK: I really like this question and I think there’s maybe a few different things that come to mind for this. One of them is something you mentioned earlier, is that not all genetic tests are the same and part of that is by their design even that they’re not the same. For example, there are tests that focus on a very narrow set of variants or genes. The sort of extreme for

looking at variants is some of the direct to consumer tests where they just look at common variants in the population and they might be able to tell you that you have this mutation

that might cause breast cancer down the road or that type of thing. But they’re only testing a small number of variants and they’re going to by their design, they’re going to miss a lot of other variants that might cause you to have breast cancer or other diseases. One of the important things to know is that there’s these different types of tests. There are also single gene tests that are often done with Sanger Sequencing, where usually Sanger Sequencing-based tests are quite accurate because this is a very mature technology, but they’re only testing maybe one gene at a time and so you’d miss any variants that are in other genes that might be important for that condition. That’s where these Next Generation Sequencing tests come in. There are ones that only have a few genes in them, there are ones that have hundreds of genes, there are tests that have thousands of genes, that cover the whole genome, or at least as much of the three billion bases as they can. That’s one of the things to understand is there are trade-offs there in what you’re going to detect and what you’re not going to detect based on how targeted your sequencing or method is.

The other thing that’s probably even more unique to our work in Genome in a Bottle is

understanding they may have different degrees of accuracy within these genes. So even if they cover the same genes, one test might cover pretty much any type of variant within that gene, another test might cover most of the gene, but miss a really hard part of the gene because they haven’t done special work to make it work there. Or it’s just due to limitations of their method that they’re using.

Also one thing we haven’t mentioned as much are these different variant types. There are single nucleotide variants where it’s just a single letter that’s changing in the genome and that tends to be easier to detect than small insertions or deletions and those tend to be easier to detect than larger insertions and deletions or larger structural variants. Different tests will have different accuracy for these different things. So it’s important to have this more nuanced understanding depending on what you want to detect.

EGGINGTON: I want to stress that really quickly because it’s something we work a lot on at the Center for Genomic Interpretation, is helping health insurance companies and other stakeholders better understand the limitations of the different genetic tests out there. Let me stress this, as Dr. Zook just said, you can have these clinical labs that have supposedly the same clinical panel. These are CLIA laboratories, these are CAP-accredited laboratories, even New York State Department of Health laboratories, dare I say it, even FDA-approved tests and that they’re marketed for the same thing, but actually one test can be more accurate than another. This is something that is not currently transparent to anyone in the United States. I would say the core of what we do at the Center for Genomic Interpretation is that we help stakeholders figure that out, which tests are more accurate? Because this is a really important hole or pitfall in the advancement of precision medicine, the fact that the marketing doesn’t necessarily match what these tests are capable of doing. That needs to be more transparent.

Is there anything else before I move on to more technical questions? Is there anything else you would like stakeholders to know from your perspective about clinical genetics?

ZOOK: Maybe just drawing one connection back to what I mentioned earlier about using Genome in a Bottle samples to understand strengths and weaknesses. Right now, we do have this set of regions of different difficult regions where you can see how well you’re performing for these difficult regions or for different types of variants. One of the things we’re actively doing some research on is making it easier to parse through all of these different types of complex genomic regions and understand how accurate is this particular method for homopolymers of different lengths, for a single base that’s repeated over and over again, and for these different complex types of repeats. Hopefully we’ll have more resources that make this easier to do in the future to get a more nuanced understanding of this and maybe in a more standardized way too.

EGGINGTON: We really hope labs adopt this and don’t just ignore the work that you guys are doing. This literally for many patients is life and death, the ability to have a good genetic, genomic test.

Can I jump into more technical questions now? If you’ve thought so far in this podcast that the questions were technical so far, hold on. They’re going to get more complex now. If you’re a laboratory person, definitely keep listening. If you’re a casual listener, you might honestly want to multitask for the rest of this podcast.

Let’s talk about this: as I mentioned, I run an independent non-profit organization that’s trying to help all laboratories improve the accuracy of their testing through a variety of ways we do that. We’ve been talking about Genome in a Bottle samples and how clinical laboratories use them in clinical test validations. We think that Genome in a Bottle samples have been very useful for clinical test development because, of course, they’re an easy source of characterized DNA. Absolutely critical. However, we’re concerned that the samples can be abused during test validation. I’m going to ask you for your response in a minute. Let me walk you through this, okay? We’re concerned that some clinical labs are abusing Genome in a Bottle samples and that ideally for a clinical test validation the lab would validate the test on the types of samples they intend to use when the test is utilized for patient care. You want to know how the test is going to perform on exactly the type of samples that they’re going to get in the clinical setting. However, some labs just repeatedly use Genome in a Bottle samples with very few, if any, patient samples to validate their tests. They’re not using patient samples at all, or very few of them, they’re just using Genome in a Bottle samples to validate their tests. These labs, therefore, have measured their test capabilities on a very narrow set of kind of unrealistic samples and they don’t really know how their tests will work in the clinical care of patients. Given that we think people are abusing what you’re creating, how would you respond to this?

ZOOK: I think that’s a really good point. One of the things that’s really important to know as you use Genome in a Bottle samples is how are they appropriately used and what are their limitations, or what else should you do in addition to using the Genome in a Bottle samples. I definitely completely agree that labs should not only be using Genome in a Bottle samples for their validation. There are a few reasons for that I think, alongside what you’ve sort of already mentioned. One of them is that our Genome in a Bottle samples are extracted DNA if you get them from NIST as our reference materials. So it’s not testing any part of your process before you get DNA, any DNA extraction or anything else that you’re doing, or if you have FFPE samples, for example, certainly should be also testing FFPE samples as part of your validation.

Those sometimes called ‘pre-analytical’ steps are really important as well and that’s not really what Genome in a Bottle is aimed at helping you test. I think that’s one of the important points to know and the other point maybe is downstream from Genome in a Bottle, where I mentioned earlier, we stop at the analytical performance or how well are you detecting these variants. We’re not helping you to understand how well you’re interpreting variants, and generally, Genome in a Bottle samples don’t have very many clinically interesting variants in them. We’re not helping you to see how well you’re going to interpret these variants or even detect particular mutations that you might be interested in making sure that you want to detect. Probably having samples that at least have some of the common mutations that you want to detect or difficult mutations that you want to detect is quite important.

This might be a good place to mention some of the resources that have been built on top of Genome in a Bottle too. In addition to us making the DNA available and characterizing that really deeply at NIST, other members of the Genome in a Bottle Consortium and even outside Genome in a Bottle have developed some products that are based on what we’ve done in Genome in a Bottle, but modified these cell lines in different ways. There are some where they’ve taken the Genome in a Bottle DNA and spiked synthetic DNA that has particular mutations into it, so then you can test how well you detect this particular mutation that might be important for you to be able to detect. Now, again, it’s not perfectly mimicking a patient sample, but it’s at least getting one step closer to what a patient sample looks like.

There’s another effort I’m involved in that’s organized by the Medical Device Innovation Consortium that’s called the Cancer Genomic Somatic Reference Samples Project where they’re taking the Genome in a Bottle cell line and engineering somatic variants into the cells. Though in this pilot project, they’ll be engineering 10 different somatic variants of different difficulties into different clones of the Genome in a Bottle cell line and then putting them into FFPE so that labs can see, how well do you detect these variants in FFPE? Again, these are not perfect mimics of real patient samples so you should still have real patient samples in addition to all of these. But these are resources that have been developed and I think validation probably should include most or all of these different types of samples as part of the process, because each of them have strengths and weaknesses in what they’re helping you to validate.

EGGINGTON: That makes sense. Thank you for clarifying that. The take home message, labs, is: don’t just rely or overly rely on Genome in a Bottle samples for validating your tests. You need to validate on not only Genome in a Bottle, but also more realistic samples or very realistic samples preferably. You mentioned something about if you’re ordering the Genome in a Bottle samples from NIST, that they extracted DNA. Is there any other place they can order these Genome in a Bottle samples from in a different form?

ZOOK: Yeah, the cell lines that we used for Genome in a Bottle are all hosted by the Coriell cell line repository that’s sponsored by NIGMS at NIH. There’s an NIGMS repository at Coriell where you can get both DNA and cells from these cell lines from. One of the things to recognize if you get the DNA or cells from Coriell is that they come from a different batch from what we’ve mostly characterized in Genome in a Bottle. We worked with Coriell to grow up a really large batch of DNA and aliquoted into thousands of vials. That’s what we distribute at NIST as our reference material. Most of our sequencing comes from that DNA from that really large batch where we think that each vial should contain essentially the same DNA. Sometimes, cell lines will change over time. We don’t think this is a huge effect or we haven’t seen evidence that this is a huge thing. But it’s possible if you get the DNA or cells from Coriell that there may be some changes that happen, and probably mostly this will show up as your accuracy looking worse than what it actually is.

EGGINGTON: No lab wants that.

ZOOK: Yeah, that’s right.

EGGINGTON: This actually reminds me to say something. I want all the laboratory folk and everyone who’s interested in what Justin’s saying here to know that they can email him directly. Please do, his email is on the title slide. I’m gonna say it here so everyone knows: it’s justin.zook@nist.gov. Please, he is there to help if you have questions. If you, for example, bought one of these Coriell cell lines and you’re repeatedly getting something different than what you expect, go ahead and email Justin. He might not want me to say that, but go ahead and email him any questions: what’s up and coming, how should you be using these samples, what are the limitations of the samples? Please email to ask. You’re not left on your own with these things, he’s there to help you, his team is there to help you.

On the topic of helping, one question that you might get from the laboratories is: can the clinical laboratory treat a Genome in a Bottle sample as absolute truth? Can they do that?

ZOOK: That’s a good question. We intentionally usually don’t call our benchmarks ‘truth sets’ or ‘the truth’ because we know that they are not truth with a capital T, in the sense that they are absolutely perfect across all of the regions that we’ve defined. Now, what we do to evaluate our

benchmarks, sometimes what we say is that we’ve evaluated them to be ‘fit for purpose.’ When we say fit for purpose, we’ve come up with a particular definition of that for our benchmarks that we’ve released recently. Essentially, we take some high-performing methods, or results from some high-performing methods from different sequencing technologies and different analysis methods, compare their results to our benchmarks, and then go in and manually look at some of the differences between these other methods and our benchmark. Our goal is that at a minimum, most of the differences should be errors in the other method and not errors in our benchmark. Usually it’s quite a bit more than half. Admittedly, this is a moving target and that’s part of why we release new versions of benchmarks over time, so that as new methods are developed, or as clinical laboratories or other people report potential errors to us, we can improve those over time. Along the lines of what Julie just said, you should feel free to reach out to me if you think you might have identified any potential issues with our benchmark. It could be that it’s a nuanced thing and I’m happy to have conversations about how to interpret the results there. But, sometimes it is an error in our benchmark and that’s really useful information.

EGGINGTON: Here we go Justin, so keep me informed on this. The first laboratory that listens to this podcast and reaches out to you with an error that they have found in your stuff, that you validate and it’s true, they helped you find something, I will send that person who reached out to you a $25 Amazon gift card. That’s my hope to encourage collegial cooperation in this work. So keep that in mind.

That leads us into something called the PrecisionFDA Truth Challenge. I want to talk about that. In the recent PrecisionFDA Truth Challenge that used Genome in a Bottle samples, many of the best performing variant callers used artificial intelligence and machine learning. What might be needed for these methods to be used broadly by clinical labs? What are your thoughts on that?

ZOOK: I think this is really interesting. For those of you who haven’t heard about this PrecisionFDA Truth Challenge, essentially we worked with the FDA and other collaborators to make available several data sets for our Genome in a Bottle samples. For one of the Genome in a Bottle samples, we already had released our latest version of the benchmark for that. But for the other two samples, we had not released the latest version of our benchmark yet. What we did is, we asked the participants in this challenge to analyze these data. which came from multiple different technologies, so they could choose which one they wanted to analyze. They would analyze these data, make variant calls from them, and submit them to the challenge. After the challenge, we would make our latest benchmarks available for all three genomes and we would say what is the accuracy of these different methods afterwards. One of the interesting

things from this challenge is that particularly for the newer sequencing technologies, like these long-read sequencing technologies, these artificial intelligence or machine learning-based methods tended to be the best performers for those technologies. These also did quite well for even the traditional short-read technologies. Though, there were some other innovations that

also led to advances in short reads, like using some of this Pangenome type of work that Julie mentioned earlier.

But for these machine learning- based methods in particular, I think one of the questions that often comes up is, are they overtraining to the Genome in a Bottle samples? Do they look like they’re performing better than they really would on any other sample that is being used? This is one of the reasons that these challenges have been useful, because we’ve kept the truth for some of the samples blinded during the challenge. So you can at least know they’re not grossly overtraining to the Genome in a Bottle samples. We don’t yet have really great ways of assessing whether there are more nuanced overtraining effects, and probably there are ways of intentionally overtraining too. That might be harder to catch. But I think for these challenges that we didn’t see any evidence of intentional overtraining. There maybe is a little bit of nuanced overtraining, but it’s actually hard to tell whether it’s actual overtraining or differences in our benchmarks, or exactly what’s causing the differences. One of the things that I feel like is needed is maybe some standards or some methods to assess these machine learning-based methods in a robust way to make sure that they’re not overtraining. Make sure like you mentioned earlier, that you shouldn’t only use Genome in a Bottle samples, you should also use real clinical samples. Probably that’s part of that process also, is using some real clinical samples to make sure that they’re working well on those. But I think they do show a lot of promise, particularly for these newer technologies where they’ve enabled adoption of these long-read sequencing technologies, at least for research purposes, much faster than they would have been able to be adopted if we didn’t have those methods.

EGGINGTON: That makes sense. That’s a lot of data for an intern to sit there and process. You probably don’t want an intern doing it anyways. That’s good to know.

Alright, so let’s talk about specificity. First of all, Justin, I’m going to have you define in the context of Next Generation Sequencing, what we mean in this discussion about specificity. Then I’m going to ask you some questions about it. Can you first define specificity for us?

ZOOK: Essentially what specificity is trying to measure is your false positive rate, or how many false positives you have.

EGGINGTON: Sorry, I’m just going to interrupt you. We’re not talking about clinical false positives here, we’re talking about detection calling false positives, is that correct?

ZOOK: Yeah, that’s a good point that in this context, false positive means a variant that you

detect that should not have been detected, because it was reference or that variant is not in that individual. There are a lot of nuances around that definition, but at least roughly that’s correct. The definition of specificity is typically true negatives divided by true negatives plus false positives. True negatives plus false positives is in the denominator, true negatives is in the numerator. If you don’t have any false positives, then your specificity would be one, but as you start getting some false positives, it starts dropping below one.

EGGINGTON: Okay. Another way to say ‘one’ is 100%. This has been a challenge for the Next Gen Sequencing community. It’s been a challenge to adopt the concept and how we report specificity due to the inherent high specificity that exists for these types of tests. Can you comment on the pros and cons of quantifying specificity for a Next Gen Sequencing test? Are there alternatives?

ZOOK: This is something that we’ve had a number of discussions with people over the years. I’m not sure that there’s a clear solution to it right now, but there are alternatives. The limitation of specificity is partly that there are so many true negatives in the genome, because only about 0.1% of the genome is variant, only one out of a thousand bases has a variant in it. That means most of the genome is true negative. That means that often your specificity number – it’s hard to even define what true negative exactly means in the context of genomics – but if you count every base that doesn’t have a variant in it as a negative, then your specificity usually ends up being 99.999% or something like that. It looks really good – and sometimes it is really good – but it’s hard to tell at face value naively whether that’s good or not. Some of the alternatives that we use are something that’s called either ‘precision’ or ‘positive predictive value’, where instead of having true negatives in the numerator and denominator, you have true positives in the numerator and denominator. So, it’s true positives divided by true positives plus false positives.

EGGINGTON: Wait, tell me that again. What is it?

ZOOK: It’s true positives divided by true positives plus false positives. True positives plus false positives are in the denominator.

EGGINGTON: And that term, what do we call that?

ZOOK: It’s called ‘precision’ or ‘positive predictive value.’ One of the problems with these terms is that they have different definitions.

EGGINGTOn: I prefer precision because positive predictive value, to me, makes me think of the clinical setting. I’m going to hold this up to make sure I got that right.

[Holds up paper with writing: “TP/(TP+FP) = Precision”]

Is that right?

ZOOK: Yes, that’s right.

EGGINGTON: Okay, so this is a better term than specificity, is that right? In your opinion?

ZOOK: Well, it’s less subject to misinterpretation. I think it’s probably the clearer way to say it. Because there are many fewer true positives than true negatives, there are thousands of the true positives that there are. That means you end up removing a lot of the nines from that metric when you’re calculating it. That’s one metric that’s commonly used. One of the downsides of precision is that different individuals have different numbers of variants in them, so someone of African ancestry has more variants than someone of European ancestry in general. That means that if you have the same number of false positives, it might look like you’re doing better in someone of African ancestry, when in fact you’re not. These are nuances that may or may not be important, but it’s worth acknowledging some of these differences. The other definition that I think might even be better, but isn’t as commonly used, is the number of false positives per megabase of the genome. Basically this is saying, how many false positives do you have in every million bases of the genome or million bases that you’re covering in your particular panel?

EGGINGTON: Does that have a name? Are we calling that anything?

[Holds up paper with writing: “# of FP per Mbases”]

ZOOK: I don’t know of any name for this.

EGGINGTON: I’m going to call it the ‘Zook Coefficient.’

ZOOK: I didn’t come up with it, so I’m not sure you should be naming it that.

EGGINGTON: But you can claim it now, come on. Okay, number of false positives per megabases, that seems like a great number to me because it makes so much sense. In context of – how many megabases are in this clinical test – that’s going to tell you a lot.

ZOOK: That tells you per sample, how many false positives will I expect? I think it has a little bit more of an intuitive meaning than some of these other metrics do, even though I think some of the regulations require you have specificity as one of them.

EGGINGTON: Yes they do, and because of that, it has allowed, in my opinion, many bad

tests to enter the clinical market. Even a rubbish test using a legacy definition of specificity will look fantastic. Their specificity will be 0.9999%. It’s like ‘wow, that’s a perfect test’, but in fact, every patient could be wrong. You could be devastating the clinical space because of it.

I want to jump to the next question now because it segues so well. Are we good with me moving on to Sanger confirmation?

ZOOK: Yes.

EGGINGTON: Okay, so this is my last thing for the labs. My last question here for you is: based on the data you’ve seen, keeping in mind that false positives – again we’re talking about detection and variant calling here, we’re not talking about the false positives that happen with interpretation, that’s my passion, but let’s talk about just the detection variant calling here – how do you feel about Sanger confirmation in Next Gen Sequencing testing?

ZOOK: It’s an interesting question that’s definitely been debated a lot in the field. Many people might be familiar with the study that was published out of Les Biesecker’s lab several years ago that essentially showed that a lot of the time that they were doing Sanger confirmation, it ended up saying that a variant they detected was not correct, when in fact it actually was a correctly detected variant. In some senses, they argued that it does more harm than good because of that. There’s probably nuances around that, where if you interpret the Sanger results in a nuanced way where you know where a Sanger test is likely to give problems versus where it’s not, then I think it still can be used effectively to confirm some variants. One of the things that some labs have tried to do is instead of confirming all of their variants, they’ll just confirm ones that they think might be questionable. So just ones that are in difficult regions of the genome, like we mentioned earlier, or a difficult type of variant. As long as you’re pretty sure that Sanger performs well in that particular genome context, then I think it’s a reasonable confirmation to do. But, one thing to know is that some of these regions that are hard for NGS are also hard for Sanger Sequencing. That’s where I think some of these newer technologies like the long-read technologies may help with doing a better job of confirming things in an orthogonal way in some of these regions of the genome where it’s much harder to detect variants. Having another technology where you have long reads potentially could be a better way of doing confirmation if you think a variant needs confirmation.

EGGINGTON: Which is interesting. Today’s date is the 7th of July 2022. As of today, to my knowledge, there is no clinical lab in the United States that has implemented long-range sequencing. I think that there are some labs – I know of one that is definitely planning on it – because there’s so many advantages, just like you’ve described, to do it. But it of course will

take clinical validation, and it’s challenging whenever you have a brand new technology, to implement on how exactly do you define the validation?

Let’s think about Les Biesecker’s paper. You’ve already said this, unless Les Biesecker’s paper – I don’t think they’re being nuanced in challenging ‘could I trust Sanger at this position?’ in that paper, Sanger was more often wrong – if I recall the paper, it’s been a few years since i’ve read it – Sanger was more often wrong when there was a discrepancy between Sanger and Next Gen Sequencing. What do you think, and have you guys done this, where you’ve been nuanced in the approach like you’ve been describing? Know where your Sanger is performing well and know where it’s performing poorly, know where your Next Gen Sequencing is performing well, and know where it’s performing poorly? When you’re more nuanced, which one outperforms the other? Do you have any ideas on that? Maybe you don’t. This is for germline, we’re not gonna talk about somatic.

ZOOK: That’s one thing to note here, is that for somatic there are definitely a lot more caveats. But for germline, and also we may be ignoring mosaic variants.

EGGINGTON: Yeah, ignoring the complex stuff.

ZOOK: I haven’t used Sanger Sequencing a lot because we’ve been fortunate in Genome in a Bottle to have a wide variety of Next Gen Sequencing technologies that we’re able to use. In general, having many Next Generation Sequencing technologies is better than trying to confirm things with Sanger, but that’s not the situation that most clinical labs are in.

EGGINGTON: No, most clinical labs in germline are using Illumina instruments and in the somatic space, they’re using mostly Illumina but sometimes Ion Torrents, is what we’ve seen.

ZOOK: I think that’s right. In that context, you can do targeted Sanger Sequencing really easily, or at least relatively easily. Common things to be aware of is if it’s in a homopolymer, it’s probably likely to be hard for both methods. I’m actually not sure which one performs better, and it probably depends on exactly how…

EGGINGTON: -How you do the chemistry.

ZOOK: Yeah, and in some of the difficult to map regions, one of the ways we have used Sanger

Sequencing recently in Genome in a Bottle is to do long-range PCR followed by Sanger Sequencing in some of these difficult to map regions. That’s going to provide you legitimately

complementary information to standard, short-read sequencing, which really can’t use that information. Likely there are advantages to using Sanger in some of those regions. I think for some of the indels, like heterozygous insertion and deletion variants, at least from what I’ve seen, they’re easier to see in Next Generation Sequencing than they are in Sanger Sequencing.

EGGINGTON: Yeah, Sangers get so garbled eventually.

ZOOK: Or it’s at least hard to interpret the results from Sanger. Going and manually looking at your Next Generation Sequencing data is probably a better method of validation than trying to get back to Sanger Sequencing

EGGINGTON: Just to shout out some of the legacy technologies that have pretty much been abandoned, we had Southern Blots, we had MLPA, we had these other technologies that handled some of these more complex structural variants a lot better than any of the sequencing technologies right now. But, they were very costly, they were not high throughput. That’s why we’re excited to see technologies like the long-read stuff really beginning to progress, to hopefully eventually do what we used to be able to do, is one way to say it. I don’t know if you agree, but I’m excited to see where this goes.

ZOOK: Yeah, there are definitely a lot of exciting new technologies coming out.

EGGINGTON: Exactly. Justin, I have come to the end of the questions that were prepared. Is there anything you want to throw in before we say thank you? Are there any concepts or thoughts?

ZOOK: No, I think that’s all. Like Julie said, feel free to reach out to me if you have any questions. I am always happy to hear what would be useful to you as a clinical laboratory also and how we might address any needs that we haven’t met yet.

EGGINGTON: Okay. Thank you Dr. Zook. It’s been a pleasure and hopefully you get a lot of great response from this podcast.

ZOOK: Great, thanks so much.

EGGINGTON: Have a good one.

ZOOK: Thank you.

The post Justin Zook, PhD: A Conversation with the Co-Leader of the Genome In a Bottle Consortium Developing Standards For Benchmarking Genetic Variant Detection appeared first on Center for Genomic Interpretation.

← Dr. Justin Zook – Insights into updates and resources in the clinical genetics space from the Genome In A Bottle Consortium Co-Leader Grail Galleri testing proving to have high false positive rates →