On The Rising Tide Podcast, scientists from the independent, nonprofit Center for Genomic Interpretation discuss with leading experts the need to further raise the bar on accuracy and quality in clinical genetics, genomics, and precision medicine. Only through improving accuracy and quality can the promise of precision medicine be realized. CGI’s ELEVATEGENETICS services are made available to health insurers and stakeholders, to help them identify the most accurate and clinically useful genetic and genomic tests, and to help them steer clear of low quality tests that have the potential to lead to patient harm and wasted expenditure.
Podcast Introduction 00:00 to 06:11
Interview begins at 06:12
GARLAPOW: Imagine you have cancer and there’s also lots of cancer in your family. Your healthcare provider orders you genetic tests. The results come back negative, in this case meaning the labs your healthcare provider used didn’t find any genetic variants or mutations that explain your family’s cancer or that are treatable markers for your current cancer. You pause and ask your doctor, “are you sure the test didn’t find anything? Could the labs have missed something?” The short answer is yes, the labs could have missed something, and some labs are missing more than others, but many labs don’t do what is necessary to find out what they’re missing. Today we will be discussing the process labs use to determine how accurate their genetic tests are before they get used on patient samples. This is referred to as proficiency testing. Dr. John Pfeifer MD, PhD at Washington University School of Medicine in St. Louis is a specialist in these kinds of proficiency tests.
PFEIFER: Proficiency testing at the end of the day is the external mechanism that various regulatory agencies and, payers as well, insurance companies, use to ensure that laboratories, if they do a test, are getting an accurate test result. Who should care about it is virtually everybody who has personally or has a relative or someone that they know who has a disease, whether it’s inherited or whether it is cancer for that patient for whom the specimen has been submitted for next-generation sequencing analysis.
GARLAPOW: Dr. Pfeifer will explain why this topic matters.
PFEIFER: What you don’t know can hurt you, and that’s the issue here. There’s also that statement out there that ‘ignorance is bliss.’ Well, ignorance is bliss, unless it actually has to do with your healthcare, right. I don’t know of any clinical labs that are running patient specimens with the idea that they have problems in their bioinformatics tools, they have problems with their tests, and they just don’t care, right that’s not ethical, and laboratories do the best they can. But ignorance isn’t always bliss and what you don’t know can hurt you.
GARLAPOW: Dr. Pfeifer explains some of the limitations of using only traditional real samples, known as wet samples, to measure the accuracy or proficiency of genetic tests.
PFEIFER: The problem with a wet sample, a group of 50 cell lines has been manufactured, is once the laboratory uses those, they know the answer. Okay, so, you’re back in high school, right, you go in and you’re in your algebra test or your trigonometry test, and the teacher gives you a test and there’s ten questions, and you don’t do so well, okay. So then the teacher says well you’re going to have to study a little more and take it again. Well you go in the next time, and you know going in the next time that it’s going to be exactly the same 10 questions, right. That’s the problem with these cell lines that have been produced, they’re great cell lines, they have high utility, but once they’ve been used once, believe me the NGS community knows what variants are there. And so they’re no longer blinded specimens. You can’t use them a second time, right. Now, yeah, if they’re cell lines, you might be able to mix them, but you know the variants. If you are a clinical laboratory you know the variants to look for and you can set your bioinformatics pipeline to actually trigger or focus on those particular variants. That’s a problem with designed wet samples. People spend lots of time and money developing these cell lines, and they are incredibly useful, once. Once. and then they are not available to be used in a blinded fashion anymore.
GARLAPOW: Dr. Pfeifer talks about how computerized methods, called in silico, can help solve these problems.
PFEIFER: So one of the things that in silico enables you to do essentially is this iterative process much faster, much more inexpensive, much more comprehensively so that quality testing can improve, right. Laboratories do not design bad tests that have low clinical utility, low accuracy on purpose. All laboratories, if they’re CLIA licensed, are performing high quality testing for the samples that they’ve analyzed. The problem is what you don’t know can hurt you.
GARLAPOW: We also discuss why nearly all labs score very highly on current proficiency tests, masking the true problem.
PFEIFER: One way you can get a high level of accuracy is to give easy variants and if you go and dig in the data, the variants have been pretty plain vanilla. There are occasional variants which labs don’t do very well on, but those have not been well represented in these proficiency challenges, okay. The proficiency testing that CAP to-date has performed, the challenges that they offer, have avoided variants which they know laboratories have trouble with. Now there’s an explanation for that and that is you have to walk before you run. That the approach, CAP’s approach has been, let’s test laboratories on the simple variants and then we’ll move on to the more complex variants. I’m not sure I personally agree with that approach. If this is a class of variants that is regularly encountered in routine clinical practice, I think you have to challenge labs on it.
GARLAPOW: Finally, Dr. Pfeifer addresses how we need to raise the tide when it comes to testing the accuracy of genetic testing and the importance of this process as genetic testing evolves over time.
PFEIFER: So when you think about these limitations of wet samples, when you think about all the bioinformatics that are designed around each one of those assay types, we’re going to have to do more rather than less in order to make sure that the accuracy of next-generation sequencing is actually supporting patient care, is actually giving results that can be used to improve patient care.
GARLAPOW: I’m Dr. Megan Garlapow with the Center for Genomic Interpretation and you’re listening to the Rising Tide Podcast, where we learn from experts about improving the accuracy and quality of precision medicine and clinical genetics and genomics. Please note that this podcast does not provide medical advice and is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified healthcare provider with any questions you may have regarding a medical condition. Additionally, comments of the Rising Tide’s guests are their own and do not necessarily reflect the position of the Center for Genomic Interpretation.
Today, I’m joined by MD and PhD, Dr. John Pfeifer, Professor, Pathology and Immunology, Vice Chair for Clinical Affairs, Pathology and Immunology, Director, Molecular Genetic Pathology Fellowship in the Department of Pathology and Genetics at Washington University School of Medicine in St. Louis. Please note that Dr. Pfeifer’s views are his own and do not necessarily represent the views of his employer. Additionally, some of what we discuss is speculative in nature and/or describes use of approaches that have not necessarily been validated or approved.
In this episode of the Rising Tide Podcast, we discuss among other topics, Dr. Pfeifer’s work on in silico proficiency testing, which is a type of proficiency testing. That is a mouthful. In in silico proficiency testing, you send your sample from your tumor, or blood, or whatever tissue is indicated, and a laboratory sequences it. The sequencing machine has created a big chunk of data; this is where John comes in with in silico proficiency testing, which evaluates the bioinformatics of genetics and genomics. John will describe in silico proficiency testing more thoroughly in a moment. Briefly, now, he hides variants in the data. As a reminder to our listeners, a variant is a change in the DNA previously, and sometimes still called a mutation, if a variant is detected and annotated it can then be classified as pathogenic or likely pathogenic, a variant of unknown significance, or benign or likely benign. In addition to his work at Washington University in St. Louis, Dr. Pfeifer’s company, P&V Licensing, is a direct resource with which laboratories can engage to assess and evaluate variant calling pipelines. Furthermore, P&V Licensing contracts with the Center for Genomic Interpretation to support thorough assessment of variant classification.
Before getting into the details of proficiency testing and the specifics of in silico proficiency testing, John, why should anyone care about proficiency testing? Who are the stakeholders here?
PFEIFER: That’s a very good question, and that’s a great place to start. Who should care about it is virtually everybody who has personally, or has a relative, or someone that they know, who has a disease, whether it’s inherited or whether it is cancer, for whom they’re a specimen, for that patient, and for whom a specimen has been submitted for next-generation sequencing analysis. Proficiency testing, at the end of the day, is the external mechanism that various regulatory agencies, and payers as well, insurance companies, use to ensure that laboratories, if they do a test, are getting an accurate test result. So the stakeholders are not only patients, but their insurance companies and the regulatory agencies, and equally as important, the stakeholders are the physicians taking care of patients. So the patients would like to be certain that they’re getting quality testing, that’s getting the correct answer. Physicians, as the care providers, would like to be certain that the results that they’re getting on the testing that they’ve ordered is accurate, so that they can make the appropriate treatment decisions. And of course payers want to be paying for quality testing, and only quality testing, and then finally the regulators, which are basically society. Society doesn’t want to be spending money on testing that not only isn’t accurate, but has the potential to cause harm.
GARLAPOW: Okay, that’s very interesting. Now, you do a type of proficiency testing, called in silico proficiency testing, but first let’s go over, how is proficiency testing normally done?
PFEIFER: Right, so proficiency testing is something that is mandated by a federal law passed in 1988, called the Clinical Laboratory Improvement Act in 1988, and basically what that said is, if you are a laboratory, and you are performing testing on a patient specimen, and that testing, for the record, is not limited to next-generation sequencing or genetic testing, it’s any kind of testing, it’s microbiology testing, it’s clinical chemistry testing, it’s any clinical testing on, that is going to be used for diagnosis or treatment of a patient, that you need to perform proficiency testing, which means, that if you have to participate in an external program to prove that your testing is accurate. So, there are a number of different mechanisms by which proficiency testing can occur, but the key phrase in all of that is that it’s external and objective. So if I’m a laboratory, except in a very few arcane specific settings, I have to find some external source that will give me clinical specimens, that will provide to me clinical specimens, that I can run through the routine laboratory processes, and then I have to, in a blinded way, so those specimens have to come in a blinded way, and then I do the testing, and then I have to send the testing back to the group or the entity that provided the specimens, and then they, that external group, has to evaluate whether my answers are correct or not. Now, part of proficiency testing is that that evaluation is objective and that if the laboratory is wrong, that carries certain implications for the laboratory. The laboratory has to understand, has to write a report as to why the result was wrong, and the laboratory has to make changes in order to convince the regulatory agency, or the provider of the proficiency testing, that they can actually get the right answer. And in a worst case scenario, the laboratory can be prevented from testing patient specimens until their paperwork emphasizes or shows that they have changed their process in a significant way that they can now get the correct answer. And in that previous answer, I just want to make sure, when I say “I” for the rest of this discussion, “I” actually means “me” as the clinical laboratory. I want everybody to recognize that instead of saying it’s a lot simpler to just say “I” where John all of a sudden becomes a clinical laboratory then for me to always say, “If you’re a clinical laboratory doing CAP or CLIA license testing blah blah blah.” It’s just simpler for me to say I. So again, for review, a couple key points. It’s external, meaning you don’t have any, the laboratory doesn’t have any control over that testing or what variants, what abnormalities are in that specimen. Number two, the laboratory has to run that specimen in their routine test and report the results. And number three, some external entity is grading those results to see if the lab got them right or not. And this is not a toothless mandate. If you’re not getting the right answer, the laboratory can be shut down until the processes improve to the point the laboratory can demonstrate they are getting the correct answers
GARLAPOW: Okay, that’s really interesting, John. Now in silico proficiency testing is a specific kind of proficiency testing. So what exactly is it, what do you do when you do in silico proficiency testing?
PFEIFER: Right, so, I just gave you a very broad answer about what proficiency testing is in general, and as I said, proficiency testing is true for the microbiology laboratory, it’s true for the clinical chemistry laboratory, it’s true for the blood bank and it’s also true for genetic testing, and for the purposes of this conversation we’re going to talk about next-generation sequencing, this massively parallel sequencing that is used not only the setting of cancer but in the setting of inherited diseases. So proficiency testing applies to those laboratories just like any other clinical laboratory. That’s the first point. The second point is before we go any further, since we’re now talking about how proficiency testing applies to next-generation sequencing, I want to make sure that I define a few abbreviations or a little bit of jargon that I’m certain I’m going to slip into during the course of this conversation. And so I just want to get it out there in case I refer to it, and sometimes it’s actually just quicker to use the abbreviations. So single nucleotide variant. This is a single base pair change, often abbreviated, single nucleotide variant, as SNV or SNVs plural. An insertion or deletion, which is a change in the number of bases, generally by definition, less than a kilobase, but usually just a dozen or so bases; insertion or deletion, I will likely call that an indel. Copy number variants, these are changes in the number of copies at a specific locus or of a specific gene, copy number variants, CNVs. And the last thing is structural variants. These are large scale structural changes, such as translocations, and structural variants, SVs. Alright, so that’s the first thing, I will undoubtedly use those abbreviations because it’s so much faster to use them. The second thing that…
GARLAPOW: Just to confirm when you say a kilobase, just for our…
PFEIFER: Oh, a kilobase is a thousand bases.
PFEIFER: It’s a thousand bases, just, right. Good point. And just to put everybody, yeah let’s just establish the landscape here. Your genome is about six megabases, okay, it’s about, or excuse me, six gigabases. It’s about six billion base pairs, alright, six billion base pairs. Your genome has two copies of about 20,000 genes, 20 to 23,000 genes, depending on how you count them. Each gene, depending on how big it is, ranges from a couple kilobases to hundreds of kilobases. And so that’s one of the things we’ll get to when we talk about proficiency testing, is when you’re considering proficiency testing in the setting of genetic testing, the landscape is huge, the landscape is really really really huge.
Okay, so we’ve talked about abbreviations. And the other thing I want to make sure that we understand is, generally, how a next-generation sequencing test occurs. And I’m not going to go into this in any detail, so anybody listening don’t roll your eyes, I guarantee I’m going to get through this in about 30 seconds, I just want to make sure that when I say something people understand the words I’m using. So, a next-generation sequencing test can be divided up into five parts, really. The first part is nucleic acid extraction. so a tissue sample comes into the lab and you’ve got to get the DNA or RNA out of the tissue. That’s called extraction. That itself is a pretty technical process and depending on how good you are at it you can do a better or not so good job of actually extracting nucleic acids. The second step is once you have the nucleic acids, is you have to make what is called the “library,” the series of molecules that you’re actually going to put on the sequencing platform. So that library preparation step is very complicated; it has about 80 separate steps to it. Fortunately it’s automated, and it only takes about a day or a day and a half, but at the end of it you’re left with the molecules that you’re going to put on the sequencing platform. So first step, extract nucleic acids. Second step is you make what’s called the library that will be sequenced the group of molecules. The third step is the actual step of sequencing the nucleic acid molecules. for the purposes of this conversation we’ll just talk about DNA. There are several platform manufacturers out there, lots of machines out there that do this by different physical and chemical mechanisms. That’s the actual sequence, that’s the actual process on this platform by which you’re making sequence, you’re making– that the sequence reads. So the sequence read is the series of A’s, C’s, T’s, and G’s that define the sequence of that piece of DNA that the machine has just sequenced. The fourth step of that is you take all that information that comes off one of these platforms,and to give you the scale of that, that information, those raw sequence reads that come off the machines are about two terabytes in size, okay, so one to two terabytes, yes, it’s a huge huge huge chunk of data that comes off the machine each time you run it. So, given that there’s that much information, the fourth part of next-generation sequencing is critical, and that’s what’s called the bioinformatics. This is the computerized process by which a group of software tools, and there are lots of them, go into that huge terabyte level data and extract the information that’s in there. They take the data, they figure out where that sequence matches up with reference sequences for the human genome, the bioinformatics tools figure out if there’s a variant, if there’s a sequence change, if there’s a mutation. If there is a mutation, what is the mutation? That software figures out how many copies of the gene there is, or the locus where the variant is, and on and on. And so that’s that very very computationally intensive piece called bioinformatics. And then the fifth part of this, is what’s called the reporting piece. And that’s after the software, the bioinformatics software, has identified variants, figured out what they are, where they align, then a pathologist or a geneticist has to go in, look at those variants, and issue a report that says we found these variants, and this is the clinical significance of the variants we found. As you mentioned in the introduction, sometimes there are variants of unknown of, uncertain significance, VUSs. So, there is, at the end, an interpretive piece usually where a pathologist or geneticist has to interpret what was found in the clinical setting, but the fifth part is of course is issuing a report. Now, I know that was a lot, and actually I apologize it took longer than 30 seconds, but it was important to understand that landscape, because proficiency testing as it is normally performed is what we refer to rather glibly as ‘soup to nuts.’ proficiency testing for next-generation sequencing by its very nature if it’s going to be complete, looks at all five of those steps. So it starts with what we refer to as a wet sample, and then a laboratory will extract nucleic acids, make a library, sequence it, do the bioinformatics, and then report. Now in silico and okay so just take that as the model of the way formal proficiency testing is performed; now in silico proficiency testing is a completely different model that is limited to the bioinformatics piece only, limited to the bioinformatics piece only. Now, that has advantages and disadvantages. The advantage is by focusing on the bioinformatics piece, you can stress those bioinformatics tools in ways that are more precise, in ways that are more general, in ways that are more focused than you can through wet lab specimens, and I’ll talk about how that’s possible. The disadvantage is it doesn’t do anything. There’s no part of in silico proficiency testing that allows you to evaluate steps one, two, and three, the extraction, the library preparation, the actual sequencing. And it really doesn’t allow you to look at reporting directly either, so it’s very focused on one of the five steps. But, let me explain why it is utilized because of that advantage. So, those bioinformatics tools can only recognize variants that are present in the specimen that has been sequenced. So let’s take a step back and think about that. How many variants are out there that could potentially exist in a patient specimen? Well that number is basically infinite. That number is basically infinite. There may only be “only” six billion base pairs. At any particular locus or any particular base, there can be a mutation of a number of different kinds. There can be an SNV, it can be part of an indel, it can be part of a copy number change or a structural change, and those variants can be present in any number of combinations. So there is a large number, if not an infinite way, that a range of sequence variants can be present in a patient specimen. Okay. Now let’s talk about the traditional proficiency testing with the wet specimen. That wet specimen, at most, can only harbor the variants that are present in that wet specimen. So if that’s the cell line, that cell line is only going to have the number of single nucleotide variants, SNVs, or copy number changes, that are present in that wet specimen. Similarly, if it’s a cell line, if it’s a patient specimen, there’s going to be a small number of sequence changes that are present. Even in tumors, even in tumors that have a high level of genetic instability, there are probably only in the number of several thousand sequence changes, alright. Now when you sequence that particular wet specimen, you can ask the bioinformatic pipeline, can you identify the sequence changes that are present, the SNVs or the indels or structural changes that are present, and ask, and by very nature proficiency testing asks, can the bioinformatics find the target the mutations that we’re questioning. But beyond that, if you want to test another group of variants, the same variants but in different association with different variants, you need to have a different wet specimen. And doing that process of nucleic acid extraction, library preparation, sequencing, that’s an expensive process that has a finite cost generally to the laboratory, generally at least hundreds of dollars. And it’s difficult to understand, even if a laboratory had unlimited resources, how the laboratory would ever look at enough wet specimens to actually capture all the potential variation. So you’re probably sitting there thinking, well John there’s a number of ways to do that. You could mix patient specimens. Well, the problem with mixing patient specimens is that bioinformatic tools have a number of filters, so that if a specimen is contaminated, the bioinformatics filters say, wait a second that’s contaminated you need to go back and check on that specimen. So, in proficiency testing, you can’t mix specimens because the bioinformatics will recognize those specimens are coming from at least two different individuals and that won’t work. So you say, well John, okay. You can make cell lines and put a whole bunch of different variants into the cell lines. Well yeah, you could do that, except it costs money and it takes time to make the cell line variants, and then, yes, you can mix those cell lines together, but operationally you could probably only mix 10 or 15 cell lines together before the frequency of the variant in that mixture falls to such a low level that you can’t expect the bioinformatics pipeline to find it. So you’re really in a bind with traditional proficiency testing that utilizes wet specimens, right, because you can’t capture a significant percentage of the types of variants, the classes of variants, and I’m going to introduce one more concept here, the variant allele fraction. That is the number of reads that you find that actually harbor that variant. So, wet specimens, while they can check a next-generation sequencing test from soup to nuts, they really are not, it’s really not a paradigm that is comprehensive, that can ever be comprehensive. So, here’s what in silico mutagenesis does. What we do is a laboratory sequences a specimen, a cell line, some other specimen, and then we take those files that come off the machine. so I said that.
GARLAPOW: The actual data, like the data readout.
PFEIFER: The actual data from the lab, yes. And I’m going to draw a distinction here in just a second. so while the data that comes off the machine is generally in the terabyte range, through some very simple computational processes, you can get that down into the range of only 20 to 40 gigabases, and not too much further you can get it down in the range of 20 to 40 megabases or so. It’s simple computational processes that simplify things. So that’s the laboratories what we call the primary sequence. You will hear this term FastQ or BAM files. That’s just simply the name of what these sequence files are. Like anything in computers, the computer science people have given them fancy names to confuse the rest of us who are human. And so, these are called FastQ or BAM files. If anybody’s interested, send me an email, you can find me on the internet, and I can point you to references that discuss these files. So our process is that we import these files, either these FastQ or these BAM files, into our software tool, and for the record our software tool has the dumbest name in the history of software, our tool is named ‘MutationMaker,’ and what MutationMaker does is it takes those files, and then it inserts changes into those sequence files, through a computerized process, at sites that we input into the program. So it can insert single nucleotide variants, it can insert one SNV or hundreds of SNVs. It can insert many of them in the same gene or in hundreds of different genes. It can insert indels, little insertions or deletions. It can insert multi-nucleotide variants, little di- or tri- or pentanucleotide changes. and it can insert these in any combination, in any genes. And at the same time we can model copy number changes, and we can model structural variants. And so, if you just think about that, through this MutationMaker software package, we have the ability to put any number of variants of any types, in any association, in a number of different genes. And the other thing that the software tool can do is put them in at a different variant allele fraction. That is, they can put this change present at one percent of the reads, and 100 percent of the reads, 50 percent of the reads, and so, when the MutationMaker program is done, we take that mutated file and we send it back to the laboratory, and then the laboratory takes that file and then puts it through that fourth step in NGS: the bioinformatics tool. So essentially what we do is we have a detour. Step one: the lab makes nucleic acids. Step two: the laboratory makes a library. Step three: the laboratory sequences the library, and then there’s a detour. Those sequence files from the machine go over here to MutationMaker, we insert variants, and then we give those files back, and then those files go into step four, which is the bioinformatics analysis by the laboratory’s bioinformatic pipeline, and then there’s a reporting part of that as well. And so what MutationMaker then does is it’s not soup to nuts, but it permits a very focused evaluation of the bioinformatics component of a laboratory’s next-generation sequencing test. and a moment’s reflection indicates how this in silico process makes it possible to perform a comprehensive evaluation of the bioinformatics tools in a bioinformatics pipeline that wet specimens would never ever be able to support. So that’s what in silico mutagenesis, that’s the process of this little company we have does. And I want to just draw your attention to a couple of things which really make it so powerful. Number one is we are using the sequence files from the clinical NGS laboratory. And that’s an important point, because it relies on the fact that the laboratory has done a nucleic acid extraction, made a library, and sequenced it. Because we’re using those actual files, we can actually provide insights into the quality of those steps, indirectly. We can see whether there are areas of the genome or areas of the test where the coverage, where there aren’t very many reads, where the sequencing hasn’t been very efficient. We can find areas where the quality of the sequence isn’t very good. Again, we can’t say anything about how that happened but we can highlight areas where the information coming into our tool isn’t very good. But the point is, since we’re inserting variants in the laboratory’s raw sequence files, when they do the bioinformatics, there are no real artifacts there, it’s their own sequencing file, it’s their own error model, it’s their own coverage model, depth of sequencing, all these technical things. And so it really does allow a focused, customized evaluation of that particular laboratory’s bioinformatics pipeline. The other thing, and this is opposed to models out there where you use a computer to generate artificial sequence files, so you can program a computer using known technical details about the way these platforms operate to generate out of thin air a completely computerized in silico generated sequence file. The problem with that sequence file is it doesn’t mimic any actual patient-derived clinical processes. It’s out there in a vacuum. And so, that’s why we settled on this customized model. Now that’s another strength of it. If there are a hundred laboratories out there and they’re all doing their own laboratory developed tests, their own LDT, they all have different library preps, they all have different bioinformatic pipelines, and on and on, and by each individual laboratory sending us a file, we can insert variants into their own file that is responsive, and in that way have a sequence file that is responsive to their particular test in their particular laboratory and their own particular bioinformatics tools.
GARLAPOW: Okay. That’s very interesting. So to make it really basic, it sounds in a way like a really complicated Easter egg hunt, bioinformatically, and a really good way to help labs assess whether they are as good at finding variants as they say they are. When you’ve done this in the past for labs, what have you found?
PFEIFER: Right and I want to highlight a couple things that you just said. Because it’s done in a blinded fashion, laboratories don’t know what they’re looking for, other than they’re looking for sequence variants, but they don’t know where they are, they don’t know what the variant is, so your analogy of an Easter egg hunt is very good. A couple years ago, we supported a model process the FDA set up, and they actually called it an Easter egg hunt because it’s such a good analogy. But, I want to emphasize that the reason that’s such an apt analogy is the labs know they’re looking for variants, but they don’t know the kind of variants we put in there, they don’t know where they are, they don’t know the variant allele fraction, and so it really is done to the labs blinded. And, it’s also important to recognize that this can be done both for proficiency testing when the laboratory is running a test in routine clinical practice, as well as for validation. We have found that there is a real demand out there in the next-generation sequencing community for these in silico mutagenized files when they’re developing a test, when they’ve designed a test and they’re familiarizing themselves and optimizing that test. It can be hard to get enough wet specimens, enough patient specimens, to actually make sure their bioinformatics tools are performing as well as they think they are. And so, it’s interesting that we designed this originally for proficiency testing, but it can be used at the point of validation of a laboratory test as well. and so I just wanted to make sure that the listeners understand that it’s not just used for proficiency testing, it can actually be used preceding that for validation of a laboratory test to set a lab up for better performance. Now, I wanted to make sure that we got those two different scenarios out there because the data that I’m going to discuss actually bear on both of them. So when we’ve done this, as model challenges, for laboratories or using laboratories or with participating laboratories that are doing routine next-generation sequencing and the support of patient care is we find that laboratories actually do pretty well on this, with the caveat that they do pretty well on the simple variants, and as far as we’ve gone along with this so far is testing laboratories on single nucleotide variants, that’s single base pairs, or small little indels like I said where there’s been a change in a dozen or a dozen and a half bases, for example the deletions that are characteristic of the EGF receptor. And what we find in our in silico process is that, the labs, excuse me, the labs are correct about 95 percent of the time or better. And that is a pretty reassuring number. 95 percent accuracy is a pretty good number when you look across the full range of clinical testing that’s done out there, again in microbiology, clinical chemistry, any number of things, and especially when you look at genetic testing. What is true is that the results that we get through this in silico process actually are 100 percent are, nothing’s 100 percent, but highly highly highly correlated with the results that you get with paired wet samples. So the question is, well, are there somehow artifacts in here or biases in here that you haven’t recognized. and the answer is there don’t appear to be so far. One may crop up here, there’s some things that we always worry about that we always check, but the question is if you gave a laboratory a wet sample that had a set of mutations, and then if you just gave a laboratory an in silico mutagenized file with those same mutations, would the bioinformatics pipeline perform the same? And the answer is yes, apparently it does perform the same. So, it seems to be a pretty accurate and robust model for focused proficiency testing that has a high degree of clinical utility. So that’s the first thing. The second thing is, what types of variants have we used? Well, as I said, so far it’s been pretty much limited to single nucleotide variants and small little indels. This falls in the category you have to learn to walk before you run. A number of people have to learn to walk before people can actually run together. and we at at our little company have had to learn how to accept these sequence files from labs and return them, right, there’s a logistical process there that laboratories had to learn how to send them to us and to receive them and to insert them into their pipeline, and we had to learn about differences in laboratories naming of sequence files so we didn’t get our process didn’t get bogged down. And laboratories had to understand how to report their variants, and whatever external entity was supporting this process had to learn how to score these variants and decide where to draw the lines as to whether or not laboratories were of adequate proficiency or not. The good news is that we can support the two major platform manufacturers out there. This is a model that is scalable across any number of laboratories that is not limited to just expert laboratories, right, academic laboratories, commercial laboratories, academic laboratories, research laboratories, everybody can use this, we’ve been able to support this process in all of those settings.
GARLAPOW: That’s wonderful. Okay, so you’ve touched on this already, well not just touched on it you’ve gone into some depth, but I’m going to repeat it but I’m going to ask for your top take-homes. So how does in silico proficiency testing improve standard proficiency testing? What can in silico proficiency testing do that standard proficiency testing cannot do? What are your top reasons in silico proficiency testing…
PFEIFER: Right, so, the top, right, sorry to interrupt, the top issue here it allows… it makes it possible to stress bioinformatic pipelines with a broader range of variants, a broader range of variant types, and a broader range of variant frequencies, in a broader range of associations than is ever physically possible with wet specimens. And in so doing, you can stress bioinformatic pipelines in ways that could occur in patient specimens, that you would never be able to model using routine proficiency testing. And again the goal of all of this, is to stress bioinformatic pipelines in a way that captures more of the possible range of variants that will be encountered in routine clinical testing to ensure that the bioinformatics pipelines have optimal accuracy, if you will, optimal accuracy for a broader range of things that may be encountered in routine patient care.
GARLAPOW: But, let’s say I’m working at a health insurance company, or I’m a patient, why should I care about that?
PFEIFER: Well why you should care about that is because the old saying that ‘what you don’t know can’t hurt you’ is actually false. What you don’t know can hurt you. And that’s the issue here. There’s also that statement out there that ‘ignorance is bliss.’ Well, ignorance is bliss unless it actually has to do with your healthcare, right. And so that’s the important point here. I don’t know of any clinical labs that are running patient specimens with the idea that they have problems in their bioinformatics tools, they have problems with their tests, and they just don’t care, right. That’s not ethical and laboratories do the best they can. But ignorance isn’t always bliss, and what you don’t know can hurt you. So the problem is, if there is a laboratory that has validated their test, and the only way they have to check whether their testing is accurate, is this very limited range of wet specimens cell lines and things, there may be whole classes of variants that their bioinformatics tool isn’t very good at recognizing, simply because that tool has never seen those variants. There can be variant associations that their tool isn’t very good at recognizing because they’ve never actually had a specimen that has those associations. There can be individual variants or classes of variants that their tool isn’t very good at, simply because their tool has never encountered them, and so it’s pretty hard, you know, the absence of proof, you know that your tool isn’t very good isn’t the same as the proof of absence of limited functionality. And so what in silico testing allows us or makes it possible to do is to feed laboratory sequence files — their own sequence files, their own background, their own error rate their own coverage model, all of those things — with these variants that we know may be problematic or that we know they’ve never seen before and see how their tools do. And that’s the reason to do it, is to enable laboratories to become aware of quality issues that they may not know. And of course what do laboratories do when they get a file, they run it through, if everything is fine, they go okay, good we’re doing so far so good. If they figure out, if they get a file back and there is a type of variant that they systematically cannot identify or that they misclassify, then what laboratories do is they have a heads up. We need to go fix this part of our bioinformatics pipeline to do a better job. So that’s the whole idea behind it. Ignorance is not bliss. What you don’t know actually can hurt you. And this tool actually makes it possible to feed laboratories a broader range of variants, varying types, variant frequencies, variant associations, and all of that to help understand whether or not that bioinformatics pipeline has the breadth to identify these more globally.
GARLAPOW: Okay, that’s really interesting, thank you, John. So you have some papers published on this. You have a 2017 paper titled “In silico Proficiency Testing for Clinical Next-Generation Sequencing,” and when I say ‘you’, I mean ‘you and your colleagues’. In this paper you say that in silico proficiency testing improves on standard proficiency testing by a) including greater flexibility in tested variants b) allowing the ability to design laboratory-specific challenges and c) reducing costs. I don’t want to beat things down, but these are really important points. So can we look at them one at a time. Why does it matter to include greater flexibility in tested variants, and how much greater is the flexibility that we’re talking about? In what therapeutic areas and types of diseases and disorders does this matter? How and why?
PFEIFER: Right, so, to give you an idea of the differences, there are, generally speaking, sets of 20 cell lines or 50 cell lines that have a range of sequence variants. So a number of organizations recognizing the limitation in the number of wet specimens have specifically designed and manufactured for distribution cell lines that have variants. Well I would argue that a cell line that has one variant or a couple variants, that’s a pretty limited sample. And if the whole set of cell lines is only 20 cell lines or 50 cell lines, that’s a pretty limited set of variants. Now admittedly, you can take a cell line and you can spike in DNA that has a variant in it, and so in that way increase the range of variants that are present within a wet lab sample. But there are of course theoretical as well as real life limitations to squirting naked DNA into a nucleic acid preparation. That’s really not very physiologic, and it doesn’t really parallel what happens in routine clinical practice. But if you look at most challenges that are out there, that are wet lab challenges, they challenge maybe a dozen, maybe a couple dozen variants in a single challenge, right, that’s the most you can do. Now, our challenges — we have designed some challenges that have hundreds of variants in just one sample, okay, and we have been asked whether we could put a thousand variants or even a million variants in the sample. and the answer is sure. It’ll maybe cost a little more money to have us put that many in there, but really the cost to put in 10 or 50 or even 100 or so is exactly the same because it’s a computerized process. So that’s the scale here. A couple dozen, you know 10 to 100, alright you know, two to three logs, versus we can easily do this with, you know, logs higher, okay. And it’s important to recognize that not only is it scalable, it’s several orders of magnitude higher variants, but we can mix those variants in different ways with one another, right, and we can model them at different variant allele frequencies, right, meaning they’re present at one percent or they’re present at 50 percent, and it’s very hard with cell lines to model more than one translocation at a time, for technical reasons, but we can model all these classes of variants. So scale wise, we’re talking, you know, we’re talking orders of magnitude difference. The other point I want to make here is the problem with a wet sample, a group of 50 cell lines has been manufactured, is once a laboratory uses those, they know the answer. Okay, so, you’re back in high school, right, you go in and you’re in your algebra test or, you know, your trigonometry test and the teacher gives you a test and there’s ten questions, and you don’t do so well, okay. So then the teacher says well you’re going to have to study a little more and take it again. Well, you go in the next time and you know going in the next time that it’s going to be exactly the same 10 questions, right. That’s the problem with these cell lines that have been produced, they’re great cell lines, they have high utility, but once they’ve been used once, believe me, the NGS community knows what variants are there. And so they’re no longer blinded specimens. You can’t use them a second time, right. Now, yeah if they’re cell lines you might be able to mix them, but you know the variants, if you are a clinical laboratory, you know the variants to look for, and you can set your bioinformatics pipeline to actually trigger or focus on those particular variants. That’s a problem with designed wet samples. Of course with in silico samples, that isn’t an issue. We can with just, you know, a simple programming error or a flick of a switch, change the range of variants, change the variant allele frequencies, and all of that. So that’s a separate issue that I think people spend lots of time and money developing these cell lines and they’re incredibly useful once. Once. And then they’re not available to be used in a blinded fashion anymore. Cost-wise? Well it costs time and money to make these cell lines. Depending on the vendor, generally six months to a year, maybe a little more, maybe a little less. It’s pretty expensive to make these cell lines generally. Generally you know it’s thousands of dollars per cell line, generally, if not more. The cost to do this in silico mutagenesis is you know we have this customized tool that we use. I don’t know it’s the cost of those electrons to actually do the analysis. And again, yeah, we are using electricity, there is a carbon footprint out there, I get all of that, but really the cost is the same to do 10 variants or 500 variants. The cost is the same you know whether the challenge is set up around one lab or 50 labs or 100 labs. The cost to have different variants in different combinations, all that is exactly the same you’re doing the in silico mutagenesis. And maybe the number of cycles and time that we’re taking on the internet changes by you know 10 milliseconds or something, in more complex specimens, but again, we do these mutagenesis things overnight and depending on how big the batch is, you know, we can get it done in two to three days. That’s the turnaround time, that’s the difference. Weeks or months versus a couple days. And that gives you an idea of, you know, it’s so much more scalable, it’s so much more flexible, and it’s so much faster, and it’s far less expensive to do it. And it can support, of course these are sequence variants, MutationMaker doesn’t really know whether you’re looking for an inherited change or doesn’t know whether you’re looking for a variant that’s associated with cancer, we just tell it to put in mutations, and then the laboratory is the laboratory that is using those files in the setting of the testing that they’re performing.
GARLAPOW: Okay. Okay. So, that second point, laboratory-specific challenges. What is a laboratory-specific challenge? What kinds of differences are we seeing across laboratories that necessitate the specificity enabled by in silico versus standard proficiency testing? How does this support higher quality and accuracy?
PFEIFER: Right, so, a laboratory-specific challenge the way that we do it is that we mutagenize that particular laboratory’s sequence files. And I don’t want to get into the details and I know that I’m an aficionado and I don’t want anybody to roll their eyes. But you can well imagine if you have a hundred different laboratories that are all running an LDT that those laboratories are going to be using different kits to actually do the nucleic acid extraction, different kits to do their library preparation, and when they make the sequence on the machine, that their assay may sequence a larger target region or a smaller target region of the genome, that they may sequence to a higher depth, that means get more reads off the machine or not. And I’m not going to go on and on about this, but the way we do it enables a laboratory to determine if that mutation were present in the way they do the assay, would they be able to detect it. And what we found is there’s a lot of variability there, that assays that laboratories that are running LDTs, as many laboratories — most laboratories are, that even if two laboratories have the same target genes that there are differences in the way those tests are designed that may impact their ability to find these variants. So that’s a real life difference. That’s just the way the assay is designed. There may be differences in the bioinformatics tools, that the variant is there and they just can’t find it. The reason that we do this in silico mutagenesis is because there are real life changes, or variations, in the way tests are designed. There are real life variations in the depth of sequencing. There are real life differences in the bioinformatics pipelines. And what we know is that there are some laboratories that have a higher degree of accuracy of finding these variants than there are for other laboratories. Again, laboratories ideally take this information, go back, and fix the problems that they had, so that their qualities improved, right. This is not a game of gotcha, I want to be very clear about that, this is not a game of gotcha, this is not designed so that we can say to a laboratory ‘neener neener neener you can’t do that very well.’ That’s not the goal. The goal is to help laboratories become aware of places where their assay design, the way that they’re performing their sequencing or their bioinformatics can’t find the variants that they think they can. And the way that things improve is if you demonstrate to a laboratory that they’re having this problem, laboratories go back, and they fix that problem, right. And so one of the things that in silico enables you to do essentially is this iterative process, much faster, much more inexpensive, much more comprehensively, so that quality testing can improve, right. Laboratories do not design bad tests that have low clinical utility, low accuracy on purpose. All laboratories, if they’re CLIA licensed, are performing high quality testing for the samples that they’ve analyzed. The problem is, what you don’t know can hurt you. And the purpose of in silico testing is to help laboratories identify those places that are clinically relevant where their testing may not be accurate. And again, accurate testing benefits everybody. It benefits patient care because you know what the disease is, you know whether a treatment is going to work. Payers are paying for quality testing. Society is better off. That’s the goal behind doing all of this.
GARLAPOW: And I’d really like to re-emphasize that point of, it’s not a game of gotcha, because when the Center for Genomic Interpretation works with P&V Licensing, and when P&V Licensing does its own separate work, one of the really critical components for the ELEVATEGENETICS at Center for Genomic Interpretation, and the work you guys do separately, is that there’s this opportunity for improvement, for showing where some hiccups have been, and then this opportunity for improvement in quality and accuracy. And I think that’s so profound.
PFEIFER: It’s interesting, let me, you know, I come from Missouri, right, and you know the whole “Show Me” state business. I understand, I live in the real world and we have a next-generation sequencing laboratory. And the problem is, there are all these things that you worry about your assay may not be very good at, but you don’t have wet specimens to test it. And there are a lot of issues that are hypothetical. And it’s a little bit like yeah, you don’t have specimens, wet specimens to test it, it’s hypothetical, so you’re really not sure it’s a problem or not, and so you’re sort of stuck wondering about it, and it’s hard to fix a problem if you’re not really sure that that problem exists, or you’re not even sure that there is a problem. And so that’s what this in silico testing is designed to do; is to say to laboratories, ‘nope you don’t have a problem with this, with this set of variants, these types of variants, things are good’ or to say to a laboratory, ‘you know, you’re not so good at this; it’s not that you specifically are trying to avoid this or have a poor quality test, you just haven’t had the opportunity to look at these variants, you’re not so good at this, tune things up.’ And laboratories take that information and then go back and improve the quality of their testing. And that’s the point. It’s not a game of gotcha. It’s this iterative cycle where we’re all working together to improve the quality of testing.
GARLAPOW: Excellent. Thank you, John. So, reducing costs in clinical genetics and genomics can mean a whole entire suite of things. How specifically does in silico proficiency testing reduce costs? Costs over what, to what magnitude, and, again, to whom does this matter? Who cares?
PFEIFER: Right. Well, I think that one of the things that in silico does is it decreases the cost to stress bioinformatic pipelines. We’ve talked about the cost and the time involved, the cost of producing cell lines, we talked about the limitations, the time it takes, well, time is money, and all that. So it enables a much more thorough evaluation of laboratories’ bioinformatics pipelines, again, it is limited, it’s not soup to nuts, it’s just the bioinformatics piece, so it enables you to do that much more cheaply, and much more comprehensively, so it’s kind of a double win there, a double plus. It’s more comprehensive and cheaper and faster. The other way it reduces cost, and this is indirect, and I’m not a medical economist, but I know a lot of people do this is — I think it’s intuitive that if you get people the right diagnosis, that the cost of medical care is decreased, right. So, by having higher quality next-generation sequencing, it increases the likelihood that the disease-specific or disease-associated variants will be identified correctly, and that enables more accurate diagnosis, it enables more appropriate therapy, and it improves outcomes. And maybe it shortens the diagnostic process, it certainly has the opportunity to make it more accurate, and all of those things save money. And the other thing it does is, of course, it not only saves time and money shortening the patient’s diagnostic odyssey and all of that, it makes it more accurate. And by making it more accurate, not only do you get therapies, you move quickly to therapies that are accurate, but you avoid therapies that aren’t going to have any clinical benefit, and that’s a huge part of this, right. You want next-generation sequencing at least for targeted therapies in cancer that can find the variants accurately. You don’t want to give a patient a drug that you don’t think they’re going to respond to, because in the setting of cancer, these drugs have toxicities, and you end up — the society, the insurance carers –all of us end up paying costs for the side effects for a patient who got a drug that it turns out was unlikely to benefit them to begin with. Right, so there’s all these different settings in the same, we can imagine scenarios that exist in the setting of inherited disease testing as well. And so again, I’ll go back to what I said a minute or two ago, that I think it’s intuitive that higher accuracy genetic tests are going to lead to more accurate diagnoses and more appropriate care, and that in and of itself is going to save money. I think it’s hard to argue that the correct diagnosis is going to add cost, right.
GARLAPOW: Absolutely. 100 percent.
PFEIFER: And so it should matter, so why does this matter? Well, I think it matters to the clinicians, they’re, you know, busy taking care of patients, they may not be sure whether a specific laboratory’s next-generation sequencing test is highly accurate or not for the class of variants they’re looking for. The laboratories themselves may or may not know, depending on whether there are wet specimens that are out there. Of course the patients want to know. With the patients, even if they go out to the internet, the patients may not have the knowledge base to understand or to interpret the information that is on these laboratories’ websites. And I can tell you, as somebody who has a bit of expertise in all of this, I can go to some laboratories websites and I can’t find the information about the issues that we have, so it’s not just a matter of education, it’s a matter that the laboratories may not even be putting it up on the internet, or may not even be aware of the issue. Then of course payers have an interest in, payers are not only private insurance companies, but the government, I mean Medicare/Medicaid, those are huge societal costs, and so we all benefit from making sure that we spend our healthcare dollar in the most efficient way.
GARLAPOW: Absolutely, absolutely. You and your colleagues have a 2016 paper “A Model Study of In Silico Proficiency Testing for Clinical Next-Generation Sequencing.” Can you please describe what happened in this paper, what you did?
PFEIFER: Right. So, this was the first time that we trotted this out to laboratories that were doing clinical next-generation sequencing, routinely, in support of patient care. And this paper was the test of the model. Could laboratories upload sequence files, could they download sequence files, were they able to put these files through their bioinformatics pipelines, were they able to report what they found, and could we, generally, the group, score those variants to to identify whether laboratories were getting them right or not, and then at the very end, how were laboratories doing? And again, even in this first model study we became aware of some of these glitches that some laboratories did not have, out of the gate, the expertise required to move these files around and insert them into their pipeline. We wondered initially whether that was going to persist, and it turns out that it has. But what we found is that with a little education, they could do it like the other 75 percent of labs. What we found is that laboratories could run these through their pipelines. And, that there was about 95 percent accuracy, again. It was similar to what you got, was virtually the same as what you got with wet specimens, and that it indicated a high degree of functionality among the laboratories. It was similar to what was being seen with proficiency testing using wet samples. So it was the initial study that demonstrated that this could be used for formal proficiency testing of laboratories doing NGS in routine clinical care.
GARLAPOW: Okay. Okay. Now, from these results, what most excited you?
PFEIFER: From these– say that again? From the article?
GARLAPOW: From the results from this study.
PFEIFER: Well, what most excited me is that, right, what most excited us, not me, us, the four of us, yeah, who set this up, is that we always considered that next-generation sequencing was heavily dependent on the accuracy of these bioinformatics tools, these bioinformatic pipelines. And that if a laboratory was not, if the laboratory’s bioinformatics pipeline was not presented with the appropriate group of variants, it would be unknown to that laboratory whether they could get them right or not, right. It keeps getting back to this limited supply of wet specimens. So we had this idea, right, that if you did this in silico process, that you could test those bioinformatic pipelines in a more thorough or comprehensive or focused way. And so what was so exciting to us, is that really, you know, 75 percent of labs had no trouble doing this, and the 25 percent of labs that had trouble with just the logistics of it, we could very easily figure out what the problems were and show them how to do it the right way. It was also interesting that laboratories got it right but that we could demonstrate in there, there were already, it wasn’t designed again as a game of gotcha, to show that laboratories were failing at something, that’s not the idea, but even in that initial study, there were these — this whiff that maybe there were problems out there that we could use this. You know, just this faint scent on the breeze. What is that? Is that really a problem, or it’s just nothing’s perfect and these are, you know, there’s always going to be a variant that a laboratory doesn’t find. But it spoke to the fact that there were some errors indicated that you could actually use this in silico mutagenesis model to perhaps highlight those errors and dig a little deeper and see what you could find. So that’s was pretty exciting. And it was also exciting, you know, once you have the software tool, which I want to give a shout out to Eric Duncavage and Haley Abel, are the people who actually wrote the code for that, so that was a lot of work for them to do it. But once you have it, it’s pretty straightforward to just use it routinely, and that was pretty exciting as well, it wasn’t going to be a nightmare to try and use this routinely.
GARLAPOW: Okay. Very interesting. And so, I’m just going to give a little bit of simple language for explaining, as a lead-in for my next question. You have different variants, and you could think of them as like a big ‘A’ and a little ‘a’. And the little ‘a’, the minor variant, it can occur at different frequencies, in a population, in samples, things like that. And the frequency at which a minor variant occurs can impact its detectability in certain ways. So in your 2016 paper, variants with minor allele frequencies, minor variant frequencies, of 15 percent or higher were identified well. What happens with frequencies below 15 percent? What does this mean for labs’ abilities to accurately detect variants that are pathogenic or likely pathogenic? And more rare, rarer than 15 percent in a, say a tumor sample, what does this mean for heritable rare diseases, diseases that have frequencies much much less than 15 percent?
PFEIFER: Well, what it means is that laboratories traditionally, and this is well known, what I’m about to say is pretty well known among laboratories is that as the VAF, the variant allele fraction, decreases below about 15 or 10 percent that laboratories’ bioinformatics tools can have trouble finding it. Now this is especially important in the setting of cancer, where there’s a lot of inter-tumor heterogeneity and so you want your bioinformatics pipelines to find these low VAF variants because they can still have clinical or therapeutic implications. The same is true in inherited diseases. For any number of genetic reasons which are beyond the scope of this topic, just because a patient has an inherited disease, or may have an inherited disease, doesn’t mean that the VAF is going to be at zero percent, meaning they have two wild type alleles — 50 percent meaning they’re a carrier, or depending on, a carrier for a recessive trait, but they would be symptomatic if it was dominant, or at 100 percent, right. It’s not as clean as that in real life. We’re all taught that back in high school or college that it’s 0, 50, or 100 percent, but it turns out in real life that’s not true. And so that’s one of the, I mentioned this a number of times, that’s one of the advantages of this in silico method is we can model those variant allele frequencies at a very precise level to test a laboratory’s ability to detect them, and that of course has huge implications for the ability to diagnose these diseases, if one of the alleles the disease-associated allele for whatever genetic reason happens to be present at a low VAF.
GARLAPOW: Okay. Okay Thank you. Now John, you say that we need better proficiency testing, but there are multiple studies out there that say everything in proficiency testing is awesome! I found some studies that show 98 percent accuracy or better across 111 labs, from a 2019 study, that’s Merker and colleagues. There’s one from 2020, just last year, showing 99.2 percent accuracy.,that’s Keegan and colleagues. What’s wrong? What am I missing?
PFEIFER: Ah well, well, well there are, if you dig a little deeper, so first of all, but before I continue I want to say that the authors on those papers are colleagues and friends and I know all of them, and if you go out there and you search for them and you search for me, you’ll see that we have worked together on a number of papers, and so there is a group, there’s a relatively small number of people who do next-generation sequencing, and among that group, there’s an even smaller number of people who are interested in how you evaluate the quality of testing, and so we all know one another. And so I had the ultimate respect for all of these people, but, and what I’m about to say, I think if you ask Dr. Merker or Dr. Keegan, they would agree with what I’m about to say, so I don’t want you to think that I’m throwing shade on those papers that present the results of proficiency testing that has been performed, or that has been offered by the College of American Pathologists, which I’ll call CAP. Okay, I don’t want anybody to think that I’m throwing shade on CAP, but nothing in this world is perfect. I think that Drs Merker and Keegan and everybody else in those papers would agree with what I’m about to say, and the question really at the end of the day becomes, how big of a problem do you personally think that this really is? So let me explain to you why that number of 98 percent is not reassuring to me at all. And there’s two reasons why that number, two basic reasons, as to why that number could be pretty high and still worrisome. First of all, what are the variants in that proficiency testing that you’re asking laboratories to identify? If they’re pretty plain vanilla, then you would expect laboratories to get them right 98 percent of the time, and you look at them, and you say ‘wow 98 percent of the time, that’s marvelous we have no quality problems here,’ except, if you read those papers, you will see that there is an occasional variant in there of a variant type, for example a di- or trinucleotide sequence change and others, which laboratories miss at a very high rate, a high rate being 15 percent or more. Well one of the ways you can make sure that your proficiency testing shows a very high level of accuracy is you avoid variants that you know laboratories are having trouble with, alright. So, that’s one of the reasons those studies are not that reassuring to me, is because the proficiency testing that CAP to-date has performed, the challenges that they offer, have avoided variants which they know laboratories have trouble with. Now, there’s an explanation for that, and that is you have to walk before you run. That the approach, CAP’s approach has been, let’s test laboratories on the simple variants, and then we’ll move on to the more complex variants. I’m not sure I personally agree with that approach. If this is a class of variants that is regularly encountered in routine clinical practice, I think you have to challenge labs on it. And again, this is just a difference in professional opinion. As I said, Jason and all these people, they’re very smart people, they have well-reasoned, well thought out reasons, well-reasoned well-thought-out approaches to the proficiency testing, and their friends and their colleagues, and again I’m not criticizing them on fundamental ethical or moral or any of those large things, I’m simply raising an academic issue here. And that is, one way you can get a high level of accuracy, is to give easy variants. And if you go and you look and dig in the data, the variants have been pretty plain vanilla. There are occasional variants which labs don’t do very well on, but those have not been well represented in these proficiency challenges, okay. That’s one way, that’s one issue with these challenges. For the record, CAP is aware of this, and as their proficiency testing continues to grow and advance, they are becoming more broad in the types of variants that they’re challenging. So again CAP is aware that this is a potential criticism, and so they’re working very hard to fix this, right. It’s a process, walk before you run. The second concern about this is, that if you report the results in aggregate, as ‘98 percent correct’, you’re not sure whether that means, ‘if you had 100 laboratories, 100 percent of the laboratories, all 100 of them were 98 percent accurate’, or ‘whether 98 of the 100 laboratories were 100 accurate, and two of the labs were zero percent accurate,’ right, that’s just a little thought experiment there. You can get to 98 percent overall accuracy by 100 labs all being 98 percent accurate, or having 98 labs that are 100 percent accurate and two labs that are zero percent accurate. You can get to that same number. And so the problem with presenting data in aggregate, is there is this concern that it may be hiding laboratories that are more poorly performing, alright. And so that’s the other part of these studies that you quote that are worrisome, is it’s entirely unclear which is going on. Now the question is, I gave an explanation for why CAP is focused on the more plain vanilla variants. Again, it’s the model, if you walk before you run for proficiency testing right, and you can ask yourself do you agree with that or not, but again, that’s an academic, you know, question. So then the question is why are they presenting the data in aggregate? Well that has to do with the fact that proficiency testing is by rule anonymized. You cannot, if it’s proficiency testing, you cannot use those data individually, provide those data individually, in a lab-identified way. And so this is the reason that CAP is presenting, in those papers that are presenting the CAP proficiency testing results, that’s the reason they’re presenting the data in aggregate fashion. Again I do not believe that CAP is purposefully trying to hide poorly performing laboratories. That’s not the goal here. But, if you’re thinking about those data, you can ask yourself ‘yeah, are there laboratories, you know, what is the range in accuracy among the laboratories that are participating?’ So you asked the question, everything seems to be great, what’s the need for in silico? And my answer is, well, that you know the keyword there is that verb “seems.” “Seems” is a lot different than “is,” right. And so I worry about the range of variants. I’ve talked a lot about how ignorance is bliss, if you’re not being tested on specific variants, you may not be aware that you’re getting them wrong. So I worry about the range of variants and variant types in those studies. I worry about the aggregate data. Is there something going on with a subset of laboratories that we’re unaware of? And again, there’s a reason for that. And then the last thing is if you look at those studies, the VAFs in those studies are all 15 percent and above. 15 percent to as high as 50 percent. And you yourself just brought up a few minutes ago about how we know at lower VAFs laboratories have a more difficult time of identifying variants. And so that’s the third point about those studies that I worry about. And again the reason that that hasn’t been systematically evaluated is that requires cell lines with these variants that you can mix in the setting of wet samples. I don’t want to talk too much about CAP, but one of the ways to work with these wet samples, as I said, is to squirt in you know, just pipette in a little bit of DNA that has the variant that you’re interested in as a way to model different VAFs and as a way to put a higher range of variants in your assay. The problem with that model is it is absolutely not physiologic, right. When you look at a specimen from a buccal swab or purple blood lymphocytes from a patient, you are extracting nucleic acids from cells, and you’re not squirting in variants into that assay. And so well it may be useful as a way for laboratories I guess to validate their assays, there are concerns about whether that approach, which is used in this paper by Keegan that you reference, whether that approach is actually physiologic. And again, that’s an academic question, that’s something you can ponder on your own, but it does at least raise the opportunity to actually test for a broader range of variants in a wet sample. Again, the problem with that, if I may just get a little bit technical, is really sophisticated bioinformatic pipelines will key in on single nucleotide variants that are part of the genomic variation between individuals. Really sophisticated bioinformatic pipelines if that DNA that you squirt in happens to be from a particular place or not can key in on the fact that it’s actually a contaminant, we’ll view it as a contaminant that it’s not part of the specimen. So again, like everything, there are strengths and weaknesses. But you asked the question, why do I think there’s a problem, right, why do we, why does we, not just being the four of us that are associated with P&V, but more broadly in the next-generation sequencing community, is because if you think about these papers like any published study, there are strengths and weaknesses, and some of the weaknesses in these papers are really worrisome, because answers one way or the other may actually indicate the presence of quality issues in NGS testing that to-date, routine proficiency testing has not addressed. So that was a lot, I’m sorry. That was a lot.
GARLAPOW: Don’t be sorry! That was very interesting and very worthwhile. It was riveting. I loved it. Okay, John, you are on the Rising Tide Podcast. A rising tide lifts all boats, improves quality, improves accuracy. How can we raise the tide?
PFEIFER: Well I think that, what, from my perspective, and my wife says that I’m a lab rat, you know, from a lab rat’s perspective, that you know, your world is focused on what you do, and I think that from the setting of next-generation sequencing, the way that you raise quality testing, is you make, one of the ways that you can do that, is that you can make specimens, through a number of different approaches, available to laboratories so they can validate their tests and be aware whether their bioinformatics pipeline can find variants that have clinical significance. Again, not in a game of gotcha, but in a game of, here’s a group of variants, can you find them? If you can find them, good, here’s another set, right. There’s always another set of variants. So that’s one way to do it. The other thing is, if your lab can’t find them, most laboratories that we work with, when they find that — when they’re confronted with the fact that they can’t find these variants, they are a combination of appalled, aghast, embarrassed, frightened, worried, concerned, all the appropriate emotions, it’s like, holy cow, we need to do a better job at this. And so that’s how you raise all boats. If everybody can have access to a broader range of materials to test their bioinformatic pipelines, you can increase the accuracy of genetic diagnosis for inherited diseases, and in the setting of oncology, in the setting of cancer, and by increasing the accuracy of genetic diagnoses, right, you’re going to get more appropriate patient care. Right. You get the right diagnosis, patient’s going to get the right care, and everybody will be better off. I think that raises the tide. The only other question that I have for you is which tide are you worried about? Are you worried about the tide that comes in today, or are you worried about the tide that comes in tomorrow? Because I’m kind of worried about the tide that comes in tomorrow, and this sort of gets back to the questions you raised about what’s wrong with these proficiency testing studies show things that are so good. Well, I mentioned that those are plain vanilla. Well it’s all plain vanilla variants done with DNA. Well we all know, that tomorrow’s tide is going to bring not testing limited to SNVs or a small range of indels in DNA, but we’re going to worry about copy number variants, we’re going to worry about structural variants in DNA. We’re going to start, a lot of labs out there starting to sequence RNA, and so we’re going to start worrying about the accuracy of testing of RNA. And then there are human specimens that people are worried about testing, or people are performing testing, for cell-free DNA, right, not from tumor specimens, right, from cell-free DNA, so you’re going to want to make sure that you have some accuracy with that. Laboratories, that the response to therapy can depend on tumor mutational burden, right, so this isn’t just which variants are there, the question is how many of them are there? And then you move beyond that to questions about what specimens are you actually testing? People are clinically doing assays around the microbiome, right, so there’s an example of something that you’re actually testing. And then there’s this question about what types of testing are even being done? Most of the assays we’ve supported so far, that are being done clinically are targeted panels, or maybe even the exome, but there was a paper in this week’s New England Journal talking about the use of sequencing the entire genome for routine patient care. You know for the last 10 years people have been talking about the cost of next-generation sequencing coming down, so that we won’t be doing panels, we’re doing exomes, and eventually we’ll be doing the entire genome. And it turns out, it looks like that day is actually here. It looks like that day is actually here. You can do it in a cost-effective way with a bioinformatics pipeline that only takes a few days to run. And so tomorrow’s tide, is a broader range of sequence variants, and a broader range of associations. Tomorrow’s tide is RNA sequencing as well as DNA. Tomorrow’s tide is the number of variants, not just variants. Tomorrow’s tide is the microbiome. Tomorrow’s tide is sequencing the entire genome. And so when you think about these limitations of wet samples, when you think about all the bioinformatics that are designed around each one of those assay types, we’re going to have to do more rather than less in order to make sure that the accuracy of next-generation sequencing is actually supporting patient care, is actually giving results that can be used to improve patient care. And so it’s interesting that for somebody like me, I’m worried about today’s tide what’s going on now, but I’m pretty focused on tomorrow’s tide, because we actually view in silico mutagenesis as helpful today, but we spend most of our time writing code so it’s helpful for tomorrow’s tide as well. And I think that anybody who’s in clinical medicine, yeah they’re worried about targeted panels, but they’re scratching their head or rolling their eyes thinking about what are we going to be doing a year from now. And now is the time to start positioning ourselves — us, ourselves, payers, regulators, physicians, patients, advocacy groups, everybody –now is the time for us to be positioning ourselves to ensure quality testing tomorrow. Next year.
GARLAPOW: Absolutely. Absolutely. Tomorrow’s tide is pretty much already here.
PFEIFER: Yup, yup, yup.
GARLAPOW: So we’ve gotta be prepared for it. John, always a pleasure whenever we speak. I learned just so much from you, and thank you for joining me today. Keep in mind that John’s company P&V Licensing is a direct resource with which laboratories can engage to assess and evaluate variant calling pipelines, and works with the Center for Genomic Interpretation to support our ELEVATEGENETICS suite of services.
PFEIFER: Yes, yes we do. Full disclosure, yes we do. Yes, we do.
GARLAPOW: Thank you so much, John.
PFEIFER: My pleasure, my pleasure. Always, always fun to talk with you.
GARLAPOW: If you loved this episode of the Rising Tide Podcast, please support us on Patreon and select us on Amazon Smile. The Center for Genomic Interpretation can help your organization identify high quality clinical genetics and genomics partners or can help a laboratory identify areas for quality improvement. Through ELEVATEGENETICS, we engage with laboratories, encourage genetic and genomic test validity, and assess test efficacy. Find us at clarifygenetics.org and genomicinterpretation.org.
For our listeners following along on YouTube, remember to hit subscribe and to activate alerts for when we post new episodes. You don’t want to miss any of these important conversations with leading experts. Together we can raise the tide.
Narrated by Dr. Megan Garlapow
Produced and edited by Kathryn Mraz and Brynlee Buhler
The post Dr. John Pfeifer – In silico Proficiency Testing to Improve Quality and Accuracy in Clinical Genetics and Genomics appeared first on Center for Genomic Interpretation.