Gene Machines: Navigating a World Enthralled by DNA
CAS
Author
11/08/2019
Added
27
Plays
Description
Biologist Colin Meiklejohn gives the third CAS Inquire lecture, connecting genetics to the 2019-2020 theme "Rise of the Machines." More information is available at https://cas.unl.edu/cas-inquire.
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:05.010]Good evening, I'm Taylor Livingston, and I'm the Director
- [00:00:07.720]of the College of Arts and Sciences Inquire program.
- [00:00:10.750]Thank you all for coming this evening
- [00:00:12.470]to the fall semester's final installment
- [00:00:15.290]of this year's Inquire Series lecture,
- [00:00:17.987]"Rise of the Machine."
- [00:00:19.840]The Inquire program is structured around these lectures,
- [00:00:22.630]allowing students, faculty, staff,
- [00:00:24.950]and the wider public the opportunity to investigate
- [00:00:28.380]how we as a society engage with technology.
- [00:00:31.730]Not just its benefits or negative aspects,
- [00:00:35.010]but the nuances and effects of our cyborg entanglements.
- [00:00:38.770]It creates an opportunity to learn
- [00:00:40.770]about the fascinating research of our faculty members
- [00:00:44.160]in the College of Arts and Sciences,
- [00:00:46.360]and enables students to see various
- [00:00:49.923]disciplinary approaches to the study of a topic,
- [00:00:52.810]as well as the necessity of a multi
- [00:00:55.060]and interdisciplinary insight
- [00:00:57.210]to truly understand social issues.
- [00:01:00.610]For a distinctive set of intellectually curious
- [00:01:03.310]and inquiring students, the Inquire scholars,
- [00:01:06.140]if you will, please stand.
- [00:01:09.530]The program offers a chance to foster a meaningful
- [00:01:12.420]academic engagement with the topic through discussions
- [00:01:15.210]with the lecture faculty, and a year long research project.
- [00:01:19.620]They are aided along the way by the peer leaders,
- [00:01:22.040]if you could please stand.
- [00:01:26.714]If you could please join me in thanking the students
- [00:01:28.840]and their participation in the program.
- [00:01:31.126](audience applauding)
- [00:01:36.020]Tonight I have the distinct pleasure of introducing
- [00:01:39.030]Dr. Colin Meiklejohn, Assistant Professor
- [00:01:41.601]in the School of Biological Sciences.
- [00:01:44.570]Dr. Meiklejohn's work focuses on evolutionary genetics,
- [00:01:48.050]particularly how new species emerge,
- [00:01:50.590]changes in gene expression, and the co-evolution
- [00:01:53.730]of nuclear and mitochondrial genomes using the fruit fly.
- [00:01:57.740]Tonight his talk, "Gene Machines:
- [00:01:59.527]"Navigating a world enthralled by DNA"
- [00:02:02.470]will help us make sense of what is fact
- [00:02:05.150]and what is science fiction
- [00:02:06.760]when it comes to direct to consumer genetic testing
- [00:02:09.830]such as 23andMe and AncestryDNA.
- [00:02:13.240]Please join me in welcoming Dr. Meiklejohn.
- [00:02:16.254](audience applauding)
- [00:02:38.790]All right.
- [00:02:40.270]Thank you very much for the opportunity to talk to you,
- [00:02:43.170]I'm very excited to talk about DNA
- [00:02:46.150]and the machines that we use to study it.
- [00:02:52.340]So, 67 years ago,
- [00:02:55.220]the structure of DNA was elucidated
- [00:02:58.020]by Franklin and Watson and Crick,
- [00:03:02.150]and provided us with this iconic structure,
- [00:03:05.040]now the double helix, a single molecule
- [00:03:08.940]made up of four letters, A, G, C, and T,
- [00:03:12.630]that is the underpinning for all of life on Earth.
- [00:03:17.560]And since then, science has not stood still,
- [00:03:23.150]and we have made really amazing strides
- [00:03:27.110]in our ability to understand DNA, to identify it,
- [00:03:30.650]to sequence it, and to try and figure out what it does.
- [00:03:33.770]And so these are the machines that sequence DNA.
- [00:03:39.790]In the last 15 years or so,
- [00:03:45.510]the decrease in the cost of sequencing of DNA
- [00:03:48.230]has really put Moore's law to shame.
- [00:03:53.070]To give a example, in 2018, I did an experiment,
- [00:03:58.900]my lab where we sequence DNA,
- [00:04:01.690]and we got 92 billion base pairs of DNA
- [00:04:07.020]for about $9,000.
- [00:04:09.420]And so that is more than 100,000 bases,
- [00:04:12.820]A's, T's, G's, and C's for every penny.
- [00:04:16.090]And this low cost of sequencing
- [00:04:19.600]has been driven mainly by technological advancements,
- [00:04:23.270]optics, microfabrication, chemistry.
- [00:04:27.230]So for example, there are now multiple methods
- [00:04:31.070]that are common for sequencing DNA at high throughput.
- [00:04:38.020]One of them involves, for example,
- [00:04:40.020]fabrication of these tiny little chambers
- [00:04:43.550]which are less than 100 nanometers wide,
- [00:04:45.460]which is about 700 times thinner than a human hair.
- [00:04:49.260]At the bottom of this chamber there is a single molecule
- [00:04:52.410]of DNA polymerase, the molecule that copies DNA,
- [00:04:55.320]and as a molecule is fed through,
- [00:04:57.940]it releases fluorescently labeled A's, T's, G's, and C's
- [00:05:02.410]and this is read off by a laser
- [00:05:04.930]that then records a movie of this happening in real time.
- [00:05:09.350]And so based on
- [00:05:13.900]just sort of a ballpark back of the envelope
- [00:05:17.000]basement number, we probably are generating
- [00:05:22.320]over 9 quadrillion bases of DNA per day on this planet,
- [00:05:27.800]and that's been going on for years now,
- [00:05:29.750]and so we are swimming in DNA sequence information.
- [00:05:34.730]And I think the question is,
- [00:05:36.410]what have we learned from all of this DNA?
- [00:05:39.440]And where are we going to go as a society
- [00:05:43.410]with all of this information and all of the promise
- [00:05:47.060]and all the pitfalls that it holds?
- [00:05:49.470]So, I wanna start by talking about
- [00:05:53.710]how we use DNA information to make inferences,
- [00:05:58.060]and throughout I'm gonna lean a lot
- [00:06:00.210]on some really neat figures and concepts
- [00:06:02.650]that were developed by Graham Coop,
- [00:06:03.910]who is a population geneticist
- [00:06:06.660]at the University of California Davis.
- [00:06:09.260]So we all have, whatever your family situation,
- [00:06:13.450]we all have two biological parents.
- [00:06:15.750]We have a mother who contributed slightly more
- [00:06:18.170]than half of the DNA that you have,
- [00:06:19.850]and a father who contributed slightly less
- [00:06:22.270]than half of the DNA you carry,
- [00:06:24.000]and we can simply represent this,
- [00:06:26.930]here's your genome, half from Mom, half from Dad.
- [00:06:31.010]And then, of course, each of your parents
- [00:06:33.700]had themselves two parents,
- [00:06:35.770]so our genome can be thought of as being inherited from
- [00:06:39.410]four grandparents, a paternal grandfather,
- [00:06:41.553]a paternal grandmother, a maternal grandfather,
- [00:06:43.383]a maternal grandmother, and so on
- [00:06:45.590]back through the generations.
- [00:06:47.450]So in general, we have two to the K ancestors
- [00:06:51.190]K generations ago, and on average,
- [00:06:55.320]we inherit one over to the K of our genome
- [00:06:59.210]from each ancestor, so eight generations ago
- [00:07:01.420]we have 256 ancestors, and on average,
- [00:07:04.760]we inherited one out of 256 of our genome
- [00:07:07.780]or somewhat over a million base pairs
- [00:07:10.720]from each of those ancestors.
- [00:07:13.900]But this average is somewhat misleading,
- [00:07:16.730]and that's because in the actual process of transmitting DNA
- [00:07:20.940]from parents to offspring,
- [00:07:22.670]there are a number of steps that are highly stochastic.
- [00:07:26.130]And that is what I'm illustrating here.
- [00:07:29.080]One of those steps is that we do not transmit
- [00:07:32.990]our chromosomes intact to our offspring,
- [00:07:36.100]we do not inherit our chromosomes from our parents intact,
- [00:07:39.700]but rather the chromosomes are shuffled every generation.
- [00:07:42.550]And we call this recombination,
- [00:07:44.600]and it involves physical breakages between the chromosomes
- [00:07:47.330]and then stitching them together
- [00:07:49.300]to make a chimera that is a hybrid molecule
- [00:07:51.750]between the two parental copies.
- [00:07:53.570]So what's shown here is one complement
- [00:07:56.870]of the 23 chromosomes that we each have,
- [00:07:59.760]so all of us have 46 chromosomes in ourselves.
- [00:08:02.840]23 from Mom, 23 from Dad.
- [00:08:05.390]And what's shown here is a hypothetical example
- [00:08:08.250]of one compliment, let's say in a daughter,
- [00:08:11.110]and these are, what's labeled here are the colors,
- [00:08:15.470]are the segments that she inherited through her mother
- [00:08:19.510]from her maternal grandfather and her maternal grandmother.
- [00:08:22.970]And so you can see, for example here,
- [00:08:25.430]these three arrows indicate these recombination events
- [00:08:29.930]where she got the tip of one chromosome
- [00:08:33.380]from her maternal grandmother and then a bit
- [00:08:35.110]from maternal grandfather, grandmother, and so on.
- [00:08:38.290]You can see that some chromosomes are transmitted
- [00:08:41.080]unrecombined, so chromosome 11,
- [00:08:43.900]this hypothetical daughter only got
- [00:08:46.760]the chromosome 11 from her maternal grandfather
- [00:08:49.160]and no DNA on chromosome 11 from her maternal grandmother.
- [00:08:55.360]And this is only half of the situation,
- [00:08:58.330]there's another complement that's coming down
- [00:09:00.150]to the father that would similarly be scrambled.
- [00:09:02.630]So that's one way that this process is more stochastic.
- [00:09:08.500]And we can see this,
- [00:09:10.020]here's another way of representing this.
- [00:09:12.270]So here are 22 pairs of the autosome,
- [00:09:16.440]so those are the chromosomes that are not
- [00:09:18.280]the X and the Y that determine our sex.
- [00:09:21.260]And you can imagine, looking at,
- [00:09:23.560]this is my chromosome one
- [00:09:25.920]painted on to my dad's chromosomes.
- [00:09:28.440]And so what this kind of figure is showing,
- [00:09:32.460]for example, chromosome four here,
- [00:09:34.440]I inherited, half of it was from my dad's mom,
- [00:09:38.310]and half of it was from my dad's father.
- [00:09:41.510]And so that is evidence that there was a recombination event
- [00:09:43.850]right in the middle here of chromosome four,
- [00:09:45.970]in this example, you can see that for most
- [00:09:47.940]of the smaller chromosomes, there in fact
- [00:09:50.020]in this simulated example, there was no crossover,
- [00:09:53.220]and so only one or the other chromosome was inherited.
- [00:09:56.510]And so, if I were the only progeny,
- [00:09:59.520]the only member of the lineage,
- [00:10:01.490]then the alternate chromosome 14 would be lost.
- [00:10:06.460]So this leads to how we can use this kind of framework
- [00:10:11.540]to think about similarity between relatives
- [00:10:14.320]and putting similarities between siblings.
- [00:10:16.620]So here is, again, this picture of my chromosomes
- [00:10:19.580]painted on to my dad's genome.
- [00:10:21.630]This might be my brother's chromosomes
- [00:10:23.460]painted on to our dad's genome.
- [00:10:25.370]And then what you can see here is the overlap
- [00:10:27.790]between my genome and my brothers
- [00:10:30.850]with respect to the pieces that we inherited
- [00:10:32.550]from our father.
- [00:10:34.020]So, dark purple here
- [00:10:37.380]indicates that we inherited the same chromosome.
- [00:10:39.630]So for example, chromosomes 18 and 19,
- [00:10:42.180]both in this case, myself and my brother
- [00:10:45.550]inherited the same chromosome
- [00:10:47.250]whether it was from my dad's dad or my dad's mom.
- [00:10:50.680]Whereas for, where is it, chromosome 17,
- [00:10:55.330]we inherited the alternate chromosome,
- [00:10:57.350]so my brother got from Grandma
- [00:10:59.730]and I got from Grandpa or vice versa.
- [00:11:03.010]And so the more distantly related the two people,
- [00:11:07.610]the less overlap there would be,
- [00:11:09.320]so this overlap is gonna be maximal
- [00:11:11.860]in the case of close relatives like siblings.
- [00:11:16.840]All right, so these chunks of ancestry, then,
- [00:11:20.140]which we refer to as identical by descent,
- [00:11:23.240]these are what we're looking for,
- [00:11:24.750]and this is how we're gonna use these chunks
- [00:11:28.090]to estimate ancestry, and to also
- [00:11:32.660]try and understand how these DNA sequences
- [00:11:34.800]affect things we're interested in, phenotypes,
- [00:11:37.300]traits, health, wealth, height, what have you.
- [00:11:42.140]So the way that we do this is by looking at variation
- [00:11:44.630]in the DNA sequences, all right?
- [00:11:46.410]So this is an example of a kind of information
- [00:11:50.840]that we're looking at that we're storing in these computers.
- [00:11:53.450]So what we have here is imagine a DNA sequence
- [00:11:55.750]from a single gene or a single spot on a chromosome.
- [00:11:58.530]These ellipses indicate large gaps.
- [00:12:01.330]So this stretch might in fact
- [00:12:02.700]be millions of base pairs long,
- [00:12:04.010]and we have here sequences from 12 different individuals.
- [00:12:08.660]And so what may not be obvious but I will highlight
- [00:12:11.250]is that at some of these places,
- [00:12:13.500]individuals have different bases.
- [00:12:16.553]So at this position, some individuals
- [00:12:18.520]have a T and others have a C.
- [00:12:20.950]And this is one kind of variation that exists in the genome,
- [00:12:24.590]this is the most common kind of variation.
- [00:12:27.720]We call these variants
- [00:12:29.480]single nucleotide polymorphisms or SNPs.
- [00:12:33.950]And we can use, then, these SNPs
- [00:12:36.620]to identify regions of identity by descent.
- [00:12:39.630]So for example, we could use polymorphisms here
- [00:12:43.440]to instead of simulating this, we could actually sequence
- [00:12:46.660]my genome and my brother's genome and my dad's genome
- [00:12:49.377]and infer this kind of overlap.
- [00:12:54.730]All right, so
- [00:12:57.950]we as a species have genetic variation,
- [00:13:02.480]and on average, your maternal
- [00:13:05.470]and your paternal chromosomes differ
- [00:13:08.550]at one in 1000 bases,
- [00:13:11.080]or about 3 million SNPs across the genome.
- [00:13:14.160]So one way to say that is that you are likely to be
- [00:13:16.450]heterozygous, that is carry two different sequences,
- [00:13:20.410]two different bases at something like
- [00:13:22.580]3 million positions in your genome.
- [00:13:26.750]All right, so what does this variation do?
- [00:13:30.380]Most of it, at least SNPs, does nothing.
- [00:13:33.040]So most of these bases do not affect
- [00:13:35.720]any trait in any detectable way,
- [00:13:38.540]but some of them obviously do.
- [00:13:40.180]And so that genetic variation is
- [00:13:44.230]of singular importance to the medical community,
- [00:13:46.680]and we have made tremendous strides over the last
- [00:13:49.530]30, 40 years in understanding the genetic reasons
- [00:13:53.240]why people get sick,
- [00:13:54.870]understanding the bases for genetic diseases,
- [00:13:57.540]and I would say only more lately,
- [00:14:00.540]but in the last decade or so,
- [00:14:01.900]we've started to make significant progress
- [00:14:03.580]in turning that information into actual treatments.
- [00:14:07.610]So, an example of, maybe the most famous example,
- [00:14:11.870]is a disease called cystic fibrosis.
- [00:14:14.780]So cystic fibrosis is a very simple genetic disease,
- [00:14:17.520]it is caused by mutations in one gene,
- [00:14:20.530]the CFTR gene, which encodes a transmembrane receptor,
- [00:14:25.190]it's an ion channel, ion chlorite, ion transporter,
- [00:14:29.190]and defects in this gene prevent cells in the lung
- [00:14:32.330]from maintaining proper osmotic balance,
- [00:14:34.630]and that leads to mucus buildup and infections
- [00:14:36.680]and all of the symptoms of cystic fibrosis.
- [00:14:41.387]It was one of the first genes and disease causing mutations
- [00:14:44.320]discovered, it was sequenced in 1989,
- [00:14:47.310]and then two weeks ago,
- [00:14:49.740]the FDA just approved a triple drug therapy
- [00:14:54.600]that seems like it's got a lot of promise,
- [00:14:58.270]it seems like it's gonna be very effective.
- [00:15:00.510]Obviously you have to wait and see
- [00:15:02.440]10 years out, but it's very promising.
- [00:15:05.370]And this drug therapy
- [00:15:07.780]specifically targets the defects associated
- [00:15:11.230]with the mutations that cause cystic fibrosis.
- [00:15:14.370]So it's very unlikely that this therapy
- [00:15:16.770]would have been developed in the absence
- [00:15:18.470]of the information about this gene.
- [00:15:21.120]But note that it was 30 years from discovery of the gene
- [00:15:25.470]to a treatment, so this stuff doesn't come about overnight.
- [00:15:30.220]Another application of DNA sequence information
- [00:15:34.970]to medical treatment is
- [00:15:38.362]an emerging area called pharmacogenomics,
- [00:15:40.590]and that describes the interactions between
- [00:15:43.670]genetic variation in humans and response to drug treatment.
- [00:15:49.150]And so one example of this involves a gene called CYP2D6,
- [00:15:54.890]it doesn't matter what that is.
- [00:15:56.790]It's a gene that mostly has its action in the liver
- [00:16:00.000]and detoxifies things that we put into our body.
- [00:16:03.550]And so that includes small molecule drugs
- [00:16:06.170]that are taken for therapy.
- [00:16:08.740]And this gene, we have a number of different enzymes,
- [00:16:13.050]genes in our body that do this kind of detoxification.
- [00:16:16.030]This particular one is thought to metabolize
- [00:16:19.040]about 25% of all of the drugs that are prescribed,
- [00:16:22.650]and over the counter as well.
- [00:16:25.060]And it turns out that this gene
- [00:16:26.470]is incredibly variable among people.
- [00:16:29.810]And so there are more than 70 different variants,
- [00:16:32.300]and that those variations are not just
- [00:16:34.480]A's versus T's or C's versus G's,
- [00:16:36.970]it turns out that at this gene,
- [00:16:38.240]one of the ways that humans are variable
- [00:16:39.810]is how many copies of this gene you have.
- [00:16:43.020]So some people have on a given chromosome, one copy,
- [00:16:46.650]some people have copies that are more or less defective,
- [00:16:49.570]some people have zero copies, it's gone,
- [00:16:51.980]those people are more or less fine.
- [00:16:54.760]Some people have two, and then some people
- [00:16:56.410]have up to 14 copies.
- [00:16:58.660]And copy number variation at this gene
- [00:17:01.610]has a strong effect on how rapidly you metabolize
- [00:17:04.960]the drugs that are the target of this gene.
- [00:17:07.800]So having too few copies
- [00:17:10.020]leads to a buildup of the drug and can cause toxicity.
- [00:17:14.460]Having many copies can cause the drug
- [00:17:17.620]to be metabolized too quickly to be effective,
- [00:17:19.600]or can also instead cause side effects.
- [00:17:23.840]And so there are now, if you, for example,
- [00:17:26.480]go to the Mayo Clinic's website,
- [00:17:28.730]they have a whole bunch of tests that you can do
- [00:17:32.800]to look for interactions between CYP2D6
- [00:17:36.780]or related gene CYP2C19
- [00:17:40.250]and a whole bunch of different drugs.
- [00:17:43.020]And if you have a particular genotype that's CYP2D6,
- [00:17:47.830]they recommend that codeine not be used
- [00:17:50.870]because you're gonna metabolize it too quickly
- [00:17:52.780]or too slowly, and they suggest
- [00:17:54.680]a different drug for that application.
- [00:17:58.570]So this has a lot of promise
- [00:18:01.690]for reducing side effects, adverse drug reactions,
- [00:18:06.830]and just sort of basically understanding
- [00:18:09.030]how our bodies deal with these chemicals
- [00:18:11.620]that we put in them, often for very important,
- [00:18:14.700]necessary uses.
- [00:18:20.370]A third medical application of sequencing is for cancer.
- [00:18:24.780]So cancer is a disease
- [00:18:29.160]that results from ourselves sort of deciding
- [00:18:33.530]that they're gonna strike out on their own.
- [00:18:35.760]They don't need to listen to our bodies anymore.
- [00:18:38.470]And as a result, they cause problems
- [00:18:40.910]when they invade other tissues.
- [00:18:42.970]And we've discovered that cancer
- [00:18:46.320]is maybe hundreds of different diseases,
- [00:18:48.710]at least at the genetic level.
- [00:18:50.750]And at one level, that's very frustrating
- [00:18:53.450]because that means that there's not probably going to be
- [00:18:56.000]a cure for all cancers.
- [00:18:59.310]But on the other hand,
- [00:19:00.860]once we've started to disentangle the various
- [00:19:03.400]genetic mechanisms that lead to cancer,
- [00:19:05.720]that has led to some very promising therapies
- [00:19:09.220]that are specific, but in the context
- [00:19:12.560]of a particular tumor type can be very effective.
- [00:19:15.550]So one example of this are non-small cell lung cancers.
- [00:19:21.080]There are three mutations
- [00:19:23.280]that are very common among this type of cancer.
- [00:19:25.870]They can be in an EGFR gene, this ALK gene,
- [00:19:29.810]or this ROS1 proto-oncogene,
- [00:19:32.300]and depending on which combination of these mutations
- [00:19:36.030]a patient has, different drugs can be prescribed.
- [00:19:40.120]So there are drugs that are specific for this cancer
- [00:19:42.510]that are EGFR positive, there are drugs
- [00:19:45.340]that are specific for ALK positive cells, and so on.
- [00:19:48.770]And so by doing a biopsy and by sequencing the tumors,
- [00:19:52.800]we can identify the specific mutations
- [00:19:55.180]that led to that cancer in that patient.
- [00:19:57.600]And again, it seems like targeting specific cancers
- [00:20:01.960]with specific drugs seems to be much more effective than
- [00:20:06.010]sort of broad spectrum chemotherapeutic approaches
- [00:20:09.360]which try to kill all cancer cells, and as a consequence,
- [00:20:13.500]take a big toll on non-cancer cells.
- [00:20:18.730]All right.
- [00:20:19.563]So, that's just the tip of the iceberg.
- [00:20:22.800]I mean, there are lots and lots of
- [00:20:26.100]applications of DNA sequencing to medicine,
- [00:20:28.350]I haven't even talked about,
- [00:20:29.490]and I don't have time to talk about gene therapy,
- [00:20:32.450]but there are a couple of diseases that we now
- [00:20:34.970]can go into cells and fix them
- [00:20:36.940]and return those cells to the patient,
- [00:20:38.820]and that seems to be a cure.
- [00:20:41.980]So some early inherited leukemias, for example.
- [00:20:47.270]But I wanna point out, highlight,
- [00:20:49.330]that something like cystic fibrosis is genetically unusual,
- [00:20:53.380]that is diseases that are as simple genetically
- [00:20:57.140]as cystic fibrosis are definitely the exception, right?
- [00:20:59.870]So cystic fibrosis, there is the CFTR gene
- [00:21:02.020]that's on chromosome seven.
- [00:21:03.370]If you have two copies of a mutant allele,
- [00:21:06.210]you will get cystic fibrosis.
- [00:21:07.990]If you have one copy of a non-mutant allele, you will not.
- [00:21:11.090]And to a first approximation, that's the whole story.
- [00:21:14.060]But that is very unusual.
- [00:21:15.390]So for most health related traits,
- [00:21:19.020]and indeed for most traits that we're interested,
- [00:21:22.340]the situation is a lot more complicated, right?
- [00:21:24.840]So for most things, diseases that you might get,
- [00:21:28.630]phenotypes that you might show,
- [00:21:30.410]that is regulated, controlled by lots of genes
- [00:21:33.880]on lots of chromosomes all over the genome.
- [00:21:37.150]And the effect of each of these variants
- [00:21:39.710]on that phenotype is small.
- [00:21:42.460]And sometimes it matters whether you get
- [00:21:44.640]this one and this one or this one and this one,
- [00:21:46.540]so they interact, and they interact in ways
- [00:21:49.190]that are often very unpredictable.
- [00:21:51.280]So it's not as if we can simply sequence everyone's genome
- [00:21:55.590]and know exactly how their life is going to go
- [00:21:59.630]with respect to health, or in fact, anything else.
- [00:22:03.880]So we have methods now, very sophisticated,
- [00:22:07.510]powerful methods for understanding the genetic basis
- [00:22:11.830]of things that are more complicated than cystic fibrosis.
- [00:22:15.630]And these can be lumped under the term
- [00:22:18.770]genetic association studies.
- [00:22:20.680]So the concept behind these is very simple.
- [00:22:24.050]We have variation in sequences, we have these SNPs,
- [00:22:28.650]and we simply look at the sequences
- [00:22:32.540]over the entire genome for
- [00:22:35.010]let's say individuals who are healthy,
- [00:22:37.040]in individuals, if we're thinking about a disease,
- [00:22:38.830]who are affected.
- [00:22:40.720]And what we look for are polymorphisms
- [00:22:43.850]where they are enriched among affected individuals
- [00:22:47.430]relative to non-affected individuals.
- [00:22:50.110]So in this cartoon example, you may notice here,
- [00:22:54.180]the C allele at this site
- [00:22:56.510]is perfectly associated with the phenotype,
- [00:23:00.650]so all of the affected individuals have C,
- [00:23:02.950]and all of the non-affected individuals have T.
- [00:23:06.020]And so in this context, we would say
- [00:23:08.310]that this SNP likely is not causal
- [00:23:12.600]but is likely very close to a causal polymorphism
- [00:23:16.890]that leads individuals to get this condition.
- [00:23:21.650]Now, this is obviously a made up cartoon
- [00:23:24.640]and it's never that clean.
- [00:23:28.540]In fact, we now have studies that are quite large
- [00:23:31.690]so we can detect very subtle effects.
- [00:23:34.630]So it may be that the frequency of the C allele
- [00:23:37.170]among affected individuals is 10%,
- [00:23:40.510]but the frequency of the C allele among
- [00:23:42.670]non-affected individuals is 80%,
- [00:23:45.060]and nonetheless that may be a sufficient difference
- [00:23:48.410]for us to be able to detect that, and it is real.
- [00:23:51.130]But it means that that variant
- [00:23:53.220]contributes very little to the outcome by itself.
- [00:23:59.010]So to highlight this,
- [00:24:02.460]this figure's a little dark, but I really love this figure.
- [00:24:05.730]This is showing a trait
- [00:24:09.300]that is more typical of a lot of traits in humans,
- [00:24:12.310]and that is height.
- [00:24:13.800]So height, like many other traits in humans,
- [00:24:16.070]is not on or off, it's a continuous trait.
- [00:24:19.250]And so this is, I believe, a statistics class
- [00:24:22.920]that has lined up according to their height.
- [00:24:26.170]So the shortest individuals are here,
- [00:24:28.110]the tallest individuals are here, I have to guess
- [00:24:30.640]that there was someone in the class who was 4'10
- [00:24:33.150]but decided not to show up on the day of the picture,
- [00:24:36.110]perhaps understandably.
- [00:24:38.580]And so this highlights the variation that we see
- [00:24:42.400]in populations, continuous variation.
- [00:24:44.750]And height is actually a trait
- [00:24:46.760]that has been studied intensively, and then probably
- [00:24:49.840]the biggest association study to date
- [00:24:52.110]has been done on human height.
- [00:24:53.950]And so this has been done a lot of times,
- [00:24:55.970]there have been some great meta analyses,
- [00:24:57.790]and we now think that human height is detectably influenced
- [00:25:02.010]by over 3000 genes across the genome.
- [00:25:05.550]So it is truly polygenic.
- [00:25:07.290]There are tons of sites everywhere that affect height.
- [00:25:10.730]Each one only changes an individual's height
- [00:25:13.870]by a millimeter or a fraction of a millimeter,
- [00:25:16.930]and then the total height, then,
- [00:25:19.700]is the sum of all of those effects.
- [00:25:21.650]But even when we sum all of those up,
- [00:25:23.440]that together accounts for something like a quarter
- [00:25:26.040]of the variation among individuals.
- [00:25:28.340]So in this room, there's variation in height,
- [00:25:30.870]and maybe a quarter of that variation
- [00:25:33.150]can be explained by the genes that those individuals have.
- [00:25:35.960]The rest of that variation we attribute to the environment,
- [00:25:39.820]which means something other than the genes.
- [00:25:42.150]So that would be what you ate growing up,
- [00:25:44.890]that would be any number of other
- [00:25:47.440]non-genetic influences that determine your final height.
- [00:25:52.640]The reason I really love this figure is it illustrates
- [00:25:54.732]the multifarious nature of genetic control of traits,
- [00:25:58.990]because this is a continuous trait,
- [00:26:01.130]there's 3000 different loci
- [00:26:03.870]are in this population that are affecting height,
- [00:26:06.450]but there is also one gene
- [00:26:07.900]that has an enormous effect on height.
- [00:26:11.096]So all of the men here are dressed in blue,
- [00:26:13.570]all of the women are wearing white,
- [00:26:15.200]and you can see quite clearly the huge effect
- [00:26:17.350]of having a Y chromosome.
- [00:26:19.350]And so this is true not just of height,
- [00:26:20.980]but this is true of many other traits.
- [00:26:23.950]Sex has a big effect,
- [00:26:25.560]biological sex has a big effect on health.
- [00:26:27.820]And so this is just to keep in mind,
- [00:26:30.570]there are big things and there are small things,
- [00:26:32.670]and it's not one or the other.
- [00:26:40.282]So this idea has been encapsulated in this review figure.
- [00:26:47.060]There are rare disease alleles that cause Mendelian disease
- [00:26:50.430]that have certain large effects, but those are very rare.
- [00:26:53.660]So very few diseases are like cystic fibrosis.
- [00:26:57.560]There are many common diseases,
- [00:27:01.030]but those tend to have a genetic architecture
- [00:27:05.430]where they are controlled by many, many genes
- [00:27:07.890]and each gene contributes a very, very small effect.
- [00:27:10.640]And so obviously, these kinds of traits,
- [00:27:14.120]these kinds of diseases are much trickier to work with
- [00:27:17.400]in terms of genetic prediction or in terms of thinking about
- [00:27:20.320]should we use genetic information to target therapies.
- [00:27:27.650]All right.
- [00:27:29.320]It gets worse than that
- [00:27:32.300]with respect to predicting human phenotypes,
- [00:27:35.490]human traits based on genetics.
- [00:27:38.100]And that's because so much of our biology
- [00:27:42.030]is affected by our behavior.
- [00:27:44.460]And our behavior, while there is definitely
- [00:27:47.220]a genetic component to lots of our behavior,
- [00:27:49.510]most of our behavior is learned
- [00:27:51.920]and culturally transmitted through families
- [00:27:54.480]and through our society.
- [00:27:56.000]And so this was revealed by a really cool study last year
- [00:28:00.820]that was done by Ancestry.com, which is,
- [00:28:03.870]I'll talk about them a little more later,
- [00:28:05.590]but one of these sites where you can upload,
- [00:28:08.700]where you can get a tests that will
- [00:28:11.500]genotype you across the genome to look for
- [00:28:14.470]regions that indicate where your ancestors may have lived,
- [00:28:17.300]and then Calico which is Google's aging research company.
- [00:28:22.680]So they combined the information,
- [00:28:25.970]and so Ancestry both has DNA, but also individuals
- [00:28:28.790]can upload information that they got simply from pedigrees.
- [00:28:32.570]So, taking family histories.
- [00:28:34.080]And so they were able to put together
- [00:28:35.640]all of this information and came up with a single pedigree
- [00:28:39.420]that consisted of 400 million individuals
- [00:28:42.520]that went back, I don't know, 10 generations or so.
- [00:28:46.080]And so they had a little bit of demographic information
- [00:28:50.440]including how long these people lived.
- [00:28:52.720]So they thought, "All right, let's see if we can do a study
- [00:28:55.007]"estimating the heritability of longevity,"
- [00:28:58.280]that is, and for this study they didn't use
- [00:29:00.690]any DNA information, they just used families.
- [00:29:03.460]So the idea here is
- [00:29:06.210]if something runs in a family
- [00:29:08.850]then that could be because it's genetic,
- [00:29:10.730]that could be because it's culturally transmitted,
- [00:29:12.970]but either way, we would call that the heritability,
- [00:29:15.640]that is the correlation between relatives.
- [00:29:17.910]So they had to, it's a complicated data set
- [00:29:20.490]and they had to do a whole bunch of stuff
- [00:29:22.580]to make sure they were doing it right,
- [00:29:24.730]including sort of stratifying by decade,
- [00:29:26.720]which the heritability of longevity
- [00:29:28.910]shows a lot of variation across decades.
- [00:29:31.430]No idea why that is, it's kind of interesting.
- [00:29:34.440]But they found that the sort of 20 to 30% range
- [00:29:39.870]seemed to be the effect of heritability on longevity,
- [00:29:43.786]that is about 20 to 30% of the variation in lifespan
- [00:29:47.100]could be predicted by which family you are in.
- [00:29:51.300]And so they could see that it was pretty consistent
- [00:29:54.030]regardless of the degree of the relatives, right?
- [00:29:56.170]So for siblings, you predict a certain correlation
- [00:29:59.670]based on their genetic similarity.
- [00:30:01.370]For cousins, you predict a lower correlation
- [00:30:05.100]because they share fewer genes.
- [00:30:07.430]But if you correct for that, the heritability should be,
- [00:30:10.400]again, they have estimated roughly 20 to 30%.
- [00:30:13.510]Okay.
- [00:30:15.020]But they noticed a couple of other patterns
- [00:30:18.130]that are a little hard to square with this.
- [00:30:21.030]One is that spouses tended to have
- [00:30:24.910]more similar lifespans than siblings.
- [00:30:28.790]So you share on average half of your genes
- [00:30:31.100]with your siblings.
- [00:30:33.020]On average, how many genes do you share with your spouse?
- [00:30:37.360]Hopefully zero, right.
- [00:30:41.800]Yet the correlation between
- [00:30:45.070]spouse lifespan is much higher than siblings.
- [00:30:49.660]Now, notice there are four kinds of siblings here,
- [00:30:52.660]or three, rather.
- [00:30:54.040]Brother brother, sister sister, and then brother sister.
- [00:30:59.200]And you notice the effect of sex here,
- [00:31:01.270]where brother sister lifespans are less correlated
- [00:31:04.610]than same sex siblings, so they're comparing this
- [00:31:08.327]to the opposite sex siblings, which makes sense.
- [00:31:11.950]But still, this suggests that there is
- [00:31:15.320]a strong effect of shared environment,
- [00:31:17.690]and this isn't actually that surprising.
- [00:31:21.960]You obviously get behaviors from the people you live with.
- [00:31:26.010]And so lifestyle choices probably account
- [00:31:29.730]for this correlation between spouses.
- [00:31:32.970]However, they also noticed
- [00:31:35.960]that when they were looking at correlations
- [00:31:38.300]between cousins and cousins-in-law,
- [00:31:42.360]so cousins, you share some genes with them,
- [00:31:44.420]cousins-in-law you do not.
- [00:31:46.670]They noticed that the correlations were pretty similar.
- [00:31:50.750]So in-law relatives have lifespan correlations
- [00:31:53.570]that are almost as strong as blood relative correlations.
- [00:31:57.410]So you might think that that, again,
- [00:31:59.570]is evidence for shared culture, for cultural transmission
- [00:32:03.440]of traits, behaviors that lead to differences in lifespan.
- [00:32:08.320]But they point out that for some of these cousins,
- [00:32:12.300]these are relatives who
- [00:32:15.040]traditionally don't share a household.
- [00:32:17.590]So these are distant enough that,
- [00:32:19.540]at least in the last few generations,
- [00:32:21.900]they probably weren't living together.
- [00:32:23.600]And so they raised an alternative explanation
- [00:32:26.530]for this correlation, which was assortative mating.
- [00:32:30.470]That is the idea that individuals who have behaviors,
- [00:32:33.670]traditions, practices that lead to a long life span
- [00:32:37.520]tend to marry unrelated individuals
- [00:32:39.810]who have similar cultural behavioral practices.
- [00:32:45.040]And vice versa for individuals
- [00:32:46.960]who tend to have shorter lifespans.
- [00:32:49.230]And that, if you think about
- [00:32:51.160]sort of how our societies are stratified
- [00:32:53.200]and how we pair up, that also makes good sense.
- [00:32:57.510]But that kind of assortative mating
- [00:32:59.840]is definitely a problem for genetic association studies.
- [00:33:03.840]Sort of the first, one of the first assumptions
- [00:33:05.980]you need to make for genetic associations
- [00:33:07.950]and for looking at heritability across pedigrees
- [00:33:10.330]is that individuals who come together to produce
- [00:33:13.510]the next generation don't share an environment.
- [00:33:15.930]And this suggests, actually, a quite strong
- [00:33:18.340]sharing of environment.
- [00:33:19.780]Now, this may be that lifespan is the worst trait for this
- [00:33:23.080]because lifespan is gonna be the sum of a lot of behaviors
- [00:33:26.530]over an individual's entire life,
- [00:33:29.090]so maybe for other traits it wouldn't be as bad,
- [00:33:31.340]but it still highlights the danger of
- [00:33:35.610]putting too much of this on the DNA itself.
- [00:33:41.040]All right, so, one moment.
- [00:33:57.580]A lot of the DNA sequencing that has been done
- [00:33:59.600]in the last decade or so has been done by scientists,
- [00:34:03.150]by medical practicers, by clinicians.
- [00:34:07.330]But there has recently been a real explosion
- [00:34:10.090]in the direct to consumer genetic testing industry.
- [00:34:15.020]And so 23andMe and Ancestry are two of the biggest hitters.
- [00:34:19.740]These are companies where, as I'm sure all of you know,
- [00:34:23.080]you send off a DNA sample, you spit in a tube
- [00:34:25.570]and they genotype you at over 700,000 SNPs,
- [00:34:31.760]so, polymorphisms in your genome.
- [00:34:34.470]And then will tell you, first of all,
- [00:34:39.720]in their database of other individuals who have done this,
- [00:34:43.220]do you have any relatives, can you find close matches,
- [00:34:46.270]can you find more distant matches?
- [00:34:48.550]They can tell you a little bit about
- [00:34:51.070]where your ancestors may have lived.
- [00:34:53.410]And then they are starting to get into,
- [00:34:57.130]in fits and starts,
- [00:35:00.430]giving you information about SNPs that, in some contexts,
- [00:35:03.730]have been linked to phenotypes,
- [00:35:06.010]including health-related phenotypes.
- [00:35:09.220]This has been sort of a tussle with the FDA
- [00:35:11.560]because 23andMe does not wanna be
- [00:35:15.030]a genetic counseling service.
- [00:35:18.250]So they weren't doing it and now they're doing it again
- [00:35:20.740]and we'll see how it goes in the future.
- [00:35:23.250]In addition, there are a number of companies,
- [00:35:27.590]websites, databases that don't actually do the sequencing,
- [00:35:32.240]but when you use one of these kits,
- [00:35:35.960]you can download your data.
- [00:35:37.820]And then you can upload it again to a place
- [00:35:41.070]like GEDmatch or FamilyTreeDNA.
- [00:35:46.070]And these, again, the point of these
- [00:35:49.560]are to use your DNA to look for relatives,
- [00:35:52.960]and to look to see if you have
- [00:35:56.050]alleles that indicate something about
- [00:35:58.310]where your recent ancestors might have lived.
- [00:36:05.240]So this works because
- [00:36:09.880]over recent human histories, over deeper human history
- [00:36:13.340]we kind of spread out over the globe,
- [00:36:15.240]and then until recently, didn't move too, too much.
- [00:36:19.210]And so that meant that there are alleles
- [00:36:21.820]that got to different places
- [00:36:23.530]or reached different frequencies in different places,
- [00:36:26.790]and those alleles, although they are a minority
- [00:36:29.880]of the polymorphisms in the genome,
- [00:36:31.590]there are plenty of them, and so those can be used
- [00:36:34.610]to find out where your ancestors
- [00:36:37.180]that contributed that DNA may have resided.
- [00:36:39.470]So this is a fairly famous figure from a famous paper now
- [00:36:42.910]in 2008 looking at genetic variation in Europeans.
- [00:36:47.180]And so what's plotted here is a
- [00:36:50.700]statistical summary of the genetic data,
- [00:36:54.240]principal components, so these data have been summarized
- [00:36:57.070]down into two major axes that capture
- [00:36:59.680]most of the variation between individuals
- [00:37:02.710]in their DNA sequences.
- [00:37:04.890]And they are orthogonal to each other,
- [00:37:07.810]and then lo and behold,
- [00:37:11.410]each individual here, so each pair of letters
- [00:37:15.330]is an individual, and their location on this axis,
- [00:37:20.550]on this plot is based on their genetic variation,
- [00:37:23.750]and then their color of the point or the letters
- [00:37:27.010]is where they reside.
- [00:37:30.930]And you can see that there is a striking relationship
- [00:37:34.160]between their genetic variation on this plot
- [00:37:37.930]and their geography of Europe.
- [00:37:40.020]So the colors here match,
- [00:37:42.110]these are individuals from Spain and Portugal.
- [00:37:44.920]These are individuals from Russia,
- [00:37:46.780]Finland, Ireland, Great Britain.
- [00:37:49.600]And so this striking correspondence
- [00:37:52.250]between genetic variation and geography
- [00:37:55.320]exists because for a number of generations,
- [00:37:58.490]individuals didn't move around too much.
- [00:38:01.090]And so those alleles that were at higher frequency
- [00:38:03.920]in this spot stayed there,
- [00:38:05.940]and consequently those individuals,
- [00:38:08.450]then when they emigrated to the United States
- [00:38:11.050]and had children and grandchildren, great grandchildren,
- [00:38:13.750]you can see in the DNA sequences of those individuals
- [00:38:17.930]the signature of where they came from.
- [00:38:21.830]So these databases have really been amazing
- [00:38:25.430]for finding relatives.
- [00:38:27.660]They've been used a lot by people who were adopted
- [00:38:30.300]to find their biological parents.
- [00:38:33.050]But they have also proven to be a boon for law enforcement.
- [00:38:37.950]And so this was really dramatically shown
- [00:38:41.280]recently with the identification
- [00:38:45.000]of the Golden State Killer, a serial killer
- [00:38:47.130]who terrorized California for decades.
- [00:38:50.090]And so the way that they found him
- [00:38:53.390]was there was DNA that was left at the crime scenes
- [00:38:56.540]way back then, and law enforcement uploaded
- [00:39:00.750]that DNA to this database,
- [00:39:03.850]which had, at the time, about a million profiles in it.
- [00:39:08.350]And they were able to find a partial match.
- [00:39:10.870]So this was a relative who was a,
- [00:39:13.970]they shared a great great great great grandfather,
- [00:39:17.090]so a third or fourth level cousin.
- [00:39:21.050]And that, they were then able to build from that
- [00:39:23.750]a family tree, so all of the individuals
- [00:39:26.430]who are connected between the DNA found at the crime scene
- [00:39:29.470]and this individual in the database,
- [00:39:31.530]and that was like 1000 people.
- [00:39:33.440]And from those 1000 people, they were then able to use
- [00:39:37.690]age and sex and where they lived
- [00:39:40.670]to narrow it down to very few candidates.
- [00:39:44.070]And ultimately, the alleged
- [00:39:48.910]individual was identified by
- [00:39:53.990]going to his environs
- [00:39:56.790]and taking DNA from his trash or his car door handle.
- [00:40:02.990]And that then matched the DNA
- [00:40:04.700]that was taken from the crime scene.
- [00:40:07.040]So, this has revealed a lot of
- [00:40:11.630]really sticky issues regarding genetic privacy.
- [00:40:16.170]And so this is basically,
- [00:40:20.290]it was said and I agree with this, the wild wild west.
- [00:40:24.150]We don't know what the rules are,
- [00:40:26.420]we don't know what the rules should be,
- [00:40:28.110]and the rules are changing under our feet.
- [00:40:30.980]So shortly after this,
- [00:40:34.950]GEDmatch changed their terms
- [00:40:37.540]so that you now have to voluntarily opt in
- [00:40:41.380]if you want your DNA to be publicly available.
- [00:40:44.200]They also said we don't want law enforcement
- [00:40:46.840]pretending to be someone uploading fake profiles
- [00:40:50.480]in order to find people.
- [00:40:52.080]Both Ancestry and 23andMe take,
- [00:40:55.330]or at least have said they've taken,
- [00:40:56.950]strong precautions to prevent this sort of thing,
- [00:40:58.980]so individual's DNA profiles for those companies
- [00:41:02.900]should be unavailable to law enforcement or to the public.
- [00:41:08.800]But I think it's really worth questioning
- [00:41:14.520]what level of genetic privacy do we expect is ideal,
- [00:41:19.680]is the right balance between protecting individuals
- [00:41:22.690]and allowing law enforcement
- [00:41:25.080]access to what is undoubtedly a very powerful tool?
- [00:41:29.090]So, one concern that's come up
- [00:41:31.600]are ways in which these databases can be attacked.
- [00:41:35.950]And so just last week, couple weeks ago,
- [00:41:39.840]two papers were published showing ways that you can
- [00:41:43.310]identify individuals, anonymous individuals in a database
- [00:41:47.390]if you upload lots of pretend genomes,
- [00:41:51.800]pretend DNA that has particular
- [00:41:54.880]SNPs that you know that you're looking for.
- [00:41:57.320]And as of the time that these were published,
- [00:42:00.500]they contacted Ancestry and GEDmatch
- [00:42:02.957]and some of these other databases,
- [00:42:04.580]and the ways that they described to do this,
- [00:42:08.000]they could do it.
- [00:42:08.880]So there are safeguards that these companies,
- [00:42:11.520]that these databases could take to prevent this,
- [00:42:13.940]but they have not done so yet.
- [00:42:18.780]It's also, I think, worth considering that last step
- [00:42:22.690]in identifying the Golden State Killer.
- [00:42:25.230]We leave DNA everywhere.
- [00:42:28.280]Like, I leave this room, you could come up to this podium
- [00:42:31.380]and prove that I gave this talk, right?
- [00:42:33.770]We shed cells, those cells contain our DNA,
- [00:42:37.260]and it goes away eventually,
- [00:42:39.410]but it would be trivial for law enforcement or anyone else
- [00:42:42.900]to go to your garbage, to go to your car,
- [00:42:45.720]to go to something that you've touched and collect your DNA.
- [00:42:49.910]So is that part of the privacy,
- [00:42:52.810]am I entitled to people staying away from the cells
- [00:42:55.513]that I'm shedding into the environment?
- [00:42:58.160]And then as of two hours ago,
- [00:43:00.970]the New York Times reported that a judge issued a warrant
- [00:43:06.050]to a police officer for GEDmatch.
- [00:43:10.610]So now under this warrant, it doesn't matter
- [00:43:14.680]whether individuals have opted out or not,
- [00:43:17.510]it doesn't matter whether the owners of GEDmatch
- [00:43:20.020]said you can't do that, they are compelled
- [00:43:22.680]by this warrant to comply with law enforcement.
- [00:43:26.190]And so it may be that in the coming months,
- [00:43:29.890]it seems like if the public and government
- [00:43:32.840]sides on the side of law enforcement,
- [00:43:35.040]it may be that none of these databases are secure.
- [00:43:39.720]And it's worth considering also,
- [00:43:42.130]you may not have done one of these tests,
- [00:43:45.110]you may not have uploaded your profile to GEDmatch,
- [00:43:49.710]but has anyone even remotely closely related to you?
- [00:43:53.950]If a third or a fourth cousin, like I have no idea
- [00:43:55.890]who my third or fourth cousins are.
- [00:43:57.650]But if they've put their data up there,
- [00:43:59.560]then you can be found.
- [00:44:01.890]So this is really, we're at a critical moment,
- [00:44:05.230]I think, for these privacy concerns.
- [00:44:09.390]Okay, I wanna end here by talking a little bit about
- [00:44:12.870]the nature of human genetic variation.
- [00:44:17.110]There are differences between populations,
- [00:44:19.710]so we originated in Africa, there we are,
- [00:44:23.040]and spread all over the world.
- [00:44:25.000]We share a common ancestor in Africa
- [00:44:27.040]a few hundred thousand years ago.
- [00:44:29.640]Since then we have diverged
- [00:44:31.600]from each other in various phenotypes,
- [00:44:34.580]and those tend to be the ones that we often focus on.
- [00:44:38.970]But while those differences are real and genetically based,
- [00:44:44.230]they are actually the result of a very small fraction
- [00:44:47.380]of the variation that's in the human genome.
- [00:44:49.930]So what's shown here are pie charts
- [00:44:52.800]indicating within each of these populations
- [00:44:55.790]how the genetic variation
- [00:44:57.120]is apportioned across populations.
- [00:44:59.740]So in these pie charts, the gray colors are variants,
- [00:45:04.120]polymorphisms that are shared either across all continents
- [00:45:07.200]or across multiple continents.
- [00:45:08.500]So these are people in Yoruba have A's and T's,
- [00:45:13.900]people in Finland have A's and T's at that site.
- [00:45:17.650]So those are polymorphisms
- [00:45:19.270]that are shared across all humans.
- [00:45:22.160]And then the colored pies are either private to a continent,
- [00:45:26.830]so Africa or Asia or Europe or the Americas,
- [00:45:30.000]or private to a population within,
- [00:45:32.230]and you can see that in most of these circles,
- [00:45:35.530]it is dominated by the gray colors, right?
- [00:45:37.710]So, particularly in Europe,
- [00:45:39.510]very small fraction of the variants
- [00:45:41.370]are private to those populations.
- [00:45:44.390]Now those, again, are responsible
- [00:45:47.010]for some phenotypic differences that we have
- [00:45:49.450]traditionally focused on and thought of as significant.
- [00:45:52.490]So it's not to say that there aren't phenotypic differences,
- [00:45:55.180]just that most of that variation is not specific
- [00:45:58.640]to a population, it's shared by all people.
- [00:46:03.490]Another couple of points I wanna make about genealogies
- [00:46:08.380]that I think are, at least were
- [00:46:09.530]a little counterintuitive to me,
- [00:46:11.910]thinking about all of our ancestors, on average,
- [00:46:14.170]I said, one over two to the K of our genome
- [00:46:16.860]is inherited from each ancestor
- [00:46:18.590]that lived K generations ago.
- [00:46:21.160]But that average, again, is misleading
- [00:46:23.100]because of the stochastic nature
- [00:46:24.510]of transmission of chromosomes.
- [00:46:27.180]So, right, remember here,
- [00:46:29.860]for chromosomes 20 through 14, or 13 in this example,
- [00:46:33.960]I inherited either from Dad, from his mom
- [00:46:36.540]or from his dad, but not pieces of both.
- [00:46:40.670]And as a result, this is what your genome
- [00:46:43.760]actually looks like in these ancestors.
- [00:46:47.470]And so you can see that right at about generation 10,
- [00:46:51.740]in fact, most, the majority of your genealogical ancestors,
- [00:46:57.530]you carry none of their DNA.
- [00:47:00.100]And the further back you go, the worse that gets,
- [00:47:02.880]or the more extreme that gets.
- [00:47:05.305]So the further back you go, the more ancestors you have,
- [00:47:08.510]but most of them will have contributed
- [00:47:10.490]none of the DNA in your cells.
- [00:47:12.940]Now, that doesn't make them any less your ancestors,
- [00:47:15.720]that doesn't mean that they weren't instrumental
- [00:47:18.860]in your family being who it is,
- [00:47:21.710]like all of them had to survive and have offspring
- [00:47:23.790]for you to be here, but not all of them
- [00:47:26.840]transmitted their DNA to you.
- [00:47:29.340]And in fact, this is, again, a simulation,
- [00:47:31.680]so this is one example, but you can see here
- [00:47:33.770]down on the bottom is the paternal line.
- [00:47:36.850]So this is my father, my father's father,
- [00:47:38.550]father's father, et cetera.
- [00:47:40.290]And you can see in this example,
- [00:47:42.500]after one, two, three, four, five, six,
- [00:47:44.640]in the seventh generation,
- [00:47:47.390]there's no material inherited from that individual.
- [00:47:50.050]So because our society is patriarchal,
- [00:47:52.490]that's probably where my name would come from,
- [00:47:55.000]my Y chromosome maybe,
- [00:47:57.670]these are the autosomes, not the Y chromosome,
- [00:47:59.580]but other than that, I have no genetic
- [00:48:02.340]inheritance from that individual.
- [00:48:04.720]And you can see, actually, the same thing
- [00:48:06.300]along the matra line happens at this generation.
- [00:48:09.230]So we have a lot of genealogical ancestors,
- [00:48:11.540]we have many fewer genetic ancestors.
- [00:48:15.600]The flip side of that is that,
- [00:48:19.641]remember we have two to the K ancestors
- [00:48:23.010]K generations ago, that means 30 generations back,
- [00:48:26.300]which is like 700, 800 years ago,
- [00:48:30.430]each of us has 1 billion ancestors.
- [00:48:33.290]But, of course, there weren't
- [00:48:34.590]a billion people on the planet then.
- [00:48:37.050]So what that means is that at some point
- [00:48:38.830]you start to have repeat ancestors, right?
- [00:48:41.160]So if this is a pedigree, here I am, my parents,
- [00:48:43.500]my four grandparents, going back and back,
- [00:48:45.990]at some point, there are individuals
- [00:48:47.750]who are your ancestor multiple times over.
- [00:48:50.950]And again, as we go further back in time
- [00:48:53.410]and the number of genealogical ancestors
- [00:48:56.270]continues to increase exponentially,
- [00:48:58.860]it will be the case that
- [00:49:01.880]most of your ancestors are repeat ancestors.
- [00:49:04.830]And in fact, it's been mathematically determined
- [00:49:07.540]that in a population of about 100,000,
- [00:49:10.830]after 29 generations back,
- [00:49:14.090]everyone who lived then and has any descendants
- [00:49:18.060]is an ancestor for everyone in the population.
- [00:49:21.530]Now, it's not clear what the human population size
- [00:49:24.760]should be if it was historically 100,000
- [00:49:27.140]or something different, but it's not gonna be far off.
- [00:49:30.270]So when we think about Ancestry and you do your DNA test
- [00:49:33.680]and you say, "My ancestors came from
- [00:49:35.837]"Belarus or from Mexico,"
- [00:49:38.900]that's your ancestors a few generations back.
- [00:49:41.900]You only have to go back a few more
- [00:49:44.200]before that doesn't really mean anything.
- [00:49:46.770]You only have to go back 30 or 40 generations
- [00:49:48.930]before it's clear that we all have
- [00:49:50.670]the same set of ancestors.
- [00:49:54.220]So, if we go back even further,
- [00:49:57.720]DNA sequencing has taught us that
- [00:50:01.190]there are some branches on the human tree
- [00:50:03.210]that we thought sort of went extinct,
- [00:50:05.020]but it turns out that they didn't really.
- [00:50:07.340]So we've sequenced ancient DNA sequences
- [00:50:10.200]from Neanderthal and Denisovan fossils.
- [00:50:13.210]We've now gotten really good at getting DNA
- [00:50:14.890]out of old bones that have been sitting in cold caves
- [00:50:17.270]for a long time.
- [00:50:18.460]And these are two branches on the human tree
- [00:50:22.130]that left Africa many hundreds of thousands of years
- [00:50:26.320]before the last exodus from Africa.
- [00:50:29.630]And they colonized Neanderthals, colonized Europe and Asia,
- [00:50:33.230]the Denisovans went through Asia
- [00:50:34.860]and got all the way to New Guinea.
- [00:50:37.440]And from fossils, we've known Neanderthals for a while,
- [00:50:40.490]and we thought culturally they went extinct,
- [00:50:42.580]they had a particular culture that sort of vanishes
- [00:50:44.920]once anatomically modern humans moved in.
- [00:50:49.210]But it turns out that this sequencing has also revealed
- [00:50:51.930]that there were multiple introgression events,
- [00:50:54.410]that is there was an interbreeding between the lineage
- [00:50:57.670]that gave rise to modern humans and the lineage
- [00:51:00.320]that gave rise to Neanderthals and Denisovans.
- [00:51:03.240]And we know that this happened after we left Africa
- [00:51:06.300]in Europe, because individuals for
- [00:51:09.070]that have European and Asian ancestry, we find
- [00:51:11.290]on average one to 4% of their DNA
- [00:51:14.170]can be traced to a Neanderthal ancestor,
- [00:51:16.500]but that is very, very small or absent
- [00:51:19.230]among individuals with African ancestry.
- [00:51:21.540]So this interbreeding happened outside of Europe
- [00:51:24.240]after we had exited Europe.
- [00:51:26.800]Similarly, we see evidence for three to 5% of inheritance
- [00:51:31.030]from individuals who live in Melanesian Australia
- [00:51:34.280]that is attributable to the Denisovan lineage.
- [00:51:38.130]Okay, so obviously we haven't just been
- [00:51:41.520]sequencing humans this whole time.
- [00:51:43.750]We've been sequencing DNA
- [00:51:44.950]from a whole bunch of other things.
- [00:51:46.470]And I'd like to finish by pointing out
- [00:51:48.200]that one of the things that's revealed is,
- [00:51:50.160]again, sort of how similar we all are.
- [00:51:52.980]So this is the amino acid sequence,
- [00:51:56.350]so our DNA encodes information to make proteins
- [00:52:00.130]and proteins have 20 letters instead of four.
- [00:52:02.940]This is a protein called cytochrome c.
- [00:52:06.790]And it is the cytochrome c, two copies of it from mouse,
- [00:52:11.010]one from horse, one from human,
- [00:52:12.480]and then one from budding yeast,
- [00:52:14.690]so the organism that makes us beer and bread.
- [00:52:17.010]And you can see quite clearly in this alignment
- [00:52:20.060]and many of these positions in the protein are the same.
- [00:52:23.340]And that's because this is the same gene.
- [00:52:26.260]So this gene existed in the common ancestor of us
- [00:52:30.030]and horses and mice and yeast about a billion years ago,
- [00:52:34.410]and it did the same thing that it does today.
- [00:52:37.330]It functions in the mitochondria,
- [00:52:38.650]which you recall is this glowing firestorm thing
- [00:52:41.410]in ourselves that gives us energy.
- [00:52:43.670]And so over a billion years,
- [00:52:46.200]this gene has been doing the same thing.
- [00:52:48.900]It's been functioning in oxidative respiration,
- [00:52:51.430]powering these cells, and so
- [00:52:54.910]maybe one thing to take away from all of this DNA sequence
- [00:52:58.030]is how similar we are not just to each other,
- [00:53:00.900]but to all other organisms.
- [00:53:02.270]So I'll just leave with this picture of my favorite
- [00:53:05.380]organism, well, my two favorite organisms,
- [00:53:09.650]which are, you know, share about 50% of their genes,
- [00:53:12.700]and every day we're finding out are more similar
- [00:53:15.000]to each other than we previously thought.
- [00:53:17.600]And so with that, I will end.
- [00:53:19.994](audience applauding)
- [00:53:29.500]So it seems like you're afraid
- [00:53:32.460]of the police coming in and using this kind of data.
- [00:53:37.710]But what is your, it seemed like you were shying away
- [00:53:41.290]from maybe what you perceive could happen.
- [00:53:44.700]What is your worst fear with--
- [00:53:48.250]Yeah, I was just sort of thinking about...
- [00:53:54.714]I will say, I don't know.
- [00:53:57.220]I have not yet, I think, imagined
- [00:54:01.420]the worst thing that could be done.
- [00:54:04.540]Certainly,
- [00:54:08.665]I guess one perspective would be, well, so what?
- [00:54:12.320]Maybe it doesn't matter if everyone knows
- [00:54:14.320]what my DNA sequence is.
- [00:54:16.530]I think in a world where,
- [00:54:21.310]that would be great if that didn't matter.
- [00:54:24.860]But I can imagine lots of nefarious purposes.
- [00:54:29.010]So for example, I can imagine
- [00:54:32.810]our current government deciding that
- [00:54:35.100]the DNA that they are taking forcibly from immigrants
- [00:54:39.750]and putting into a database reveals alleles
- [00:54:43.340]that are associated with unlawfulness,
- [00:54:46.500]which is garbage,
- [00:54:48.010]but you could imagine them making that claim.
- [00:54:51.120]You could imagine insurance companies
- [00:54:53.460]harvesting DNA information to try and put you
- [00:54:56.200]into certain risk groups.
- [00:54:58.400]Aha, you've got this APOL1 allele,
- [00:55:01.110]so we know that you're more likely to develop
- [00:55:03.440]hypertension, so your premiums are gonna go up.
- [00:55:06.650]So, I guess you could imagine
- [00:55:13.720]people discovering, yeah, I mean, I guess
- [00:55:17.480]I haven't imagined all of the bad things that could happen,
- [00:55:20.960]and I'm not sure what they are yet.
- [00:55:23.940]But I think it's maybe naive to think that
- [00:55:26.130]someone won't find out a way to misuse this stuff.
- [00:55:29.820]Thank you.
- [00:55:36.421](audience member speaking faintly)
- [00:55:56.090]I mean, there are many positives.
- [00:55:59.190]Certainly, genetic medicine is real.
- [00:56:03.250]And there are definitely,
- [00:56:08.970]people are taking an approach
- [00:56:11.780]that's sort of been used in animal breeding for a while now,
- [00:56:14.770]to generate what are called polygenic risk scores.
- [00:56:16.980]So this is the idea that for a trait that's like height
- [00:56:19.430]that's controlled by lots of genes all over the genome,
- [00:56:22.520]you just sequence someone's genome,
- [00:56:24.910]figure out all of their polymorphisms added up,
- [00:56:27.840]and determine their risk score for a particular trait.
- [00:56:31.430]And it seems like for some common
- [00:56:35.030]medical conditions like hypertension,
- [00:56:37.370]like cholesterol level,
- [00:56:39.950]polygenic risk scores work about as well
- [00:56:42.690]as taking biomarkers, right?
- [00:56:44.700]So if you go to the doctor
- [00:56:46.460]and they take your cholesterol levels,
- [00:56:48.030]if it's above a certain level, they may prescribe you
- [00:56:51.724]a statin or some other medication.
- [00:56:54.580]And it seems like if instead of taking that blood
- [00:56:58.630]and doing that test, if they instead
- [00:57:00.480]just looked at your genome,
- [00:57:01.630]in some cases, that would be about as good.
- [00:57:04.590]So that may be a real boon for predicting risk
- [00:57:09.130]and for managing different propensities for diseases.
- [00:57:12.740]I mean, it's still the case that for all of these
- [00:57:14.970]conditions, you can look at your risk score
- [00:57:18.400]or you can eat less bacon, and like it still works,
- [00:57:22.120]it doesn't matter what your genes are,
- [00:57:23.980]the behavior will change your risk.
- [00:57:27.020]Certainly, I don't wanna pretend that
- [00:57:31.240]the use of forensics is never appropriate
- [00:57:34.670]in law enforcement, I mean,
- [00:57:37.740]since the Golden State Killer, there are 70 cold cases
- [00:57:40.820]that have been solved with the use
- [00:57:42.680]of genetic genealogy detective work.
- [00:57:46.180]So there are real upsides, and I think that's the challenge.
- [00:57:49.700]If there were no upsides, it would sort of be easy
- [00:57:52.860]to say, all right,
- [00:57:56.875]we want to protect privacy,
- [00:57:58.590]we want to focus on stuff other than DNA.
- [00:58:01.780]But the challenges is that there are,
- [00:58:03.870]it is so powerful in the medical realm,
- [00:58:06.960]it is so powerful in the forensic realm.
- [00:58:09.940]And so I think that leads to
- [00:58:14.010]a danger that we're rushing into stuff
- [00:58:16.190]before we have all the safeguards in place.
- [00:58:22.240](audience member speaking faintly)
- [00:58:36.430]I would say for most simple genetic diseases,
- [00:58:40.230]it's been done.
- [00:58:41.570]And a lot of it was done before
- [00:58:43.490]the advent of these association studies,
- [00:58:45.290]it was done with old fashioned linkage mapping
- [00:58:47.770]where you would use pedigrees
- [00:58:50.100]and you had like one marker per chromosome,
- [00:58:54.390]so most of those, we've already identified those genes.
- [00:58:58.950]I think in my limited understanding,
- [00:59:02.300]treatment is gonna be specific
- [00:59:04.450]for each gene, each disease.
- [00:59:09.380]So for cystic fibrosis, this triple drug cocktail,
- [00:59:15.320]one of the drugs improves the
- [00:59:19.250]movement of the protein to the membrane.
- [00:59:22.870]So, cystic fibrosis, I should say,
- [00:59:27.000]there's one mutation that's super common.
- [00:59:28.920]So, the majority of cases of cystic fibrosis
- [00:59:31.900]are because individuals have at least one copy
- [00:59:33.620]of this one mutation, but there are lots of other mutations,
- [00:59:36.260]and this drug treatment is ideal for the common mutation.
- [00:59:40.180]So if an individual has cystic fibrosis
- [00:59:41.780]because of other mutations, it may or may not work.
- [00:59:43.890]That particular mutation affects the ability of the protein
- [00:59:47.290]to get to the membrane,
- [00:59:48.520]so one of the drugs helps it do that.
- [00:59:50.680]And then another drug helps it
- [00:59:52.910]be more active when it's there.
- [00:59:55.520]So those are really specific to that protein,
- [00:59:58.800]in fact, that mutation for that protein.
- [01:00:02.290]So it's that question, one question would be
- [01:00:05.850]for all of these diseases and all of these alleles,
- [01:00:08.730]do we want to spend all of the money and effort,
- [01:00:12.030]is that the right approach, is that an efficient approach?
- [01:00:16.040]Because it took 30 years to get from discovery of that gene,
- [01:00:20.110]which is super genetically simple, to a treatment.
- [01:00:24.740]I don't off the top of my head
- [01:00:28.570]know a lot of details about other success stories.
- [01:00:33.260]But I think the answer is that it's,
- [01:00:35.230]you know, that's a lot of hard work,
- [01:00:37.180]and it certainly can be worth doing,
- [01:00:39.460]the individuals who have cystic fibrosis,
- [01:00:41.170]their lives may be transformed by this.
- [01:00:43.690]But it often is a lot of hard work
- [01:00:46.090]to go from sequence to treatment.
- [01:00:54.119]What are you finding out about junk DNA?
- [01:01:00.500]It's not junk.
- [01:01:04.070]So junk DNA is a term that I said, what did I say?
- [01:01:07.900]Most polymorphisms don't do anything,
- [01:01:10.040]and that's because most of the bases in our genome
- [01:01:13.260]don't do anything obvious.
- [01:01:15.720]So in our genome, about one to 2% of those bases
- [01:01:19.040]encodes genes, and the rest of it doesn't.
- [01:01:23.310]And so that rest of it was referred to as junk DNA,
- [01:01:26.440]so the sequences that are in between genes,
- [01:01:29.340]the sequences at the ends of the chromosomes,
- [01:01:31.312]in the middle of chromosomes.
- [01:01:33.620]And what we're learning is that,
- [01:01:36.320]yeah, they don't encode proteins,
- [01:01:38.770]and maybe they're not expressed, they don't function,
- [01:01:42.940]but they can have other roles, they can do other things,
- [01:01:48.220]and they can be important
- [01:01:52.010]in scaffolding chromosomes in the nucleus,
- [01:01:56.300]they can be important in turning nearby genes on and off,
- [01:02:00.190]so you know, there definitely is,
- [01:02:02.730]it is true that a majority of the nucleotides
- [01:02:06.240]in the human genome, if you changed any one of them
- [01:02:09.310]from what it is to one of the other three bases,
- [01:02:11.710]it would have no effect.
- [01:02:13.920]But some of these things, there's function
- [01:02:16.380]where we didn't think there was before.
- [01:02:18.090]And some of these things may have function
- [01:02:19.860]not as a single base, but in aggregate.
- [01:02:22.780]It can make a difference if you have, you know,
- [01:02:24.720]a lot of our genome is very repetitive,
- [01:02:26.220]it's the same sequence again and again and again.
- [01:02:28.240]And it can make a big difference if you have two copies
- [01:02:30.650]of that or five copies of that or 500 copies of that.
- [01:02:33.750]So I think there's a lot still to be learned about
- [01:02:37.830]the function of all of these bases,
- [01:02:40.550]particularly the ones that aren't involved
- [01:02:42.210]in coding for proteins.
- [01:02:48.001]So it's a continued challenge in science
- [01:02:50.649]to communicate to those who are
- [01:02:53.176]maybe on the outside of science,
- [01:02:54.831]with the current trends in genetics,
- [01:02:56.059]do you see that challenge becoming more complex
- [01:02:58.677]or less complex or somewhere in the middle?
- [01:03:01.640]Oh, I think it's becoming more complex.
- [01:03:03.290]But I have seen, I mean, there's been this
- [01:03:07.210]explosion in genetic counseling programs.
- [01:03:11.310]So this is medical providers whose job it is
- [01:03:14.700]to interpret genetic information for patients.
- [01:03:18.730]And so that suggests to me that
- [01:03:21.830]there is a new or a growing demand for
- [01:03:26.230]that kind of communication
- [01:03:27.760]explicitly in the medical community.
- [01:03:31.260]So I think it's getting harder,
- [01:03:34.050]and I think, I didn't talk much about this,
- [01:03:36.040]I think one of the reasons it's getting harder
- [01:03:37.410]is that there's a lot of people
- [01:03:39.230]trying to make a quick buck on this, right?
- [01:03:41.380]So you can take your ancestry results,
- [01:03:43.480]so your 23andMe results, and for a small fee,
- [01:03:46.970]you can send those to Nutrigenomix
- [01:03:49.500]which will optimize your diet
- [01:03:51.350]to your particular genetic profile.
- [01:03:53.440]Or you could just eat healthy.
- [01:03:55.620]You can send it to Soccer Genomics,
- [01:03:58.470]and they will provide a training regimen
- [01:04:02.230]specified for your genotype,
- [01:04:04.070]or you could just go out there and do the sprints.
- [01:04:06.850]So I think that it's not gonna get simpler,
- [01:04:11.700]both because the biology is complex,
- [01:04:14.320]and because there's a sucker born every minute
- [01:04:17.140]and there's someone at the other end of that
- [01:04:18.730]standing to make a buck.
- [01:04:26.036]Thank you so much for coming, have a good night.
- [01:04:28.175](audience applauding)
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/11711?format=iframe&autoplay=0" title="Video Player: Gene Machines: Navigating a World Enthralled by DNA" allowfullscreen ></iframe> </div>
Comments
0 Comments