James Bovaird: “Adaptation in the Social, Behavioral and Education Sciences”

Nebraska Center for Research on Children, Youth, Families and Schools Author

12/19/2018 Added

16 Plays

Description

2018 Methodology Applications Series James Bovaird Director, Nebraska Academy for Methodology, Analytics and Psychometrics This presentation will introduce three broad categories of adaptive methodologies that have been, are becoming, or should become familiar to researchers in the social, behavioral and education sciences.

Searchable Transcript

Search:

[00:00:00.000](upbeat instrumental music)
[00:00:06.250]Thank you very much.
[00:00:07.310]I am Jim Bovaird, I'm the the Director of the MAP Academy,
[00:00:10.430]of the Nebraska Academy
[00:00:11.620]for Methodology Analytics and Psychometrics.
[00:00:13.990]This is the first installment
[00:00:15.720]over 2018-2019 Methodology Application Series,
[00:00:20.340]sometimes just referred to as the MAP apps so.
[00:00:25.039]Some familiar faces in the room, some unfamiliar faces.
[00:00:29.000]I've seen some of you in multiple contexts.
[00:00:31.770]Maybe I will see some of you in other contexts
[00:00:34.570]over the next couple semesters, couple years.
[00:00:38.330]I said, I'm the the Director of the MAP Academy.
[00:00:41.100]I'm also an Associate Professor
[00:00:42.690]in Department of Educational Psychology
[00:00:44.770]where I'm the Train Director for the Quantitative,
[00:00:47.010]Qualitative and Psychometric Methods Program
[00:00:50.240]and also a courtesy Associate Professor
[00:00:52.430]in the SRAM program as well.
[00:00:54.470]A lot of my courses are cross listed with the SRAM program
[00:00:59.030]and have been over the years.
[00:01:00.530]So today, I'm going to talk to you about adaptation
[00:01:03.990]in social behavioral and educational sciences,
[00:01:06.990]implications for measurement, intervention and evaluation.
[00:01:10.760]And what I'm going to talk about today
[00:01:12.380]are essentially three areas
[00:01:16.610]that some may view it as distinct.
[00:01:19.150]But I'm gonna hopefully be successful
[00:01:21.610]in drawing the parallels between them
[00:01:23.900]and show you how they are related to each other.
[00:01:27.530]In fact, in a lot of ways, close cousins,
[00:01:30.950]maybe even step-siblings of each other.
[00:01:33.720]And how these methodologies that some view as being new
[00:01:39.090]and cutting-edge are actually over a century old,
[00:01:43.100]common in other disciplines
[00:01:45.560]and now making their way into the social sciences.
[00:01:49.449]Okay, so the idea of adaptive methodologies,
[00:01:52.650]it actually has been around for over 100 years,
[00:01:56.360]traced back to the early 1900s,
[00:01:58.290]with the advent of intelligence testing.
[00:02:02.130]The three broad categories
[00:02:03.140]I'm gonna talk to you about today,
[00:02:05.940]are adaptive testing, adaptive interventions
[00:02:09.050]and adaptive designs, okay?
[00:02:11.470]So we're gonna focus on measurement, intervention
[00:02:13.480]and evaluation.
[00:02:17.100]We'll start off talking about adaptive testing
[00:02:19.940]and judging by the fact that I'm just gonna assume
[00:02:23.710]a majority of you in the room are graduate students,
[00:02:26.930]and so you probably have taken a computerized adaptive test
[00:02:30.200]'cause you probably took the GRE
[00:02:31.610]to get here in the first place,
[00:02:33.680]we're gonna start off with adaptive testing
[00:02:35.600]to kind of set up some of the ideas
[00:02:38.930]of adaptive methodologies.
[00:02:42.620]The idea of an adaptive tests adapts the assessment tool
[00:02:45.670]to refine the measurement process in an effort
[00:02:49.960]to be a more efficient in the measurement process.
[00:02:53.810]Then I'm gonna move into adaptive interventions.
[00:02:55.940]This is the area that's, in some ways,
[00:02:58.560]viewed as the newest of the three, but it's essential.
[00:03:02.250]I'm going to try to show is
[00:03:04.090]that it's essentially adaptive testing
[00:03:05.910]applied to the intervention.
[00:03:08.840]So we're going to talk about in particular the SMART design,
[00:03:13.830]that's becoming quite popular over the last 10 15 years.
[00:03:17.420]And then I'm going to go back to the granddaddy,
[00:03:20.790]the area that started this in the first place,
[00:03:23.370]the adaptive design, the sequential design of experiments
[00:03:26.910]and show how those can be used for evaluating
[00:03:30.010]a adaptive intervention as an example, all right.
[00:03:35.870]So like I said, these have been around for over 100 years.
[00:03:39.370]The tradition really started back in the early 1900s
[00:03:42.480]with Alfred Binet in the development
[00:03:44.400]of intelligence testing and there around that time.
[00:03:50.040]The majority of the real meat of the work
[00:03:53.640]was done in the first 1/2 of the century,
[00:03:56.100]around the time of the World Wars.
[00:03:58.710]And you'll see in blue,
[00:03:59.840]and I highlighted those in particular,
[00:04:01.890]'cause you've probably seen those names
[00:04:03.430]attached to a number of statistics
[00:04:05.210]that you've learned about over the years
[00:04:06.610]in your statistics courses.
[00:04:09.190]In particular, the sequential probability ratio test
[00:04:12.300]developed in 1943 by Abraham Wald and his colleagues
[00:04:15.950]in the Statistical Research Group at Columbia,
[00:04:20.470]their sequential probability ratio test
[00:04:22.470]is the mechanism by which the adaptive test
[00:04:25.120]is possible right?
[00:04:27.560]This also launched this complementary field,
[00:04:29.420]the sequential analysis.
[00:04:33.180]Then there's a little bit of a gap.
[00:04:34.550]In the '60s, Armitage publishes his book
[00:04:37.600]and that book is what put sequentially designed experiments
[00:04:42.640]really on the map.
[00:04:44.110]And since that time, the sequential designed experiment
[00:04:47.510]versus what I'll later show you as a fixed design,
[00:04:50.950]the sequentially designed experiment is very prevalent
[00:04:54.170]in the biomedical fields, pharmaceutical fields,
[00:04:57.710]but hasn't made its way over into the social sciences.
[00:05:01.910]'80s saw the advent of computerized adaptive testing,
[00:05:05.190]although Binet's early testing programs
[00:05:08.620]were actually adaptive, they became much more widespread
[00:05:12.620]and feasible by the explosion of the computational abilities
[00:05:16.530]that we saw in the '80s
[00:05:18.407]and particularly the development of item response,
[00:05:21.140]the advancements of item response theory.
[00:05:24.490]And then here in the last 15, 10, 15, 20 years,
[00:05:29.440]we've seen advances and then taking some
[00:05:32.720]of this adaptive thinking
[00:05:34.840]and applying it to interventions themselves.
[00:05:39.950]All right, so start off with adaptive testing here.
[00:05:43.620]So like I said, I'm gonna make an assumption
[00:05:45.670]that most of you are young enough
[00:05:47.240]that you yeah, I took the pencil-and-paper version,
[00:05:50.970]okay, but most of you probably took the adaptive test,
[00:05:54.910]or at least a computerized version of the GRE,
[00:05:57.770]or whatever other entrance exams
[00:05:59.670]you might have taken for schooling.
[00:06:03.230]The idea of the adaptive test
[00:06:04.967]it starts off administering an item that is appropriate
[00:06:09.350]for what it assumes your ability level is.
[00:06:12.140]And so most of the time we just assume
[00:06:14.120]that your average, no offense okay?
[00:06:16.297]But we assume that your average okay?
[00:06:19.620]Based on that response to that question
[00:06:23.050]or that set of questions, then the test adapts.
[00:06:26.820]If you get it right, you get harder questions.
[00:06:28.880]If you don't get it right, you get easier questions.
[00:06:32.530]And because of item response theory,
[00:06:37.510]and here's an equation of a 2PL model here at the bottom.
[00:06:40.470]I apologize, it might be a little blurry for you,
[00:06:43.520]something show up better on the big screen,
[00:06:45.820]or on the little screen than they do on the large.
[00:06:49.250]But through the advent of item response theory,
[00:06:51.880]we were able to tag items based on their difficulty,
[00:06:54.810]or on some of the characteristics,
[00:06:56.650]and then those are used to determine
[00:06:59.350]what is a harder question or an easier question.
[00:07:04.590]Primary benefit of the adaptive test
[00:07:06.780]is that we can find the percent precision level,
[00:07:09.690]a fixed length test with savings is up,
[00:07:12.630]that's been reported up to as much as 50%.
[00:07:16.090]So an old 100-item test may be tailored
[00:07:20.730]to be a 50-item test.
[00:07:25.413]And we said this is made possible
[00:07:26.680]through item response theory.
[00:07:29.090]Item response theory is someone might refer to it
[00:07:31.700]as modern measurement versus classical test theory.
[00:07:34.630]It's a model-based approach to measurement.
[00:07:38.010]And in that,
[00:07:38.960]every item basically has a set of characteristics.
[00:07:41.610]Three most common are its difficulty,
[00:07:44.840]its discriminability and then a guessing parameter,
[00:07:49.410]its likelihood of guessing response.
[00:07:51.750]And we can represent the relationship
[00:07:53.690]between the difficulty of the item
[00:07:56.250]corresponding to the ability of a person
[00:07:58.230]and the likelihood on the y-axis there of a response,
[00:08:01.620]by what we refer to as an item response function.
[00:08:04.960]Okay, sometimes it's also referred
[00:08:06.480]to as item characteristic curve,
[00:08:08.210]but as multi-level modeling has become more popular,
[00:08:10.380]ICC has been more attached to multi-level modeling
[00:08:13.780]and IRT is resorted
[00:08:15.100]to the item response function terminology.
[00:08:18.320]And we can see these lines on the screen.
[00:08:21.410]They vary in terms of the slope at the inflection point.
[00:08:27.150]They vary in terms of where that inflection point is located
[00:08:31.180]left to right and they also vary on this lower asymptote.
[00:08:35.920]The slope is the discriminability,
[00:08:37.810]the location is the difficulty and this lower asymptote
[00:08:41.070]is potentially the guessing parameter.
[00:08:42.920]So this item would have a higher likelihood
[00:08:46.250]of a guessing response than some of the others.
[00:08:52.920]Now, based on that item response function,
[00:08:56.790]we also get what we could refer to as information
[00:08:59.380]about the item, the amount of information
[00:09:01.549]that response contributes to estimation
[00:09:03.900]of the latent trait okay?
[00:09:05.950]So there are three of those item response functions
[00:09:10.260]shown on this graph and three corresponding,
[00:09:12.160]one, two, three hills,
[00:09:16.530]kind of look like normal distributions.
[00:09:18.730]The location of these hill,
[00:09:20.920]these are the item information functions
[00:09:24.480]and they're located at that inflection point.
[00:09:27.640]The peak shows the amount of information
[00:09:30.650]that that item provides.
[00:09:32.660]So this first item on the left, that has the steepest slope,
[00:09:35.540]provides the most information.
[00:09:37.130]It has the strongest item total correlation,
[00:09:39.660]if you want to think classical test theory.
[00:09:42.520]The item on the right has the lowest slope,
[00:09:46.180]lowest discrimination, the lowest item total correlation,
[00:09:49.200]and so it provides the least overall information.
[00:09:54.370]Tests are generally constructed to maximize information.
[00:09:59.370]It's someplace at some point on the continuation
[00:10:03.090]so if there is a cut score, if you're trying to use the test
[00:10:05.690]to determine whether someone passes or fails,
[00:10:09.100]has mastered content or not,
[00:10:11.750]then the information should be located
[00:10:13.460]around where that cut point is.
[00:10:15.360]That means most of the items on the test
[00:10:17.350]should be appropriate for discriminating between individuals
[00:10:19.870]who are just above or just below the cut point.
[00:10:24.160]On a more general test,
[00:10:26.490]then the item discrimination function,
[00:10:29.530]which in this case would be the test information function,
[00:10:32.440]which is the sum
[00:10:33.273]of the individual item information functions,
[00:10:36.000]would look more like a Southwest U.S. Plateau kind of thing,
[00:10:41.470]with a lot of information, but spread over a broader range.
[00:10:46.800]So happens it can be thought of as the inverse
[00:10:49.400]of the standard error of measurement,
[00:10:51.170]which is the regular direction yield
[00:10:55.960]that's bolted in dash there.
[00:10:58.590]So, where there's less information,
[00:11:01.610]there's a larger standard error.
[00:11:03.530]Where information is maximized,
[00:11:05.530]the standard error of measurements is at the smallest.
[00:11:11.810]The idea of the adaptive test again,
[00:11:13.620]utilizing this information provide by item response theory,
[00:11:16.580]especially this concept
[00:11:17.700]of that standard error of measurement, that precision,
[00:11:23.440]that adapting the test allows us to have,
[00:11:26.390]all examines are administered
[00:11:27.580]that that modern difficult item, that average item,
[00:11:30.810]missing that item results on a lower ability estimate.
[00:11:34.950]Getting it right gives you a higher ability estimate.
[00:11:38.560]And another question or another set of questions follows.
[00:11:42.680]Using item response theory,
[00:11:43.940]because those items have characteristics,
[00:11:46.900]then we can estimate, we can adaptively as we go,
[00:11:51.390]estimate the individuals ability level
[00:11:53.730]after each item is administered
[00:11:55.100]or after each test slit or block of items,
[00:11:57.120]depending upon the design of the test.
[00:12:01.720]Testing continues until an algorithm
[00:12:04.260]that's based on that sequential probability ratio test
[00:12:07.200]that Wald came up with
[00:12:08.250]back in the first 1/2 of the 19th century,
[00:12:11.810]that algorithm identifies the difficulty
[00:12:14.230]with the which of the respondent will miss or get correct,
[00:12:17.770]depends on if you're a glass-half-full,
[00:12:19.510]or half-empty kind of person, 50% of the items.
[00:12:24.670]Information then for that individual,
[00:12:28.400]because those items are tailored around
[00:12:30.080]where that individual's point is,
[00:12:33.050]information on the test is maximized
[00:12:35.260]at that individuals level.
[00:12:37.410]Therefore, it's tailored.
[00:12:41.740]That's where the efficiency,
[00:12:42.710]that's where the 50% savings can come in.
[00:12:45.428]If it's a high ability person,
[00:12:46.530]why I ask them the easy questions?
[00:12:48.590]If there are low ability person,
[00:12:49.830]why I ask them the hard questions
[00:12:51.090]that you know they're gonna get incorrect?
[00:12:54.590]Efficiency.
[00:12:56.020]So stopping rules are based
[00:12:57.280]on the other logistical convention.
[00:12:59.000]So yeah, there's only so much time you can sit down
[00:13:01.210]and take the test and, if you're all over the place,
[00:13:02.970]they'll eventually stop you anyway.
[00:13:05.760]Or again, that sufficiently small standard error,
[00:13:08.540]that precision of the standard error of measurement.
[00:13:14.890]Just a diagram of how the branching works
[00:13:18.520]and in kind of the it's like a decision tree all right?
[00:13:21.930]So you start off with the first question.
[00:13:23.380]If you get it correct on the left,
[00:13:24.890]if you get it incorrect on the right,
[00:13:26.660]then you get a new question.
[00:13:27.800]And let's give them question 2A or 2B and it goes on.
[00:13:32.090]And two individuals can end up
[00:13:35.160]eventually getting the same question here, question 3B,
[00:13:39.540]but there can be different pathways to reach that result.
[00:13:45.360]Okay.
[00:13:47.110]Familiar to most of you, if not all of you,
[00:13:49.600]'cause you lived it, you experienced it, all right.
[00:13:53.140]So this is the ideas that I just spoke about
[00:13:58.350]will carry through in the other two topics.
[00:14:00.060]This is enough measurement for today, okay.
[00:14:03.270]Sorry, Dr. Albano.
[00:14:06.910]All right, we're gonna move on
[00:14:07.800]to an adaptive intervention now,
[00:14:09.840]plus, I only have an hour so.
[00:14:13.360]Adaptive interventions.
[00:14:15.840]Adaptive interventions
[00:14:16.840]are gonna utilize individual variables
[00:14:19.460]to adapt the intervention.
[00:14:22.060]So instead of using item characteristics
[00:14:24.030]to adapt to the test,
[00:14:25.313]it's going to use individual characteristics
[00:14:27.350]now to adapt the intervention
[00:14:29.820]and then utilize individual outcomes to readapt.
[00:14:33.500]So just like items are reassigned
[00:14:35.450]based on the previous response,
[00:14:37.690]the intervention may change
[00:14:39.670]based on the participant's responsivity
[00:14:42.340]to the initial intervention.
[00:14:46.200]So the type or the dosage of the intervention,
[00:14:48.240]what level of the intervention they receive,
[00:14:51.150]or is offered to patients is gonna be individualized
[00:14:53.960]based on their characteristics,
[00:14:56.180]or their clinical presentation,
[00:14:57.520]their responsivity to the intervention,
[00:14:59.507]and we call these tailoring variables.
[00:15:01.990]So something's being measured along the way,
[00:15:04.550]progress monitoring.
[00:15:07.730]And then they'll be readjusted,
[00:15:09.960]just like in that adaptive test.
[00:15:11.490]It's a multi-stage process.
[00:15:16.720]You're going to operationalize the process
[00:15:19.230]by a set of decision rules.
[00:15:20.800]So at specific points in the intervention,
[00:15:24.560]something's going to be assessed,
[00:15:25.740]a decision is going to be made
[00:15:27.830]and something may happen as a result.
[00:15:30.300]Something may be keep going,
[00:15:32.780]something may be a change up or down.
[00:15:36.210]So recommendations can be based on those characteristics,
[00:15:38.540]plus the data monitoring, plus the data,
[00:15:43.490]as it accumulates.
[00:15:44.890]Several different synonymous, fairly synonymous,
[00:15:47.190]terms that are used in the literature
[00:15:50.240]and have been used over the years.
[00:15:52.970]Four key elements to this adaptive intervention idea.
[00:15:56.800]There has to be a sequence of critical decisions.
[00:16:01.170]What intervention, what dosage do they get first?
[00:16:05.825]What intervention or dosage do they change to
[00:16:08.660]if the initial one is unsuccessful,
[00:16:10.800]or if some change needs to be made for other reasons?
[00:16:15.180]Set of then possible options to change to, right?
[00:16:20.610]And all this should, of course,
[00:16:22.450]be determined a priori right, not an on-the-fly situation.
[00:16:29.350]A set of those tailoring variables
[00:16:31.860]that are going to be used to pinpoint
[00:16:33.270]when the intervention should be changed or continued.
[00:16:39.540]And those might be based on again,
[00:16:41.250]non-response, lack of adherence unfortunate side effects,
[00:16:46.220]other contextual information.
[00:16:50.960]And then there's that set set of decision rules,
[00:16:52.820]one rule per critical decisions.
[00:16:55.610]So you have those decision points
[00:16:57.423]that have to be a set of rules that govern
[00:17:00.720]what happens at that decision point.
[00:17:04.010]So if then, if the tailoring variable equals X,
[00:17:07.830]then Y happens as a result.
[00:17:13.500]The statistical analysis for adaptive intervention
[00:17:15.660]is essentially the same as for fixed interventions,
[00:17:18.560]you just have more elements to include, as predictors.
[00:17:23.180]Depends upon how the design, the eventual complete design
[00:17:27.240]after adaptation happens, may dictate how complicated
[00:17:32.270]your factorial ANOVA is going to be.
[00:17:34.950]But the principles, the modeling doesn't necessarily
[00:17:38.420]have to be any more complex than in a traditional,
[00:17:42.690]more fixed design, okay?
[00:17:45.230]So things like sampling control groups,
[00:17:47.060]assignment to conditions, etc,
[00:17:50.670]research questions are essentially the same.
[00:17:56.080]Random assignment is key if you're going to maintain
[00:18:00.290]your causal inference.
[00:18:03.140]And the replicability of this is going to be linked
[00:18:05.750]to now fidelity of implementation of the intervention,
[00:18:09.350]especially adherence to the decision rules.
[00:18:15.000]Not that the rules can't be fuzzy,
[00:18:17.300]but the fuzzier they are,
[00:18:19.790]if the tailoring variables got a cut score of 50,
[00:18:23.500]ah, but I really wanna give that 49 the next level,
[00:18:26.680]you know, that's the fuzziness.
[00:18:28.190]And the more fuzzy it gets,
[00:18:33.329]the odds are of a lack of replicability,
[00:18:36.240]blah, I can't talk today, replicability go up.
[00:18:41.710]So I if consider this,
[00:18:44.350]and participants typically have variable responsiveness
[00:18:47.820]to treatments and interventions,
[00:18:49.450]the effectiveness isn't always consistent.
[00:18:53.780]Now how many times have you taken medicine,
[00:18:55.300]you feel good the first day and then you slide back in,
[00:18:58.740]and so maybe you self-up your dosage,
[00:19:03.610]or drop your dosage?
[00:19:07.170]Emerging characteristics potential for a relapse.
[00:19:10.810]Sometimes these interventions are expensive.
[00:19:13.270]And so why continue the high cost
[00:19:16.210]high resource allocation intervention
[00:19:18.370]when something less expensive, less intrusive,
[00:19:22.800]might be working, so ethical considerations,
[00:19:25.370]as well as financial.
[00:19:26.850]And just adherence is always, always a challenge.
[00:19:32.670]So response to intervention,
[00:19:35.640]especially in the educational settings,
[00:19:37.110]is a fairly commonplace example
[00:19:40.070]of a type of adaptive design.
[00:19:43.560]So use this as a little more concrete example.
[00:19:46.330]It's an approach to academic interventions
[00:19:48.780]that is reported early, systematic and/or appropriate
[00:19:53.180]grade or age-level standards.
[00:19:56.470]We're gonna promote, (sniggers) I lost a U
[00:19:58.950]and somehow spell check didn't catch it right there.
[00:20:01.320]Promotes academic success through universal screening,
[00:20:04.350]no early intervention, frequent progress monitoring.
[00:20:08.690]But then intensive research-based instruction,
[00:20:11.520]data-driven decision-making
[00:20:16.370]to guide interventions for those students who struggle.
[00:20:19.510]We call it,
[00:20:20.343]this isn't multi-level in terms of the modeling per se,
[00:20:23.570]but also sometimes referred to as multi-tiered.
[00:20:26.810]So multi-level approach is adjusted and modified,
[00:20:29.770]adapted as needed.
[00:20:32.436]Some can consider this a special case
[00:20:34.810]of the multi-tiered system of support,
[00:20:36.340]which is a more broad umbrella concept
[00:20:40.440]that encompasses both RTI and PBIS.
[00:20:48.120]So you can think of it as a pyramid with tiers.
[00:20:51.050]Tier one, it's going to be
[00:20:52.340]the research-based core instruction,
[00:20:53.950]most commonly used strategies.
[00:20:55.800]If those don't work for some individuals,
[00:20:58.490]then they get tier two.
[00:21:00.340]Tier two will be the middle level here.
[00:21:02.660]The secondary level interventions become more intensive
[00:21:05.760]because the students were considered to be at greater risk.
[00:21:07.950]They weren't as responsive to the business-as-usual.
[00:21:11.280]Third tier, then, is that intensive that top.
[00:21:14.360]Only those who really need it
[00:21:16.230]are going to receive that more intense and consistent.
[00:21:24.620]So again, the primary characteristics
[00:21:27.050]are screening all right, so testing,
[00:21:30.190]so that we have some empirical information.
[00:21:34.790]Cut points then attach to that screening.
[00:21:38.000]So think of these as tailoring variables and a cut point
[00:21:43.460]for decision-making.
[00:21:46.060]Those decisions when made are data-based right,
[00:21:49.540]not SQL database but evidence-based.
[00:21:55.660]And that determines what they move on to next, if necessary,
[00:22:00.210]or maybe they just stay.
[00:22:03.820]And there's continuous monitoring.
[00:22:07.000]So that's that fidelity piece.
[00:22:13.757]It's a more common, no shouldn't say more common,
[00:22:18.230]the hot, the sexy, (student chuckling)
[00:22:23.810]the key keyword, adaptive intervention
[00:22:29.480]that is coming onto our radar here in the last 10 years,
[00:22:33.620]is the idea of a SMART,
[00:22:35.680]a Sequential Multiple Assignment Randomized Trial.
[00:22:40.240]So basically it's a protocolized version
[00:22:42.930]of an adaptive intervention.
[00:22:47.280]And this idea of a SMART or these just in general,
[00:22:50.540]just more protocolized, there's a protocol, a set of rules
[00:22:54.780]that you follow just like any experimental design.
[00:22:59.210]This type of thinking is used to develop
[00:23:03.870]an adaptive intervention,
[00:23:05.630]not evaluate whether it works or not,
[00:23:07.940]but to develop it.
[00:23:09.070]So maybe a SMART design could be used to develop a RTI,
[00:23:15.580]a Response To Intervention program in a certain setting
[00:23:19.950]and do so empirically with some degree
[00:23:22.210]of the TOEFL of causality plot.
[00:23:26.820]It involves multiple intervention stages.
[00:23:31.160]Each stage corresponds, then,
[00:23:32.520]with that decision that has to be made
[00:23:34.880]in that adaptive intervention process.
[00:23:37.220]Each participant moves through the stages,
[00:23:39.870]so a decision is made.
[00:23:42.070]The difference here between,
[00:23:45.180]say the response to intervention,
[00:23:47.700]where those who need it get the next level,
[00:23:51.140]is that, in this case, there's a random assignment process
[00:23:55.960]that's tied to it and that's where we tie back
[00:23:57.880]into the inference of causality.
[00:24:01.380]So at each stage,
[00:24:02.213]participants are randomly reassigned
[00:24:04.010]to one of the several intervention options.
[00:24:06.320]So still have a number of options
[00:24:07.890]that are determined a priori,
[00:24:13.290]but now, instead of those who need it move up,
[00:24:17.810]is that those who the intervention isn't working,
[00:24:20.640]let's randomly assign to try some potentially something new
[00:24:24.930]and evaluate whether that works.
[00:24:27.740]That's where again,
[00:24:28.573]we preserve that potential for a causal inference.
[00:24:32.580]Okay, we use it to develop the adaptive intervention
[00:24:35.510]rather than evaluate whether it's better than control.
[00:24:38.490]That doesn't say we can't have a control condition within it
[00:24:41.530]as one of the options,
[00:24:44.360]but it's not necessarily in itself an RCT.
[00:24:49.192]And, as I said, it could be used to develop
[00:24:50.750]those tiered kind of interventions.
[00:24:53.890]Once a SMART is conducted,
[00:24:55.970]can be followed up by that randomized trial
[00:24:58.170]to evaluate whether of all the pathways,
[00:25:00.800]of all the options through the adaptive intervention,
[00:25:04.950]do those if you follow this pathway,
[00:25:07.370]does that work better than a business-as-usual?
[00:25:11.700]So an example here, and this is a general statements,
[00:25:16.850]multiple stages of the stage zero.
[00:25:19.280]So before you start
[00:25:20.490]you're going to determine these intervention options.
[00:25:23.650]You're gonna rank them in order of increasing intensity,
[00:25:26.916]okay, and/or scope.
[00:25:28.350]So we'll just call them A, B, C and D.
[00:25:30.610]Stage One, participants are going to be randomized
[00:25:32.790]to two or more possible intervention conditions.
[00:25:36.360]Okay, so somewhere on that continuum,
[00:25:38.310]that ranking continuum, probably in the middle somewhere.
[00:25:41.330]So if B is not working, you got somewhere to go down to.
[00:25:45.110]And if C isn't working as well,
[00:25:47.970]you got something to ramp up.
[00:25:51.730]After X period of time, so the amount of time
[00:25:54.210]that the intervention is allocated,
[00:25:56.520]then they're classified as being either responsive,
[00:25:59.320]the intervention worked or didn't.
[00:26:02.070]So across those two categories and we get to stage Two.
[00:26:05.270]The non-responders from Stage One are re-randomized,
[00:26:08.430]to two or more treatment conditions,
[00:26:10.010]and those who respond are re-randomized two conditions.
[00:26:17.660]They may be equal, just continue going, status quo,
[00:26:21.340]or if some change is necessary.
[00:26:24.170]So maybe they are in the higher condition,
[00:26:26.350]that's working really well,
[00:26:27.280]so maybe they actually get re-randomized
[00:26:28.780]to a less resource-heavy, a less intensive treatment,
[00:26:34.510]to see if that maintains the responsiveness as well.
[00:26:39.370]And then, depending upon how many interventions,
[00:26:41.320]how many branches you're gonna put in here,
[00:26:42.700]you may continue to repeat.
[00:26:48.410]So as an example, okay, so in the next slide
[00:26:51.703]I'll just show the branching.
[00:26:53.100]It'll look similar to the adaptive test.
[00:26:54.713]It's just oriented left to right instead of top to bottom.
[00:26:57.740]I do have to thank Dr. Kogel, because this is an example
[00:27:01.395]that's gonna be in her a forthcoming chapter
[00:27:05.290]that she was kind enough to lend me,
[00:27:08.766]and so I'm acknowledging her so that if you don't like it,
[00:27:11.980]it's her fault. (students laughing)
[00:27:15.770]So it's a potential SMART design
[00:27:17.130]for evaluating adaptations
[00:27:20.010]within a hypothetical homeschool consultation intervention.
[00:27:23.700]So targets parents with low engagement
[00:27:25.990]in the child's behavioral plan.
[00:27:28.920]So in this, there are four adaptive interventions
[00:27:31.010]that could be compared.
[00:27:33.350]And so that you'll see in the next slide
[00:27:35.540]some of the pathways here.
[00:27:36.890]So we're gonna start with in-person consultation
[00:27:40.262]in the first stage and you'll continue with that,
[00:27:44.210]with that in-person consultation
[00:27:46.000]if the parent is responding to the intervention,
[00:27:49.860]or they may be randomly assigned to intensify
[00:27:52.270]that in-person consultation, if they're not responding.
[00:27:59.390]Another branch, they can start
[00:28:00.400]with that in-person consultation.
[00:28:02.460]They can change to a distance-based consultation
[00:28:04.820]if the parent is responding, a less burdensome condition,
[00:28:12.430]or they could intensify to that in-person,
[00:28:14.890]if the parent, if that parents,
[00:28:17.600]they can continue with it, trying with it not responding.
[00:28:22.140]You can start with the distance-based consultation, right,
[00:28:25.670]and drop down to the in-person,
[00:28:28.080]if the distance pace wasn't enough.
[00:28:30.420]We can start with a distant-based
[00:28:32.170]and we can increase to the intensified
[00:28:38.420]if not efficient.
[00:28:41.390]So once you identify what the optimal intervention
[00:28:44.080]is going to be, you can test its efficacy
[00:28:46.500]by then comparing it to a business-as-usual or control.
[00:28:50.340]Yes.
[00:28:51.173]Hi, I just want a clarification.
[00:28:52.490]It sounds like on here that you're changing conditions
[00:28:56.070]based on the performance of the parents
[00:28:58.290]or the family, what have you,
[00:28:59.640]but in the previous slide, you said that non-responders
[00:29:02.090]and responders will be randomly assigned to another group
[00:29:04.940]so am I missing something?
[00:29:08.010]Yeah, so sorry, I'd probably just wasn't clear with this.
[00:29:12.160]All right, so initial randomization two conditions.
[00:29:15.170]One condition receiving
[00:29:16.240]the in-person homeschool consultation,
[00:29:18.760]the other group receiving the distance-based
[00:29:20.140]home school consultation.
[00:29:22.220]Within those who receive
[00:29:23.260]the in-person home school consultation,
[00:29:25.475]let's say the tailoring variable is their responsibility
[00:29:28.620]to that intervention.
[00:29:29.880]If they're responsive,
[00:29:31.160]then we can randomly re-randomly assign them
[00:29:34.520]to one of two conditions, either to continue
[00:29:37.340]with that in-person homeschool consultation,
[00:29:39.830]or randomly assigned
[00:29:41.340]to receive the distance-based homeschool consultation.
[00:29:44.410]If they weren't responsive,
[00:29:46.150]then they could be displaced
[00:29:50.896]into the intensified.
[00:29:53.290]There could also be a further randomization here as well
[00:29:57.430]into a new decision.
[00:29:59.370]The distance-based homeschool consultation,
[00:30:02.020]measuring the responsiveness.
[00:30:03.300]If, yes, then so they have the distance space,
[00:30:06.110]they can be randomly assigned to continue with it.
[00:30:08.890]The responsiveness, no, they can be randomly assigned,
[00:30:12.890]then go up to the in-person or to go to the intensified,
[00:30:16.610]one of the other two remaining conditions.
[00:30:20.430]So depends upon the clinical nature of it,
[00:30:23.410]whether a responsive,
[00:30:26.730]whether that necessarily needs to be
[00:30:30.170]re-randomly assigned okay?
[00:30:31.560]These are, again, these are the decision points
[00:30:33.650]is what happens at each of those stages.
[00:30:38.090]Does that help?
[00:30:38.923]Yeah, I probably wasted 30 seconds.
[00:30:40.640]No that's okay. (everyone laughing)
[00:30:41.910]And I'm sure I probably said something wrong.
[00:30:45.940]All right.
[00:30:47.970]So now is the segue into the last of the three areas
[00:30:53.090]and I'll acknowledge is the area
[00:30:54.650]that I am in know more about
[00:30:55.710]that I'm just personally more interested in,
[00:30:58.650]is the idea of the SMART
[00:31:00.850]versus a sequential or adapted design, okay?
[00:31:04.040]So adaptive evaluation.
[00:31:08.260]An adaptive experimental design,
[00:31:10.000]or evaluation versus a SMART,
[00:31:12.410]they're both multistage designs,
[00:31:14.280]they're both gonna be based on accumulating data
[00:31:17.557]that's going to be used to modify aspects of the study
[00:31:21.060]while preserving the validity
[00:31:22.450]and the integrity of the trial.
[00:31:24.870]Okay, so they have that in common.
[00:31:26.930]But the differences are, as often is the case,
[00:31:29.610]in the details.
[00:31:30.670]So when we're talking about stages.
[00:31:32.930]In the SMART design, the intervention states correspond
[00:31:35.640]with the critical care decision
[00:31:37.570]and a new intervention level.
[00:31:39.120]So the intervention is what was changing.
[00:31:41.940]In a sequentially designed experiment,
[00:31:45.070]the experimental stage is going to involve new participants.
[00:31:50.600]And the entry of new participants into the study,
[00:31:53.890]or new data into the study.
[00:31:57.770]In terms of the adaptation, in the SNART design,
[00:32:00.260]the intervention is what's adapted
[00:32:02.160]based on the response to the prior.
[00:32:05.030]In a sequential design, where we're going next,
[00:32:07.370]the randomization probabilities are adapted
[00:32:10.360]based on not how they perform prior to,
[00:32:14.300]but based on now the presence or the need, or not need,
[00:32:19.850]does that make sense,
[00:32:22.250]other data of other participants entering into the study.
[00:32:26.860]So once the SMART has developed,
[00:32:28.840]an adaptive design could be used to evaluate it.
[00:32:32.000]Now, to make the distinction
[00:32:33.350]between fixed and sequential designs.
[00:32:36.250]Fixed is what we're 99.9% of the time we've dealt with.
[00:32:41.290]You do an a priori power analysis.
[00:32:42.810]It tells you need 100 participants, you go and do it okay?
[00:32:47.090]In a fixed design, it's the typical one.
[00:32:49.630]Sample size, how many, who's in it,
[00:32:52.080]who's assigned to what condition
[00:32:54.950]is all determined prior to conducting the experiment.
[00:32:57.490]So it's fixed.
[00:33:00.090]Versus a sequential design, a sequential design,
[00:33:02.650]this is what Armitage made.
[00:33:04.540]This was based out of Wald's work
[00:33:07.270]back during the World Wars.
[00:33:08.830]This is what Armitage 1960s book
[00:33:11.300]put on the map and in biostats.
[00:33:14.300]Sample size is going to be treated as a random variable.
[00:33:17.320]It allows sequential interim analyses in decision-making.
[00:33:20.720]Now in terms of the effectiveness or the evaluation,
[00:33:24.300]or the difference, it's gonna be based on cumulative data
[00:33:28.710]and previous design decisions, but,
[00:33:31.040]so skip this for a second,
[00:33:34.010]be honest, and my hands already up,
[00:33:36.420]how many of you part-way through your study,
[00:33:39.090]have snooped, have peaked, have looked at the data, okay?
[00:33:43.500]Most of you are liars, okay? (students laughing)
[00:33:46.940]And don't worry, even though there's video evidence here,
[00:33:49.870]if I ever have you in class later on,
[00:33:51.760]I won't use it against you because I'm guilty okay?
[00:33:55.070]But basically this is a protocol for snooping,
[00:33:58.520]for looking at your data early in making decisions.
[00:34:02.640]But what you didn't do when you snooped
[00:34:05.630]is that you did preserve your error rates.
[00:34:08.950]This is a way to snoop to make early decisions if necessary,
[00:34:13.410]and if possible,
[00:34:15.890]but maintain the validity of your evaluation.
[00:34:21.560]These are also of sequential designs,
[00:34:23.300]also referred to as adaptive or flexible designs,
[00:34:26.610]where the current design decisions
[00:34:28.230]are sequentially selected
[00:34:29.330]according to previous design points.
[00:34:32.220]Again, the fixed design sample size and composition
[00:34:34.750]are determined a priori.
[00:34:36.450]In the sequential, it's going to be, its predetermined
[00:34:39.090]so it's considered random.
[00:34:43.670]A lot of times just, again,
[00:34:44.910]like an adaptive test eventually got to cut them off,
[00:34:48.870]we'll put a maximum sample size,
[00:34:50.320]which is usually pretty close
[00:34:52.210]to the fixed design sample size in the first place.
[00:34:56.460]So, just like an adaptive test,
[00:34:58.475]a fixed test might have 100 questions,
[00:35:01.530]but if you're very consistent in your responding,
[00:35:03.390]you might be done in 50.
[00:35:07.560]The overall fixed sample size might be 100.
[00:35:10.030]You might be able to make the same conclusion
[00:35:11.670]with 50 participants.
[00:35:16.490]Just using the same numbers I used before.
[00:35:18.100]Not saying that you're always gonna save 50%.
[00:35:20.480]Okay.
[00:35:21.730]It allows for early termination of experiments.
[00:35:23.850]If this cumulative evidence
[00:35:26.190]suggests that there either is a clear effect,
[00:35:28.980]or not clear evidence of no effect.
[00:35:33.870]So from ethical perspectives, this is really useful right,
[00:35:36.450]especially those of who are in any kind of clinical areas,
[00:35:39.490]'cause we don't want unnecessarily expose participants
[00:35:42.570]to something that either has no chance of working,
[00:35:46.050]or might be detrimental to them.
[00:35:49.400]And this is why the pharmaceutical industry likes it.
[00:35:52.870]If the evidences shows hey, this thing works,
[00:35:55.250]let's get to market, let's get it out there
[00:35:58.280]to the people who need it.
[00:35:59.880]Why wait for the, especially if there's a common criticism
[00:36:03.520]of academic work, we take too long
[00:36:05.850]to actually be responsive to policy needs.
[00:36:09.930]Let's get it out there to those who need it faster
[00:36:13.080]than the academics can.
[00:36:16.170]Logistical perspectives.
[00:36:17.370]We can save a lot of money potentially
[00:36:19.620]with our reduced sample sizes, okay?
[00:36:22.300]Both by failure (mic mutes)
[00:36:24.200]early termination for lack of effectiveness.
[00:36:28.580]Why do we gotta spend four years
[00:36:29.870]if the interventions not going to work
[00:36:31.890]if we can determine that early?
[00:36:33.550]Or rejecting similar savings can be observed
[00:36:37.040]if, why do we need four years
[00:36:39.950]to show this thing when it works,
[00:36:41.080]when we got a pretty good indication
[00:36:43.190]after year one or year two?
[00:36:47.600]Savings tend to be reported around the 10% range
[00:36:54.072]if you fail to reject non-evidence and as large as 50%
[00:36:58.670]if there really is something to find.
[00:37:02.810]All right, some characteristics here.
[00:37:04.700]There needs to be at least one interim analysis stage, okay?
[00:37:09.000]It's determined a priori.
[00:37:11.210]Again, it has to be protocolized.
[00:37:15.950]All the statistical details are determined.
[00:37:17.930]So how many of these stages, are you gonna have?
[00:37:19.700]You're gonna have one snoop or you're gonna have multiple.
[00:37:22.210]So if you're doing a four-year cohort study,
[00:37:24.390]you might decide to do it after every cohort completes.
[00:37:29.610]You're going to determine the critical values
[00:37:31.330]the boundary values, so your regions of rejection.
[00:37:35.060]So if we're going to use, good old z tests,
[00:37:37.860]plus or minus 1.96 for a two-tailed hypothesis,
[00:37:40.420]those are boundary values,
[00:37:42.100]they bound the region of rejection.
[00:37:45.330]We're going to determine the appropriate test statistic.
[00:37:47.090]Fisher information
[00:37:48.020]is just the inverse of the standard error.
[00:37:50.130]Okay, it gives again, this is based on
[00:37:52.310]just like in at the very beginning,
[00:37:54.550]with talking about the adaptive test
[00:37:55.697]and the idea of information,
[00:37:57.420]information is the inverse of the standard error
[00:37:59.850]of the precision more information, more precision,
[00:38:02.870]smaller standard error.
[00:38:05.440]Test statistic is then compared
[00:38:06.820]against those critical values.
[00:38:09.480]And this is where we're going to control
[00:38:11.210]our experimental-wise error rate.
[00:38:14.440]And we'll see in a moment 1.96 doesn't cut it
[00:38:17.330]if you're gonna make early decisions.
[00:38:20.720]And then we make a decision.
[00:38:22.730]That decision can be a combination
[00:38:25.064]of reject the null hypothesis, it's working,
[00:38:30.020]fail to reject is clearly not, or (groans) inconclusive,
[00:38:33.610]keep going.
[00:38:40.940]So the problem we run into
[00:38:42.450]is the problem of multiple comparisons
[00:38:44.070]and we're generally referred to as multiplicity.
[00:38:46.400]To those traditional fixed designs were said,
[00:38:48.380]you collect 100 participants
[00:38:49.620]and you evaluate it with a T or Z or whatever,
[00:38:52.410]at the end of it, and you use critical values of,
[00:38:54.550]plus or minus 1.96 for a two-tailed test right?
[00:38:58.860]Those don't work when you look at the data early,
[00:39:04.510]because otherwise they'll be inflated.
[00:39:06.960]Type-one error rate especially, will be inflated here.
[00:39:09.390]So about the same time Armitage put his book out,
[00:39:13.550]they showed also that alpha level, that nominal alpha,
[00:39:19.150]we like to control 0.05 with a Z test here,
[00:39:25.330]with just a two-stage sequential design.
[00:39:27.390]So, looking once, you inflated that alpha to 0.083,
[00:39:32.330]if you used the standard 1.96.
[00:39:39.460]If you use that for a five-stage,
[00:39:41.520]then it's jumped up to about almost 15%.
[00:39:48.581]I think we all agree that's not a good thing.
[00:39:52.670]So there are three general boundary method categories,
[00:39:55.380]fixed boundary, shape methods,
[00:39:56.213]and so we're going to talk about next,
[00:39:57.920]called Whitehead methods or error spending methods.
[00:40:05.780]Boundary values are similar to conventional critical values.
[00:40:08.580]They're set up for each stage
[00:40:13.010]and they are again,
[00:40:14.420]derived to maintain experiment-wise error rate control.
[00:40:21.040]Up to four-boundary levels,
[00:40:22.600]are going to be determined a priori.
[00:40:24.300]If it's a one tailed test, you're just gonna have two,
[00:40:28.590]fail to reject, reject.
[00:40:30.830]If it's a two-tailed test, you'll see it's times two,
[00:40:33.700]it's four.
[00:40:34.533]I'll show you some plots here in a moment.
[00:40:37.660]And we're gonna make a decision depending upon
[00:40:39.240]just like we learned in introductory stats.
[00:40:42.320]Design are look at our normal distribution,
[00:40:44.850]draw in our regions of rejection
[00:40:46.600]and plot where the result is.
[00:40:48.650]And we'll make a determination based on that.
[00:40:52.730]At the last stage then,
[00:40:53.990]boundaries of the region of rejection
[00:40:57.070]and acceptance, it should be fail-to-rejection region.
[00:41:02.120]The experiment stops and an eventual decision has to be made
[00:41:05.150]and there's no keep going.
[00:41:06.970]This is usually that logistic-capped sample size.
[00:41:18.650]All right, so the fixed boundary shape methods.
[00:41:20.870]There are two primary ones,
[00:41:22.030]the Pocock and the O'Brien Fleming.
[00:41:28.540]These are going to be used when they're equally spaced
[00:41:30.900]and the Pocock is to be used
[00:41:32.630]when equally spaced information levels
[00:41:34.300]derive a constant boundary value.
[00:41:36.400]So we want a flat level, but we have to adjust
[00:41:39.640]somewhere more extreme than let's say the 1.96.
[00:41:44.220]The nominal alpha level is usually smaller than desired
[00:41:47.400]at that alpha 0.05 level.
[00:41:49.600]So there's a penalty, there's a cost to some of this.
[00:41:53.330]And the overall design tends to be more conservative.
[00:41:57.590]The O'Brien Fleming method, instead of having a constant,
[00:42:00.270]will have a sloped.
[00:42:01.470]So the further on you get,
[00:42:03.690]the less the burden is to make a decision,
[00:42:07.770]but it was going to require a whole lot more evidence early
[00:42:12.730]to reject at those early stages.
[00:42:14.690]So it's initially really conservative
[00:42:16.770]but then, as the evidence accumulates, then it lightens up.
[00:42:22.140]So a five-stage design with O'Brien Fleming,
[00:42:25.820]initial critical value might be 4.56,
[00:42:28.785]a little bigger than 1.96.
[00:42:30.888]3.23, 2.63, 2.28 and then converging pretty close
[00:42:35.040]to where it would have been in a fixed design
[00:42:37.270]in the first place.
[00:42:42.860]Versus the Whitehead methods,
[00:42:44.240]they're appropriate for discrete monitoring.
[00:42:47.460]So usually after groups of participants complete the study.
[00:42:51.960]So instead of after say,
[00:42:53.600]individual end of person's complete.
[00:42:58.430]It'll have triangular or straight-line boundaries,
[00:43:00.600]depending upon some decisions that you make
[00:43:02.660]and it's just a little easier computationally
[00:43:05.890]than the fixed-boundary shape methods.
[00:43:09.980]But this still also controls the error rate
[00:43:12.930]as we'd like it to.
[00:43:14.820]Third category, or the error spending methods,
[00:43:17.310]it's kind of like getting an allowance.
[00:43:19.620]You can spin it all up front or you can save it
[00:43:22.100]and use it overtime here.
[00:43:26.210]So those are the hardest to do,
[00:43:27.710]especially when there are a lot of stages
[00:43:29.380]to a lot of things to spend it on.
[00:43:35.070]And the basically is a sign,
[00:43:37.110]or determinate spending function.
[00:43:39.180]So instead you'll see in the fixed shape plots
[00:43:42.970]here in a moment and with the next slide,
[00:43:45.780]it's all geometry.
[00:43:47.860]This is more calculus, okay, a set of clear shapes.
[00:43:54.320]All right, so here's an example
[00:43:55.830]of some one-sided boundary plots.
[00:43:57.590]Here in the upper left
[00:43:58.440]you see the three overlapped on each other.
[00:44:00.630]So you can see, I think a little more clearly,
[00:44:02.810]the distinction between those who have a flat level
[00:44:09.690]versus the, so this would be the Pocock.
[00:44:15.560]This is the O'Brien Fleming, and this is the Whitehead.
[00:44:19.250]Okay, so triangular.
[00:44:21.700]Constant, but further off.
[00:44:29.960]Initially strong evidence,
[00:44:32.730]but then actually it's, no that's,
[00:44:35.930]yeah, though O'Brien Fleming initially overwhelming
[00:44:38.440]and then coming in.
[00:44:41.730]One-tailed test, so early stage.
[00:44:44.550]So here, this is a example of the test statistic
[00:44:48.640]it's determined.
[00:44:49.473]If it was in this area, you might fail to reject.
[00:44:52.670]Its here, we reject, well keep going to the next stage.
[00:45:00.720]A two-tailed set of boundary plots
[00:45:04.850]and you see the Pocock with a straight level,
[00:45:07.520]the other two with the triangular shapes.
[00:45:11.430]And so eventually, you want the point to either move in
[00:45:16.780]or maintain a constant level
[00:45:18.800]so that it eventually crosses over into the boundary space
[00:45:21.260]as the boundary becomes less conservative.
[00:45:27.840]All right.
[00:45:29.210]So three general types of sequential designs:
[00:45:32.880]a fully sequential design, analogous to an adaptive test
[00:45:35.753]that adapts after every question,
[00:45:38.780]a group sequential design analogous to an adaptive test
[00:45:42.290]that adapts after a free test slit or block of items,
[00:45:46.590]and then a flexible sequential design,
[00:45:48.730]which is kind of a hybrid of the other two.
[00:45:52.570]They're going to differ based on sample recruitment
[00:45:54.490]and those decision-making criteria.
[00:45:57.810]The fully sequential design requires continuous monitoring
[00:46:00.900]updates after every observation
[00:46:02.550]or every person enters the study and completes the study.
[00:46:09.570]Those critical boundary values are going to change
[00:46:11.700]as the experiment progresses.
[00:46:15.240]It requires that the result is knowable quick.
[00:46:19.370]So the decision can be made before the next observation
[00:46:22.610]or the next participant enters the study.
[00:46:25.720]So an intervention that takes the long period of time
[00:46:29.990]doesn't really lend itself.
[00:46:32.100]So it's not a lot of our school-based interventions
[00:46:34.060]where we go and look for two years,
[00:46:36.420]watching it like say their transition to kindergarten,
[00:46:38.550]so we have a year of preschool and a year of kindergarten.
[00:46:41.530]They don't lend themselves necessarily
[00:46:43.550]to say, a fully sequential design,
[00:46:45.180]'cause we're not going to wait two years
[00:46:46.410]before we enroll the next person.
[00:46:53.230]This originated, this is specifically
[00:46:55.880]where Wald sequential probability ratio came into play.
[00:46:59.660]This is why I started off with the adaptive test
[00:47:02.130]and used that as the example,
[00:47:03.573]because the adaptive test branch
[00:47:05.440]exactly off of the fully sequential experimental design.
[00:47:10.350]So, as this all came in the World Wars,
[00:47:13.210]when they were trying to do evaluations
[00:47:15.450]of a factory production.
[00:47:18.060]So as bombs (chuckling) are coming off the production line
[00:47:22.120]and are being tested before being sent overseas,
[00:47:25.640]they wanted a way to be able to immediately determine
[00:47:33.346]but they're at their set of outcomes.
[00:47:37.050]All right.
[00:47:43.370]Versus the group sequential design,
[00:47:44.970]which is to be the more common just like it's,
[00:47:48.770]Dr. Albano could correct me if I'm wrong,
[00:47:51.080]I think more generally,
[00:47:52.110]there are
[00:47:55.362]adaptive tests tend to be more testily.
[00:47:57.150]They're block-based than after every item.
[00:48:00.110]Same way in the application of the group sequential designs,
[00:48:03.160]they tend to be, after certain percentage,
[00:48:06.610]or number of participants complete the study.
[00:48:10.760]That's what defines the stages,
[00:48:12.420]instead of after person one, person two, person three.
[00:48:17.430]It's analogous to a fully sequential design here
[00:48:20.270]except that now these boundary values
[00:48:23.060]are computed for determined numbers of spaced stages
[00:48:27.360]rather than for each participant.
[00:48:30.410]Conversely, the fully sequential
[00:48:32.330]can be viewed as a version of the group,
[00:48:35.070]just the group is of size one.
[00:48:38.720]It's gonna be more practical,
[00:48:44.110]although sometimes it can be controversial.
[00:48:47.170]Early termination after small sample sizes
[00:48:49.900]might be viewed as controversial.
[00:48:54.520]Generally recommend four to five interim stages for this.
[00:49:00.180]They found that a larger number of those interim analyses
[00:49:03.820]don't actually lead to extra benefits.
[00:49:10.290]Versus the flexible, or the fully adaptive,
[00:49:13.300]okay, this allows the modification during the experiment
[00:49:15.930]through an alpha spending function.
[00:49:19.820]So you can maybe not look very often in the initial stages,
[00:49:24.690]but as you're approaching certain milestones in the study
[00:49:27.470]look more often,
[00:49:29.870]but by looking more often,
[00:49:31.960]there's a more chance of inflation of the error rate,
[00:49:35.080]the multiplicity problem
[00:49:36.550]and so you have to spend your error,
[00:49:39.220]spend your alpha differently.
[00:49:46.220]'Cause we want to protect
[00:49:47.720]against potential researcher abuse right?
[00:49:55.530]It can be viewed generally as a compromise
[00:49:57.780]between the other two.
[00:50:01.800]All right, the benefits it may allows us to make use
[00:50:04.210]of the existing information at those interim stages
[00:50:09.010]and make these decisions.
[00:50:14.630]When a researcher has a limited amount of theoretical
[00:50:16.830]or empirical knowledge they would prevent,
[00:50:20.340]or otherwise negatively impact some decision-making.
[00:50:25.340]So optimal design of experiments,
[00:50:26.980]we can use this to help us, right?
[00:50:30.800]It's been proposed that these types of designs
[00:50:33.460]might be used for arguing for no-cost extensions
[00:50:38.470]in grant projects.
[00:50:41.500]You plan to get a certain number
[00:50:43.650]and either recruitment went bad or went slow
[00:50:46.810]and you're not quite to that sample size,
[00:50:49.810]you could utilize a reframe it as a sequential experiment
[00:50:54.490]to see okay, if you were to continue
[00:50:57.650]into that additional year or ask for the continuation funds,
[00:51:02.480]what is the likelihood of success given,
[00:51:04.670]what you've learned thus far.
[00:51:07.771]To the extent that that effect size is incorrect
[00:51:09.600]in the population, you did a really great pilot study.
[00:51:12.840]Your really great literature review meta-analysis, whatever,
[00:51:15.550]to design your study, but the file drawer effect
[00:51:20.550]hits you hard or has just a new context,
[00:51:23.790]a replication study and it's just not working out.
[00:51:27.540]You can use this to either okay, let's reorient, let's stop,
[00:51:35.530]or let's use it for justification for going on.
[00:51:47.930]Could also be justified as a benefit here.
[00:51:50.900]The magnitude of the treatment effect
[00:51:53.210]when it's not clearly known, right?
[00:51:58.120]Again, you think that you know what the effect size
[00:52:00.650]is going to be,
[00:52:01.483]but you need a little bit more empirical guidance.
[00:52:07.060]So there's discrepancy
[00:52:08.540]between a clinically meaningful effect
[00:52:10.620]and the observable effect, oftentimes two different things.
[00:52:19.330]All right.
[00:52:23.570]So sequential designs really lend themselves
[00:52:25.640]to the avenue of RCTs
[00:52:29.902]in the medical and pharmaceutical,
[00:52:32.020]in clinical settings right,
[00:52:34.730]because one, they are randomized experiment,
[00:52:36.610]which are clearly emphasized in these areas
[00:52:41.260]because of the causal inference.
[00:52:43.660]They're gonna be conducted according to a plan or a protocol
[00:52:47.370]to preserve the causal inference.
[00:52:51.100]A lot of clinical, pharmaceutical, biomedical context
[00:52:56.640]actually utilized sequentially recruited participants.
[00:53:01.110]A cancer study, a drug study doesn't wait
[00:53:03.200]until you have 100 people eligible for it.
[00:53:05.490]It's as the person presents, you treat.
[00:53:08.140]And then the next person who presents you go ahead and treat
[00:53:11.680]that's the sequentially recruited.
[00:53:13.670]It lends itself much more so
[00:53:15.930]than a lot of our school-based interventions
[00:53:17.530]where we get a panel of individuals
[00:53:19.310]and we typically follow them over a period of time,
[00:53:22.980]not a sequential process,
[00:53:26.040]especially when trial monitoring is necessary.
[00:53:30.700]So like in an adaptive intervention,
[00:53:34.030]when you're looking at responsivity
[00:53:35.250]to the initial intervention,
[00:53:36.600]and maybe a change needs to be made
[00:53:38.360]and adaptation and the intervention itself.
[00:53:42.220]So pairing that, you've adaptive intervention
[00:53:44.330]with adaptive evaluation.
[00:53:47.530]And particularly when the clinical outcome
[00:53:49.120]is considered irreversible,
[00:53:52.530]that fixed design is really not ethically acceptable
[00:53:57.600]in that context.
[00:54:03.060]In particularly with Phase II clinical trials,
[00:54:05.210]these flexible designs can be really useful.
[00:54:11.200]Phase II trials tend to be conducted
[00:54:13.110]on larger samples of participants,
[00:54:15.250]and so the smaller the initial trial
[00:54:17.510]may have found something,
[00:54:18.450]but a more generalized trial has a new set of issues.
[00:54:25.189]And it's been especially with the effect size
[00:54:27.040]tends to be smaller.
[00:54:27.873]Again, these adaptive designs allow you
[00:54:30.170]to maybe accommodate limitations.
[00:54:36.130]They are more complex.
[00:54:38.140]There are increased computational burdens
[00:54:39.810]because of those boundary values that need to be calculated
[00:54:43.880]under the control of experiment-wise error rate.
[00:54:46.790]There are threats of validity due to early termination.
[00:54:51.160]Some people just don't trust
[00:54:53.110]decisions made on small samples.
[00:54:55.370]Some of the statistics that you may use
[00:54:57.290]to determine whether the effect is significant or not,
[00:55:00.700]are based on asymptotic approaches like maximum likelihood,
[00:55:04.060]so mixed models, multi-level models,
[00:55:06.530]structural equation modeling, right,
[00:55:08.710]and those who generally are considered
[00:55:11.110]to require larger samples.
[00:55:13.770]So there's this statistical conclusion validity,
[00:55:16.870]so analytic problems.
[00:55:20.080]And especially in social sciences,
[00:55:22.480]are early terminations more complex
[00:55:24.370]than just a statistical criterion?
[00:55:27.210]It works a little easier in the pharmaceutical biomedical,
[00:55:30.120]'cause, what's the dependent variable?
[00:55:33.780]They're still alive right or it works,
[00:55:36.670]it lowers blood pressure,
[00:55:39.400]it has a clearly measurable outcome.
[00:55:44.760]In social sciences,
[00:55:45.890]we have a little bit more of a kitchen sink approach.
[00:55:49.180]We have a number of sub scores on it on a measurement.
[00:55:52.540]We have multiple constructs.
[00:55:54.600]So what if one construct shows early termination
[00:55:58.370]is reasonable but other constructs don't?
[00:56:03.150]It's a little more convoluted okay?
[00:56:05.750]So we need consistency across primary
[00:56:07.460]and secondary outcomes.
[00:56:08.580]Risk groups, especially now that we're really looking
[00:56:11.590]at issues of mediation and moderation.
[00:56:16.610]All right.
[00:56:17.443]So I'm gonna wrap up here with an example
[00:56:20.190]of a group sequential clinical trial.
[00:56:23.240]So I'm gonna talk about the group sequential
[00:56:25.290]here in particular,
[00:56:26.430]and walk you through a bit of a data example
[00:56:29.780]and then I'll wrap up and answer any questions.
[00:56:33.730]So six general steps to this group sequential trial.
[00:56:36.800]First one is going to be just specifying
[00:56:38.390]the statistical details of the design.
[00:56:41.930]So specifying your hypothesis, choosing a test statistic.
[00:56:45.640]Again, you can use whether it's a T-test ANOVA,
[00:56:48.210]a mixed model, survival analysis, whatever, okay?
[00:56:52.570]A test statistic is determined
[00:56:54.350]as appropriate for evaluating the efficacy.
[00:56:58.740]Error rates are determined usually as 0.5 and 0.2.
[00:57:03.250]Power 0.8 so one minus Beta, Beta is 0.2.
[00:57:07.740]Stopping criterion, the number of stages
[00:57:10.630]and then the relative information level at each stage.
[00:57:15.110]That's all tied into 0.2 computing the boundary values
[00:57:19.380]and the required sample size at each stage,
[00:57:21.380]based on the design specification,
[00:57:23.430]so essentially a power analysis.
[00:57:25.400]But taking this more complex system into consideration.
[00:57:29.720]At each stage then, additional data
[00:57:31.410]with required sample size is collected.
[00:57:33.500]At each stage the available data,
[00:57:35.690]that means what you just collected plus anything previous,
[00:57:39.590]are analyzed.
[00:57:41.730]Test statistic is computed.
[00:57:43.930]And then five, compare that test statistic
[00:57:46.220]with those boundary values.
[00:57:48.270]You make a decision trial stopped
[00:57:50.630]because you reject or fail to reject,
[00:57:52.770]or you continue to the next stage
[00:57:54.520]until you get to the end point
[00:57:58.550]where you must either reject or fail to reject.
[00:58:01.600]After it stops, then parameter estimates are computed
[00:58:05.910]confidence intervals, P-values,
[00:58:08.350]whatever statistical inference you're gonna make in the end.
[00:58:13.700]So the example is going to be based off of the CBC
[00:58:16.950]in the early grades study.
[00:58:19.130]CBC is Controlling Behavioral Consultation.
[00:58:21.720]It's essentially an intervention that gets parents
[00:58:24.320]and teachers on the same page
[00:58:26.170]in addressing problematic behaviors of children.
[00:58:29.670]So the idea of children's behavior usually translates
[00:58:34.120]between context, if you get all the,
[00:58:36.130]they're getting it from both home and at school,
[00:58:38.170]hopefully collectively that will decrease
[00:58:40.300]the disruptive negative behaviors.
[00:58:45.550]So in this study
[00:58:48.528]funded by the Institute for Educational Sciences,
[00:58:51.460]it has a fork.
[00:58:52.590]It was originally designed as a four-cohort fixed design.
[00:58:58.043]And done as an RCT looking at the evaluation
[00:59:00.640]of this intervention CBC
[00:59:02.890]for students with challenging classroom behaviors.
[00:59:04.840]We had 22 schools, 90 classrooms,
[00:59:07.780]207 kindergarten through third grade students
[00:59:12.370]and their parents.
[00:59:14.000]So small groups of participants were randomly assigned.
[00:59:20.020]So we formed small groups,
[00:59:21.740]two to three students within a classroom.
[00:59:24.320]That classroom was then assigned
[00:59:26.400]to receive the CBC intervention or not.
[00:59:30.880]It was powered to detect
[00:59:32.240]a medium of standardized effect size of 0.38
[00:59:35.620]and through the power analysis, we determined that we needed
[00:59:39.300]270 participants, participating children
[00:59:42.540]within 90 classrooms, averaging three kids per classroom.
[00:59:47.490]So that was the fixed design.
[00:59:48.880]What we ended up doing for this reanalysis
[00:59:53.370]was because it was a sequentially,
[00:59:55.500]we could conceptualize it as a sequentially unfolding study,
[01:00:00.610]we went back and reanalyzed it.
[01:00:02.710]That's what I'm gonna present to you
[01:00:04.092]we're going to evaluate the sequential design
[01:00:06.989]on its ability to find the same finding we found
[01:00:10.600]after the study was actually completed as a fixed design.
[01:00:16.540]All right.
[01:00:17.373]So this is a post hoc reanalysis.
[01:00:19.930]It was a four-cohort cohort study,
[01:00:22.710]so we're gonna define those four cohorts as the groups,
[01:00:25.870]Year One, Year Two, Year Three, Year Four.
[01:00:29.900]Then we're going to assume that the known fixed design,
[01:00:32.340]the conclusion is the truth.
[01:00:33.710]So ask yourselves what's the degree of savings,
[01:00:36.420]if we had done this as as a sequential design instead.
[01:00:41.230]The nice thing about this
[01:00:42.370]is that there are PROCs for this, okay?
[01:00:46.020]And I'm not an R user, but I assume,
[01:00:49.400]there's probably a bunch of R packages for this as well.
[01:00:52.760]And SAS, they have three procedures,
[01:00:54.670]two sequential design procedures,
[01:00:56.960]that are useful for this as well as then,
[01:00:59.720]you can use whatever procedure you want to use
[01:01:02.000]in the middle.
[01:01:03.000]So PROC sequential design
[01:01:05.810]will help you determine the boundary value,
[01:01:07.920]so those plots a few slides ago, critical values, okay,
[01:01:12.050]it's an essentially an a priori power analysis.
[01:01:15.800]Then you go in, you do your study
[01:01:17.920]and you analyze it at each stage.
[01:01:19.960]In this case, there are four stages,
[01:01:21.260]so they're gonna be three snoops
[01:01:23.920]and one final determination, all right,
[01:01:26.180]before statistical analyses that are gonna be conducted.
[01:01:29.570]And then PROC SEQTEST, every one pronounce that,
[01:01:34.440]is going to be used to evaluate those results
[01:01:36.230]based on the boundary value.
[01:01:37.500]So if you've done any multiple imputation,
[01:01:39.530]especially with SAS,
[01:01:40.420]it's kind of like using PROC MIANALYZE
[01:01:42.820]to reintegrate all of your multiple imputation results.
[01:01:49.200]So it's going to take what you get
[01:01:50.900]out of whatever your statistical procedure is,
[01:01:52.940]in this case, I'm using MIXED
[01:01:54.610]and plug it back into the information
[01:01:57.590]from the boundary values
[01:01:59.270]to determine what your decision-making is going to be.
[01:02:03.780]All right, so through this process,
[01:02:07.410]this is just showing one outcome.
[01:02:10.580]It's the adaptive skills outcome.
[01:02:12.740]So four stages, we're going to do
[01:02:14.360]after the first cohort is collected.
[01:02:16.030]Second cohort, which includes both first and second,
[01:02:18.990]after the third cohort includes all three fourth cohort,
[01:02:22.360]all the data's in.
[01:02:23.750]So we're gonna get a file out
[01:02:25.480]that has the parameter estimate and its standard error
[01:02:29.300]for each of the four stages with a sample size.
[01:02:32.480]This file is input back into that sequential test file
[01:02:37.790]and then it'll reach output the results here.
[01:02:42.400]So this is using the O'Brien Fleming.
[01:02:49.200]So this is reporting
[01:02:52.980]a one tail here.
[01:02:54.290]So we're going to plot, you see the dot here.
[01:02:58.430]So at stage one, hey after the first 25 small groups
[01:03:04.680]were completed that was the test statistic.
[01:03:08.560]It's in the area of keep going.
[01:03:13.390]At stage two, here after 54 total participants,
[01:03:16.980]including the original 25,
[01:03:19.700]now it moves into now this region of a fail to reject.
[01:03:26.750]Falls in here, fail to reject there it's rejection.
[01:03:31.430]Three, it stays in, four it stays in.
[01:03:34.670]So, eventually not finding a result.
[01:03:38.340]We could have come to that conclusion
[01:03:40.090]instead of four years later, 90 classrooms, 207 kids,
[01:03:46.010]we could have made that decision
[01:03:47.670]after the second cohort was completed
[01:03:50.280]after a little less than,
[01:03:54.860]or more than 1/2 of the classrooms had participated,
[01:03:59.630]saved a little bit of money.
[01:04:02.670]Looking across at a number of the,
[01:04:04.710]now, the complication is
[01:04:06.260]that there a number of outcomes in this all right?
[01:04:09.080]So what we just saw here, the second outcome,
[01:04:13.050]it was clearly from the beginning up in the rejection.
[01:04:18.800]Other outcomes same pattern as the first,
[01:04:22.250]saw it early or, no inconclusive early, excuse me,
[01:04:25.690]then moose could have called it after the second.
[01:04:28.450]Here we can't call it until the very end.
[01:04:33.900]And I mean see all it's a table summarizing it.
[01:04:37.290]So we had a set of both parent observations
[01:04:39.970]of the child disrupted behavior and teacher evaluations.
[01:04:43.430]We come to a conclusion that on the parent outcomes,
[01:04:48.110]we could have made all of the eventual decisions
[01:04:50.470]that parent outcomes didn't show anything
[01:04:53.180]after the second cohort.
[01:04:56.070]All the teacher outcomes were three out of the four,
[01:05:00.460]needed a little bit extra time, but all four were found,
[01:05:04.080]teachers were able to observe changes
[01:05:06.010]in the disruptive behaviors right?
[01:05:08.630]We could have called a couple of them early,
[01:05:10.840]the externalizing behavior we needed to go the full trial
[01:05:15.490]to find the conclusion.
[01:05:17.030]This is the unfortunate part of the application
[01:05:20.920]of sequential designs in education research
[01:05:24.410]is that, because of our multiple outcomes
[01:05:26.043]that we tend to have when do we actually stop here?
[01:05:29.540]Now, we probably could have stopped bothering the parents
[01:05:32.910]collecting that information early,
[01:05:34.830]'cause that was pretty clear
[01:05:35.990]the parent observations were not picking up anything.
[01:05:41.360]Yeah, if now these two both came from the Basques,
[01:05:45.130]if they had both stopped early,
[01:05:47.130]we could have stopped administering that instrument early,
[01:05:51.780]but instead, two different sub scales
[01:05:53.960]where one stopped early
[01:05:55.050]the other one needed to go through to continuation,
[01:05:58.640]but social skills score and their parent-teach relationship,
[01:06:01.640]they both stopped early.
[01:06:02.580]We could have maybe stopped data collection
[01:06:04.700]on those as well, depending upon how expensive,
[01:06:06.720]how intrusive the data collection,
[01:06:08.880]you could start cutting out the measures
[01:06:11.470]that aren't necessary
[01:06:13.740]to carry all the way through the completion.
[01:06:17.340]All right, so the fixed design results
[01:06:20.390]indicate significant treatment effects on some outcomes,
[01:06:22.610]but not all primarily the teacher effects,
[01:06:24.940]the teacher observations
[01:06:26.380]and not the parent observations.
[01:06:29.020]We were able to come to the same conclusion,
[01:06:31.610]but the average sample size necessary was 55 classrooms
[01:06:35.220]or teachers to come to the same conclusion
[01:06:37.540]across all outcomes.
[01:06:39.860]So it could be beneficial,
[01:06:42.130]but it gets a little complication, again,
[01:06:44.870]because of the profile of decisions.
[01:06:50.720]So wrapping up with this,
[01:06:52.330]again reminding you of some of the limitations.
[01:06:55.760]Yeah, sample has to be sequentially recruited.
[01:06:58.010]In this case, it was as a short 12-week intervention.
[01:07:00.580]It was completed over the course within a semester
[01:07:04.800]by looking it as a group design,
[01:07:06.660]we could look at it in terms of the four cohorts.
[01:07:10.920]So it lended itself to the sequential recruitment.
[01:07:14.220]It is more complex and computationally burdensome,
[01:07:16.860]but luckily again SAS as a program has procedures
[01:07:20.240]to take a lot of the heavy lifting out of this.
[01:07:24.260]Early termination might be
[01:07:29.500]a validity consideration,
[01:07:32.050]only 50 classrooms, but if you're familiar
[01:07:34.790]with the multi-level modeling literature,
[01:07:36.670]50 is that lower bound of the recommended sample size okay,
[01:07:40.120]so maybe not so much.
[01:07:41.810]Maybe if we had stopped the one after only 25,
[01:07:45.008]and we have some reviewer concerns.
[01:07:49.020]Instability.
[01:07:51.640]Again, because most of the decisions
[01:07:53.150]could be made after 50 Level Two units were obtained,
[01:07:57.870]we may ought avoid some of the asymptotic principles,
[01:08:00.960]but that's also an empirical question.
[01:08:03.560]But where we run into trouble with this is this consistency
[01:08:06.620]across outcomes in groups.
[01:08:09.040]So some of the future work we're doing
[01:08:11.050]is going to reanalyze this and some other randomized trials
[01:08:14.290]that we've done through CYFS, trying to reevaluate
[01:08:19.760]some as fully sequential designs
[01:08:23.050]as well as those hybrid, those flexible designs as well.
[01:08:26.590]So I will stop talking.
[01:08:30.740]You have my contact information.
[01:08:32.960]So if you have any questions
[01:08:35.830]that you'd like to discuss outside of the confines
[01:08:38.840]of this room, please don't hesitate to contact.
[01:08:42.360]Otherwise, questions. (students applause)

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

James Bovaird: “Adaptation in the Social, Behavioral and Education Sciences”

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments