One-Way ANOVA in R Studio

Jessie Morrill Author

11/07/2024 Added

15 Plays

Description

Dr. Jessie Morrill shares how to perform a One-Way ANOVA with Pairwise Comparisons in R Studio.

Searchable Transcript

Search:

[00:00:00.000]All right, so welcome to this video. I am going to be showing you how to run a one-way ANOVA in our studio.
[00:00:15.080]So to do this, you need to have a couple of packages installed onto your computer, and to get started, you need to call those packages to the library.
[00:00:27.060]So the two packages that you need for this example are going to be ggpubr as well as dplyr.
[00:00:35.040]So to call those to the working environment, I'm going to select those two lines of code and click run.
[00:00:41.520]So you can see in the bottom left-hand screen, you can see that those show up in the console indicating that it's been run.
[00:00:49.840]So if you are doing a one-way ANOVA on your own data set,
[00:00:56.460]you would first need to make sure that you set your working directory
[00:01:00.220]and then read in your data file that you have all of your data stored in.
[00:01:06.320]For today's example, I'm not showing you that because I'm going to be using a preloaded R data set.
[00:01:12.840]So you're able to actually follow along with this on your own individual computers without me having to share a data set with you.
[00:01:21.140]So to load this preloaded R data set, you can use...
[00:01:25.860]use the line of code "data" then in parentheses and quotes put in the phrase "plant growth".
[00:01:34.180]It's really important that you have a capital P and a capital G when you type that in and run it.
[00:01:41.740]So you can see in the upper right hand corner now plant growth is a part of our environment.
[00:01:48.060]So to preview that data, you can use the function "head" to give you a preview
[00:01:55.260]of the column headings, and then put in your parentheses "plant growth," select
[00:02:01.500]that line of code, run it. What you can see down in the console is this data
[00:02:07.140]frame has two columns in it. The first column is going to have a value for
[00:02:12.260]weight. That's going to be the response variable that we are looking at. It also
[00:02:16.860]has a column called "group." The group column has the names of all of the
[00:02:22.200]different treatment groups in the particular
[00:02:24.660]data set that we are looking at. To get a better preview of this data, I've
[00:02:30.420]added some code in here to be able to create a ggbarplot. The ggbarplot
[00:02:36.720]function is a part of the GGR package. To be able to take a look at this data,
[00:02:43.920]what we have on the x-axis is each of the treatment groups. On the y-axis, we
[00:02:49.920]have the plant weight again, which is the response variable.
[00:02:54.060]You can see the black dots in each of these bars is going to be
[00:02:59.960]representative of the experimental units that are associated with each of
[00:03:05.880]those treatment groups. Next, what we want to be able to do is calculate some
[00:03:11.860]summary statistics. What I mean by that is we want to be able to know what
[00:03:16.880]is the mean of the control group and the mean of the treatment one group and the
[00:03:23.960]mean of the treatment two group. We also want to know what the standard error of
[00:03:28.560]the mean is. By standard error of the mean, what I'm referring to is the
[00:03:32.900]standard deviation of the response variable, which is weight. To
[00:03:38.460]calculate the standard error of the mean, we also need to take the square root of
[00:03:41.720]n. In this case, that's going to be the square root of the number of
[00:03:46.020]experimental units per treatment group. If you were to count those
[00:03:49.940]dots, we'd be dividing by the square root of the number of those
[00:03:53.860]dots on that bar graph. To be able to calculate those summary
[00:03:58.260]statistics, I've got a couple lines of code here
[00:04:00.920]that I've set equal to plant growth summary. I'm using the function
[00:04:06.560]summarize, and then I'm grouping the data in the original data frame
[00:04:12.040]called plant growth by group. I'm grouping the data by group.
[00:04:18.540]The next thing I'm doing is I'm calculating the mean
[00:04:21.880]weight of
[00:04:23.760]the weight column for each of those treatment groups, and then I'm
[00:04:28.500]calculating the SEM, or standard error of the mean, and the equation that I'm
[00:04:33.060]using to do that is the standard deviation of weight divided by the
[00:04:37.440]square root of n. So if I run this line of code, these few lines of code, what
[00:04:42.660]ends up happening? You can see now in the environment that we have an object
[00:04:48.420]called plant growth summary. It has three observations of three variables.
[00:04:53.660]So now if I come down and I want to display those summary statistics, I'm
[00:04:59.120]going to select plant growth summary, run that, and it'll show up in my console
[00:05:06.160]down in the bottom left of my screen. And so you can see in the first column we
[00:05:11.960]have the groups, which is control treatment 1 and treatment 2. We also have
[00:05:16.780]the mean weight that we calculated, which is going to be 5.03, 4.66, and
[00:05:23.560]5.53. So if you come over to your bar graph, you can see
[00:05:28.420]see that the control group is approximately five. You can see that treatment one is slightly lower
[00:05:35.630]than that, around 4.66, and then you can see that the treatment two group is higher than the rest of
[00:05:43.630]those in the value for the mean here. The top of the bar is 5.53. You can also see the standard
[00:05:51.310]error of the mean, and so for the control group, that's going to be 0.18, and so you can also
[00:06:00.350]appreciate that in this data set, the largest standard error of the mean is 0.25, and if you
[00:06:09.590]look here on the error bars of each of these three treatment groups, the one with the largest error
[00:06:18.270]bars is going to be this treatment one group.
[00:06:21.010]And you can see the length of these error bars around the mean, and that happens to be the
[00:06:27.850]largest one. So being able to plot the data always helps me to be able to make sense of the numbers
[00:06:34.870]that I see in my summary statistics, and so it's a good habit to always do, and it also just gives
[00:06:41.610]you more confidence in understanding what's happening with your data set when you're working
[00:06:46.650]to interpret the results.
[00:06:48.870]So now the next thing we want to do...
[00:06:50.970]...is actually run the ANOVA.
[00:06:53.310]So the function to be able to run an analysis of variance is abbreviated AOV, analysis of variance.
[00:07:01.850]And then in parentheses, what we have first is we have the...
[00:07:06.210]oops, excuse me.
[00:07:07.230]We have the response variable, which happens to be weight.
[00:07:13.350]And then we have, in this case, the identifier for the treatment groups.
[00:07:19.690]And so this is referring to...
[00:07:20.970]to the column where all the treatment groups are indicated in the data frame.
[00:07:25.970]And that happened to be called group.
[00:07:27.990]So next, in this particular function, we need to tell R what data set to look at.
[00:07:35.430]So in this case, the data that we are looking at is called plant growth.
[00:07:39.350]So we're going to go ahead and run this line of code.
[00:07:43.030]And we're setting it equal to plant growth AOV.
[00:07:47.170]So when we run that, now you can see...
[00:07:50.850]Plant growth AOV shows up as a list of 13 items in the working environment.
[00:07:59.070]So now if we come down and want to display the results of that ANOVA in table format,
[00:08:07.030]we're going to use the word summary and then in parentheses put plant growth AOV.
[00:08:12.310]Now if we run that, the ANOVA table shows up in our console.
[00:08:19.190]And you can see...
[00:08:20.730]The sums of squares, you can see the degrees of freedom for the group as well as the residuals.
[00:08:26.310]You can see the mean square error.
[00:08:28.370]You can also see the F value as well as the P value.
[00:08:32.590]So in this case, the P value ends up being 0.0159.
[00:08:38.290]Now with ANOVA, any time that you have a P value that is considered significant for this effect that we are looking at,
[00:08:50.610]we are able to do something called pairwise comparisons.
[00:08:53.610]What pairwise comparisons essentially is, is it's a t-test that allows us to compare each of the treatment groups that we have to each other.
[00:09:03.610]So it allows us to compare the control group only to treatment 1 and the control group only to treatment 2,
[00:09:11.610]and then treatment 1 compared to treatment 2.
[00:09:15.610]And so to be able to do those pairwise comparisons, to be able to determine
[00:09:20.490]if our treatment groups are different from each other, we can use the function "tukiHSD".
[00:09:28.370]And then in parentheses, what we put is the plant growth AOV results that we have.
[00:09:34.370]And if we run that, what you end up getting is now the pairwise comparisons for each of the treatments.
[00:09:42.370]So we have treatment 1 versus the control.
[00:09:45.370]You can see that the p-value comes out to be 0.39.
[00:09:50.370]So that's not considered to be significantly different.
[00:09:53.250]Next, we can look at treatment 2 versus the control.
[00:09:57.250]And you can see that treatment 2 versus the control has a p-value of 0.19,
[00:10:02.250]which in our case we're going to say is not considered significantly different.
[00:10:06.250]Next, you can see treatment 2 versus treatment 1.
[00:10:10.250]The p-value comes out to be 0.01.
[00:10:14.250]And so in our case, that is going to be considered statistically significant.
[00:10:20.250]So you can appreciate that the control is intermediate between treatment 1 and treatment 2.
[00:10:27.130]And the control is not statistically different from treatment 1 or treatment 2.
[00:10:36.130]But you can appreciate that treatment 1 has the lowest mean value
[00:10:41.130]and is statistically different from treatment 2, which does happen to have the highest mean value.
[00:10:50.130]Thank you for watching this video, and I encourage you to try this out using both the plant growth data as well as the data set that you have.

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

One-Way ANOVA in R Studio

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments