如何写“ for”循环。使用dplyr语法在R中循环 [英] How to write loops "for" loops in R using dplyr syntax
问题描述
我在R中使用dplyr语法编写了大量代码。但是,我试图将这些代码放入循环中,以便最终可以创建多个输出文件,而不是一个。不幸的是,我似乎无法这样做。
出于说明目的,让我们参考R中常用的 iris数据集:
> data( iris)
> str(iris)
data.frame:150磅。的5个变量:
$ Sepal.Length:num
$ Sepal.Width:num
$ Petal.Length:num
$ Petal.Width:num
$种:具有3个级别的因子 setosa, versicolor, virginica
我说想要保存杂色物种的平均Petal.Length。 dplyr代码可能如下所示:
MeanLength2<-虹膜%>%filter(Species == versicolor )
%>%summary(mean(Petal.Length))%>%print()
将给出以下值:
均值(Petal.Length)
1 4.26
让我们尝试创建一个循环以获取所有物种的平均花瓣长度。
根据我对循环的了解,我想做这样的事情:
for(i in unique(iris $ Species))
{
iris%&%;%filter(iris $ Species == unique(iris $ Species)[i])%&%;%
summary(mean(iris $ Petal.Length))%>%print()
print(i)
}
由于某种原因,我必须指定循环内的数据框和列,通常在使用dplyr的管道功能。我假设这是问题的征兆。
无论如何,以上代码给出以下输出:
均值(iris $ Petal.Length)
1 3.758
[1] setosa
均值(iris $ Petal.Length)
1 3.758
[1] versicolor
均值(iris $ Petal.Length)
1 3.758
[1] virginica
因此,该代码将输出3.758次三倍,这是数据集中所有物种的平均花瓣长度。这表明过滤器代码未按预期工作。据我所知,由于最终输出中打印了所有三个唯一的物种名称,因此循环本身似乎按预期运行。
如何使用for循环来做类似的事情?我知道这项特殊的练习不需要使用花式循环,因为可以通过使用dplyr中的 group_by函数轻松获得所有物种的平均花瓣长度,但是我希望输出接近100个唯一的表格和PDF文件以及我正在使用的数据集,并且知道如何用于循环将真正有助于实现该目的。
很不幸,您的代码没有引发任何错误。如果您逐行运行代码,您将会理解我的意思。在此示例中,我将选择循环的第一个迭代,让我们将 i
替换为 setosa
:
>虹膜%>%过滤器(iris $ Species == unique(iris $ Species)[ setosa])
[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
< 0行> (或长度为0的row.names)
您的过滤器产生的数据帧没有观测值,因此没有要点,但是对于本示例,让我们运行其余代码:
>虹膜%>%过滤器(iris $ Species == unique(iris $ Species)[ setosa])%&%;%
+ summary(mean(iris $ Petal.Length))
均值( iris $ Petal.Length)
1 3.758
发生了什么事?代码中的 iris
数据集,一个更明显的例子是:
> filter(iris,iris $ Species == unique(iris $ Species)[ setosa])%&%;%
+摘要(mean(mtcars $ cyl))
均值(mtcars $ cyl)
1 6.1875
这就是为什么您无法获得预期的答案,而过滤器却没有
如TJ Mahr所述,您的代码未指定数据集运行良好:
> for(i in unique(iris $ Species))
+ {
+虹膜%&%;%过滤器(Species == i)%&%;%
+ summary(mean(Petal.Length ))%>%print()
+ print(i)
+}
均值(Petal.Length)
1 1.462
[1] setosa
均值(Petal.Length)
1 4.26
[1] versicolor
均值(Petal.Length)
1 5.552
[1] virginica
我希望这会有所帮助
I have an extensive block of code that I've written using dplyr syntax in R. However, I am trying to put that code in a loop, so that I can ultimately create multiple output files as opposed to just one. Unfortunately, I appear unable to do so.
For illustration purposes regarding my problem, let's refer to the commonly used "iris" dataset in R:
> data("iris")
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num
$ Sepal.Width : num
$ Petal.Length: num
$ Petal.Width : num
$ Species : Factor w/ 3 levels "setosa","versicolor","virginica"
Let's say that I want to save the average Petal.Length of the species "versicolor". The dplyr code could look like the following:
MeanLength2 <- iris %>% filter(Species=="versicolor")
%>% summarize(mean(Petal.Length)) %>% print()
Which would give the following value:
mean(Petal.Length)
1 4.26
Lets attempt to create a loop to get the average petal length for all of the species.
From what little I know of loops, I would want to do something like this:
for (i in unique(iris$Species))
{
iris %>% filter(iris$Species==unique(iris$Species)[i]) %>%
summarize(mean(iris$Petal.Length)) %>% print()
print(i)
}
For some reason, I had to specify the data frame and the column inside the loop, which is generally not the case while using the piping functionality of dplyr. I'm assuming that this is indicative of the problem.
Anyways, the above code gives the following output:
mean(iris$Petal.Length)
1 3.758
[1] "setosa"
mean(iris$Petal.Length)
1 3.758
[1] "versicolor"
mean(iris$Petal.Length)
1 3.758
[1] "virginica"
So the code is outputting 3.758 three times, which is the average petal length across all species in the dataset. This indicates that the "filter" code did not work as expected. From what I can tell, it appears that the loop itself functioned as intended, as all three unique species names were printed in the eventual output.
How can one go about doing something like this with the use of for loops? I understand that this particular exercise does not require the use of fancy loops as one can easily get the average petal length of all the species by using, for example, the "group_by" function in dplyr, but I am looking to output close to a 100 unique table and PDF files with the dataset that I am working with and knowing how to use for loops would really help for that purpose.
It is unfortunate that your code didn't raise any errors. If you run your code line by line you'll understand what I'm saying. For this example I will choose the first iteration of your loop, let's replace i
for "setosa"
:
> iris %>% filter(iris$Species == unique(iris$Species)["setosa"])
[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<0 rows> (or 0-length row.names)
Your filter yields a data frame with no observations, so no point in going ahead, but for this example, let's run the rest of the code:
> iris %>% filter(iris$Species == unique(iris$Species)["setosa"]) %>%
+ summarize(mean(iris$Petal.Length))
mean(iris$Petal.Length)
1 3.758
What happened is that you're calling the iris
dataset from within your code, a more obvious example would be:
> filter(iris, iris$Species == unique(iris$Species)["setosa"]) %>%
+ summarize(mean(mtcars$cyl))
mean(mtcars$cyl)
1 6.1875
That's why you don't get the answer you expected, your filter didn't work and you got a summary statistic from another dataset.
As TJ Mahr mentioned, your code without specifying the dataset runs fine:
> for (i in unique(iris$Species))
+ {
+ iris %>% filter(Species==i) %>%
+ summarize(mean(Petal.Length)) %>% print()
+ print(i)
+ }
mean(Petal.Length)
1 1.462
[1] "setosa"
mean(Petal.Length)
1 4.26
[1] "versicolor"
mean(Petal.Length)
1 5.552
[1] "virginica"
I hope this helps
这篇关于如何写“ for”循环。使用dplyr语法在R中循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!