如何写“ for”循环。使用dplyr语法在R中循环 [英] How to write loops "for" loops in R using dplyr syntax

查看:189
本文介绍了如何写“ for”循环。使用dplyr语法在R中循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中使用dplyr语法编写了大量代码。但是,我试图将这些代码放入循环中,以便最终可以创建多个输出文件,而不是一个。不幸的是,我似乎无法这样做。



出于说明目的,让我们参考R中常用的 iris数据集:

 > data( iris)
> str(iris)
data.frame:150磅。的5个变量:
$ Sepal.Length:num
$ Sepal.Width:num
$ Petal.Length:num
$ Petal.Width:num
$种:具有3个级别的因子 setosa, versicolor, virginica

我说想要保存杂色物种的平均Petal.Length。 dplyr代码可能如下所示:

  MeanLength2<-虹膜%>%filter(Species == versicolor )
%>%summary(mean(Petal.Length))%>%print()

将给出以下值:

 均值(Petal.Length)
1 4.26

让我们尝试创建一个循环以获取所有物种的平均花瓣长度。



根据我对循环的了解,我想做这样的事情:

  for(i in unique(iris $ Species))
{
iris%&%;%filter(iris $ Species == unique(iris $ Species)[i])%&%;%
summary(mean(iris $ Petal.Length))%>%print()
print(i)
}

由于某种原因,我必须指定循环内的数据框和列,通常在使用dplyr的管道功能。我假设这是问题的征兆。



无论如何,以上代码给出以下输出:

 均值(iris $ Petal.Length)
1 3.758
[1] setosa
均值(iris $ Petal.Length)
1 3.758
[1] versicolor
均值(iris $ Petal.Length)
1 3.758
[1] virginica

因此,该代码将输出3.758次三倍,这是数据集中所有物种的平均花瓣长度。这表明过滤器代码未按预期工作。据我所知,由于最终输出中打印了所有三个唯一的物种名称,因此循环本身似乎按预期运行。



如何使用for循环来做类似的事情?我知道这项特殊的练习不需要使用花式循环,因为可以通过使用dplyr中的 group_by函数轻松获得所有物种的平均花瓣长度,但是我希望输出接近100个唯一的表格和PDF文件以及我正在使用的数据集,并且知道如何用于循环将真正有助于实现该目的。

解决方案

很不幸,您的代码没有引发任何错误。如果您逐行运行代码,您将会理解我的意思。在此示例中,我将选择循环的第一个迭代,让我们将 i 替换为 setosa

 >虹膜%>%过滤器(iris $ Species == unique(iris $ Species)[ setosa])
[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
< 0行> (或长度为0的row.names)

您的过滤器产生的数据帧没有观测值,因此没有要点,但是对于本示例,让我们运行其余代码:

 >虹膜%>%过滤器(iris $ Species == unique(iris $ Species)[ setosa])%&%;%
+ summary(mean(iris $ Petal.Length))
均值( iris $ Petal.Length)
1 3.758

发生了什么事?代码中的 iris 数据集,一个更明显的例子是:

 > filter(iris,iris $ Species == unique(iris $ Species)[ setosa])%&%;%
+摘要(mean(mtcars $ cyl))
均值(mtcars $ cyl)
1 6.1875

这就是为什么您无法获得预期的答案,而过滤器却没有

如TJ Mahr所述,您的代码未指定数据集运行良好:

 > for(i in unique(iris $ Species))
+ {
+虹膜%&%;%过滤器(Species == i)%&%;%
+ summary(mean(Petal.Length ))%>%print()
+ print(i)
+}
均值(Petal.Length)
1 1.462
[1] setosa
均值(Petal.Length)
1 4.26
[1] versicolor
均值(Petal.Length)
1 5.552
[1] virginica

我希望这会有所帮助


I have an extensive block of code that I've written using dplyr syntax in R. However, I am trying to put that code in a loop, so that I can ultimately create multiple output files as opposed to just one. Unfortunately, I appear unable to do so.

For illustration purposes regarding my problem, let's refer to the commonly used "iris" dataset in R:

      > data("iris")
      > str(iris)
      'data.frame': 150 obs. of  5 variables:
      $ Sepal.Length: num  
      $ Sepal.Width : num  
      $ Petal.Length: num  
      $ Petal.Width : num  
      $ Species     : Factor w/ 3 levels "setosa","versicolor","virginica"

Let's say that I want to save the average Petal.Length of the species "versicolor". The dplyr code could look like the following:

    MeanLength2 <- iris %>% filter(Species=="versicolor")
                       %>% summarize(mean(Petal.Length)) %>% print()

Which would give the following value:

      mean(Petal.Length)
    1               4.26

Lets attempt to create a loop to get the average petal length for all of the species.

From what little I know of loops, I would want to do something like this:

     for (i in unique(iris$Species))
      {
       iris %>% filter(iris$Species==unique(iris$Species)[i]) %>%
        summarize(mean(iris$Petal.Length)) %>% print()
        print(i) 
       }

For some reason, I had to specify the data frame and the column inside the loop, which is generally not the case while using the piping functionality of dplyr. I'm assuming that this is indicative of the problem.

Anyways, the above code gives the following output:

          mean(iris$Petal.Length)
     1                   3.758
     [1] "setosa"
          mean(iris$Petal.Length)
     1                   3.758
     [1] "versicolor"
          mean(iris$Petal.Length)
     1                   3.758
     [1] "virginica"  

So the code is outputting 3.758 three times, which is the average petal length across all species in the dataset. This indicates that the "filter" code did not work as expected. From what I can tell, it appears that the loop itself functioned as intended, as all three unique species names were printed in the eventual output.

How can one go about doing something like this with the use of for loops? I understand that this particular exercise does not require the use of fancy loops as one can easily get the average petal length of all the species by using, for example, the "group_by" function in dplyr, but I am looking to output close to a 100 unique table and PDF files with the dataset that I am working with and knowing how to use for loops would really help for that purpose.

解决方案

It is unfortunate that your code didn't raise any errors. If you run your code line by line you'll understand what I'm saying. For this example I will choose the first iteration of your loop, let's replace i for "setosa":

> iris  %>% filter(iris$Species == unique(iris$Species)["setosa"])
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
<0 rows> (or 0-length row.names)

Your filter yields a data frame with no observations, so no point in going ahead, but for this example, let's run the rest of the code:

> iris  %>% filter(iris$Species == unique(iris$Species)["setosa"]) %>%  
+ summarize(mean(iris$Petal.Length))
  mean(iris$Petal.Length)
1                   3.758

What happened is that you're calling the iris dataset from within your code, a more obvious example would be:

> filter(iris, iris$Species == unique(iris$Species)["setosa"]) %>% 
+ summarize(mean(mtcars$cyl))
  mean(mtcars$cyl)
1           6.1875

That's why you don't get the answer you expected, your filter didn't work and you got a summary statistic from another dataset.

As TJ Mahr mentioned, your code without specifying the dataset runs fine:

> for (i in unique(iris$Species))
+ {
+     iris %>% filter(Species==i) %>%
+         summarize(mean(Petal.Length)) %>% print()
+     print(i) 
+ }
  mean(Petal.Length)
1              1.462
[1] "setosa"
  mean(Petal.Length)
1               4.26
[1] "versicolor"
  mean(Petal.Length)
1              5.552
[1] "virginica"

I hope this helps

这篇关于如何写“ for”循环。使用dplyr语法在R中循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆