使用dplyr在for循环内进行按行运算 [英] Rowwise operation within for loop using dplyr

查看:112
本文介绍了使用dplyr在for循环内进行按行运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些传输数据,如果要在for循环中进行比较,我想按行执行。数据看起来像这样。

I have some transport data which I would like to perform a rowwise if comparison within a for loop. The data looks something like this.

# Using the iris dataset 
> iris <- as.data.frame(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

其中的结果将记录每个物种中花瓣长度相等的萼片长度的实例。这样我们就可以记录成对的花瓣长度相等的萼片对(这只是一个没有科学意义的图示)。会产生这样的结果:

Where the result would record the instances of sepal lengths with equal petal width in each species. Such that we record the pairs of sepal lengths with equal petal width (this is only an illustration having no scientific significance). Which would yield something like this:

Species Petal.Width Sepal.Length1 Sepal.Length2
setosa          0.2         5.1             4.9
setosa          0.2         5.1             4.7
setosa          0.2         4.9             4.7
setosa          0.2         5.1             4.6
...

我最初的Python类想法是在for循环内执行for循环,如下所示:

My initial Python-ish thought was to perform a for loop within a for loop, looking something like this:

for s in unique(Species):
  for i in 1:nrow(iris):
    for j in 1:nrow(iris):
      if iris$Petal.Width[i,] == iris$Petal.Width[j,]:
        Output$Species = iris$Species[i,]
        Output$Petal.Width = iris$Petal.Width[i,]
        Output$Sepal.Length1= iris$Sepal.Length[i,]
        Output$Sepal.Length2= iris$Sepal.Length[j,]
    end
  end
end

我曾经考虑过使用 group_by 分类 Species 首先在unique(Species):中实现第一个for循环 for。但是我不知道如何按行比较数据集中的每个观察值,并像第二段代码一样存储它。我在 dplyr中的循环上看到了问题行数。如果上面的代码不清楚,我很抱歉。第一次在这里问一个问题。

I had thought about using group_by to classify Species first to achieve the first for loop for s in unique(Species):. But I don't know how to rowwise compare each observation in the dataset, and to store it like the second block of code. I have seen questions on for loops in dplyr and rowwise quantities. My apologies if the code above is not as clear. First time asking a question here.

推荐答案

使用 dplyr

library(dplyr)    

iris %>%
      group_by(Species,Petal.Width) %>%
      mutate(n = n()) %>%
      filter(n > 1) %>%
      mutate(Sepal.Length1 = Sepal.Length,
             Sepal.Length2 = Sepal.Length1 - Petal.Width) %>%
      arrange(Petal.Width) %>%
      select(Species, Petal.Width, Sepal.Length1, Sepal.Length2)

这是将种类分组Petal.Width ,计算它们相同的实例,仅选择唯一配对超过1个的实例,然后重命名 Sepal.Length Sepal.Length1 ,并创建一个新变量 Sepal.Length2 = Sepal.Length1 - Petal.Width

This is grouping Species and Petal.Width, counting instances where they are the same, only selecting cases where there are more than 1 unique pairing, and then renaming Sepal.Length to Sepal.Length1, and creating a new variable Sepal.Length2 = Sepal.Length1 - Petal.Width

用于记录 Sepal.Length 在定义范围内的每个物种

For recording Sepal.Length for each Species within a defined range:

minpw <- min(Petal.Width)
maxpw <- max(Petal.Width)

iris %>%
  group_by(Sepal.Length, Species, petal_width_range = cut(Petal.Width, breaks = seq(minpw,maxpw,by=0.2))) %>%
  summarise(count = n())

这篇关于使用dplyr在for循环内进行按行运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆