使用dplyr在for循环内进行按行运算 [英] Rowwise operation within for loop using dplyr
问题描述
我有一些传输数据,如果要在for循环中进行比较,我想按行执行。数据看起来像这样。
I have some transport data which I would like to perform a rowwise if comparison within a for loop. The data looks something like this.
# Using the iris dataset
> iris <- as.data.frame(iris)
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
其中的结果将记录每个物种中花瓣长度相等的萼片长度的实例。这样我们就可以记录成对的花瓣长度相等的萼片对(这只是一个没有科学意义的图示)。会产生这样的结果:
Where the result would record the instances of sepal lengths with equal petal width in each species. Such that we record the pairs of sepal lengths with equal petal width (this is only an illustration having no scientific significance). Which would yield something like this:
Species Petal.Width Sepal.Length1 Sepal.Length2
setosa 0.2 5.1 4.9
setosa 0.2 5.1 4.7
setosa 0.2 4.9 4.7
setosa 0.2 5.1 4.6
...
我最初的Python类想法是在for循环内执行for循环,如下所示:
My initial Python-ish thought was to perform a for loop within a for loop, looking something like this:
for s in unique(Species):
for i in 1:nrow(iris):
for j in 1:nrow(iris):
if iris$Petal.Width[i,] == iris$Petal.Width[j,]:
Output$Species = iris$Species[i,]
Output$Petal.Width = iris$Petal.Width[i,]
Output$Sepal.Length1= iris$Sepal.Length[i,]
Output$Sepal.Length2= iris$Sepal.Length[j,]
end
end
end
我曾经考虑过使用 group_by
分类 Species
首先在unique(Species):中实现第一个for循环 for。但是我不知道如何按行比较数据集中的每个观察值,并像第二段代码一样存储它。我在 dplyr中的循环上看到了问题和行数。如果上面的代码不清楚,我很抱歉。第一次在这里问一个问题。
I had thought about using group_by
to classify Species
first to achieve the first for loop for s in unique(Species):
. But I don't know how to rowwise compare each observation in the dataset, and to store it like the second block of code. I have seen questions on for loops in dplyr and rowwise quantities. My apologies if the code above is not as clear. First time asking a question here.
推荐答案
使用 dplyr
:
library(dplyr)
iris %>%
group_by(Species,Petal.Width) %>%
mutate(n = n()) %>%
filter(n > 1) %>%
mutate(Sepal.Length1 = Sepal.Length,
Sepal.Length2 = Sepal.Length1 - Petal.Width) %>%
arrange(Petal.Width) %>%
select(Species, Petal.Width, Sepal.Length1, Sepal.Length2)
这是将种类
和分组Petal.Width
,计算它们相同的实例,仅选择唯一配对超过1个的实例,然后重命名 Sepal.Length
到 Sepal.Length1
,并创建一个新变量 Sepal.Length2
= Sepal.Length1
- Petal.Width
This is grouping Species
and Petal.Width
, counting instances where they are the same, only selecting cases where there are more than 1 unique pairing, and then renaming Sepal.Length
to Sepal.Length1
, and creating a new variable Sepal.Length2
= Sepal.Length1
- Petal.Width
用于记录 Sepal.Length
在定义范围内的每个物种
:
For recording Sepal.Length
for each Species
within a defined range:
minpw <- min(Petal.Width)
maxpw <- max(Petal.Width)
iris %>%
group_by(Sepal.Length, Species, petal_width_range = cut(Petal.Width, breaks = seq(minpw,maxpw,by=0.2))) %>%
summarise(count = n())
这篇关于使用dplyr在for循环内进行按行运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!