通过条件查找在R数据框中创建新变量 [英] Create new variable in R data frame by conditional lookup

查看:168
本文介绍了通过条件查找在R数据框中创建新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过使用现有列作为同一表中另一列的查找值在R数据框中创建一个新变量。例如,在以下数据框中:

I want to create a new variable in an R data frame by using an existing column as a lookup value on another column in the same table. For example, in the following data frame:

df = data.frame(
  pet = c("smalldog", "mediumdog", "largedog",
             "smallcat", "mediumcat", "largecat"),
  numPets = c(1, 2, 3, 4, 5, 6)
  )

> df

        pet numPets
1  smalldog       1
2 mediumdog       2
3  largedog       3
4  smallcat       4
5 mediumcat       5
6  largecat       6

我想创建一个名为numEnemies的新列,对于小动物,该列等于零但等于大小相同但中型和大型动物种类不同的动物数量。我想以此结尾:

I want to to create a new column called numEnemies which is equal to zero for small animals but equal to the number of animals of the same size but the different species for medium and large animals. I want to end up with this:

        pet numPets numEnemies
1  smalldog       1          0
2 mediumdog       2          5
3  largedog       3          6
4  smallcat       4          0
5 mediumcat       5          2
6  largecat       6          3

我尝试执行此操作的方法是使用条件逻辑生成一个字符变量,然后可以使用该变量从同一数据帧查找所需的最终值,即让我到这里:

The way I was attempting to do this was by using conditional logic to generate a character variable which I could then use to look up the final value I want from the same data frame, which got me to here:

calculateEnemies <- function(df) {
  ifelse(grepl('small', df$pet), 0,
         ifelse(grepl('dog', df$pet), gsub('dog', 'cat', df$pet),
                ifelse(grepl('cat', df$pet),
                       gsub('cat', 'dog', df$pet), NA)))
}

df$numEnemies <- calculateEnemies(df)

> df

        pet numPets numEnemies
1  smalldog       1          0
2 mediumdog       2  mediumcat
3  largedog       3   largecat
4  smallcat       4          0
5 mediumcat       5  mediumdog
6  largecat       6   largedog

我想修改此功能以使用新生成的字符串以根据df $ pet中的相应值从df $ numPets中查找值。我也对更好的方法也持开放态度。

I want to modify this function to use the newly generated string to lookup the values from df$numPets based on the corresponding value in df$pet. I'm also open to a better approach that also generalizes.

推荐答案

在这里,我将使用 data.table

library(data.table)
setDT(df)[, numEnemies := rev(numPets), by = sub(".*(large|medium).*", "\\1", pet)]
df[grep("^small", pet), numEnemies := 0L]
#          pet numPets numEnemies
# 1:  smalldog       1          0
# 2: mediumdog       2          5
# 3:  largedog       3          6
# 4:  smallcat       4          0
# 5: mediumcat       5          2
# 6:  largecat       6          3

我基本上要做的是首先在整个数据上创建的组设置并反转每个组中的值。
然后,当时,我为 numPets 中的所有值分配了 0 grep( ^ small,pet)

What I basically did, is to first create groups of medium and large over the whole data set and just reverse the values within each group. Then, I've assigned 0 to all the values in numPets when grep("^small", pet).

这应该既高效又健壮,因为它可以用于任何数量的动物,而您实际上不需要知道动物的名字apriori。

This should be both very efficient and robust, as it will work on any number of animals and you don't actually need to know the animals names apriori.

这篇关于通过条件查找在R数据框中创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆