R ifelse()计算一个条件并返回匹配 [英] R ifelse() evaluates a condition and returns match

查看:183
本文介绍了R ifelse()计算一个条件并返回匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框

countryname <- c("Viet Nam", "Viet Nam", "Viet Nam", "Viet Nam", "Viet Nam")
year <- c(1974, 1975, 1976, 1977,1978)

df <- data.frame(countryname, year)

这是一个长的国家按年份格式。

that is in a long country by year format.

我想创建一个功能,可以根据观察年份进行标准化国名。我创建了一个能够从数据框架 cnames 中拉取并且标准化名称的函数,但这仅适用于横截面,如果国名不随时间变化。

I would like to create a function that can standardize countrynames conditional upon the year of the observation. I created a function that is able to pull from a data frame cnames and standardize names but this is only useful for cross-sections and if country names do not vary over time.

country <- c("Vietnam, North", "Vietnam, N.", "Vietnam North", "Viet Nam", "Democratic Republic Of Vietnam")
standardize <- c("Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of")
panel <- c("Vietnam", "Vietnam","Vietnam","Vietnam","Vietnam")
time <- c(1976,1976,1976,1976,1976)

cnames <- data.frame(country, standardize, panel, time)

要标准化的功能是

country_name <- function(x) {
   return(cnames[match(x,cnames$country),]$standardize)
}

但是,您可以看到,这并不考虑国家名称随时间的变化。我已经尝试了很多不同的东西,最接近的是这个功能。

However, as you can see this doesn't account for any variation of country names over time. I've tried a number of different things and the closest I've come is this function.

country_panel <- function(x, y) {

  ifelse(cnames$time < y, 
    return(cnames[match(x, cnames$country),]$panel),
    return(cnames[match(x, cnames$country),]$standardize)
  )
}

我使用 dplyr 链条拉入数据框,然后使用 mutate 创建一个新的变量,理想情况是捕获国家/地区名称的差异。

I use a dplyr chain to pull in the data frame and then use mutate to create a new variable that ideally that captures the difference in names for countries.

d1 <- df %>%
    mutate(new_name = country_panel(countryname, year))

我发现的问题是该函数仅评估 y c $ c> country_panel 函数作为单个对象不作为向量。如果我输入一个大于或小于 cnames $ time 的整数,它将正确评估,但会传递每行的值。

The problem that I'm finding is that the function only evaluates y in the country_panel function as a single object not as a vector. If I input an integer that is greater or less than cnames$time it evaluates correctly but passes the value for every row.

如何使用此函数评估每个 cnames $ country cnames $ time 关系到 df $ year 并返回正确的 cnames $ panel cnames $ standardize

How can I have this function evaluate each cnames$country and cnames$time relationship to df$year and return the correct cnames$panel or cnames$standardize?

感谢您的帮助。

推荐答案

d1
#   countryname year                        new_name
# 1    Viet Nam 1974 Vietnam, Democratic Republic of
# 2    Viet Nam 1975 Vietnam, Democratic Republic of
# 3    Viet Nam 1976 Vietnam, Democratic Republic of
# 4    Viet Nam 1977                         Vietnam
# 5    Viet Nam 1978                         Vietnam

所有你需要做的是确保你的数据框设置为 stringsAsFactors = F 当您定义它们时,即( df< - data.frame(countryname,year,stringsAsFactors = F) )。并取出返回命令:

All you need to do is make sure your data frames are set to stringsAsFactors=F when you define them, i.e. (df <- data.frame(countryname, year, stringsAsFactors=F)). And take out the return command:

country_panel <- function(x, y) {
  ifelse(cnames$time < y, 
    cnames[match(x, cnames$country),]$panel,
    cnames[match(x, cnames$country),]$standardize
  )
}

背后的推理是调用后,返回停止其轨道中的功能。所以你的数据帧被一个单一的值输出填充。这就是为什么他们都一样。

The reasoning behind it is that return stops the function in its tracks once it's called. So your data frame is being populated by a single value output. That's why they were all the same.

这篇关于R ifelse()计算一个条件并返回匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆