在 R 中使用 OR 条件和字符串进行子集化 [英] Subsetting in R using OR condition with strings

查看:29
本文介绍了在 R 中使用 OR 条件和字符串进行子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约有 40 列的数据框,第二列 data[2] 包含其余行数据描述的公司名称.但是,公司的名称因年份而异(2009 年数据为 09,2010 年没有).

I have a data frame with about 40 columns, the second column, data[2] contains the name of the company that the rest of the row data describes. However, the names of the companies are different depending on the year (trailing 09 for 2009 data, nothing for 2010).

我希望能够对数据进行子集化,以便我可以同时提取这两年.这是我正在尝试做的一个例子......

I would like to be able to subset the data such that I can pull in both years at once. Here is an example of what I'm trying to do...

subset(data, data[2] == "Company Name 09" | "Company Name", drop = T) 

本质上,我在子集函数中使用 OR 运算符有困难.

Essentially, I'm having difficulty using the OR operator within the subset function.

但是,我尝试了其他选择:

However, I have tried other alternatives:

subset(data, data[[2]] == grep("Company Name", data[[2]]))

也许使用字符串函数有更简单的方法?

Perhaps there's an easier way to do it using a string function?

任何想法都会受到赞赏.

Any thoughts would be appreicated.

推荐答案

首先(正如乔纳森在他的评论中所做的那样)要引用第二列,您应该使用 data[[2]]data[,2].但如果您使用子集,您可以使用列名:subset(data, CompanyName == ...).

First of all (as Jonathan done in his comment) to reference second column you should use either data[[2]] or data[,2]. But if you are using subset you could use column name: subset(data, CompanyName == ...).

对于您的问题,我将执行以下操作之一:

And for you question I will do one of:

subset(data, data[[2]] %in% c("Company Name 09", "Company Name"), drop = TRUE) 
subset(data, grepl("^Company Name", data[[2]]), drop = TRUE)

第二,我使用 grepl(在 R 版本 2.9 中引入),它返回带有 TRUE 的逻辑向量进行匹配.

In second I use grepl (introduced with R version 2.9) which return logical vector with TRUE for match.

这篇关于在 R 中使用 OR 条件和字符串进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆