将列中的字符与另一列中的字符串进行比较 [英] Compare character in column with string in another column

查看:51
本文介绍了将列中的字符与另一列中的字符串进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有些日子以来,我试图通过将列中的字符与另一列中的字符串进行比较来找到一种方法来对我的数据框进行子集化.

Since some days I try to find a way to subset my data frame by comparing a character in a column with a string in another column.

如果字符不在字符串中,我想将值复制到新列.我搜索了高低,尝试了很多例子,但由于某种原因,我没有让它在我的数据框上工作.

In case the character is not within the string, I want to copy a value to a new column. I searched high and low, tried many examples, but for some reason I do not get it to work on my data frame.

    df <- structure(list(POLY = c("K3", "K3", "K3", "K4", "K4", "K4", "K4", 
    "K6", "K6", "K7", "K7", "K7", "L1", "L1", "L1"), FIX = c("O", 
    "K", "M", "M", "K", "O", "L", "K", "M", "K", "O", "M", "M", "L", 
    "O"), SESSTIME = c(310, 190, 181, 188, 151, 260, 268, 200, 259, 
    245, 180, 188, 259, 199, 244), CODE = c("KO", "KO", "KO", "KM", 
    "KM", "KM", "KM", "KM", "KM", "KO", "KO", "KO", "LMO", "LMO", 
    "LMO")), .Names = c("POLY", "FIX", "SESSTIME", "CODE"), row.names = c(42L, 
    44L, 46L, 115L, 116L, 117L, 133L, 225L, 231L, 269L, 270L, 328L, 
    420L, 425L, 431L), class = "data.frame")

这是它的一部分:

    row.names   POLY    FIX SESSTIME    CODE    SESSTIME2
1   42          K3      O   310         KO      NA
2   44          K3      K   190         KO      NA
3   46          K3      M   181         KO      ...
4   115         K4      M   188         KM
5   116         K4      K   151         KM
6   117         K4      O   260         KM      NA
7   133         K4      L   268         KM      268
8   225         K6      K   200         KM      NA
9   231         K6      M   259         KM
10  269         K7      K   245         KO
11  270         K7      O   180         KO
12  328         K7      M   188         KO      188
13  420         L1      M   259        LMO
14  425         L1      L   199        LMO
15  431         L1      O   244        LMO

因此,当 FIX 不在 CODE 中时,应将 SESSTIME 的值复制到 SESSTIME2(列已预先填充了 NA)

So when FIX is not in CODE the value of SESSTIME should be copied to SESSTIME2 (column already prepopulated with NA)

我尝试过例如

  df$FIX %in% strsplit(as.character(df$CODE,""))

或类似,但比较总是正确的.

or similar, but the comparison is always TRUE.

我发现的所有示例仅适用于(和工作)单个字符的比较,例如K"用向量 c("K","L","M") 左右硬编码,但从来没有举例说明如何将其应用于数据框列和行.

All examples I found only applied (and worked) with comparison of a single character e.g. "K" hardcoded with a vector c("K","L","M") or so, but never an example how to apply this to data frame columns and rows.

我有点紧张...

有人知道我做错了什么吗?

Anyone an idea what I'm doing wrong?

更新:

感谢下面的答案,我的代码现在看起来像这样并且满足我的需要:

Thanx to the answer below, my code now looks like this and does what I need:

df3$SESSTIME2[!(mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)) & is.na(df$SESSTIME2)] 

<- 

df$SESSTIME[!(mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)) & is.na(df$SESSTIME2)] 

推荐答案

你的代码不工作的原因是

The reason your code doesn't work is because

strsplit(as.character(df$CODE,""))

返回一个列表.相反,您需要使用 mapply 来检测是否存在匹配.

returns a list. Instead, you need to use mapply to detect if there is a match.

这里我们使用了grep,它允许更灵活的字符匹配

Here we used grep which allows more flexible character matching

# The values of FIX & CODE are passed to i and j
mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)

或使用 %in%

## Suggested by akrun
mapply('%in%', df$FIX,strsplit(as.character(df$CODE), ''))

这篇关于将列中的字符与另一列中的字符串进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆