将列中的字符与另一列中的字符串进行比较 [英] Compare character in column with string in another column
问题描述
有些日子以来,我试图通过将列中的字符与另一列中的字符串进行比较来找到一种方法来对我的数据框进行子集化.
Since some days I try to find a way to subset my data frame by comparing a character in a column with a string in another column.
如果字符不在字符串中,我想将值复制到新列.我搜索了高低,尝试了很多例子,但由于某种原因,我没有让它在我的数据框上工作.
In case the character is not within the string, I want to copy a value to a new column. I searched high and low, tried many examples, but for some reason I do not get it to work on my data frame.
df <- structure(list(POLY = c("K3", "K3", "K3", "K4", "K4", "K4", "K4",
"K6", "K6", "K7", "K7", "K7", "L1", "L1", "L1"), FIX = c("O",
"K", "M", "M", "K", "O", "L", "K", "M", "K", "O", "M", "M", "L",
"O"), SESSTIME = c(310, 190, 181, 188, 151, 260, 268, 200, 259,
245, 180, 188, 259, 199, 244), CODE = c("KO", "KO", "KO", "KM",
"KM", "KM", "KM", "KM", "KM", "KO", "KO", "KO", "LMO", "LMO",
"LMO")), .Names = c("POLY", "FIX", "SESSTIME", "CODE"), row.names = c(42L,
44L, 46L, 115L, 116L, 117L, 133L, 225L, 231L, 269L, 270L, 328L,
420L, 425L, 431L), class = "data.frame")
这是它的一部分:
row.names POLY FIX SESSTIME CODE SESSTIME2
1 42 K3 O 310 KO NA
2 44 K3 K 190 KO NA
3 46 K3 M 181 KO ...
4 115 K4 M 188 KM
5 116 K4 K 151 KM
6 117 K4 O 260 KM NA
7 133 K4 L 268 KM 268
8 225 K6 K 200 KM NA
9 231 K6 M 259 KM
10 269 K7 K 245 KO
11 270 K7 O 180 KO
12 328 K7 M 188 KO 188
13 420 L1 M 259 LMO
14 425 L1 L 199 LMO
15 431 L1 O 244 LMO
因此,当 FIX 不在 CODE 中时,应将 SESSTIME 的值复制到 SESSTIME2(列已预先填充了 NA)
So when FIX is not in CODE the value of SESSTIME should be copied to SESSTIME2 (column already prepopulated with NA)
我尝试过例如
df$FIX %in% strsplit(as.character(df$CODE,""))
或类似,但比较总是正确的.
or similar, but the comparison is always TRUE.
我发现的所有示例仅适用于(和工作)单个字符的比较,例如K"用向量 c("K","L","M") 左右硬编码,但从来没有举例说明如何将其应用于数据框列和行.
All examples I found only applied (and worked) with comparison of a single character e.g. "K" hardcoded with a vector c("K","L","M") or so, but never an example how to apply this to data frame columns and rows.
我有点紧张...
有人知道我做错了什么吗?
Anyone an idea what I'm doing wrong?
更新:
感谢下面的答案,我的代码现在看起来像这样并且满足我的需要:
Thanx to the answer below, my code now looks like this and does what I need:
df3$SESSTIME2[!(mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)) & is.na(df$SESSTIME2)]
<-
df$SESSTIME[!(mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)) & is.na(df$SESSTIME2)]
推荐答案
你的代码不工作的原因是
The reason your code doesn't work is because
strsplit(as.character(df$CODE,""))
返回一个列表.相反,您需要使用 mapply
来检测是否存在匹配.
returns a list. Instead, you need to use mapply
to detect if there is a match.
这里我们使用了grep
,它允许更灵活的字符匹配
Here we used grep
which allows more flexible character matching
# The values of FIX & CODE are passed to i and j
mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)
或使用 %in%
## Suggested by akrun
mapply('%in%', df$FIX,strsplit(as.character(df$CODE), ''))
这篇关于将列中的字符与另一列中的字符串进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!