使用R中的模式匹配从现有列创建新列 [英] Create new column from an existing column with pattern matching in R
问题描述
myData $ SubOrder< - str_replace_all(myData $ GreatGroup,l)
Boom。
注1: str_replace_all
没有 ignore.case
选项,但您可以在 tolower $ c中包装
myData $ GreatGroup
$ c>(easy)或重新配置正则表达式(hard)。
注2:与选项2 一样,它将不匹配项作为 GreatGroup
中的值,因此请使用该选项末尾的行来返回 NA
s,如果你喜欢。
I'm trying to add a new column based on another using pattern matching. I've read this post, but not getting the desired output.
I want to create a new column (SubOrder) based on the GreatGroup column. I have tried the following:
SubOrder <- rep(NA_character_, length(myData))
SubOrder[grepl("udults", myData, ignore.case = TRUE)] <- "Udults"
SubOrder[grepl("aquults", myData, ignore.case = TRUE)] <- "Aquults"
SubOrder[grepl("aqualfs", myData, ignore.case = TRUE)] <- "aqualfs"
SubOrder[grepl("humods", myData, ignore.case = TRUE)] <- "humods"
SubOrder[grepl("udalfs", myData, ignore.case = TRUE)] <- "udalfs"
SubOrder[grepl("orthods", myData, ignore.case = TRUE)] <- "orthods"
SubOrder[grepl("udalfs", myData, ignore.case = TRUE)] <- "udalfs"
SubOrder[grepl("psamments", myData, ignore.case = TRUE)] <- "psamments"
SubOrder[grepl("udepts", myData, ignore.case = TRUE)] <- "udepts"
SubOrder[grepl("fluvents", myData, ignore.case = TRUE)] <- "fluvents"
SubOrder[grepl("aquods", myData, ignore.case = TRUE)] <- "aquods"
For example, I'm looking for "udults" inside any word, such as Hapludults or Paleudults, and return just "udults".
EDIT: If anyone wants to take a shot at alistaire's comment, this is the search patterns I would use.
subOrderNames <- c("Udults", "Aquults", "Aqualfs", "Humods", "Udalfs", "Orthods", "Psamments", "Udepts", "fluvents")
Example data below.
myData <- dput(head(test))
structure(list(1:6, SID = c(200502L, 200502L, 200502L, 200502L,
200502L, 200502L), Groupdepth = c(11L, 12L, 13L, 14L, 21L, 22L
), AWC0to10 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), AWC10to20 = c(0.12,
0.12, 0.12, 0.12, 0.12, 0.12), AWC20to50 = c(0.12, 0.12, 0.12,
0.12, 0.12, 0.12), AWC50to100 = c(0.15, 0.15, 0.15, 0.15, 0.15,
0.15), Db3rdbar0to10 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43),
Db3rdbar10to20 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), Db3rdbar20to50 = c(1.43,
1.43, 1.43, 1.43, 1.43, 1.43), Db3rdbar50to100 = c(1.43,
1.43, 1.43, 1.43, 1.43, 1.43), HydrcRatngPP = c(0L, 0L, 0L,
0L, 0L, 0L), OrgMatter0to10 = c(1.25, 1.25, 1.25, 1.25, 1.25,
1.25), OrgMatter10to20 = c(1.25, 1.25, 1.25, 1.25, 1.25,
1.25), OrgMatter20to50 = c(1.02, 1.02, 1.02, 1.02, 1.02,
1.02), OrgMatter50to100 = c(0.12, 0.12, 0.12, 0.12, 0.12,
0.12), Clay0to10 = c(8, 8, 8, 8, 8, 8), Clay10to20 = c(8,
8, 8, 8, 8, 8), Clay20to50 = c(9.4, 9.4, 9.4, 9.4, 9.4, 9.4
), Clay50to100 = c(40, 40, 40, 40, 40, 40), Sand0to10 = c(85,
85, 85, 85, 85, 85), Sand10to20 = c(85, 85, 85, 85, 85, 85
), Sand20to50 = c(83, 83, 83, 83, 83, 83), Sand50to100 = c(45.8,
45.8, 45.8, 45.8, 45.8, 45.8), pHwater0to20 = c(6.3, 6.3,
6.3, 6.3, 6.3, 6.3), Ksat0to10 = c(23, 23, 23, 23, 23, 23
), Ksat10to20 = c(23, 23, 23, 23, 23, 23), Ksat20to50 = c(19.7333,
19.7333, 19.7333, 19.7333, 19.7333, 19.7333), Ksat50to100 = c(9,
9, 9, 9, 9, 9), TaxClName = c("Fine, mixed, semiactive, mesic Oxyaquic Hapludults",
"Fine, mixed, semiactive, mesic Oxyaquic Hapludults", "Fine, mixed, semiactive, mesic Oxyaquic Hapludults",
"Fine, mixed, semiactive, mesic Oxyaquic Hapludults", "Fine, mixed, semiactive, mesic Oxyaquic Hapludults",
"Fine, mixed, semiactive, mesic Oxyaquic Hapludults"), GreatGroup = c("Hapludults",
"Hapludults", "Hapludults", "Hapludults", "Hapludults", "Hapludults"
)), .Names = c("", "SID", "Groupdepth", "AWC0to10", "AWC10to20",
"AWC20to50", "AWC50to100", "Db3rdbar0to10", "Db3rdbar10to20",
"Db3rdbar20to50", "Db3rdbar50to100", "HydrcRatngPP", "OrgMatter0to10",
"OrgMatter10to20", "OrgMatter20to50", "OrgMatter50to100", "Clay0to10",
"Clay10to20", "Clay20to50", "Clay50to100", "Sand0to10", "Sand10to20",
"Sand20to50", "Sand50to100", "pHwater0to20", "Ksat0to10", "Ksat10to20",
"Ksat20to50", "Ksat50to100", "TaxClName", "GreatGroup"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -6L))
A few options, some of which I posted in the comments above.
Note: All options assume the replacement for the strings that match patters are just the pattern. If you want something else, they're all easily editable to include separate replacement values.
Option 1: for
+ grepl
Using the same code as the original, but looping to avoid repetitive code:
# make a list of patterns
pat <- c('udults', 'aquults', 'aqualfs', 'humods', 'udalfs', 'orthods', 'psamments', 'udepts', 'fluvents', 'aquods')
SubOrder <- rep(NA_character_, length(myData))
for(x in 1:length(pat)){
SubOrder[grepl(pat[x], myData$GreatGroup, ignore.case = TRUE)] <- pat[x]
}
Option 2: for
+ gsub
Build the new column in place by copying myData$GreatGroup
and then altering it with gsub
. The extra regex pasted on includes characters within the same string.
myData$SubOrder <- myData$GreatGroup
for(x in pat){
myData$SubOrder <- gsub(paste0('.*', x, '.*'), x, myData$SubOrder, ignore.case = TRUE)
}
Note that values not matched by one of the strings in pat
will have the value from GreatGroup
, not NA
. If you want them to be NA
, fix them with
myData$SubOrder[!(myData$SubOrder %in% pat)] <- NA
Option 3: named list + stringr::str_replace_all
My favorite because it doesn't loop, although it requires the stringr
package (which is pretty awesome, anyway).
Make a named list from pat
, where the name is the regex you want to replace, and the item is the string to match:
l <- as.list(pat)
names(l) <- paste0('.*', pat, '.*')
so it looks like
> l
$`.*udults.*`
[1] "udults"
$`.*aquults.*`
[1] "aquults"
$`.*aqualfs.*`
[1] "aqualfs"
......
Then use str_replace_all
to DO IT ALL AT ONCE:
myData$SubOrder <- str_replace_all(myData$GreatGroup, l)
Boom.
Note 1: str_replace_all
doesn't have an ignore.case
option, but you can wrap myData$GreatGroup
in tolower
(easy) or reconfigure the regex (hard).
Note 2: Like Option 2, it leaves unmatched entries as the value from GreatGroup
, so use the line at the end of that option to go back to NA
s, if you like.
这篇关于使用R中的模式匹配从现有列创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!