使用R中的模式匹配从现有列创建新列 [英] Create new column from an existing column with pattern matching in R

查看：121 发布时间：2018/5/28 19:41:17 regex r grep pattern-matching

本文介绍了使用R中的模式匹配从现有列创建新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用模式匹配来添加基于另一列的新列。
我读过这篇文章 a>，但没有获得所需的输出。

考试我正在寻找任何单词中的udults，例如Hapludults或Paleudults，并且只返回udults。

编辑：如果有人想拍摄alistaire的评论，这是我将使用的搜索模式。

  subOrderNames <-c（Udults，Aquults，Aqualfs，Humods，Udalfs， Orthods，Psamments，Udepts，fluvents）

下面的示例数据。

  myData <-dput（head（test））
 structure（list（1：6，SID = c （200502L，200502L，200502L，200502L，
 200502L，200502L），Groupdepth = c（11L，12L，13L，14L，21L，22L 
），AWC0to10 = c（0.12,0.12,0.12,0.12 ，AWC10至20 = c（0.12，
 0.12,0.12,0.12,0.12,0.12），AWC20至50 = c（0.12,0.12,0.12，
 0.12,0.12,0.12），AWC50至100 = （0.15,0.15,0.15,0.15,0.15，
 0.15），Db3rdbar0到10 = c（1.43,1.43,1.43,1.43,1.43,1.43），
 Db3rdbar10到20 = c（1.43,1.43,1.43,1.43,1.43,1.43,1.43,1.43， （1.43,1.43,1.43），Db3rdbar20至50 = c（1.43，
 1.43,1.43,1.43,1.43,1.43），Db3rdbar50至100 = c（1.43，
 1.43,1.43,1.43,1.43,1.43），HydrRatngPP OrgMatter0to10 = c（1.25,1.25,1.25,1.25,1.25，
 1.25），OrgMatter10 to 20 = c（1.25,1.25,1.25 ，1.25,1.25，
 1.25），OrgMatter20to50 = c（1.02,1.02,1.02,1.02,1.02，
 1.02），OrgMatter50to100 = c（0.12,0.12,0.12,0.12,0.12，
 0.12）， Clay 0 to 10 = c（8,8,8,8,8,8），Clay 10 to 20 = c（8，
 8,8,8,8,8），Clay 20 to 50 = c（9.4,9.4,9.4,9.4， （40，40，40，40，40，40），Sand 0 to 10 = c（85，
 85,85,85,85,85），Sand 10 to 20 = c （85,85,85,85,85,85 
），Sand20to50 = c（83,83,83,83,83,83），Sand50to100 = c（45.8，
45.8,45.8,45.8 （23,23,23,23,23,23 
），Ksat10至20 = 25（c = 6.3，6.3，6.3，6.3），Ksat0至10 = （23,23,23,23,23,23），Ksat20to50 = c（19.7333，
 19.7333,19.7333,19.7333,19.7333,19.7333），Ksat50to100 = c（9，
 9,9， 9，9，9），TaxClName = c（精细，混合，semiactive，mesic Oxyaquic Hapludults，
细，混合，semiactive，mesic Oxyaquic Hapludults，细，混合，semiactive，mesic Oxyaquic Hapludults ，
好的，混合的，半的，mesic Oxyaquic Hapludults，Fine，mixed，semiactive，mesic Oxyaquic Hapludults，
精细，混合，semiactive，mesic Oxyaqu icap Hapludults），GreatGroup = c（Hapludults，
Hapludults，Hapludults，Hapludults，Hapludults，Hapludults
）），.Names = c（ ，SID，Groupdepth，AWC0to10，AWC10to20，
AWC20to50，AWC50to100，Db3rdbar0to10，Db3rdbar10to20，
Db3rdbar20to50，Db3rdbar50to100 HydrRatngPP，OrgMatter0to10，
OrgMatter10to20，OrgMatter20to50，OrgMatter50to100，Clay0to10，
Clay10to20，Clay20to50，Clay50to100，Sand0to10，Sand10to20 ，
Sand20to50，Sand50to100，pHwater0to20，Ksat0to10，Ksat10to20，
Ksat20to50，Ksat50to100，TaxClName，GreatGroup），class = c tbl_df，
data.frame），row.names = c（NA，-6L））

解决方案

有几个选项，其中一些是我在上面的注释中发布的。
$ b 所有选项都假定替换匹配模式的字符串只是模式。如果你想要其他的东西，他们都可以很容易编辑，包括单独的替换值。

选项1： for + grepl

使用与原始代码相同的代码，但循环避免重复代码：
＃列出模式 pat <-c（'udults'，'aquults'，'aqualfs'，'humods' ，'udalfs'，'orthods'，'psamments'，'udepts'，'fluvents'，'aquods'） SubOrder< - rep（NA_character_，length（myData）） for（x in 1：length（pat））{ SubOrder [grepl（pat [x]，myData $ GreatGroup，ignore.case = TRUE）]< - pat [x] }

选项2： for + gsub

复制 myData $ GreatGroup ，然后使用 gsub 进行修改。
myData $ SubOrder< - myData $ GreatGroup 对于粘贴的额外正则表达式包含同一字符串中的字符。（x in pat）{ myData $ SubOrder< - gsub（paste0（'。*'，x，'。*'），x，myData $ SubOrder，ignore.case = TRUE） }
请注意， pat 的值将来自 GreatGroup ，而不是 NA 。如果您希望它们是 NA ，请修正它们。

myData $ SubOrder [！（myData $ SubOrder％in％pat）] < - NA

选项3：命名列表+ stringr :: str_replace_all

我最喜欢的，因为它不会't循环，尽管它需要 stringr 包（无论如何，这真是太棒了）。

列表来自 pat ，其中名称是要替换的正则表达式，并且该项目是要匹配的字符串：
l < - as.list（pat）名称（l）< - paste0（'。*'，pat，'。*'）
所以它看起来像
> l $`。* udults。*` [1]udults $``* aquults。*` [1]aquults $``* aqualfs。*` [1]aqualfs ......
$ b $然后使用 str_replace_all 来完成所有工作： myData $ SubOrder< - str_replace_all（myData $ GreatGroup，l） Boom。注1： str_replace_all 没有 ignore.case 选项，但您可以在 tolower myData $ GreatGroup $ c>（easy）或重新配置正则表达式（hard）。注2：与选项2 一样，它将不匹配项作为 GreatGroup 中的值，因此请使用该选项末尾的行来返回 NA s，如果你喜欢。

I'm trying to add a new column based on another using pattern matching. 
I've read this post, but not getting the desired output. 

I want to create a new column (SubOrder) based on the GreatGroup column. 
I have tried the following: 
SubOrder <- rep(NA_character_, length(myData))

SubOrder[grepl("udults", myData, ignore.case = TRUE)] <-  "Udults"
SubOrder[grepl("aquults", myData, ignore.case = TRUE)] <-  "Aquults"
SubOrder[grepl("aqualfs", myData, ignore.case = TRUE)] <-  "aqualfs"
SubOrder[grepl("humods", myData, ignore.case = TRUE)] <-  "humods"
SubOrder[grepl("udalfs", myData, ignore.case = TRUE)] <-  "udalfs"
SubOrder[grepl("orthods", myData, ignore.case = TRUE)] <-  "orthods"
SubOrder[grepl("udalfs", myData, ignore.case = TRUE)] <-  "udalfs"
SubOrder[grepl("psamments", myData, ignore.case = TRUE)] <-  "psamments"
SubOrder[grepl("udepts", myData, ignore.case = TRUE)] <-  "udepts"
SubOrder[grepl("fluvents", myData, ignore.case = TRUE)] <-  "fluvents"
SubOrder[grepl("aquods", myData, ignore.case = TRUE)] <-  "aquods"
For example, I'm looking for "udults" inside any word, such as Hapludults or Paleudults, and return just "udults". 

EDIT: If anyone wants to take a shot at  alistaire's comment, this is the search patterns I would use. 
 subOrderNames <- c("Udults", "Aquults", "Aqualfs", "Humods", "Udalfs", "Orthods", "Psamments", "Udepts", "fluvents")
Example data below. 
myData <- dput(head(test))
structure(list(1:6, SID = c(200502L, 200502L, 200502L, 200502L, 
200502L, 200502L), Groupdepth = c(11L, 12L, 13L, 14L, 21L, 22L
), AWC0to10 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), AWC10to20 = c(0.12, 
0.12, 0.12, 0.12, 0.12, 0.12), AWC20to50 = c(0.12, 0.12, 0.12, 
0.12, 0.12, 0.12), AWC50to100 = c(0.15, 0.15, 0.15, 0.15, 0.15, 
0.15), Db3rdbar0to10 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), 
    Db3rdbar10to20 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), Db3rdbar20to50 = c(1.43, 
    1.43, 1.43, 1.43, 1.43, 1.43), Db3rdbar50to100 = c(1.43, 
    1.43, 1.43, 1.43, 1.43, 1.43), HydrcRatngPP = c(0L, 0L, 0L, 
    0L, 0L, 0L), OrgMatter0to10 = c(1.25, 1.25, 1.25, 1.25, 1.25, 
    1.25), OrgMatter10to20 = c(1.25, 1.25, 1.25, 1.25, 1.25, 
    1.25), OrgMatter20to50 = c(1.02, 1.02, 1.02, 1.02, 1.02, 
    1.02), OrgMatter50to100 = c(0.12, 0.12, 0.12, 0.12, 0.12, 
    0.12), Clay0to10 = c(8, 8, 8, 8, 8, 8), Clay10to20 = c(8, 
    8, 8, 8, 8, 8), Clay20to50 = c(9.4, 9.4, 9.4, 9.4, 9.4, 9.4
    ), Clay50to100 = c(40, 40, 40, 40, 40, 40), Sand0to10 = c(85, 
    85, 85, 85, 85, 85), Sand10to20 = c(85, 85, 85, 85, 85, 85
    ), Sand20to50 = c(83, 83, 83, 83, 83, 83), Sand50to100 = c(45.8, 
    45.8, 45.8, 45.8, 45.8, 45.8), pHwater0to20 = c(6.3, 6.3, 
    6.3, 6.3, 6.3, 6.3), Ksat0to10 = c(23, 23, 23, 23, 23, 23
    ), Ksat10to20 = c(23, 23, 23, 23, 23, 23), Ksat20to50 = c(19.7333, 
    19.7333, 19.7333, 19.7333, 19.7333, 19.7333), Ksat50to100 = c(9, 
    9, 9, 9, 9, 9), TaxClName = c("Fine, mixed, semiactive, mesic Oxyaquic Hapludults", 
    "Fine, mixed, semiactive, mesic Oxyaquic Hapludults", "Fine, mixed, semiactive, mesic Oxyaquic Hapludults", 
    "Fine, mixed, semiactive, mesic Oxyaquic Hapludults", "Fine, mixed, semiactive, mesic Oxyaquic Hapludults", 
    "Fine, mixed, semiactive, mesic Oxyaquic Hapludults"), GreatGroup = c("Hapludults", 
    "Hapludults", "Hapludults", "Hapludults", "Hapludults", "Hapludults"
    )), .Names = c("", "SID", "Groupdepth", "AWC0to10", "AWC10to20", 
"AWC20to50", "AWC50to100", "Db3rdbar0to10", "Db3rdbar10to20", 
"Db3rdbar20to50", "Db3rdbar50to100", "HydrcRatngPP", "OrgMatter0to10", 
"OrgMatter10to20", "OrgMatter20to50", "OrgMatter50to100", "Clay0to10", 
"Clay10to20", "Clay20to50", "Clay50to100", "Sand0to10", "Sand10to20", 
"Sand20to50", "Sand50to100", "pHwater0to20", "Ksat0to10", "Ksat10to20", 
"Ksat20to50", "Ksat50to100", "TaxClName", "GreatGroup"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -6L))

 解决方案 
A few options, some of which I posted in the comments above.

Note: All options assume the replacement for the strings that match patters are just the pattern. If you want something else, they're all easily editable to include separate replacement values.

Option 1: for + grepl

Using the same code as the original, but looping to avoid repetitive code:
# make a list of patterns
pat <- c('udults', 'aquults', 'aqualfs', 'humods', 'udalfs', 'orthods', 'psamments', 'udepts', 'fluvents', 'aquods')

SubOrder <- rep(NA_character_, length(myData))

for(x in 1:length(pat)){
  SubOrder[grepl(pat[x], myData$GreatGroup, ignore.case = TRUE)] <-  pat[x]
}




Option 2: for + gsub

Build the new column in place by copying myData$GreatGroup and then altering it with gsub. The extra regex pasted on includes characters within the same string.
myData$SubOrder <- myData$GreatGroup
for(x in pat){
  myData$SubOrder <- gsub(paste0('.*', x, '.*'), x, myData$SubOrder, ignore.case = TRUE)
}
Note that values not matched by one of the strings in pat will have the value from GreatGroup, not NA. If you want them to be NA, fix them with
myData$SubOrder[!(myData$SubOrder %in% pat)] <- NA




Option 3: named list + stringr::str_replace_all

My favorite because it doesn't loop, although it requires the stringr package (which is pretty awesome, anyway).

Make a named list from pat, where the name is the regex you want to replace, and the item is the string to match:
l <- as.list(pat)
names(l) <- paste0('.*', pat, '.*')
so it looks like
> l
$`.*udults.*`
[1] "udults"

$`.*aquults.*`
[1] "aquults"

$`.*aqualfs.*`
[1] "aqualfs"
......
Then use str_replace_all to DO IT ALL AT ONCE:
myData$SubOrder <- str_replace_all(myData$GreatGroup, l)
Boom.

Note 1: str_replace_all doesn't have an ignore.case option, but you can wrap myData$GreatGroup in tolower (easy) or reconfigure the regex (hard).

Note 2: Like Option 2, it leaves unmatched entries as the value from GreatGroup, so use the line at the end of that option to go back to NAs, if you like.

                        这篇关于使用R中的模式匹配从现有列创建新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用R中的模式匹配从现有列创建新列 [英] Create new column from an existing column with pattern matching in R

问题描述

选项1： `for` + `grepl`

选项2： `for` + `gsub`

选项3：命名列表+ `stringr :: str_replace_all`

Option 1: `for` + `grepl`

Option 2: `for` + `gsub`

Option 3: named list + `stringr::str_replace_all`

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用R中的模式匹配从现有列创建新列 [英] Create new column from an existing column with pattern matching in R

问题描述

选项1： for + grepl

选项2： for + gsub

选项3：命名列表+ stringr :: str_replace_all

Option 1: for + grepl

Option 2: for + gsub

Option 3: named list + stringr::str_replace_all

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

选项1： `for` + `grepl`

选项2： `for` + `gsub`

选项3：命名列表+ `stringr :: str_replace_all`

Option 1: `for` + `grepl`

Option 2: `for` + `gsub`

Option 3: named list + `stringr::str_replace_all`

登录关闭