如何用R中的查询代码用字符串替换列 [英] How to replace column with strings with look-up codes in R

查看:222
本文介绍了如何用R中的查询代码用字符串替换列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下,我有一个带有字符串列的数据帧或数据表,其中一行看起来像这样:

Imagine that I have a dataframe or datatable with strings column where one row looks like this:

a1; b: b1, b2, b3; c: c1, c2, c3; d: d1, d2, d3, d4

以及一个查找表,其中包含用于映射每个这些字符串。例如:

and a look-up table with codes for mapping each of these strings. For example:

string code
a1     10
b1     20
b2     30
b3     40
c1     50
c2     60
...

我想有一个映射功能,可以将该字符串映射为代码:

I would like to have a mapping function that maps this string to code:

10; b: 20, 30, 40; c: 50, 60, 70; d: 80, 90, 100

我在data.table / data中有一列这些字符串。框架(更多100k),因此任何快速解决方案将不胜感激。
请注意,此字符串长度并不总是相同的...例如,在一行中,我可以包含字符串 a d ,在其他 a f 中。

I have a column of these strings in data.table/data.frame (more tha 100k) so any quick solution would be very appreciated. Note that this string length is not always the same... for example in one row i can have strings a to d, in other a to f.

编辑

我们得到了上述情况的解决方案,但是想象一下我有这样的字符串:

We got the solution for case above, however imagine I have a string like this:

a; b: peter, joe smith, john smith; c: luke, james, john smith

如何替换这些已知的 john史密斯可以根据其属于 b 还是 c 类别使用两个不同的代码?
此外,字符串也可以包含单词,单词之间必须有空格。

How to replace these knowning that john smith can have two different codes depending on whether it belongs to b or c category? Also, string can contain words with space in between them.

EDIT 2

   string     code
    a          10
    peter      20
    joe smith  30
    john smith 40
    luke       50
    james      60
    john smith 70
...

最终解决方案是:

10; b: 20, 30, 40; c: 50, 60, 70

编辑3 为下一个问题打开了一个新问题:
如何用R中的查找代码替换重复的字符串和中间的空格

EDIT 3 As suggested, I have opened a new question for next issue: How to replace repeated strings and space in-between with look-up codes in R

推荐答案

我们可以使用 gsubfn

library(gsubfn)
gsubfn("([a-z]\\d+)", setNames(as.list(df1$code), df1$string), str1)
#[1] "10; b: 20, 30, 40; c: 50, 60, 70; d: 80, 90, 100, 110"






对于已编辑的版本


For the edited version

gsubfn("(\\w+ ?\\w+?)",  setNames(as.list(df2$code), df2$string), str2)
#[1] "a; b: 20, 30, 40; c: 50, 60, 40"



数据



data

str1 <- "a1; b: b1, b2, b3; c: c1, c2, c3; d: d1, d2, d3, d4"
df1 <- structure(list(string = c("a1", "b1", "b2", "b3", "c1", "c2", 
 "c3", "d1", "d2", "d3", "d4"), code = c(10L, 20L, 30L, 40L, 50L, 
 60L, 70L, 80L, 90L, 100L, 110L)), class = "data.frame",
  row.names = c(NA, -11L))

str2 <- "a; b: peter, joe smith, john smith; c: luke, james, john smith"

df2 <- structure(list(string = c("a", "peter", "joe smith", "john smith", 
"luke", "james", "john smith"), code = c(10L, 20L, 30L, 40L, 
50L, 60L, 70L)), class = "data.frame", row.names = c(NA, -7L))

这篇关于如何用R中的查询代码用字符串替换列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆