拆分字符列并获取字符串中的字段名称 [英] split character columns and get names of field in string

查看：36 发布时间：2021/4/28 19:36:24 r data.table reshape

本文介绍了拆分字符列并获取字符串中的字段名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要将包含信息的列拆分为几列.
我会使用 tstrsplit ，但是相同的信息在行之间的顺序并不相同，因此我需要在变量中提取新列的名称.重要信息:可能有很多信息(字段变成新变量)，我不知道所有这些信息，因此，我不需要逐字段"解决方案.

I need to split a column that contains information into several columns.
I'd use tstrsplit but the same kind of information is not in the same order among the rows and I need to extract the name of the new column within the variable. Important to know: there can be many pieces of information (fields to become new variables) and I don't know all of them, so I don't want a "field by field" solution.

下面是我所拥有的一个例子:

Below is an example of what I have:

library(data.table)

myDT <- structure(list(chr = c("chr1", "chr2", "chr4"), pos = c(123L,
                  435L, 120L), info = c("type=3;end=4", "end=6", "end=5;pos=TRUE;type=2"
                  )), class = c("data.table", "data.frame"), row.names = c(NA,-3L))

#    chr pos                  info
#1: chr1 123          type=3;end=4
#2: chr2 435                 end=6
#3: chr4 120 end=5;pos=TRUE;type=2

我想得到:

#    chr pos end  pos type
#1: chr1 123   4 <NA>    3
#2: chr2 435   6 <NA> <NA>
#3: chr4 120   5 TRUE    2

最简单的方法将不胜感激！(注意:我不愿意采用dplyr/tidyr方式)

A most straightforward way to get that would be much appreciated! (Note: I'm not willing to go with a dplyr/tidyr way)

推荐答案

使用 regex 和 stringi 软件包:

setDT(myDT) # After creating data.table from structure()

library(stringi)

fields <- unique(unlist(stri_extract_all(regex = "[a-z]+(?==)", myDT$info)))
patterns <- sprintf("(?<=%s=)[^;]+", fields)
myDT[, (fields) := lapply(patterns, function(x) stri_extract(regex = x, info))]
myDT[, !"info"]

    chr  pos type end
1: chr1 <NA>    3   4
2: chr2 <NA> <NA>   6
3: chr4 TRUE    2   5

要获取正确的类型，请使用似乎(?) type.convert():

To get the correct type it seems (?) type.convert() can be used:

myDT[, (fields) := lapply(patterns, function(x) type.convert(stri_extract(regex = x, info), as.is = TRUE))]

这篇关于拆分字符列并获取字符串中的字段名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

拆分字符列并获取字符串中的字段名称 [英] split character columns and get names of field in string

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

拆分字符列并获取字符串中的字段名称 [英] split character columns and get names of field in string

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭