将数据框中的括号之间的文本提取到数据框中的新列中 [英] Extracting text between parenthesis in columns in dataframe into new columns in dataframes

查看:322
本文介绍了将数据框中的括号之间的文本提取到数据框中的新列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为原因的数据框,其中某些行的列中有带括号的数字.格式是这样的.

I have a dataframe called reasons with columns where in some rows, there is text that have numbers in parenthesis. The format is like this.

concern                          notaware           scenery
(2) chat community (4) more      
(1) didn't know                  (1) beautiful      (3) stunning
(3) often                                           (1) always

可复制的版本:

structure(list(concern = c("(2) chat community (4) more", "(1) didn't know", 
"(3) often"), notaware = c("", "(1) beautiful", ""), scenery = c("", 
"(3) stunning", "(1) always")), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

我想要一个仅带括号和数字的新数据框

I want a new data frame with just the parenthesis and numbers

concern                          notaware            scenery
(2) (4) 
(1)                             (1)                (3) 
(3)                             (1) 

我意识到这里有一个类似的问题,但是数据不在列中

I realise there is a similar question here but the data is not in a column

使用R将数据提取到新列中

,但这似乎不适用于数据框

and this but it doesn't seem to apply to a dataframe

在R中所有括号内提取信息

从我所查找的问题中,我尝试解决一种变通方法.我尝试过

From the questions I've looked up I've tried to cobble a workaround. I tried

reasons %>% mutate(concern1 = str_match(concern, pattern = "\\(.*?\\)"))

这导致数据帧保持不变.

Which resulted in an unchanged dataframe.

还有这个

reasons$concern1 <- sub(regmatches(reasons$concern, gregexpr(pat, reasons$concern, perl=TRUE)))

哪个想出了这个

Error in sub(regmatches(UltraCodes$concern, gregexpr(pat, 
UltraCodes$concern,  : 
argument "x" is missing, with no default

我看着这个,我知道它是第二个问题的重复,但是对我来说更有意义.

I looked at this which I know is a duplicate of the second question but it made more sense to me.

使用R解析并返回括号中的文本

我用过

pat <- "(?<=\\()([^()]*)(?=\\))"
concern1 <- regmatches(reasons$concern, gregexpr(pat, reasons$concern, 
perl=TRUE))

这给了我一个带有名称,类型和值的列表-尽管值是'2'而不是(2),但这些值仍然是我想要的

This gives me a list with a name and a type and a value - the values are what I want even though its '2' rather than (2)

所以我认为我可以创建多个列表,然后尝试将它们放入一个数据帧中,这样我就可以在notaware列之外创建一个notaware1列表.我有一种感觉,我尝试时空白值正在抛出东西

So I figure I can make multiple lists and try to put them into a dataframe so I make a list notaware1 out of column notaware and so on. I have a feeling that the blank values are throwing things of as I try

reasons1 <-data.frame(concern1, notaware1)
reasons1 <-as.data.frame(concern1, notaware1)

哪个给我

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = 
TRUE,  : 
arguments imply differing number of rows: 0, 1, 2

我不太了解,因为我所有的名单都是一样的,我觉得我对这里的一些基本知识误解了.

Which I don't quite understand as all my lists are the same lengths, I feel I'm misunderstanding some fundamentals here.

接下来,我想我可以通过将列表导出到csv来进行环绕操作,但是我发现的答案似乎是要我首先将列表转换为数据帧,这是我的问题.

Next I thought I could do a wrap around by exporting the list to csv, but the answers I've found seem to want me to turn the list into a dataframe first, which is my problem.

然后我找到了

reasons$concern3 <-paste(concern1)

哪个确实将列表添加到了我的数据框中,我可以对所有列表重复此操作.

Which does add the list to my dataframe, and I can repeat this for all my lists.

但是有点麻烦,因为现在空白以character(0)给出,一个括号是单个数字,而其中两个括号是c("2","9"),所以我的列现在看起来像这样

However it is a bit messy as blanks are now given as character(0), one bracket is single numbers and where there are two brackets is c("2", "9") so my columns now look like this

concern                          adventure          scenery
c("2", "9")                      character(0)       character(0)
1                                1                  3
3                                1                  character(0)

但是我有一些东西可以放入csv文件中进行整理.

But I have something that I can put into a csv file to tidy.

有没有更简单的方法?

推荐答案

您是否在寻找:

 data.frame(gsub("[^()0-9]","",as.matrix(dat)))

  concern notaware scenery
1  (2)(4)                 
2     (1)      (1)     (3)
3     (3)              (1)

编辑

 data.frame(gsub("(?<!\\))(?:\\w+|[^()])(?!\\))","",as.matrix(dat),perl=T))
   concern notaware scenery
1 (2) (4)                  
2     (1)      (1)     (3) 
3     (3)              (1) 

这篇关于将数据框中的括号之间的文本提取到数据框中的新列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆