在 R 的列中分解(融化)文本数据? [英] Breaking up (melting) text data in a column in R?

查看:26
本文介绍了在 R 的列中分解(融化)文本数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下格式数据的 csv 文件:

I have a csv file which contains data in the following format:

PrjID目标
1001 , (i) 提高效率 (ii) 降低成本 (iii) 最大化收益
1002 , a) 玩得开心 b) 学习新事物
1003、(1)变得棘手(2)挑战任务

PrjID, Objective
1001 , (i) To improve efficiency (ii) Decrease cost (iii) Maximize revenue
1002 , a) Have fun b) Learn new things
1003 , (1) Getting tricky (2) Challanging task

第一个变量是一个Id,第二个变量是一个文本变量objective".每个项目在单个列中都有关于多个目标的数据,按 (i), (ii), ..etc 或 (a), (b), (c),..etc, or (1), (2),(3)、..等现在我想要为项目的每个目标创建一个观察.很像这样:

First variable is an Id and the second variable is a text variable "objective". Each project has data on multiple objectives in a single column seperate by (i), (ii), ..etc or (a), (b), (c),..etc, or (1), (2), (3), ..etc. Now I want an observation created for each objective of the projects. Much like this:

PrjID目标
1001, (i) 提高效率
1001 , (ii) 降低成本
1001 , (iii) 最大化收益
1002,a)玩得开心
1002, b) 学习新事物
1003、(1)变本加厉
1003、(2)挑战任务

PrjID, Objective
1001 , (i) To improve efficiency
1001 , (ii) Decrease cost
1001 , (iii) Maximize revenue
1002 , a) Have fun
1002 , b) Learn new things
1003 , (1) Getting tricky
1003 , (2) Challanging task

对于只有一个目标的项目,它只有一行.但是对于多个目标,它会拆分观察.

For the projects that have just one objective, it has just one row. But for multiple objectives it splits up the observation.

我对在 R 中处理文本数据很陌生,一些 R 专业人士可以帮助我开始解决这个问题吗?提前致谢!

I am quite new to handling text data in R, can some R pro help me get started with this problem? Thanks in advance!

推荐答案

这里有一个想法.

  1. 使用巧妙的正则表达式在 Objective 列中插入新的分隔符
  2. strsplit 中使用此分隔符将句子拆分为向量
  3. 使用 by ,通过 ID 处理前面的步骤.
  1. Insert a new delimiter in your Objective column, using a clever regular expression
  2. Use this delimiter within strsplit to split the sentence in a vector
  3. Using by , to process the previous steps by ID.

按照这个步骤,我得到这个代码:

Following this steps , I get this code:

ll <- by(dat,dat$PrjID,FUN = function(x){
        x.delim <- gsub(" (\\(?[a-x,0-9]*\\))",'#\\1',x$Objective)
        obj  = unlist(strsplit(x.delim,'#'))
        data.frame(PrjID= x$PrjID,objective=obj[-1])
})
## transform your list to a data.frame
do.call(rbind,ll)

      PrjID                 objective
1001.1  1001 (i) To improve efficiency
1001.2  1001        (ii) Decrease cost
1001.3  1001   (iii) Maximize revenue 
1002.1  1002               a) Have fun
1002.2  1002      b) Learn new things 
1003.1  1003        (1) Getting tricky
1003.2  1003      (2) Challanging task

PS,这里dat是:

dat <- read.table(text='PrjID, Objective 
1001 , (i) To improve efficiency (ii) Decrease cost (iii) Maximize revenue 
1002 , a) Have fun b) Learn new things 
1003 , (1) Getting tricky (2) Challanging task',sep=',',header=TRUE)

这篇关于在 R 的列中分解(融化)文本数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆