多列上的 R substr 函数 [英] R substr function on multiple columns

查看:21
本文介绍了多列上的 R substr 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 3 列.第一列具有唯一 ID,第二列和第三列具有字符串数据和一些 NA 数据.我需要从第 2 列中提取信息并将其放在单独的列中,并对第 3 列执行相同的操作.我正在构建一个函数,如下所示,使用 for 循环.我需要在第三个字母后拆分列.[例如在下面的V1列中,我需要将AAAbbb拆分为AAA和bbb并将它们放在单独的列中.我知道我可以使用 substr 来做到这一点.我是 R 新手,请帮忙.

I have 3 columns. First column has unique ID, second and third columns have string data and some NA data. I need to extract info from column 2 and put it in separate columns and do the same thing for column 3. I am building a function as follows, using for loops. I need to split the columns after the third letter. [For example in the V1 column below, I need to break AAAbbb as AAA and bbb and put them in separate columns. I know I can use substr to do this. I am new to R, please help.

UID * V1 * V2 *

UID * V1 * V2 *

Z001NL * AAAbbb * IADSFO *

Z001NL * AAAbbb * IADSFO *

Z001NP * IADSFO * NA *

Z001NP * IADSFO * NA *

Z0024G * SFOHNL * NLSFO0 *

Z0024G * SFOHNL * NLSFO0 *

这是我的代码.

test=read.csv("c:/some/path/in/windows/test.csv", header=TRUE)

substring_it = function(test)
{
for(i in 1:3){
for(j in 2:3){
answer = transform(test, code 1 = substr((test[[j,i]]), 1, 3), code2 = substr((test[j,i]), 4, 6))

}
}
return(answer)

}

hello = substring_it(test)

test 将是我将读入的数据框.

test will be my data frame that I will read in.

我需要这个作为我的输出

I need this as my output

UID * V1.1 * V1.2 * V2.1 * V2.2

UID * V1.1 * V1.2 * V2.1 * V2.2

Z001NL * AAA * bbb * IAD * SFO

Z001NL * AAA * bbb * IAD * SFO

Z001NP * IAD * SFO * NA * NA

Z001NP * IAD * SFO * NA * NA

Z0024G * SFO * HNL * NLS * SFO

Z0024G * SFO * HNL * NLS * SFO

推荐答案

您可以使用 sapply 将函数应用于向量的每个元素 - 这在这里可能很有用,因为您可以使用 sapply在原始数据框(测试)的列上为新数据框创建列.

You can use sapply to apply a function to each element of a vector - this could be useful here, since you could use sapply on the columns of your original data frame (test) to create the columns for your new data frame.

这是一个解决方案:

test = data.frame(UID = c('Z001NL', 'Z001NP', 'Z0024G'), 
  V1 = c('AAAbbb', 'IADSFO', 'SFOHNL'),
  V2 = c('IADSFO', NA, 'NLSFO0'))

substring_it = function(x){
  # x is a data frame
  c1 = sapply(x[,2], function(x) substr(x, 1, 3))
  c2 = sapply(x[,2], function(x) substr(x, 4, 6))
  c3 = sapply(x[,3], function(x) substr(x, 1, 3))
  c4 = sapply(x[,3], function(x) substr(x, 4, 6))
  return(data.frame(UID=x[,1], c1, c2, c3, c4))
}

substring_it(test)
# returns:
#     UID  c1  c2   c3   c4
#1 Z001NL AAA bbb  IAD  SFO
#2 Z001NP IAD SFO <NA> <NA>
#3 Z0024G SFO HNL  NLS  FO0

如果您必须多次执行此操作,这是一种循环列的方法.我不确定原始数据框的列的顺序以及您希望新数据框的列以什么顺序结束,因此您可能需要使用pos"计数器.我还假设要拆分的列是第 2 到 201 列(colindex"),因此您可能需要更改它.

here's a way to loop over columns if you have to do this a bunch of times. I'm not sure what order your original data frame's columns are in and what order you want the new data frame's columns to end up in, so you may need to play around with the "pos" counter. I also assumed the columns to be split were columns 2 thru 201 ("colindex"), so you'll probably have to change that.

newcolumns = list()
pos = 1 #counter for column index of new data frame
for(colindex in 2:201){
    newcolumns[[pos]] = sapply(test[,colindex], function(x) substr(x, 1, 3))
    newcolumns[[pos+1]] = sapply(test[,colindex], function(x) substr(x, 4, 6))
    pos = pos+2
}
newdataframe = data.frame(UID = test[,1], newcolumns)
# update "names(newdataframe)" as needed

这篇关于多列上的 R substr 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆