在列和名称中拆分字符 [英] Split character in column and name

查看:92
本文介绍了在列和名称中拆分字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拆分字符。虽然我有一个大的数据帧工作,下面的小例子显示需要做什么。

  mydf<  -  data.frame(name = c(L1,L2,L3),
M1 = c(AC,AT,NA),M2 = c(CC, - ,TC),M3 = c(AT,TT ))

我想将变量M1到M3的字符分割(在实际数据集中我有> 6000个变量)

 名称M1a M1b M2a M2b M3a M3b 
L1 ACCCAT
L2 AT - - TT
L3 NA NA TCAG

我尝试了以下代码:

  func<  -  function(x){sapply(strsplit(x,),
match,table = c(A ,T,G, - ,NA))}

odataframe< - data.frame(apply(mydf,1,func))
colnames odataframe)< - 粘贴(rep(names(mydf),each = 2),c(a,b),sep =)
odataframe


解决方案

你去:

  splitCol<  -  function(x){ 
x< - as.character(x)
x [is.na(x)]< - $$
z< - matrix(unlist(strsplit(x,split = )),ncol = 2,byrow = TRUE)
z [z ==$]< - NA
z
}


newdf < - as.data.frame(do.call(cbind,lapply(mydf [,-1],splitCol)))
名称(newdf)< - paste(rep(names(mydf [,-1 ]),每个= 2),c(a,b),sep =)
newdf< - data.frame(mydf [,1,drop = FALSE],newdf)

newdf
名称M1a M1b M2a M2b M3a M3b
1 L1 ACCCAT
2 L2 AT - - TT
3 L3< NA> < NA T C A G


I want to split characters. Although I have a large dataframe to work, the following small example to show what need to be done.

  mydf <- data.frame (name = c("L1", "L2", "L3"), 
    M1 = c("AC", "AT", NA), M2 = c("CC", "--", "TC"), M3 = c("AT", "TT", "AG"))

I want to split the characters for variables M1 to M3 (in real dataset I have > 6000 variables)

  name  M1a M1b   M2a M2b  M3a  M3b 
   L1   A    C    C    C    A     T
   L2   A    T    -    -    T     T
   L3   NA   NA   T     C    A     G

I tried the following codes:

func<- function(x) {sapply( strsplit(x, ""),
                     match, table= c("A","C","T","G", "--", NA))}

odataframe <- data.frame(apply(mydf, 1, func) )
colnames(odataframe) <-  paste(rep(names(mydf), each = 2), c("a", "b"), sep = "")
odataframe

解决方案

Here you go:

splitCol <- function(x){
  x <- as.character(x)
  x[is.na(x)] <- "$$"
  z <- matrix(unlist(strsplit(x, split="")), ncol=2, byrow=TRUE)
  z[z=="$"] <- NA
  z
}


newdf <- as.data.frame(do.call(cbind, lapply(mydf[, -1], splitCol)))
names(newdf) <- paste(rep(names(mydf[, -1]), each=2), c("a", "b"), sep="")
newdf <- data.frame(mydf[, 1, drop=FALSE], newdf)

newdf
  name  M1a  M1b M2a M2b M3a M3b
1   L1    A    C   C   C   A   T
2   L2    A    T   -   -   T   T
3   L3 <NA>  <NA   T   C   A   G

这篇关于在列和名称中拆分字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆