用子字符串替换数据帧的rownames [英] Replacing rownames of data frame by a sub-string

查看:73
本文介绍了用子字符串替换数据帧的rownames的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的数据框(名为test)不同的rownames。

 > rownames(test)
[1]U2OS.EV.2.7.9U2OS.PIM.2.7.9U2OS.WDR.2.7.9U2OS.MYC.2.7.9
[5]U2OS.OBX.2.7.9U2OS.EV.18.6.9U2O2.PIM.18.6.9U2OS.WDR.18.6.9
[9]U2OS。 MYC.18.6.9U2OS.OBX.18.6.9X1.U2OS ... OBXX2.U2OS ... MYC
[13]X3.U2OS ... WDR82 X4.U2OS ... PIMX5.U2OS ... EVexp1.U2OS.EV
[17]exp1.U2OS.MYCEXP1.U20S..PIM1EXP1.U2OS .WDR82EXP1.U20S.OBX
[21]EXP2.U2OS.EVEXP2.U2OS.MYCEXP2.U2OS.PIM1EXP2.U2OS.WDR82
[ 25]EXP2.U2OS.OBX

如您所见,部分行名称相同的部分名称。例如,每一行名称都是 MYC 我想将整个行名更改为MYC。总体而言,行名称包含5个因素: MYC EV PIM WDR OBX

解决方案

正如@teucer指出的那样,你不能有重复的行名。相反,您在数据框架中创建一个新列,并使用简单的正则表达式来提取您的因素。例如,

  ##您的行名称
x = c(U2OS.EV.2.7.9, U2OS.PIM.2.7.9,U2OS.WDR.2.7.9,U2OS.MYC.2.7.9,
U2OS.OBX.2.7.9,U2OS.EV.18.6。 9,U2O2.PIM.18.6.9,U2OS.WDR.18.6.9,
U2OS.MYC.18.6.9,U2OS.OBX.18.6.9,X1。 U2OS ... OBX,X2.U2OS ... MYC)

test $ rnames = gsub(。*(MYC | EV | PIM | WDR | OBX) \\1,x)


I have a large dataframe (named test) with different rownames.

> rownames(test)
[1] "U2OS.EV.2.7.9"   "U2OS.PIM.2.7.9"  "U2OS.WDR.2.7.9"  "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9"  "U2OS.EV.18.6.9"  "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX"   "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM"   "X5.U2OS...EV"    "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC"   "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV"    "EXP2.U2OS.MYC"   "EXP2.U2OS.PIM1"  "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"

As you could see, part of the row names have the same partial name. For example every row with partial name MYC I want to change the whole rowname into "MYC". Overall the row names contain 5 factors: MYC, EV, PIM, WDR and OBX.

解决方案

As @teucer points out, you can't have duplicate row names. Instead, you create a new column in your data frame and use a simple regular expression to extract your factors. For example,

## Your row names
x = c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
      "U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
      "U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")

test$rnames = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", x)

这篇关于用子字符串替换数据帧的rownames的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆