R:如何根据规范更改数据框中的列名 [英] R: How to change the column names in a data frame based on a specification
问题描述
我有一个数据框,开头在下面:
I have a data frame, the start of it is below:
SM_H1455 SM_V1456 SM_K1457 SM_X1461 SM_K1462
ENSG00000000419.8 290 270 314 364 240
ENSG00000000457.8 252 230 242 220 106
ENSG00000000460.11 154 158 162 136 64
ENSG00000000938.7 20106 18664 19764 15640 19024
ENSG00000000971.11 30 10 4 2 10
请注意,还有更多的列和行。
Note that there are many more cols and rows.
这是我要做的:我想更改列的名称。列名称中最重要的信息,例如SM_H1455,是字符串的第4个字符。在这种情况下,它是一个H.我想做的是如果第四个字符为H或K,将SM部分更改为控制,如果第四列为X,则将情况更改为V。我想保留所有其他的名字。所以最后我想要一张这样的表:
Here's what I want to do: I want to change the name of the columns. The most important information in a column's name, e.g. SM_H1455, is the 4th character of the character string. In this case it's a H. What I want to do is to change the "SM" part to "Control" if the 4th character is "H" or "K", and "Case" if the 4th column is "X" or "V". I'd like to keep everything else in the name. So that in the end, I'd like a table like this:
Control_H1455 Case_V1456 Control_K1457 Case_X1461 Control_K1462
ENSG00000000419.8 290 270 314 364 240
ENSG00000000457.8 252 230 242 220 106
ENSG00000000460.11 154 158 162 136 64
ENSG00000000938.7 20106 18664 19764 15640 19024
ENSG00000000971.11 30 10 4 2 10
请记住,第四个字符是否为V,X ,K或H是完全随机的。
Please keep in mind that whether the 4th character is "V", "X", "K" or "H" is completely random.
感谢任何帮助!谢谢。
推荐答案
一种方式,其中 x
是您的df: / p>
One way, where x
is your df:
controls <- which(substring(names(x),4,4) %in% c("H","K"))
cases <- which(substring(names(x),4,4) %in% c("X","V"))
names(x)[controls] <- gsub("SM","Control",names(x)[controls])
names(x)[cases] <- gsub("SM","Case",names(x)[cases])
或者:
names(x) <- sapply(names(x),function(z) {
if(substring(z,4,4) %in% c("H","K"))
sub("SM","Control",z)
else if(substring(z,4,4) %in% c("X","V"))
sub("SM","Case",z)
})
这篇关于R:如何根据规范更改数据框中的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!