R:如何根据规范更改数据框中的列名 [英] R: How to change the column names in a data frame based on a specification

查看:807
本文介绍了R:如何根据规范更改数据框中的列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,开头在下面:

I have a data frame, the start of it is below:

                                SM_H1455          SM_V1456          SM_K1457      SM_X1461          SM_K1462
ENSG00000000419.8                290               270               314               364               240
ENSG00000000457.8                252               230               242               220               106
ENSG00000000460.11               154               158               162               136                64
ENSG00000000938.7              20106             18664             19764             15640             19024
ENSG00000000971.11                30                10                 4                 2                10

请注意,还有更多的列和行。

Note that there are many more cols and rows.

这是我要做的:我想更改列的名称。列名称中最重要的信息,例如SM_H1455,是字符串的第4个字符。在这种情况下,它是一个H.我想做的是如果第四个字符为H或K,将SM部分更改为控制,如果第四列为X,则将情况更改为V。我想保留所有其他的名字。所以最后我想要一张这样的表:

Here's what I want to do: I want to change the name of the columns. The most important information in a column's name, e.g. SM_H1455, is the 4th character of the character string. In this case it's a H. What I want to do is to change the "SM" part to "Control" if the 4th character is "H" or "K", and "Case" if the 4th column is "X" or "V". I'd like to keep everything else in the name. So that in the end, I'd like a table like this:

                        Control_H1455          Case_V1456        Control_K1457      Case_X1461        Control_K1462
ENSG00000000419.8                290               270               314               364               240
ENSG00000000457.8                252               230               242               220               106
ENSG00000000460.11               154               158               162               136                64
ENSG00000000938.7              20106             18664             19764             15640             19024
ENSG00000000971.11                30                10                 4                 2                10

请记住,第四个字符是否为V,X ,K或H是完全随机的。

Please keep in mind that whether the 4th character is "V", "X", "K" or "H" is completely random.

感谢任何帮助!谢谢。

推荐答案

一种方式,其中 x 是您的df: / p>

One way, where x is your df:

controls <- which(substring(names(x),4,4) %in% c("H","K"))
cases <- which(substring(names(x),4,4) %in% c("X","V"))
names(x)[controls] <- gsub("SM","Control",names(x)[controls])
names(x)[cases] <- gsub("SM","Case",names(x)[cases])

或者:

names(x) <- sapply(names(x),function(z) {
    if(substring(z,4,4) %in% c("H","K"))
        sub("SM","Control",z)
    else if(substring(z,4,4) %in% c("X","V"))
        sub("SM","Case",z)
})

这篇关于R:如何根据规范更改数据框中的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆