如何更改数据表中的几个列名称 [英] How to change few column names in a data table

查看:135
本文介绍了如何更改数据表中的几个列名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含10列的数据表。

  town 
tc
one
two


five
six
seven
total

对于我使用的列一到总生成平均值,

  DTmean < lapply(.SD,mean)),by =。(town,tc),。SDcols = 3:10] 


$ b b

这产生了平均值,但是我想要的列名称后面加上_mean。我们如何做到这一点?希望前两列保持与城镇和tc相同。我尝试下面但是它重命名所有一到总只是_mean

  for :10){
setnames(DTmean,i,paste0(names(i),_ mean))
}


解决方案

如果你想用 data.table 方法, c $ c> setnames 如下所示:

  setnames(DTmean,3:10,paste0 (DT)[3:10],'_mean'))

或:

  cols<  -  names(DT)[3:10] 
setnames(DTmean,cols,paste0(cols,'_mean') )

此外,您不需要 .SDcols 语句,因为您正在聚合所有其他列。因此,使用 DT [,lapply(.SD,mean),by =。(town,tc)] 应该给出与使用 [,(lapply(.SD,mean)),by =。(town,tc),.SDcols = 3:10]






在以下示例数据集中:

  set.seed $ b DT < -  data.table(town = rep(c('A','B'),each = 10),
tc = rep ),
one = rnorm(20,1,1),
two = rnorm(20,2,1),
three = rnorm(20,3,1),
four = rnorm(20,4,1),
five = rnorm(20,5,2),
six = rnorm(20,6,2),
seven = rnorm 20,7,2),
total = rnorm(20,28,3))

使用:

  DTmean<  -  DT [,lapply(.SD,mean),by =。(town,tc) ] 
setnames(DTmean,3:10,paste0(names(DT)[3:10],'_mean'))

提供:

  DTmean 
town tc one_mean two_mean three_mean four_mean five_mean six_mean seven_mean total_mean
1:AC 1.7368898 1.883586 3.358440 4.849896 4.742609 5.089877 6.792513 29.20286
2:AD 0.8906842 1.826135 3.267684 3.760931 6.210145 7.320693 5.571687 26.56142
3 :BC 1.4037955 2.474836 2.587920 3.719658 3.446612 6.510183 8.309784 27.80012
4:BD 0.8103511 1.153000 3.360940 3.945082 5.555999 6.198380 8.652779 28.95180






回答你的意见:如果你想同时计算平均值和 sd (改编自我的回答这里):

  DT [,as.list(unlist(lapply(.SD,function(x)list(mean = mean(x),sd = sd(x))) tc)] 

其中:

  town tc one.mean one.sd two.mean two.sd three.mean three.sd four.mean four.sd five.mean five.sd six.mean six.sd seven.mean seven.sd total.mean total.sd 
1:AC 0.2981842 0.3556520 1.578174 0.7788545 2.232366 0.9047046 4.896201 1.238877 4.625866 0.7436584 7.607439 1.7262628 7.949366 1.772771 28.94287 3.902602
2:AD 1.2099018 1.0205252 1.686068 1.5497989 2.671027 0.8323733 4.811279 1.404794 7.235969 0.7883873 6.765797 2.7719942 6.657298 1.107843 27.42563 3.380785
3:BC 0.9238309 0.6679821 2.525485 0.8054734 3.138298 1.0111270 3.876207 0.573342 3.843140 2.1991052 4.942155 0.7784024 6.783383 2.595116 28.95243 1.078307
4:BD 0.8843948 0.9384975 1.988908 1.0543981 3.673393 1.3505701 3.957534 1.097837 2.788119 1.9089660 6.463784 0.7642144 6.416487 2.041441 27.88205 3.807119

但是,很可能最好以长格式存储。要得到这个,你可以使用 data.table 熔化函数如下:

  cols < -  names(DT)[3:10] 
DT2 < .SD,function(x)list(mn = mean(x),sdev = sd(x)))),by =。(town,tc)],
id.vars = c ,'tc'),
measure.vars = patterns('。mn','。sdev'),
value.name = c('mn','sdev')) = cols [variable]]

或更简单的操作:

  DT2   

在:

 > DT2 
town tc variable mn sdev
1:AC one 0.2981842 0.3556520
2:AD one 1.2099018 1.0205252
3:BC one 0.9238309 0.6679821
4:BD one 0.8843948 0.9384975
5:AC two 1.5781743 0.7788545
6:AD two 1.6860675 1.5497989
7:BC two 2.5254855 0.8054734
8:BD two 1.9889082 1.0543981
9:AC three 2.2323655 0.9047046
10:AD三个2.6710267 0.8323733
11:BC三个3.1382982 1.0111270
12:BD三个3.6733929 1.3505701
.....






为了回应您最后的评论,您可以检测异常值,如下所示:

  DT3 < -  melt(DT,id.vars = c('town','tc'))
DT3 [,`:= `(mn = mean(value),sdev = sd(value))by =。(town,tc,variable)
] [,outlier:= +(value< mn -sdev | value> + sdev)]

其中:

  town tc variable value mn sdev outlier 
1:AC one 0.5681578 0.2981842 0.355652 0
2:AD one 0.5528128 1.2099018 1.020525 0
3:AC one 0.5214274 0.2981842 0.355652 0
4:AD one 1.4171454 1.2099018 1.020525 0
5:AC one 0.5820994 0.2981842 0.355652 0
---
156:BD total 23.4462542 27.8820524 3.807119 1
157:BC总计30.5934956 28.9524305 1.078307 1
158:BD总计30.5618759 27.8820524 3.807119 0
159:BC总计27.5940307 28.9524305 1.078307 1
160:BD总计24.8378437 27.8820524 3.807119 0


I have a data table with 10 columns.

town    
tc  
one  
two  
three   
four    
five    
six  
seven   
total

Need to generate mean for columns "one" to "total" for which I am using,

DTmean <- DT[,(lapply(.SD,mean)),by = .(town,tc),.SDcols=3:10]

This generates the mean, but then I want the column names to be suffixed with "_mean". How can we do this? Want the first two columns to remain the same as "town" and "tc". I tried the below but then it renames all "one" to "total" to just "_mean"

for (i in 3:10) {
  setnames(DTmean,i,paste0(names(i),"_mean"))
}

解决方案

If you want to do it the data.table way, you should use setnames as follows:

setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))

or:

cols <- names(DT)[3:10]
setnames(DTmean, cols, paste0(cols, '_mean'))

Furthermore, you don't need the .SDcols statement as you are aggregating all the other columns. Using DT[, lapply(.SD,mean), by = .(town,tc)] should thus give you the same result as using DT[, (lapply(.SD,mean)), by = .(town,tc), .SDcols=3:10].


On the following example dataset:

set.seed(71)
DT <- data.table(town = rep(c('A','B'), each=10),
                 tc = rep(c('C','D'), 10),
                 one = rnorm(20,1,1),
                 two = rnorm(20,2,1),
                 three = rnorm(20,3,1),
                 four = rnorm(20,4,1),
                 five = rnorm(20,5,2),
                 six = rnorm(20,6,2),
                 seven = rnorm(20,7,2),
                 total = rnorm(20,28,3))

using:

DTmean <- DT[, lapply(.SD,mean), by = .(town,tc)]
setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))

gives:

> DTmean
   town tc  one_mean two_mean three_mean four_mean five_mean six_mean seven_mean total_mean
1:    A  C 1.7368898 1.883586   3.358440  4.849896  4.742609 5.089877   6.792513   29.20286
2:    A  D 0.8906842 1.826135   3.267684  3.760931  6.210145 7.320693   5.571687   26.56142
3:    B  C 1.4037955 2.474836   2.587920  3.719658  3.446612 6.510183   8.309784   27.80012
4:    B  D 0.8103511 1.153000   3.360940  3.945082  5.555999 6.198380   8.652779   28.95180


In reply to your comment: If you want to calculate both the mean and the sd simultanuously, you could do (adapted from my answer here):

DT[, as.list(unlist(lapply(.SD, function(x) list(mean = mean(x), sd = sd(x))))), by = .(town,tc)]

which gives:

   town tc  one.mean    one.sd two.mean    two.sd three.mean  three.sd four.mean  four.sd five.mean   five.sd six.mean    six.sd seven.mean seven.sd total.mean total.sd
1:    A  C 0.2981842 0.3556520 1.578174 0.7788545   2.232366 0.9047046  4.896201 1.238877  4.625866 0.7436584 7.607439 1.7262628   7.949366 1.772771   28.94287 3.902602
2:    A  D 1.2099018 1.0205252 1.686068 1.5497989   2.671027 0.8323733  4.811279 1.404794  7.235969 0.7883873 6.765797 2.7719942   6.657298 1.107843   27.42563 3.380785
3:    B  C 0.9238309 0.6679821 2.525485 0.8054734   3.138298 1.0111270  3.876207 0.573342  3.843140 2.1991052 4.942155 0.7784024   6.783383 2.595116   28.95243 1.078307
4:    B  D 0.8843948 0.9384975 1.988908 1.0543981   3.673393 1.3505701  3.957534 1.097837  2.788119 1.9089660 6.463784 0.7642144   6.416487 2.041441   27.88205 3.807119

However, it is highly probable better to store this in long format. To get this you could use data.table's melt function as follows:

cols <- names(DT)[3:10]
DT2 <- melt(DT[, as.list(unlist(lapply(.SD, function(x) list(mn = mean(x), sdev = sd(x))))), by = .(town,tc)], 
            id.vars = c('town','tc'), 
            measure.vars = patterns('.mn','.sdev'),
            value.name = c('mn','sdev'))[, variable := cols[variable]]

or in a much simpler operation:

DT2 <- melt(DT, id.vars = c('town','tc'))[, .(mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)]

which results in:

> DT2
    town tc variable         mn      sdev
 1:    A  C      one  0.2981842 0.3556520
 2:    A  D      one  1.2099018 1.0205252
 3:    B  C      one  0.9238309 0.6679821
 4:    B  D      one  0.8843948 0.9384975
 5:    A  C      two  1.5781743 0.7788545
 6:    A  D      two  1.6860675 1.5497989
 7:    B  C      two  2.5254855 0.8054734
 8:    B  D      two  1.9889082 1.0543981
 9:    A  C    three  2.2323655 0.9047046
10:    A  D    three  2.6710267 0.8323733
11:    B  C    three  3.1382982 1.0111270
12:    B  D    three  3.6733929 1.3505701
.....


In response to your last comments, you can detect outliers as follows:

DT3 <- melt(DT, id.vars = c('town','tc'))
DT3[, `:=` (mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)
    ][, outlier := +(value < mn - sdev | value > mn + sdev)]

which gives:

     town tc variable      value         mn     sdev outlier
  1:    A  C      one  0.5681578  0.2981842 0.355652       0
  2:    A  D      one  0.5528128  1.2099018 1.020525       0
  3:    A  C      one  0.5214274  0.2981842 0.355652       0
  4:    A  D      one  1.4171454  1.2099018 1.020525       0
  5:    A  C      one  0.5820994  0.2981842 0.355652       0
 ---                                                        
156:    B  D    total 23.4462542 27.8820524 3.807119       1
157:    B  C    total 30.5934956 28.9524305 1.078307       1
158:    B  D    total 30.5618759 27.8820524 3.807119       0
159:    B  C    total 27.5940307 28.9524305 1.078307       1
160:    B  D    total 24.8378437 27.8820524 3.807119       0

这篇关于如何更改数据表中的几个列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆