如何更改数据表中的几个列名称 [英] How to change few column names in a data table
问题描述
我有一个包含10列的数据表。
town
tc
one
two
三
四
five
six
seven
total
对于我使用的列一到总生成平均值,
DTmean < lapply(.SD,mean)),by =。(town,tc),。SDcols = 3:10]
$ b b
这产生了平均值,但是我想要的列名称后面加上_mean。我们如何做到这一点?希望前两列保持与城镇和tc相同。我尝试下面但是它重命名所有一到总只是_mean
for :10){
setnames(DTmean,i,paste0(names(i),_ mean))
}
如果你想用 data.table
方法, c $ c> setnames 如下所示:
setnames(DTmean,3:10,paste0 (DT)[3:10],'_mean'))
或:
cols< - names(DT)[3:10]
setnames(DTmean,cols,paste0(cols,'_mean') )
此外,您不需要 .SDcols
语句,因为您正在聚合所有其他列。因此,使用 DT [,lapply(.SD,mean),by =。(town,tc)]
应该给出与使用 [,(lapply(.SD,mean)),by =。(town,tc),.SDcols = 3:10]
。
在以下示例数据集中:
set.seed $ b DT < - data.table(town = rep(c('A','B'),each = 10),
tc = rep ),
one = rnorm(20,1,1),
two = rnorm(20,2,1),
three = rnorm(20,3,1),
four = rnorm(20,4,1),
five = rnorm(20,5,2),
six = rnorm(20,6,2),
seven = rnorm 20,7,2),
total = rnorm(20,28,3))
使用:
DTmean< - DT [,lapply(.SD,mean),by =。(town,tc) ]
setnames(DTmean,3:10,paste0(names(DT)[3:10],'_mean'))
提供:
DTmean
town tc one_mean two_mean three_mean four_mean five_mean six_mean seven_mean total_mean
1:AC 1.7368898 1.883586 3.358440 4.849896 4.742609 5.089877 6.792513 29.20286
2:AD 0.8906842 1.826135 3.267684 3.760931 6.210145 7.320693 5.571687 26.56142
3 :BC 1.4037955 2.474836 2.587920 3.719658 3.446612 6.510183 8.309784 27.80012
4:BD 0.8103511 1.153000 3.360940 3.945082 5.555999 6.198380 8.652779 28.95180
回答你的意见:如果你想同时计算平均值和 sd
(改编自我的回答这里):
DT [,as.list(unlist(lapply(.SD,function(x)list(mean = mean(x),sd = sd(x))) tc)]
其中:
town tc one.mean one.sd two.mean two.sd three.mean three.sd four.mean four.sd five.mean five.sd six.mean six.sd seven.mean seven.sd total.mean total.sd
1:AC 0.2981842 0.3556520 1.578174 0.7788545 2.232366 0.9047046 4.896201 1.238877 4.625866 0.7436584 7.607439 1.7262628 7.949366 1.772771 28.94287 3.902602
2:AD 1.2099018 1.0205252 1.686068 1.5497989 2.671027 0.8323733 4.811279 1.404794 7.235969 0.7883873 6.765797 2.7719942 6.657298 1.107843 27.42563 3.380785
3:BC 0.9238309 0.6679821 2.525485 0.8054734 3.138298 1.0111270 3.876207 0.573342 3.843140 2.1991052 4.942155 0.7784024 6.783383 2.595116 28.95243 1.078307
4:BD 0.8843948 0.9384975 1.988908 1.0543981 3.673393 1.3505701 3.957534 1.097837 2.788119 1.9089660 6.463784 0.7642144 6.416487 2.041441 27.88205 3.807119
但是,很可能最好以长格式存储。要得到这个,你可以使用 data.table
的熔化
函数如下:
cols < - names(DT)[3:10]
DT2 < .SD,function(x)list(mn = mean(x),sdev = sd(x)))),by =。(town,tc)],
id.vars = c ,'tc'),
measure.vars = patterns('。mn','。sdev'),
value.name = c('mn','sdev')) = cols [variable]]
或更简单的操作:
DT2
在:
> DT2
town tc variable mn sdev
1:AC one 0.2981842 0.3556520
2:AD one 1.2099018 1.0205252
3:BC one 0.9238309 0.6679821
4:BD one 0.8843948 0.9384975
5:AC two 1.5781743 0.7788545
6:AD two 1.6860675 1.5497989
7:BC two 2.5254855 0.8054734
8:BD two 1.9889082 1.0543981
9:AC three 2.2323655 0.9047046
10:AD三个2.6710267 0.8323733
11:BC三个3.1382982 1.0111270
12:BD三个3.6733929 1.3505701
.....
为了回应您最后的评论,您可以检测异常值,如下所示:
DT3 < - melt(DT,id.vars = c('town','tc'))
DT3 [,`:= `(mn = mean(value),sdev = sd(value))by =。(town,tc,variable)
] [,outlier:= +(value< mn -sdev | value> + sdev)]
其中:
town tc variable value mn sdev outlier
1:AC one 0.5681578 0.2981842 0.355652 0
2:AD one 0.5528128 1.2099018 1.020525 0
3:AC one 0.5214274 0.2981842 0.355652 0
4:AD one 1.4171454 1.2099018 1.020525 0
5:AC one 0.5820994 0.2981842 0.355652 0
---
156:BD total 23.4462542 27.8820524 3.807119 1
157:BC总计30.5934956 28.9524305 1.078307 1
158:BD总计30.5618759 27.8820524 3.807119 0
159:BC总计27.5940307 28.9524305 1.078307 1
160:BD总计24.8378437 27.8820524 3.807119 0
I have a data table with 10 columns.
town tc one two three four five six seven total
Need to generate mean for columns "one" to "total" for which I am using,
DTmean <- DT[,(lapply(.SD,mean)),by = .(town,tc),.SDcols=3:10]
This generates the mean, but then I want the column names to be suffixed with "_mean". How can we do this? Want the first two columns to remain the same as "town" and "tc". I tried the below but then it renames all "one" to "total" to just "_mean"
for (i in 3:10) { setnames(DTmean,i,paste0(names(i),"_mean")) }
解决方案If you want to do it the
data.table
way, you should usesetnames
as follows:setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))
or:
cols <- names(DT)[3:10] setnames(DTmean, cols, paste0(cols, '_mean'))
Furthermore, you don't need the
.SDcols
statement as you are aggregating all the other columns. UsingDT[, lapply(.SD,mean), by = .(town,tc)]
should thus give you the same result as usingDT[, (lapply(.SD,mean)), by = .(town,tc), .SDcols=3:10]
.
On the following example dataset:
set.seed(71) DT <- data.table(town = rep(c('A','B'), each=10), tc = rep(c('C','D'), 10), one = rnorm(20,1,1), two = rnorm(20,2,1), three = rnorm(20,3,1), four = rnorm(20,4,1), five = rnorm(20,5,2), six = rnorm(20,6,2), seven = rnorm(20,7,2), total = rnorm(20,28,3))
using:
DTmean <- DT[, lapply(.SD,mean), by = .(town,tc)] setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))
gives:
> DTmean town tc one_mean two_mean three_mean four_mean five_mean six_mean seven_mean total_mean 1: A C 1.7368898 1.883586 3.358440 4.849896 4.742609 5.089877 6.792513 29.20286 2: A D 0.8906842 1.826135 3.267684 3.760931 6.210145 7.320693 5.571687 26.56142 3: B C 1.4037955 2.474836 2.587920 3.719658 3.446612 6.510183 8.309784 27.80012 4: B D 0.8103511 1.153000 3.360940 3.945082 5.555999 6.198380 8.652779 28.95180
In reply to your comment: If you want to calculate both the mean and the
sd
simultanuously, you could do (adapted from my answer here):DT[, as.list(unlist(lapply(.SD, function(x) list(mean = mean(x), sd = sd(x))))), by = .(town,tc)]
which gives:
town tc one.mean one.sd two.mean two.sd three.mean three.sd four.mean four.sd five.mean five.sd six.mean six.sd seven.mean seven.sd total.mean total.sd 1: A C 0.2981842 0.3556520 1.578174 0.7788545 2.232366 0.9047046 4.896201 1.238877 4.625866 0.7436584 7.607439 1.7262628 7.949366 1.772771 28.94287 3.902602 2: A D 1.2099018 1.0205252 1.686068 1.5497989 2.671027 0.8323733 4.811279 1.404794 7.235969 0.7883873 6.765797 2.7719942 6.657298 1.107843 27.42563 3.380785 3: B C 0.9238309 0.6679821 2.525485 0.8054734 3.138298 1.0111270 3.876207 0.573342 3.843140 2.1991052 4.942155 0.7784024 6.783383 2.595116 28.95243 1.078307 4: B D 0.8843948 0.9384975 1.988908 1.0543981 3.673393 1.3505701 3.957534 1.097837 2.788119 1.9089660 6.463784 0.7642144 6.416487 2.041441 27.88205 3.807119
However, it is highly probable better to store this in long format. To get this you could use
data.table
'smelt
function as follows:cols <- names(DT)[3:10] DT2 <- melt(DT[, as.list(unlist(lapply(.SD, function(x) list(mn = mean(x), sdev = sd(x))))), by = .(town,tc)], id.vars = c('town','tc'), measure.vars = patterns('.mn','.sdev'), value.name = c('mn','sdev'))[, variable := cols[variable]]
or in a much simpler operation:
DT2 <- melt(DT, id.vars = c('town','tc'))[, .(mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)]
which results in:
> DT2 town tc variable mn sdev 1: A C one 0.2981842 0.3556520 2: A D one 1.2099018 1.0205252 3: B C one 0.9238309 0.6679821 4: B D one 0.8843948 0.9384975 5: A C two 1.5781743 0.7788545 6: A D two 1.6860675 1.5497989 7: B C two 2.5254855 0.8054734 8: B D two 1.9889082 1.0543981 9: A C three 2.2323655 0.9047046 10: A D three 2.6710267 0.8323733 11: B C three 3.1382982 1.0111270 12: B D three 3.6733929 1.3505701 .....
In response to your last comments, you can detect outliers as follows:
DT3 <- melt(DT, id.vars = c('town','tc')) DT3[, `:=` (mn = mean(value), sdev = sd(value)), by = .(town,tc,variable) ][, outlier := +(value < mn - sdev | value > mn + sdev)]
which gives:
town tc variable value mn sdev outlier 1: A C one 0.5681578 0.2981842 0.355652 0 2: A D one 0.5528128 1.2099018 1.020525 0 3: A C one 0.5214274 0.2981842 0.355652 0 4: A D one 1.4171454 1.2099018 1.020525 0 5: A C one 0.5820994 0.2981842 0.355652 0 --- 156: B D total 23.4462542 27.8820524 3.807119 1 157: B C total 30.5934956 28.9524305 1.078307 1 158: B D total 30.5618759 27.8820524 3.807119 0 159: B C total 27.5940307 28.9524305 1.078307 1 160: B D total 24.8378437 27.8820524 3.807119 0
这篇关于如何更改数据表中的几个列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!