重命名列条目(按组的最大值)时,结果不一致 [英] Renaming a column entry, when it is the maximum value by group, gives inconsistent results

查看:44
本文介绍了重命名列条目(按组的最大值)时,结果不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据如下:

 库(data.table)DT<-structure(list(State_Ab = c("VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA","VA"),年份= c(1995,1995,1995、1995、1999、1999、1999、1999、2001、2001、2001、2001、2005,2005、2005、2005、2007、2007、2007、2007、2011、2011、2011、2011,2017、2017、2017、2005、2005、2005、2005、2017、2017、2017、1995,1995、1995、1995、2001、2001、2001、2001、2007、2007、2007、2007),县= c("Bedford","Fairfax","Bedford","Fairfax","Bedford","Fairfax","Bedford","Fairfax","Bedford","Bedford","Fairfax","Fairfax","Bedford","Bedford","Fairfax","Fairfax","Bedford","Bedford","Fairfax","Fairfax","Bedford","Bedford","Fairfax","Fairfax","Bedford","Fairfax","Fairfax","Bedford","Bedford","Fairfax","Fairfax","Bedford","Fairfax","Fairfax","Fairfax","Fairfax","Bedford","Bedford","Bedford","Fairfax","Bedford","Fairfax","Bedford","Fairfax","Bedford","Fairfax"),类型= c("B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","A","A","A","A","A","A","A","C","C","C","C","C","C","C","C","C","C","C","C"),人口= c(15528,297214,2053,7505,8963,199282,829,4299,20040,2018,9095,392987,26930、2319、10225、448078、24499、1935、8048、340397、24012,1926、7112、303379、41681、479086、9552、31404、2542、10546,461379、42525、551183、12028、303203、7600、2160、17988、25284,410475,2379,9462,25122,342998,1940,8096)),row.names = c(NA,-46L),class = c("data.table","data.frame")) 

其中一些值适用于贝德福德市,某些适用于贝德福德县.根据我掌握的信息,最小值应为贝德福德市,最大值为贝德福德县.我以为我会做以下事情,但是以某种方式失败了.我想根据

但是我得到了错误:

[.data.table`(DT,County =="Bedford"& order(Population),`:=`(County,:在县"列中提供了2个要分配给大小为4的组1的项目.RHS长度必须为1(可以使用单个值)或与LHS长度完全匹配.如果您想回收" RHS,请显式使用rep()来使您的代码读者清楚这一意图.

输出将变为:

  State_Ab年县类型人口1:VA 2017贝德福德B 416812:VA 2005 Bedford A 314043:VA 2005贝德福德A 25424:弗吉尼亚州2017贝德福德A 425255:VA 1995贝德福德C 21606:VA 1995贝德福德C 179887:VA 2001贝德福德C 252848:VA 2001贝德福德C 23799:VA 2007贝德福德C 2512210:弗吉尼亚州2007贝德福德C 194011:VA 1995贝德福德市B 205312:弗吉尼亚州1999贝德福德城B 82913:弗吉尼亚州2001贝德福德市B 201814:VA 2005贝德福德市B 231915:弗吉尼亚州2007贝德福德城B 193516:VA 2011贝德福德城B 192617:弗吉尼亚州1995贝德福德县B 1552818:弗吉尼亚州1999贝德福德郡B 896319:弗吉尼亚州2001贝德福德县B 2004020:VA 2005贝德福德县B 2693021:弗吉尼亚州2007贝德福德郡B 2449922:VA 2011贝德福德郡B 24012 

我不太了解这个问题的来源.

尝试数据集中的另一个县时,我会这样做:

  DT [County =="Fairfax"&order(人口),县:= c("Fairfax县","Fairfax市"),.(State_Ab,年份,县,类型)] 

我没有收到任何错误,但是输出错误(费尔法克斯县比费尔法克斯市大很多,但它并不总是在数据中):

  23:VA 1995年Fairfax City B 750524:弗吉尼亚州1999年费尔法克斯市B 429925:VA 2001费尔法克斯市B 39298726日:VA 2005费尔法克斯市B 44807827日:VA 2007费尔法克斯市B 34039728日:VA 2011费尔法克斯市B 30337929:弗吉尼亚州2017年费尔法克斯市B 955230:VA 2005费尔法克斯市A 46137931:弗吉尼亚州2017费尔法克斯城A 1202832:VA 1995年费尔法克斯城C 760033:VA 2001费尔法克斯市C 946234:VA 2007费尔法克斯城C 809635:弗吉尼亚州1995年费尔法克斯县B 2972​​1436:弗吉尼亚州1999年费尔法克斯县B级19928237:VA 2001年费尔法克斯县B级909538:VA 2005费尔法克斯县B 1022539:VA 2007费尔法克斯县B 804840:VA 2011费尔法克斯县B 711241:弗吉尼亚州2017年费尔法克斯县B 47908642:VA 2005费尔法克斯县A 1054643:弗吉尼亚州2017年费尔法克斯县A 55118344:弗吉尼亚州1995年费尔法克斯县C 30320345:弗吉尼亚州2001年费尔法克斯县C 41047546:弗吉尼亚州2007年费尔法克斯县C 342998 

这真让我发疯..这是怎么回事?

所需结果:

  23:VA 1995年Fairfax City B 750524:弗吉尼亚州1999年费尔法克斯市B 429925:弗吉尼亚州2001年费尔法克斯县B 39298726日:弗吉尼亚州2005年费尔法克斯县B 44807827日:弗吉尼亚州2007年费尔法克斯县B 34039728日:VA 2011费尔法克斯县B 30337929:弗吉尼亚州2017年费尔法克斯市B 955230:VA 2005费尔法克斯县A 46137931:弗吉尼亚州2017费尔法克斯城A 1202832:VA 1995年费尔法克斯城C 760033:VA 2001费尔法克斯市C 946234:VA 2007费尔法克斯城C 809635:弗吉尼亚州1995年费尔法克斯县B 2972​​1436:弗吉尼亚州1999年费尔法克斯县B级19928237:VA 2001费尔法克斯市B 909538:VA 2005费尔法克斯市B 1022539:VA 2007费尔法克斯市B 804840:VA 2011费尔法克斯市B 711241:弗吉尼亚州2017年费尔法克斯县B 47908642:VA 2005费尔法克斯市A 1054643:弗吉尼亚州2017年费尔法克斯县A 55118344:弗吉尼亚州1995年费尔法克斯县C 30320345:弗吉尼亚州2001年费尔法克斯县C 41047546:弗吉尼亚州2007年费尔法克斯县C 342998 

解决方案

我正在使用一个函数来对县和人口进行排序,然后相应地更改县.

我注意到 VA 2017 Bedford A 在该年只有一个条目.

  fn2<-函数(县,人口){if(length(County)== 1){返回(列表(县,人口))}  别的 {list(County = paste(County,c("City","County"))),人口=排序(人口)}}DT [County =="Bedford",c("County","Population"):= fn2(County,人口),.((州_Ab,年份,类型)] 

DT的

 部分State_Ab年县类型人口1:VA 1995贝德福德市B 20532:VA 1995年Fairfax B 2972​​143:VA 1995贝德福德县B 155284:VA 1995年Fairfax B 75055:弗吉尼亚州1999贝德福德城B 8296:VA 1999费尔法克斯B 1992827:弗吉尼亚州1999贝德福德郡B 89638:VA 1999费尔法克斯B 42999:弗吉尼亚州2001贝德福德市B 201810:弗吉尼亚州2001贝德福德郡B 2004011:VA 2001费尔法克斯B 909512:VA 2001费尔法克斯B 392987 

I have data as follows:

library(data.table)
DT <- structure(list(State_Ab = c("VA", "VA", "VA", "VA", "VA", "VA", 
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", 
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", 
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", 
"VA", "VA", "VA", "VA", "VA", "VA", "VA"), year = c(1995, 1995, 
1995, 1995, 1999, 1999, 1999, 1999, 2001, 2001, 2001, 2001, 2005, 
2005, 2005, 2005, 2007, 2007, 2007, 2007, 2011, 2011, 2011, 2011, 
2017, 2017, 2017, 2005, 2005, 2005, 2005, 2017, 2017, 2017, 1995, 
1995, 1995, 1995, 2001, 2001, 2001, 2001, 2007, 2007, 2007, 2007
), County = c("Bedford", "Fairfax", "Bedford", "Fairfax", "Bedford", 
"Fairfax", "Bedford", "Fairfax", "Bedford", "Bedford", "Fairfax", 
"Fairfax", "Bedford", "Bedford", "Fairfax", "Fairfax", "Bedford", 
"Bedford", "Fairfax", "Fairfax", "Bedford", "Bedford", "Fairfax", 
"Fairfax", "Bedford", "Fairfax", "Fairfax", "Bedford", "Bedford", 
"Fairfax", "Fairfax", "Bedford", "Fairfax", "Fairfax", "Fairfax", 
"Fairfax", "Bedford", "Bedford", "Bedford", "Fairfax", "Bedford", 
"Fairfax", "Bedford", "Fairfax", "Bedford", "Fairfax"), Type = c("B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"A", "A", "A", "A", "A", "A", "A", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C"), Population = c(15528, 297214, 
2053, 7505, 8963, 199282, 829, 4299, 20040, 2018, 9095, 392987, 
26930, 2319, 10225, 448078, 24499, 1935, 8048, 340397, 24012, 
1926, 7112, 303379, 41681, 479086, 9552, 31404, 2542, 10546, 
461379, 42525, 551183, 12028, 303203, 7600, 2160, 17988, 25284, 
410475, 2379, 9462, 25122, 342998, 1940, 8096)), row.names = c(NA, 
-46L), class = c("data.table", "data.frame"))

Some of these values are for Bedford City, some for Bedford County. Based on the information I have, the min value should be Bedford City, the max value Bedford County. I thought I would do the following, but it fails somehow. I would like to do the following, based on Ronak's answer :

DT[County=="Bedford" & order(Population), County := c("Bedford County", "Bedford City"), .(State_Ab, year, County, Type)]

But I am getting the error:

Error in `[.data.table`(DT, County == "Bedford" & order(Population), `:=`(County,  : 
  Supplied 2 items to be assigned to group 1 of size 4 in column 'County'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

The output then becomes:

   State_Ab year         County Type Population
 1:       VA 2017        Bedford    B      41681
 2:       VA 2005        Bedford    A      31404
 3:       VA 2005        Bedford    A       2542
 4:       VA 2017        Bedford    A      42525
 5:       VA 1995        Bedford    C       2160
 6:       VA 1995        Bedford    C      17988
 7:       VA 2001        Bedford    C      25284
 8:       VA 2001        Bedford    C       2379
 9:       VA 2007        Bedford    C      25122
10:       VA 2007        Bedford    C       1940
11:       VA 1995   Bedford City    B       2053
12:       VA 1999   Bedford City    B        829
13:       VA 2001   Bedford City    B       2018
14:       VA 2005   Bedford City    B       2319
15:       VA 2007   Bedford City    B       1935
16:       VA 2011   Bedford City    B       1926
17:       VA 1995 Bedford County    B      15528
18:       VA 1999 Bedford County    B       8963
19:       VA 2001 Bedford County    B      20040
20:       VA 2005 Bedford County    B      26930
21:       VA 2007 Bedford County    B      24499
22:       VA 2011 Bedford County    B      24012

I do not really understand where this issue is coming from..

When try the other county in the data-set, I do:

DT[County=="Fairfax" & order(Population), County := c("Fairfax County", "Fairfax City"), .(State_Ab, year, County, Type)]

I get no error, but the output is wrong (Fairfax County is MUCH larger than Fairfax City, but it not always is in the data):

23:       VA 1995   Fairfax City    B       7505
24:       VA 1999   Fairfax City    B       4299
25:       VA 2001   Fairfax City    B     392987
26:       VA 2005   Fairfax City    B     448078
27:       VA 2007   Fairfax City    B     340397
28:       VA 2011   Fairfax City    B     303379
29:       VA 2017   Fairfax City    B       9552
30:       VA 2005   Fairfax City    A     461379
31:       VA 2017   Fairfax City    A      12028
32:       VA 1995   Fairfax City    C       7600
33:       VA 2001   Fairfax City    C       9462
34:       VA 2007   Fairfax City    C       8096
35:       VA 1995 Fairfax County    B     297214
36:       VA 1999 Fairfax County    B     199282
37:       VA 2001 Fairfax County    B       9095
38:       VA 2005 Fairfax County    B      10225
39:       VA 2007 Fairfax County    B       8048
40:       VA 2011 Fairfax County    B       7112
41:       VA 2017 Fairfax County    B     479086
42:       VA 2005 Fairfax County    A      10546
43:       VA 2017 Fairfax County    A     551183
44:       VA 1995 Fairfax County    C     303203
45:       VA 2001 Fairfax County    C     410475
46:       VA 2007 Fairfax County    C     342998

This is driving me absolutely nuts.. What is going on here?

Desired result:

23:       VA 1995   Fairfax City    B       7505
24:       VA 1999   Fairfax City    B       4299
25:       VA 2001   Fairfax County  B     392987
26:       VA 2005   Fairfax County  B     448078
27:       VA 2007   Fairfax County  B     340397
28:       VA 2011   Fairfax County  B     303379
29:       VA 2017   Fairfax City    B       9552
30:       VA 2005   Fairfax County  A     461379
31:       VA 2017   Fairfax City    A      12028
32:       VA 1995   Fairfax City    C       7600
33:       VA 2001   Fairfax City    C       9462
34:       VA 2007   Fairfax City    C       8096
35:       VA 1995 Fairfax County    B     297214
36:       VA 1999 Fairfax County    B     199282
37:       VA 2001 Fairfax City      B       9095
38:       VA 2005 Fairfax City      B      10225
39:       VA 2007 Fairfax City      B       8048
40:       VA 2011 Fairfax City      B       7112
41:       VA 2017 Fairfax County    B     479086
42:       VA 2005 Fairfax City      A      10546
43:       VA 2017 Fairfax County    A     551183
44:       VA 1995 Fairfax County    C     303203
45:       VA 2001 Fairfax County    C     410475
46:       VA 2007 Fairfax County    C     342998

解决方案

I am using an function to order County and Population, and then change the County accordingly.

I noticed that VA 2017 Bedford A has only one entry for that year.

fn2 <- function(County, Population) {

 if (length(County) == 1) {
    return(list(County, Population))
 }  else {
    list(County = paste(County, c("City", "County")), 
        Population = sort(Population))
 }
}

DT[County == "Bedford", c("County", "Population") := fn2(County, Population), 
            .(State_Ab, year, Type)]  

part of DT
    State_Ab year         County Type Population
 1:       VA 1995   Bedford City    B       2053
 2:       VA 1995        Fairfax    B     297214
 3:       VA 1995 Bedford County    B      15528
 4:       VA 1995        Fairfax    B       7505
 5:       VA 1999   Bedford City    B        829
 6:       VA 1999        Fairfax    B     199282
 7:       VA 1999 Bedford County    B       8963
 8:       VA 1999        Fairfax    B       4299
 9:       VA 2001   Bedford City    B       2018
10:       VA 2001 Bedford County    B      20040
11:       VA 2001        Fairfax    B       9095
12:       VA 2001        Fairfax    B     392987

这篇关于重命名列条目(按组的最大值)时,结果不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆