如何在数据dr ame中查找与每个日期相对应的唯一ids数 [英] How to find number of unique ids corresponding to each date in a data drame

查看:107
本文介绍了如何在数据dr ame中查找与每个日期相对应的唯一ids数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

 日期时间id datetime 
1 2015-01- 02 14:27:22.130 999000000007628 2015-01-02 14:27:22
2 2015-01-02 14:41:27.720 989001002807730 2015-01-02 14:41:27
3 2015- 01-02 14:41:27.940 989001002807730 2015-01-02 14:41:27
4 2015-01-02 14:41:28.140 989001002807730 2015-01-02 14:41:28
5 2015-01-02 14:41:28.170 989001002807730 2015-01-02 14:41:28
6 2015-01-02 14:41:28.350 989001002807730 2015-01-02 14:41:28

我需要查找该数据框中每个日期的唯一id数。 p>

我尝试过:

  sums< -data.frame(date =唯一(数据$ date),numIDs = 0)

(i in unique(data $ date)){
sums [sums $ date == i,] $ numIDs< -length (unique(data [data $ date == i,] $ id))
}

我收到以下错误:

  $&l中的错误t;  - 。data.frame`(`* tmp *`,numIDs,value = 0L):
替换有1行,数据有0
另外:警告消息:
在`==。default`(data $ date,i)中:
更长的对象长度不是较短对象长度的倍数

任何想法?谢谢!



希望这有帮助!

  data<结构(list(date = structure(list(sec = c(0,0,0,0,0,0,
0,0,0,0),min = c(0L,0L,0L, ,0L,0L,0L,0L,0L,0L),
小时= c(0L,0L,0L,0L,0L,0L,0L,0L,0L,0L),mday = c(2L,$ (0L,0L,0L,
0L,0L,0L,0L,0L,0L,0L),b $ b 2L,2L,2L,2L,2L,2L,2L,2L,2L) ,year = c(115L,115L,115L,115L,
115L,115L,115L,115L,115L,115L),wday = c(5L,5L,5L,
5L,5L,5L, 5L,5L,5L),yd = c(1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L),isdst = c(0L,0L,0L, 0L,0L,
0L,0L,0L),zone = c(PST,PST,PST,PST,PST,
PST ,PST,PST,PST),gmtoff = c(NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_) .Names = c(sec,
min,hour,mday,mon,year,wday,yday t,
zone,gmtoff),class = c(POSIXlt,POSIXt)),time = c(14:27:22.130,
14:41 :27.720,14:41:27.940,14:41:28.140,14:41:28.170,
14:41:28.350,14:41:28.390 :41:28.520,14:41:28.630,
14:41:28.740),id = c(999000000007628,989001002807730,
989001002807730,989001002807730 ,989001002807730,989001002807730,
989001002807730,989001002807730,989001002807730,989001002807730
),datetime = structure(list(sec = c(22.13,27.72,27.94, 28.14,
28.17,28.35,28.39,28.52,28.63,28.74),min = c(27L,41L,
41L,41L,41L,41L,41L,41L,41L,41L),小时= c(14L,14L,14L,
14L,14L,14L,14L,14L,14L,14L),mday = c(2L,2L,2L,2L,
2L,2L,2L, ,2L,2L),mon = c(0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L),year = c(115L,115L,115L,115L,115L,115L ,115L,
115L,115L,115L),wday = c(5L,5L,5L,5L,5L,5L,5L,5L,5L,
5L),yday = c(1L,1L ,1L,1L,1L,1L,1L,1L,1L, 1L),isdst = c(0L,
0L,0L,0L,0L,0L,0L,0L,0L,0L),zone = c(PST,PST,PST $ bPST,PST,PST,PST,PST,PST,PST),gmtoff = c(NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_ ,
NA_integer_,NA_integer_,NA_integer_,NA_integer_)),.Names = c(sec,
min,hour,mday,mon,year,wday ,yday,isdst,
zone,gmtoff),class = c(POSIXlt,POSIXt)),site = c(Chivato,
Chivato,Chivato,Chivato,Chivato,Chivato,Chivato,
Chivato,Chivato,Chivato)),.Names = c(date time,
id,datetime,site),row.names = c(NA,10L),class =data.frame)


解决方案

您可以使用 uniqueN code> data.table :

  library(data.table)
setDT(df)[,uniqueN(id),by = date]

或(as根据@Richard Scriven的评论):

 聚合(id〜date,df,function(x)length(unique )))


I have a data frame that looks like this:

      date         time              id            datetime    
1 2015-01-02 14:27:22.130 999000000007628 2015-01-02 14:27:22 
2 2015-01-02 14:41:27.720 989001002807730 2015-01-02 14:41:27 
3 2015-01-02 14:41:27.940 989001002807730 2015-01-02 14:41:27 
4 2015-01-02 14:41:28.140 989001002807730 2015-01-02 14:41:28 
5 2015-01-02 14:41:28.170 989001002807730 2015-01-02 14:41:28 
6 2015-01-02 14:41:28.350 989001002807730 2015-01-02 14:41:28 

I need to find the number of unique "id"s for each "date" in that data frame.

I tried this:

sums<-data.frame(date=unique(data$date), numIDs=0)

for(i in unique(data$date)){
  sums[sums$date==i,]$numIDs<-length(unique(data[data$date==i,]$id))
}

and I got the following error:

 Error in `$<-.data.frame`(`*tmp*`, "numIDs", value = 0L) : 
   replacement has 1 row, data has 0
 In addition: Warning message:
 In `==.default`(data$date, i) :
   longer object length is not a multiple of shorter object length

Any ideas?? Thank you!

Hopefully this helps!

data <- structure(list(date = structure(list(sec = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), mday = c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mon = c(0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L), year = c(115L, 115L, 115L, 115L, 
    115L, 115L, 115L, 115L, 115L, 115L), wday = c(5L, 5L, 5L, 
    5L, 5L, 5L, 5L, 5L, 5L, 5L), yday = c(1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), zone = c("PST", "PST", "PST", "PST", "PST", 
    "PST", "PST", "PST", "PST", "PST"), gmtoff = c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", 
"zone", "gmtoff"), class = c("POSIXlt", "POSIXt")), time = c("14:27:22.130", 
"14:41:27.720", "14:41:27.940", "14:41:28.140", "14:41:28.170", 
"14:41:28.350", "14:41:28.390", "14:41:28.520", "14:41:28.630", 
"14:41:28.740"), id = c("999000000007628", "989001002807730", 
"989001002807730", "989001002807730", "989001002807730", "989001002807730", 
"989001002807730", "989001002807730", "989001002807730", "989001002807730"
), datetime = structure(list(sec = c(22.13, 27.72, 27.94, 28.14, 
28.17, 28.35, 28.39, 28.52, 28.63, 28.74), min = c(27L, 41L, 
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L), hour = c(14L, 14L, 14L, 
14L, 14L, 14L, 14L, 14L, 14L, 14L), mday = c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), mon = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), year = c(115L, 115L, 115L, 115L, 115L, 115L, 115L, 
115L, 115L, 115L), wday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
     5L), yday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), isdst = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), zone = c("PST", "PST", "PST", 
    "PST", "PST", "PST", "PST", "PST", "PST", "PST"), gmtoff =     c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec", 
    "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", 
    "zone", "gmtoff"), class = c("POSIXlt", "POSIXt")), site = c("Chivato", 
    "Chivato", "Chivato", "Chivato", "Chivato", "Chivato", "Chivato", 
    "Chivato", "Chivato", "Chivato")), .Names = c("date", "time", 
    "id", "datetime", "site"), row.names = c(NA, 10L), class = "data.frame")

解决方案

You can use the uniqueN function from data.table:

library(data.table)
setDT(df)[, uniqueN(id), by = date]

or (as per the comment of @Richard Scriven):

aggregate(id ~ date, df, function(x) length(unique(x)))

这篇关于如何在数据dr ame中查找与每个日期相对应的唯一ids数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆