dplyr不按日期分组数据 [英] dplyr does not group data by date

查看:164
本文介绍了dplyr不按日期分组数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在尝试计算由Leada提供的数据集使用的自行车频率。

  library(dplyr)

setAs(character,POSIXlt,function(from)strptime(from,format = %m /%d /%y%H:%M))
d < - read.csv(http://mandrillapp.com/track/click/30315607/s3-us-west-1 .amazonaws.com p = eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ,
colClasses = C( 数字, 数字, POSIXlt, 因子, 数字, POSIXlt, 因子, 数字, 数字,因子,字符),
stringsAsFactors = T)
名称(d)[9] < - BikeNo

d < - tbl_df (d)

d< - d%>%mutate(Weekday = factor(工作日(Start.Date)))
d%>%group_by(工作日)
%>%summarize(Total = n())
%>%select(工作日,总计)

但是dplyr不想按平日分组数据:


错误:列'Start.Date'不支持类型


为什么我关心Start.Date列,我按一个因素分组?
您可以在本地运行代码来重现错误:它将自动下载数据。



P.S。我正在使用dplyr版本:dplyr_0.3.0.2

解决方案

在处理日期时,lubridate包很有用。
这里是解析Start.Date和End.Date的代码,提取周日,然后按周分组:



读取日期作为字符向量< h3>

 库(dplyr)
库(lubridate)
#由于某种原因,您的指令直接加载csv一个url
#没有工作。我将csv保存到临时目录。
d< - read.csv(/ tmp / bike_trip_data.csv,colClasses = c(numeric,numeric,character,factor,numeric,character ,数字,数字,因子,字符),stringsAsFactors = T)

名称(d)[9]< - BikeNo
d < - tbl_df(d)



使用lubridate转换开始日期和结束日期



  d<  -  d%>%
mutate(
Start.Date = parse_date_time(Start.Date,%m / %d /%y%H:%M),
End.Date = parse_date_time(End.Date,%m /%d /%y%H:%M),
Weekday = wday(Start.Date,label = TRUE,abbr = FALSE))



每周的行数



  d%>%
group_by(工作日)%>%
总结(总计= n ())

#平日合计
#1星期日10587
#2星期一23138
#3星期二24678
#4星期三23651
#5星期四25265
#6星期五24283
#7星期六12413


I am trying to calculate the frequency of bikes that are taken by people using a dataset provided by Leada.

Here is the code:

library(dplyr)

setAs("character", "POSIXlt", function(from) strptime(from, format = "%m/%d/%y %H:%M"))
d <- read.csv("http://mandrillapp.com/track/click/30315607/s3-us-west-1.amazonaws.com?p=eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ",
              colClasses = c("numeric", "numeric", "POSIXlt", "factor", "numeric", "POSIXlt", "factor", "numeric", "numeric", "factor", "character"),
              stringsAsFactors = T)
names(d)[9] <- "BikeNo"

d <- tbl_df(d)

d <- d %>% mutate(Weekday = factor(weekdays(Start.Date)))
d %>% group_by(Weekday) 
  %>% summarise(Total = n()) 
  %>% select(Weekday, Total)

It is strange but dplyr does not want to group data by Weekday saying:

Error: column 'Start.Date' has unsupported type

Why it cares about Start.Date column where I group by a factor? You can run the code locally to reproduce the error: it will download the data automatically.

P.S. I am using dplyr version: dplyr_0.3.0.2

解决方案

The lubridate package is useful when dealing with dates. Here is the code to parse Start.Date and End.Date, extract week days, then group by week days:

Read dates as character vectors

library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)

names(d)[9] <- "BikeNo"
d <- tbl_df(d)

Use lubridate to convert start date and end date

d <- d %>% 
  mutate(
    Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
    End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
    Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))

Number of lines per week day

d %>%
  group_by(Weekday) %>%
  summarise(Total = n())

#     Weekday Total
# 1    Sunday 10587
# 2    Monday 23138
# 3   Tuesday 24678
# 4 Wednesday 23651
# 5  Thursday 25265
# 6    Friday 24283
# 7  Saturday 12413

这篇关于dplyr不按日期分组数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆