R直方图显示在每个箱中花费的时间 [英] R histogram showing time spent in each bin
问题描述
I'm trying to create a plot similar to the ones here:
基本上我想要一个直方图,其中每个垃圾箱显示在该节奏范围内花费了多长时间(例如0-20rpm 1小时,21-40rpm 3小时,等等)
Basically I want a histogram, where each bin shows how long was spent in that range of cadence (e.g 1 hour in 0-20rpm, 3 hours in 21-40rpm, etc)
library("rjson") # 3rd party library, so: install.packages("rjson")
# Load data from Strava API.
# Ride used for example is http://app.strava.com/rides/13542320
url <- "http://app.strava.com/api/v1/streams/13542320?streams[]=cadence,time"
d <- fromJSON(paste(readLines(url)))
d$cadence
(rpm)中的每个值都与d$time
中的相同索引(从开始算起的秒数)配对.
Each value in d$cadence
(rpm) is paired with the same index in d$time
(the number of seconds from the start).
值不一定一致(如将plot(x=d$time, y=d$cadence, type='l')
与plot(d$cadence, type='l')
进行比较可以看到)
The values are not necessarily uniform (as can be seen if you compare plot(x=d$time, y=d$cadence, type='l')
with plot(d$cadence, type='l')
)
如果我做了最简单的事情:
If I do the simplest possible thing:
hist(d$cadence)
..这会产生非常接近的结果,但是Y值是频率"而不是时间,并且忽略了每个数据点之间的时间(因此,特别是0rpm段的表示将不足)
..this produces something very close, but the Y value is "frequency" instead of time, and ignores the time between each data point (so the 0rpm segment in particular will be underrepresented)
推荐答案
您需要创建一个新列来说明两次采样之间的时间.
You need to create a new column to account for the time between samples.
对于这种事情,我更喜欢使用data.frames作为列表,所以:
I prefer data.frames to lists for this kind of thing, so:
d <- as.data.frame(fromJSON(paste(readLines(url))))
d$sample.time <- 0
d$sample.time[2:nrow(d)] <- d$time[2:nrow(d)]-d$time[1:(nrow(d)-1)]
现在您已经有了采样时间,您可以对采样时间大于1的任何东西简单地重复"节奏测量,并绘制该时间的直方图
now that you've got your sample times, you can simply "repeat" the cadence measures for anything with a sample time more than 1, and plot a histogram of that
hist(rep(x=d$cadence, times=d$sample.time),
main="Histogram of Cadence", xlab="Cadence (RPM)",
ylab="Time (presumably seconds)")
肯定会有一个更优雅的解决方案,它不会在非整数采样时间内消失,但这可以与您的采样数据一起使用.
There's bound to be a more elegant solution that wouldn't fall apart for non-integer sample times, but this works with your sample data.
re:更优雅,更通用的解决方案,您可以使用new.d <- aggregate(sample.time~cadence, data=d, FUN=sum)
之类的东西来处理非整数采样时间,但是问题就变成了绘制看起来像频率表的东西的直方图,但是没有-整数频率.经过一番摸索之后,我得出的结论是,对于这种情况,您必须将自己的直方图滚动起来,方法是将数据进一步汇总到bin中,然后用条形图显示它们.
re: the more elegant, generalized solution, you can deal with non-integer sample times with something like new.d <- aggregate(sample.time~cadence, data=d, FUN=sum)
, but then the problem becomes plotting a histogram for something that looks like a frequency table, but with non-integer frequencies. After some poking around, I'm coming to the conclusion you'd have to roll-your-own histogram for this case by further aggregating the data into bins, and then displaying them with a barchart.
这篇关于R直方图显示在每个箱中花费的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!