R直方图显示在每个箱中花费的时间 [英] R histogram showing time spent in each bin

查看:106
本文介绍了R直方图显示在每个箱中花费的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建类似于

I'm trying to create a plot similar to the ones here:

基本上我想要一个直方图,其中每个垃圾箱显示在该节奏范围内花费了多长时间(例如0-20rpm 1小时,21-40rpm 3小时,等等)

Basically I want a histogram, where each bin shows how long was spent in that range of cadence (e.g 1 hour in 0-20rpm, 3 hours in 21-40rpm, etc)

library("rjson") # 3rd party library, so: install.packages("rjson")

# Load data from Strava API.
# Ride used for example is http://app.strava.com/rides/13542320
url <- "http://app.strava.com/api/v1/streams/13542320?streams[]=cadence,time"
d <- fromJSON(paste(readLines(url)))

d$cadence(rpm)中的每个值都与d$time中的相同索引(从开始算起的秒数)配对.

Each value in d$cadence (rpm) is paired with the same index in d$time (the number of seconds from the start).

值不一定一致(如将plot(x=d$time, y=d$cadence, type='l')plot(d$cadence, type='l')进行比较可以看到)

The values are not necessarily uniform (as can be seen if you compare plot(x=d$time, y=d$cadence, type='l') with plot(d$cadence, type='l') )

如果我做了最简单的事情:

If I do the simplest possible thing:

hist(d$cadence)

..这会产生非常接近的结果,但是Y值是频率"而不是时间,并且忽略了每个数据点之间的时间(因此,特别是0rpm段的表示将不足)

..this produces something very close, but the Y value is "frequency" instead of time, and ignores the time between each data point (so the 0rpm segment in particular will be underrepresented)

推荐答案

您需要创建一个新列来说明两次采样之间的时间.

You need to create a new column to account for the time between samples.

对于这种事情,我更喜欢使用data.frames作为列表,所以:

I prefer data.frames to lists for this kind of thing, so:

d <- as.data.frame(fromJSON(paste(readLines(url))))
d$sample.time <- 0
d$sample.time[2:nrow(d)] <- d$time[2:nrow(d)]-d$time[1:(nrow(d)-1)]

现在您已经有了采样时间,您可以对采样时间大于1的任何东西简单地重复"节奏测量,并绘制该时间的直方图

now that you've got your sample times, you can simply "repeat" the cadence measures for anything with a sample time more than 1, and plot a histogram of that

hist(rep(x=d$cadence, times=d$sample.time),
     main="Histogram of Cadence", xlab="Cadence (RPM)",
     ylab="Time (presumably seconds)")

肯定会有一个更优雅的解决方案,它不会在非整数采样时间内消失,但这可以与您的采样数据一起使用.

There's bound to be a more elegant solution that wouldn't fall apart for non-integer sample times, but this works with your sample data.

re:更优雅,更通用的解决方案,您可以使用new.d <- aggregate(sample.time~cadence, data=d, FUN=sum)之类的东西来处理非整数采样时间,但是问题就变成了绘制看起来像频率表的东西的直方图,但是没有-整数频率.经过一番摸索之后,我得出的结论是,对于这种情况,您必须将自己的直方图滚动起来,方法是将数据进一步汇总到bin中,然后用条形图显示它们.

re: the more elegant, generalized solution, you can deal with non-integer sample times with something like new.d <- aggregate(sample.time~cadence, data=d, FUN=sum), but then the problem becomes plotting a histogram for something that looks like a frequency table, but with non-integer frequencies. After some poking around, I'm coming to the conclusion you'd have to roll-your-own histogram for this case by further aggregating the data into bins, and then displaying them with a barchart.

这篇关于R直方图显示在每个箱中花费的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆