以“bin size”/“频率”重新采样 pandas 数据帧 [英] Resample Pandas Dataframe with "bin size"/"frequency"
问题描述
This:
时间值
ID测量
ET001 0 0 2
1 0.15 3
2 0.3 4
3 0.45 3
4 0.6 3
5 0.75 2
6 0.9 3
ET002 0 0 2
1 0.16 5
2 0.32 4
3 0.45 3
4 0.6 3
5 0.75 2
我想变成这样:
时间值
ID测量
ET001 0 0.15 3
1 0.6 2.7
2 0 .9 3
ET002 0 0.16 3.7
1 0.6 2.7
我试过将我的时间列变成如下所示的熊猫日期时间索引,然后使用重新采样:
df = df.set_index(pd.DatetimeIndex (timecourse_normed ['Time']))
但是第一行给了我实际的日期(1970年) - 这对第二行来说是无益的。浏览arund堆栈溢出我发现一些类似的quiestios,所有这些都没有基于熊猫的重新采样的解决方案 - 也可悲的是,我的用例不可行。
df = df.groupby(level = 0).resample(rule ='0.1S',how = np.mean)
你能给我一个手?
解决方案我认为你的想法可以 - 将每个
ID
分为3个记录(如 ntile(3)在SQL中)组,并计算平均值。要创建这个数字,我们可以使用这个事实,即您已经为每一行 -测量
索引级别设置了序列号。所以我们可以将这个数字除以3
来获取我们需要的数字:>>> df
时间值ntile
ID测量
ET001 0 0.00 2 0
1 0.15 3 0
2 0.30 4 0
3 0.45 3 1
4 0.60 3 1
5 0.75 2 1
6 0.90 3 2
ET002 0 0.00 2 0
1 0.16 5 0
2 0.32 4 0
3 0.45 3 1
4 0.60 3 1
5 0.75 2 1
所以我们可以使用这样的帮助函数,并将其应用于每个组以获得所需的结果。
>>> def helper(x):
... x = x.reset_index()
... x = x.groupby(x ['measurement']。div(3))。 b $ b ... del x ['measurement']
... return x
...
>>> df.groupby(level = 0).apply(helper)
时间值
ID测量
ET001 0 0.15 3.000000
1 0.60 2.666667
2 0.90 3.000000
ET002 0 0.16 3.666667
1 0.60 2.666667
希望它有帮助。
9I have a multi-indexed dataframe which I would like to resample to reduce the frequency of datapoints by a factor of 3 (meaning that every 3 rows become one).
This:
time value ID measurement ET001 0 0 2 1 0.15 3 2 0.3 4 3 0.45 3 4 0.6 3 5 0.75 2 6 0.9 3 ET002 0 0 2 1 0.16 5 2 0.32 4 3 0.45 3 4 0.6 3 5 0.75 2
I want to turn into this:
time value ID measurement ET001 0 0.15 3 1 0.6 2.7 2 0.9 3 ET002 0 0.16 3.7 1 0.6 2.7
I tried to turn my time column into a pandas datetime index like so, and then use resample:
df = df.set_index(pd.DatetimeIndex(timecourse_normed['Time'])) df = df.groupby(level=0).resample(rule='0.1S', how=np.mean)
But the first line of that gives me actual dates (1970-something) which is quite unhelpful for the second line. Browsing arund stack overflow I found some similar quiestios which all had solutions NOT based on panda's resample - and also, sadly, not viable for my use case.
Could you give me a hand?
解决方案I think the idea for you could be - divide records inside each
ID
into bins by 3 records each (like ntile(3) in SQL) group by it and calculate mean. To create this numbers we can use the fact that you already have sequential numbers for each row -measurement
level of index. So we can just divide this number by3
to get numbers we need:>>> df time value ntile ID measurement ET001 0 0.00 2 0 1 0.15 3 0 2 0.30 4 0 3 0.45 3 1 4 0.60 3 1 5 0.75 2 1 6 0.90 3 2 ET002 0 0.00 2 0 1 0.16 5 0 2 0.32 4 0 3 0.45 3 1 4 0.60 3 1 5 0.75 2 1
So we can use helper function like this and apply it to each group to get desired results.
>>> def helper(x): ... x = x.reset_index() ... x = x.groupby(x['measurement'].div(3)).mean() ... del x['measurement'] ... return x ... >>> df.groupby(level=0).apply(helper) time value ID measurement ET001 0 0.15 3.000000 1 0.60 2.666667 2 0.90 3.000000 ET002 0 0.16 3.666667 1 0.60 2.666667
Hope it helps.
这篇关于以“bin size”/“频率”重新采样 pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!