仅在知道结果测量计数的情况下对 pandas 数据框重新采样 [英] Resample pandas dataframe only knowing result measurement count
问题描述
我有一个看起来像这样的数据框:
I have a dataframe which looks like this:
Trial Measurement Data
0 0 12
1 4
2 12
1 0 12
1 12
2 0 12
1 12
2 NaN
3 12
我想对数据进行重新采样,以便每个试验只有两次测量 所以我想把它变成这样:
I want to resample my data so that every trial has just two measurements So I want to turn it into something like this:
Trial Measurement Data
0 0 8
1 8
1 0 12
1 12
2 0 12
1 12
这项相当罕见的任务源于我的数据在刺激表现方面有故意的抖动.
This rather uncommon task stems from the fact that my data has an intentional jitter on the part of the stimulus presentation.
我知道pandas具有重采样功能,但是我不知道如何将其应用于我的二级索引,同时根据一级索引将数据保留在离散类别中:(
I know pandas has a resample function, but I have no idea how to apply it to my second-level index while keeping the data in discrete categories based on the first-level index :(
我还想遍历我的一级索引,但是显然
Also, I wanted to iterate, over my first-level indices, but apparently
for sub_df in np.arange(len(df['Trial'].max()))
不起作用,因为由于'Trial'
是大熊猫找不到的索引.
Won't work because since 'Trial'
is an index pandas can't find it.
推荐答案
好吧,这不是我见过的最漂亮的,而是从看起来像这样的框架中
Well, it's not the prettiest I've ever seen, but from a frame looking like
>>> df
Trial Measurement Data
0 0 0 12
1 0 1 4
2 0 2 12
3 1 0 12
4 1 1 12
5 2 0 12
6 2 1 12
7 2 2 NaN
8 2 3 12
然后我们可以手动构建两个平均"对象,然后使用pd.melt
重塑输出的形状:
then we can manually build the two "average-like" objects and then use pd.melt
to reshape the output:
avg = df.groupby("Trial")["Data"].agg({0: lambda x: x.head((len(x)+1)//2).mean(),
1: lambda x: x.tail((len(x)+1)//2).mean()})
result = pd.melt(avg.reset_index(), "Trial", var_name="Measurement", value_name="Data")
result = result.sort("Trial").set_index(["Trial", "Measurement"])
产生
>>> result
Data
Trial Measurement
0 0 8
1 8
1 0 12
1 12
2 0 12
1 12
这篇关于仅在知道结果测量计数的情况下对 pandas 数据框重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!