重新索引数据框的问题(处理分类数据) [英] problem with re index dataframe (dealing with categorical data)
问题描述
我有一个看起来像这样的数据
i have a data that look like this
subject_id hour_measure urine color heart_rate
3 1 red 40
3 1.15 red 60
4 2 yellow 50
我想重新索引数据以对每位患者进行 24 小时测量我使用以下代码
i want to re index data to make 24 hour of measurement for every patient i use the following code
mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,24)],
names=['subject_id','hour_measure'])
df = df.groupby(['subject_id','hour_measure']).mean().reindex(mux).reset_index()
df.to_csv('totalafterreindex.csv')
它适用于数值,但对于分类值,它删除了它,我如何增强此代码以对数字使用均值,对分类使用最常见的
it works good with numeric values , but with categorical values it removed it , how can i enhance this code to use mean for numeric and most frequent for categorical
想要的输出
subject_id hour_measure urine color heart_rate
3 1 red 40
3 2 red 60
3 3 yellow 50
3 4 yellow 50
.. .. ..
推荐答案
想法是使用 GroupBy.agg
用 mean
表示数字,mode
表示分类, 还添加了 next
和 iter
用于返回 None
s if mode
返回空值:
Idea is use GroupBy.agg
with mean
for numeric and mode
for categorical, also is added next
with iter
for return None
s if mode
return empty value:
mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,24)],
names=['subject_id','hour_measure'])
f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else next(iter(x.mode()), None)
df1 = df.groupby(['subject_id','hour_measure']).agg(f).reindex(mux).reset_index()
详细信息:
print (df.groupby(['subject_id','hour_measure']).agg(f))
urine color heart_rate
subject_id hour_measure
3 1.00 red 40
1.15 red 60
4 2.00 yellow 50
如果需要,最后根据 subject_id
使用GroupBy.ffill
:
Last if necessary forward filling missing values per subject_id
use GroupBy.ffill
:
cols = df.columns.difference(['subject_id','hour_measure'])
df[cols] = df.groupby('subject_id')[cols].ffill()
这篇关于重新索引数据框的问题(处理分类数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!