重新索引数据框的问题(处理分类数据) [英] problem with re index dataframe (dealing with categorical data)

查看：38 发布时间：2021/7/16 20:25:30 python python-3.x pandas scikit-learn

本文介绍了重新索引数据框的问题(处理分类数据)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个看起来像这样的数据

i have a data that look like this

subject_id      hour_measure       urine color        heart_rate
3                 1                  red                40
3                 1.15               red                 60
4                  2                  yellow             50

我想重新索引数据以对每位患者进行 24 小时测量我使用以下代码

i want to re index data to make 24 hour of measurement for every patient i use the following code

mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,24)],
                                  names=['subject_id','hour_measure'])
df = df.groupby(['subject_id','hour_measure']).mean().reindex(mux).reset_index()
df.to_csv('totalafterreindex.csv')

它适用于数值，但对于分类值，它删除了它，我如何增强此代码以对数字使用均值，对分类使用最常见的

it works good with numeric values , but with categorical values it removed it , how can i enhance this code to use mean for numeric and most frequent for categorical

想要的输出

 subject_id      hour_measure       urine color        heart_rate
    3                 1                  red                40
    3                 2                  red                 60
    3                 3                  yellow             50  
    3                 4                  yellow             50  
    ..                ..                ..

推荐答案

想法是使用 GroupBy.agg 用 mean 表示数字，mode 表示分类, 还添加了 next 和 iter 用于返回 Nones if mode 返回空值:

Idea is use GroupBy.agg with mean for numeric and mode for categorical, also is added next with iter for return Nones if mode return empty value:

mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,24)],
                                  names=['subject_id','hour_measure'])
f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else next(iter(x.mode()), None)
df1 = df.groupby(['subject_id','hour_measure']).agg(f).reindex(mux).reset_index()

详细信息:

print (df.groupby(['subject_id','hour_measure']).agg(f))
                        urine color  heart_rate
subject_id hour_measure                        
3          1.00                 red          40
           1.15                 red          60
4          2.00              yellow          50

如果需要，最后根据 subject_id 使用GroupBy.ffill:

Last if necessary forward filling missing values per subject_id use GroupBy.ffill:

cols = df.columns.difference(['subject_id','hour_measure'])
df[cols] = df.groupby('subject_id')[cols].ffill()

这篇关于重新索引数据框的问题(处理分类数据)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

重新索引数据框的问题(处理分类数据) [英] problem with re index dataframe (dealing with categorical data)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

重新索引数据框的问题(处理分类数据) [英] problem with re index dataframe (dealing with categorical data)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭