在 pandas 中添加缺失数据组合的值 [英] Adding values for missing data combinations in Pandas

查看:61
本文介绍了在 pandas 中添加缺失数据组合的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,其中包含以下内容:

I've got a pandas data frame containing something like the following:

person_id   status    year    count
0           'pass'    1980    4
0           'fail'    1982    1
1           'pass'    1981    2

如果我知道每个字段的所有可能值为:

If I know that all possible values for each field are:

all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]

我想用count=0填充原始数据框,以获取缺少的数据组合(person_id,状态和年份),即我希望新数据框包含:

I'd like to populate the original data frame with count=0 for missing data combinations (of person_id, status, and year), i.e. I'd like the new data frame to contain:

person_id   status    year    count
0           'pass'    1980    4
0           'pass'    1981    0
0           'pass'    1982    0
0           'fail'    1980    0
0           'fail'    1981    0
0           'fail'    1982    2
1           'pass'    1980    0
1           'pass'    1981    2
1           'pass'    1982    0
1           'fail'    1980    0
1           'fail'    1981    0
1           'fail'    1982    0
2           'pass'    1980    0
2           'pass'    1981    0
2           'pass'    1982    0
2           'fail'    1980    0
2           'fail'    1981    0
2           'fail'    1982    0

有没有一种有效的方法可以在大熊猫中实现这一目标?

Is there an efficient way to achieve this in pandas?

推荐答案

通过MultiIndex.from_product()然后创建set_index()reindex()reset_index()创建MultiIndex.

create a MultiIndex by MultiIndex.from_product() and then set_index(), reindex(), reset_index().

import pandas as pd
import io

all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
df = pd.read_csv(io.BytesIO("""person_id   status    year    count
0           pass    1980    4
0           fail    1982    1
1           pass    1981    2"""), delim_whitespace=True)
names = ["person_id", "status", "year"]

mind = pd.MultiIndex.from_product(
    [all_person_ids, all_statuses, all_years], names=names)
df.set_index(names).reindex(mind, fill_value=0).reset_index()

这篇关于在 pandas 中添加缺失数据组合的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆