在 pandas 中添加缺失数据组合的值 [英] Adding values for missing data combinations in Pandas
本文介绍了在 pandas 中添加缺失数据组合的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个熊猫数据框,其中包含以下内容:
I've got a pandas data frame containing something like the following:
person_id status year count
0 'pass' 1980 4
0 'fail' 1982 1
1 'pass' 1981 2
如果我知道每个字段的所有可能值为:
If I know that all possible values for each field are:
all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
我想用count=0
填充原始数据框,以获取缺少的数据组合(person_id,状态和年份),即我希望新数据框包含:
I'd like to populate the original data frame with count=0
for missing data combinations (of person_id, status, and year), i.e. I'd like the new data frame to contain:
person_id status year count
0 'pass' 1980 4
0 'pass' 1981 0
0 'pass' 1982 0
0 'fail' 1980 0
0 'fail' 1981 0
0 'fail' 1982 2
1 'pass' 1980 0
1 'pass' 1981 2
1 'pass' 1982 0
1 'fail' 1980 0
1 'fail' 1981 0
1 'fail' 1982 0
2 'pass' 1980 0
2 'pass' 1981 0
2 'pass' 1982 0
2 'fail' 1980 0
2 'fail' 1981 0
2 'fail' 1982 0
有没有一种有效的方法可以在大熊猫中实现这一目标?
Is there an efficient way to achieve this in pandas?
推荐答案
通过MultiIndex.from_product()然后创建set_index()
,reindex()
,reset_index()
创建MultiIndex.
create a MultiIndex by MultiIndex.from_product() and then set_index()
, reindex()
, reset_index()
.
import pandas as pd
import io
all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
df = pd.read_csv(io.BytesIO("""person_id status year count
0 pass 1980 4
0 fail 1982 1
1 pass 1981 2"""), delim_whitespace=True)
names = ["person_id", "status", "year"]
mind = pd.MultiIndex.from_product(
[all_person_ids, all_statuses, all_years], names=names)
df.set_index(names).reindex(mind, fill_value=0).reset_index()
这篇关于在 pandas 中添加缺失数据组合的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文