Pandas DataFrame插入/填充以前日期的缺失行 [英] Pandas DataFrame insert / fill missing rows from previous dates

查看：616 发布时间：2020/10/17 0:53:10 python pandas dataframe

本文介绍了Pandas DataFrame插入/填充以前日期的缺失行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 DataFrame ，它由 date s，其他列和一个数值组成，其中一些值组合在其他列可能会丢失，我想从以前的日期开始填充它们。

I have a DataFrame consisting of dates, other columns and a numerical value, where some value combinations in "other columns" could be missing, and I want to populate them from previous dates.

示例。假设 DataFrame 如下所示。您可以在 2016-01-01 上看到，我们有（LN，A），（LN，B），（NY，A）和（NY，B）在（位置，范围）列上。

Example. Say the DataFrame is like below. You can see on 2016-01-01, we have data for (LN, A), (LN, B), (NY, A) and (NY, B) on columns (location, band).


        date  location  band  value
0 2016-01-01        LN     A   10.0
1 2016-01-01        LN     B    5.0
2 2016-01-01        NY     A    9.0
3 2016-01-01        NY     B    6.0
4 2016-01-02        LN     A   11.0
5 2016-01-02        NY     B    7.0
6 2016-01-03        NY     A   10.0

然后您注意到 2016-01-02 ，我们只有（LN，A）和（NY，B），但（LN，B）和（NY，A）丢失。同样，在 2016-01-03 上，仅（纽约州，美国）可用；所有其他三个组合都丢失了。

Then you notice on 2016-01-02, we only have (LN, A) and (NY, B), but (LN, B) and (NY, A) are missing. Again, on 2016-01-03, only (NY, A) is available; all other three combinations are missing.

我要做的是填充其前身中每个日期的丢失组合。比如说 2016-01-02 ，我想再添加两行，从 2016-01-01 ：（LN，B，5.0）和（NY，A，9.0） $ c>（位置，范围，价值）。 2016-01-03 也是如此。为了使整个事情像下面这样：

What I want to do is to populate the missing combinations of each date from its predecessor. Say for 2016-01-02, I would like to add two more rows, "rolled over" from 2016-01-01: (LN, B, 5.0) and (NY, A, 9.0) for columns (location, band, value). Same for 2016-01-03. So as to make the whole thing like below:


        date  location  band  value
 0 2016-01-01        LN     A   10.0
 1 2016-01-01        LN     B    5.0
 2 2016-01-01        NY     A    9.0
 3 2016-01-01        NY     B    6.0
 4 2016-01-02        LN     A   11.0
 5 2016-01-02        NY     B    7.0
 6 2016-01-03        NY     A   10.0
 7 2016-01-02        LN     B    5.0
 8 2016-01-02        NY     A    9.0
 9 2016-01-03        LN     A   11.0
10 2016-01-03        LN     B    5.0
11 2016-01-03        NY     B    7.0

注意第7-11行分别从第1、2、4、7和5行填充。顺序不是很重要，因为如果我需要的所有数据都存在，我总是可以在事后进行排序。

Note rows 7-11 are populated from rows 1, 2, 4, 7 and 5, respectively. The order is not really important as I can always sort afterwards if all the data I need is present.

有人需要帮助吗？非常感谢！

Anyone to help? Thanks a lot!

推荐答案

您可以使用 unstack / stack 方法获取所有缺失值，然后进行正向填充：

You can use a unstack/stack method to get all missing values, followed by a forward fill:

# Use unstack/stack to add missing locations.
df = df.set_index(['date', 'location', 'band']) \
       .unstack(level=['location', 'band']) \
       .stack(level=['location', 'band'], dropna=False)

# Forward fill NaN values within ['location', 'band'] groups.
df = df.groupby(level=['location', 'band']).ffill().reset_index()

或者您可以直接构建包含所有组合的 MultiIndex ：

Or you can directly build a MultiIndex containing all combinations:

# Build the full MultiIndex, set the partial MultiIndex, and reindex.
levels = ['date', 'location', 'band']
full_idx = pd.MultiIndex.from_product([df[col].unique() for col in levels], names=levels)
df = df.set_index(levels).reindex(full_idx)

# Forward fill NaN values within ['location', 'band'] groups.
df = df.groupby(level=['location', 'band']).ffill().reset_index()

任一方法的结果输出：

         date location band  value
0  2016-01-01       LN    A   10.0
1  2016-01-01       LN    B    5.0
2  2016-01-01       NY    A    9.0
3  2016-01-01       NY    B    6.0
4  2016-01-02       LN    A   11.0
5  2016-01-02       LN    B    5.0
6  2016-01-02       NY    A    9.0
7  2016-01-02       NY    B    7.0
8  2016-01-03       LN    A   11.0
9  2016-01-03       LN    B    5.0
10 2016-01-03       NY    A   10.0
11 2016-01-03       NY    B    7.0

这篇关于Pandas DataFrame插入/填充以前日期的缺失行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas DataFrame插入/填充以前日期的缺失行 [英] Pandas DataFrame insert / fill missing rows from previous dates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas DataFrame插入/填充以前日期的缺失行 [英] Pandas DataFrame insert / fill missing rows from previous dates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭