用NA值填充dict以允许转换为 pandas 数据框 [英] Filling dict with NA values to allow conversion to pandas dataframe

查看:60
本文介绍了用NA值填充dict以允许转换为 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字典,其中包含不同时间滞后的计算值,这意味着它们在不同的日期开始.例如,我拥有的数据可能如下所示:

I have a dict that holds computed values on different time lags, which means they start on different dates. For instance, the data I have may look like the following:

Date      col1    col2    col3    col4    col5
01-01-15  5       12      1      -15      10
01-02-15  7       0       9       11      7
01-03-15          6       1       2       18
01-04-15          9       8       10
01-05-15         -4               7
01-06-15         -11             -1
01-07-15          6               

其中每个标头是键,而每一列值是每个键的值(为此,我使用defaultdict(list)).当我尝试运行pd.DataFrame.from_dict(d)时,我可以理解地得到一个错误,指出所有数组的长度必须相同.是否有一种简单/简单的方法来填充或填充数字,以使输出最终成为以下数据帧?

Where each header is the key, and each column of values is the value for each key (I'm using a defaultdict(list) for this). When I try to run pd.DataFrame.from_dict(d) I understandably get an error stating that all arrays must be the same length. Is there an easy/trivial way to fill or pad the numbers so that the output would end up being the following dataframe?

Date      col1    col2    col3    col4    col5
01-01-15  5       12      1      -15      10
01-02-15  7       0       9       11      7
01-03-15  NaN     6       1       2       18
01-04-15  NaN     9       8       10      NaN
01-05-15  NaN    -4       NaN     7       NaN
01-06-15  NaN    -11      NaN    -1       NaN
01-07-15  NaN     6       NaN     NaN     NaN

还是我必须手动对每个列表执行此操作?

Or will I have to do this manually with each list?

以下是重新创建字典的代码:

Here is the code to recreate the dictionary:

import pandas as pd
from collections import defaultdict

d = defaultdict(list)
d["Date"].extend([
    "01-01-15", 
    "01-02-15", 
    "01-03-15", 
    "01-04-15", 
    "01-05-15",
    "01-06-15",
    "01-07-15"
]
d["col1"].extend([5, 7])
d["col2"].extend([12, 0, 6, 9, -4, -11, 6])
d["col3"].extend([1, 9, 1, 8])
d["col4"].extend([-15, 11, 2, 10, 7, -1])
d["col5"].extend([10, 7, 18])

推荐答案

另一种选择是将from_dictorient='index'一起使用,然后进行转置:

Another option is to use from_dict with orient='index' and then take the tranpose:

my_dict = {'a' : [1, 2, 3, 4, 5], 'b': [1, 2, 3]}
df = pd.DataFrame.from_dict(my_dict, orient='index').T

请注意,如果您的列具有不同的类型(例如,列),则可能会遇到dtype问题.浮在一列中,字符串在另一列中.

Note that you could run into problems with dtype if your columns have different types, e.g. floats in one column, strings in another.

结果输出:

     a    b
0  1.0  1.0
1  2.0  2.0
2  3.0  3.0
3  4.0  NaN
4  5.0  NaN

这篇关于用NA值填充dict以允许转换为 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆