从嵌套字典中的项目构造pandas DataFrame [英] Construct pandas DataFrame from items in nested dictionary

查看:48
本文介绍了从嵌套字典中的项目构造pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个带有结构的嵌套字典user_dict":

Suppose I have a nested dictionary 'user_dict' with structure:

  • 级别 1: 用户 ID(长整数)
  • 级别 2: 类别(字符串)
  • 级别 3: 各种属性(浮点数、整数等)
  • Level 1: UserId (Long Integer)
  • Level 2: Category (String)
  • Level 3: Assorted Attributes (floats, ints, etc..)

例如,这本词典的一个条目是:

For example, an entry of this dictionary would be:

user_dict[12] = {
    "Category 1": {"att_1": 1, 
                   "att_2": "whatever"},
    "Category 2": {"att_1": 23, 
                   "att_2": "another"}}

user_dict 中的每个项目都具有相同的结构,并且 user_dict 包含大量我想提供给 Pandas DataFrame 的项目,从属性构建系列.在这种情况下,分层索引将非常有用.

each item in user_dict has the same structure and user_dict contains a large number of items which I want to feed to a pandas DataFrame, constructing the series from the attributes. In this case a hierarchical index would be useful for the purpose.

具体来说,我的问题是是否有一种方法可以帮助 DataFrame 构造函数理解应该从字典中的级别 3"的值构建系列?

Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the "level 3" in the dictionary?

如果我尝试以下操作:

df = pandas.DataFrame(users_summary)

级别 1"中的项目(UserId 的)被视为列,这与我想要实现的(以 UserId 为索引)相反.

The items in "level 1" (the UserId's) are taken as columns, which is the opposite of what I want to achieve (have UserId's as index).

我知道我可以在遍历字典条目后构建该系列,但如果有更直接的方法,这将非常有用.一个类似的问题是询问是否可以从文件中列出的 json 对象构造一个 Pandas DataFrame.

I know I could construct the series after iterating over the dictionary entries, but if there is a more direct way this would be very useful. A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file.

推荐答案

pandas MultiIndex 由元组列表组成.因此,最自然的方法是重塑您的输入字典,使其键是与您需要的多索引值相对应的元组.然后你可以使用 pd.DataFrame.from_dict 构建你的数据框,使用选项 orient='index':

A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict, using the option orient='index':

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
                  'Category 2': {'att_1': 23, 'att_2': 'another'}},
             15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
                  'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                           for i in user_dict.keys() 
                           for j in user_dict[i].keys()},
                       orient='index')


               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

另一种方法是通过连接组件数据框来构建您的数据框:

An alternative approach would be to build your dataframe up by concatenating the component dataframes:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
    user_ids.append(user_id)
    frames.append(pd.DataFrame.from_dict(d, orient='index'))

pd.concat(frames, keys=user_ids)

               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

这篇关于从嵌套字典中的项目构造pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆