从嵌套字典中的项目构建大 pandas DataFrame [英] Construct pandas DataFrame from items in nested dictionary

查看：134 发布时间：2017/3/25 22:16:39 python dataframe pandas

本文介绍了从嵌套字典中的项目构建大 pandas DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个嵌套字典'user_dict'，结构如下：

第1级 UserId（Long Integer）

级别2：类别（字符串）

3级：属性（float，ints等）

例如，这个字典的条目是：

  user_dict [12] = {
类别1：{att_1：1，
att_2：whatever}，
类别2：{att_1：23，
att_2：another}}

user_dict中的每个项目具有相同的结构，user_dict包含大量的项目，我想将其提供给大熊猫DataFrame，从属性构建系列。在这种情况下，分层索引对于目的是有用的。

具体来说，我的问题是，是否有一种方法来帮助DataFrame构造函数了解该系列应该是由字典中3级的值构成？

如果我尝试像：

  df = pandas.DataFrame（users_summary）

级别1（用户ID）被视为列，这与我想要实现的（与用户ID作为索引）相反。

我知道我可以在迭代字典条目之后构建系列，但如果有更直接的方法，这将是非常有用的。类似的问题将是询问是否可以从文件中列出的json对象构建一个熊猫DataFrame。

解决方案

大熊猫MultiIndex由元组列表组成。因此，最自然的方法是重塑您的输入dict，使其键是与您需要的多索引值对应的元组。然后，您可以使用 orient ='index'：使用 pd.DataFrame.from_dict 构建数据框/ p>

  user_dict = {12：{'Category 1'：{'att_1'：1，'att_2'：'whatever'}， 
'类别2'：{'att_1'：23，'att_2'：'another'}}，
 15：{'Category 1'：{'att_1'：10，'att_2' foo'}，
'Category 2'：{'att_1'：30，'att_2'：'bar'}}} 
 
 pd.DataFrame.from_dict（{（i，j） ：user_dict [i] [j] 
 for user in user_dict.keys（）
 for j in user_dict [i] .keys（）}，
 orient ='index'）
 
 
 att_1 att_2 
 12类别1 1无论
类别2 23另一个
 15类别1 10 foo 
类别2 30 bar

另一种方法是建立通过连接组件数据框，您的数据帧：

  user_ids = [] 
 frames = [] 
 
 for user_id，d in user_dict.iteritems（）：
 user_ids.append（user_id）
 frames.append（pd.DataFrame.from_dict（d，orient ='index'））
 
 pd.concat（frames，keys = user_ids）
 
 att_1 att_2 
 12类别1 1无论
类别2 23另一个
 15类别1 10 foo 
类别2 30 bar

Suppose I have a nested dictionary 'user_dict' with structure:

Level 1: UserId (Long Integer)

Level 2: Category (String)

Level 3: Assorted Attributes (floats, ints, etc..)

For example, an entry of this dictionary would be:

user_dict[12] = {
    "Category 1": {"att_1": 1, 
                   "att_2": "whatever"},
    "Category 2": {"att_1": 23, 
                   "att_2": "another"}}

each item in "user_dict" has the same structure and "user_dict" contains a large number of items which I want to feed to a pandas DataFrame, constructing the series from the attributes. In this case a hierarchical index would be useful for the purpose.

Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the "level 3" in the dictionary?

If I try something like:

df = pandas.DataFrame(users_summary)

The items in "level 1" (the user id's) are taken as columns, which is the opposite of what I want to achieve (have user id's as index).

I know I could construct the series after iterating over the dictionary entries, but if there is a more direct way this would be very useful. A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file.

解决方案

A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict, using the option orient='index':

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
                  'Category 2': {'att_1': 23, 'att_2': 'another'}},
             15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
                  'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                           for i in user_dict.keys() 
                           for j in user_dict[i].keys()},
                       orient='index')


               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

An alternative approach would be to build your dataframe up by concatenating the component dataframes:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
    user_ids.append(user_id)
    frames.append(pd.DataFrame.from_dict(d, orient='index'))

pd.concat(frames, keys=user_ids)

               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

这篇关于从嵌套字典中的项目构建大 pandas DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从嵌套字典中的项目构建大 pandas DataFrame [英] Construct pandas DataFrame from items in nested dictionary

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从嵌套字典中的项目构建大 pandas DataFrame [英] Construct pandas DataFrame from items in nested dictionary

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭