从嵌套字典中的项目构建大 pandas DataFrame [英] Construct pandas DataFrame from items in nested dictionary
问题描述
第1级 UserId(Long Integer)
级别2:类别(字符串)
3级:属性(float,ints等)
例如,这个字典的条目是:
user_dict [12] = {
类别1:{att_1:1,
att_2:whatever},
类别2:{att_1:23,
att_2:another}}
user_dict中的每个项目具有相同的结构,user_dict包含大量的项目,我想将其提供给大熊猫DataFrame,从属性构建系列。在这种情况下,分层索引对于目的是有用的。
具体来说,我的问题是,是否有一种方法来帮助DataFrame构造函数了解该系列应该是由字典中3级的值构成?
如果我尝试像:
df = pandas.DataFrame(users_summary)
级别1(用户ID)被视为列,这与我想要实现的(与用户ID作为索引)相反。
我知道我可以在迭代字典条目之后构建系列,但如果有更直接的方法,这将是非常有用的。类似的问题将是询问是否可以从文件中列出的json对象构建一个熊猫DataFrame。
大熊猫MultiIndex由元组列表组成。因此,最自然的方法是重塑您的输入dict,使其键是与您需要的多索引值对应的元组。然后,您可以使用 orient ='index'
:使用 pd.DataFrame.from_dict
构建数据框/ p>
user_dict = {12:{'Category 1':{'att_1':1,'att_2':'whatever'},
'类别2':{'att_1':23,'att_2':'another'}},
15:{'Category 1':{'att_1':10,'att_2' foo'},
'Category 2':{'att_1':30,'att_2':'bar'}}}
pd.DataFrame.from_dict({(i,j) :user_dict [i] [j]
for user in user_dict.keys()
for j in user_dict [i] .keys()},
orient ='index')
att_1 att_2
12类别1 1无论
类别2 23另一个
15类别1 10 foo
类别2 30 bar
另一种方法是建立通过连接组件数据框,您的数据帧:
user_ids = []
frames = []
for user_id,d in user_dict.iteritems():
user_ids.append(user_id)
frames.append(pd.DataFrame.from_dict(d,orient ='index'))
pd.concat(frames,keys = user_ids)
att_1 att_2
12类别1 1无论
类别2 23另一个
15类别1 10 foo
类别2 30 bar
Suppose I have a nested dictionary 'user_dict' with structure:
Level 1: UserId (Long Integer)
Level 2: Category (String)
Level 3: Assorted Attributes (floats, ints, etc..)
For example, an entry of this dictionary would be:
user_dict[12] = {
"Category 1": {"att_1": 1,
"att_2": "whatever"},
"Category 2": {"att_1": 23,
"att_2": "another"}}
each item in "user_dict" has the same structure and "user_dict" contains a large number of items which I want to feed to a pandas DataFrame, constructing the series from the attributes. In this case a hierarchical index would be useful for the purpose.
Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the "level 3" in the dictionary?
If I try something like:
df = pandas.DataFrame(users_summary)
The items in "level 1" (the user id's) are taken as columns, which is the opposite of what I want to achieve (have user id's as index).
I know I could construct the series after iterating over the dictionary entries, but if there is a more direct way this would be very useful. A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file.
A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict
, using the option orient='index'
:
user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
'Category 2': {'att_1': 23, 'att_2': 'another'}},
15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
'Category 2': {'att_1': 30, 'att_2': 'bar'}}}
pd.DataFrame.from_dict({(i,j): user_dict[i][j]
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')
att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar
An alternative approach would be to build your dataframe up by concatenating the component dataframes:
user_ids = []
frames = []
for user_id, d in user_dict.iteritems():
user_ids.append(user_id)
frames.append(pd.DataFrame.from_dict(d, orient='index'))
pd.concat(frames, keys=user_ids)
att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar
这篇关于从嵌套字典中的项目构建大 pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!