在大 pandas 数据框中存储复杂字典 [英] store complex dictionary in pandas dataframe
问题描述
这个问题跟随我以前的一个。这是一个母语字典,之前是
商店字典在大熊猫数据框中
我有一个字典
<$ p $$$ {$ {$ {$'$$$$$$ 'B':200,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B' 'C':300}}},
234:{'choice':1,'city':'New York','choice_set':{0:{'A':100,'B':400 },1:{'A':100,'B':300,'C':1000}}},
1876:{'choice':2,'city':'New York','choice_set ':{0:{'A':100,'B':400,'C':300},1:{'A':100,'B':300,'C' 'A':600,'B':200,'C':100}}
}},
'伦敦':{1534:{'choice':0,'city' ','choice_set':{0:{'A':100,'B':400,'C ':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}} ,
2134:{'choice':1,'city':'London','choice_set':{0:{'A':100,'B':600},1:{'A' 170,'B':300,'C':1000}}},
1776:{'choice':2,'city':'London','choice_set':{0:{'A' 100,'B':400,'C':500},1:{'A':100,'B':300},2:{'A':600,'B':200,'C' 100'}}},
'Paris':{1534:{'choice':0,'city':'Paris','choice_set':{0:{'A' 'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B' 'C':300}}},
2134:{'choice':1,'city':'Paris','choice_set':{0:{'A':100,'B':600} ,1:{'A':170,'B':300,'C':1000}}},
1776:{'choice':1,'city':'Paris','choice_set' {0:{'A':100,'B':400,'C':500},1:{'A':100,'B':300}}}
}}
我想要它成为一个大熊猫数据框架这个(一些具体的内容可能不太准确)
id选项A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2纽约伦敦巴黎
1234 0 100 200 300 200 300 300 500 300 300 1 0 0
234 1 100 400 - 100 300 1000 - - - 1 0 0
1876 2 100 400 300 100 300 1000 600 200 100 1 0 0
1534 0 100 200 300 200 300 300 500 300 300 0 1 0
2134 1 100 400 - 100 300 1000 - - - 0 1 0
2006 2 100 400 300 100 300 1000 600 200 100 0 1 0
1264 0 100 200 300 200 300 300 500 300 300 0 0 1
1454 1 100 400 - 100 300 1000 - - - 0 0 1
1776 1 100 400 300 100 300 - - - - 0 0 1
在老问题中, sub_dictionary的一种方法:
df = pd.read_json(json.dumps(dictionary_example))。T
def to_s(r):
return pd.read_json(json.dumps(r))。unpack()
flattened_choice_set = df [choice_set]。apply(to_s)
flattened_choice_set.columns = [' _'。join((str(col [0]),col [1]))for col in flattened_choice_set.columns]
result = pd.merge(df,flattened_choice_set,
left_index = True,right_index = True).drop(choice_set,axis = 1)
做大字典吗?
所有最好的,
凯文
如前所述,以前提供的解决方案不是很整齐。这一个更可读,为您当前的问题提供解决方案。如果可能,您应该重新考虑您的数据结构,但...
df = pd.DataFrame()
question_ids = [ 0,1,2]
为每个城市选择组合创建一行数据帧,其中包含字典在__cample.iteritems()中的_,city_value的选择集列
city_df = pd.DataFrame.from_dict (city_value).T
city_df = city_df.join(pd.DataFrame(city_df [choice_set]。to_dict())。T)
df = df.append(city_df)
将选择集中的奇怪列名加入您的df
for i in question_ids:
choice_df = pd.DataFrame(df [i] .to_dict())。T
choice_df.columns = map(lambda x:{ } _ {}。format(x,i),choice_df.columns)
df = df.join(choice_df)
修复城市列
df = pd.get_dummies(df,prefix =,prefix_sep =,columns = ['city'])
df.drop( question_ids + ['choice_set'],axis = 1,inplace = True)
#可选从问题中删除NaN:
#df = df.fillna(0)
df
This question follows my previous one.it's a mother dictionary of the one before store dictionary in pandas dataframe
I have a dictionary
dictionary_example={'New York':{1234:{'choice':0,'city':'New York','choice_set':{0:{'A':100,'B':200,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
234:{'choice':1,'city':'New York','choice_set':{0:{'A':100,'B':400},1:{'A':100,'B':300,'C':1000}}},
1876:{'choice':2,'city':'New York','choice_set':{0:{'A': 100,'B':400,'C':300},1:{'A':100,'B':300,'C':1000},2:{'A':600,'B':200,'C':100}}
}},
'London':{1534:{'choice':0,'city':'London','choice_set':{0:{'A':100,'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
2134:{'choice':1,'city':'London','choice_set':{0:{'A':100,'B':600},1:{'A':170,'B':300,'C':1000}}},
1776:{'choice':2,'city':'London','choice_set':{0:{'A':100,'B':400,'C':500},1:{'A':100,'B':300},2:{'A':600,'B':200,'C':100}}}},
'Paris':{1534:{'choice':0,'city':'Paris','choice_set':{0:{'A':100,'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
2134:{'choice':1,'city':'Paris','choice_set':{0:{'A':100,'B':600},1:{'A':170,'B':300,'C':1000}}},
1776:{'choice':1,'city':'Paris','choice_set':{0:{'A': 100,'B':400,'C':500},1:{'A':100,'B':300}}}
}}
I want it become a pandas data frame like this (some specific value inside maybe not exactly accurate)
id choice A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2 New York London Paris
1234 0 100 200 300 200 300 300 500 300 300 1 0 0
234 1 100 400 - 100 300 1000 - - - 1 0 0
1876 2 100 400 300 100 300 1000 600 200 100 1 0 0
1534 0 100 200 300 200 300 300 500 300 300 0 1 0
2134 1 100 400 - 100 300 1000 - - - 0 1 0
2006 2 100 400 300 100 300 1000 600 200 100 0 1 0
1264 0 100 200 300 200 300 300 500 300 300 0 0 1
1454 1 100 400 - 100 300 1000 - - - 0 0 1
1776 1 100 400 300 100 300 - - - - 0 0 1
In the old question the nice guy provide a way for the sub_dictionary:
df = pd.read_json(json.dumps(dictionary_example)).T
def to_s(r):
return pd.read_json(json.dumps(r)).unstack()
flattened_choice_set = df["choice_set"].apply(to_s)
flattened_choice_set.columns = ['_'.join((str(col[0]), col[1])) for col in flattened_choice_set.columns]
result = pd.merge(df, flattened_choice_set,
left_index=True, right_index=True).drop("choice_set", axis=1)
Any way to do for the large dictionary?
All the best, Kevin
The previously provided solution, as you quote, is not a very neat one. This one is more readable and provides the solution for your current problem. If possible you should reconsider your data structure though...
df = pd.DataFrame()
question_ids = [0,1,2]
Create a dataframe with a row for every city-choice combination, with dictionary in choice set column
for _, city_value in dictionary_example.iteritems():
city_df = pd.DataFrame.from_dict(city_value).T
city_df = city_df.join(pd.DataFrame(city_df["choice_set"].to_dict()).T)
df = df.append(city_df)
Join the weird column names from choice set to your df
for i in question_ids:
choice_df = pd.DataFrame(df[i].to_dict()).T
choice_df.columns = map(lambda x: "{}_{}".format(x,i), choice_df.columns)
df = df.join(choice_df)
Fix the city columns
df = pd.get_dummies(df, prefix="", prefix_sep="", columns=['city'])
df.drop(question_ids + ['choice_set'], axis=1, inplace=True)
# Optional to remove NaN from questions:
# df = df.fillna(0)
df
这篇关于在大 pandas 数据框中存储复杂字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!