嵌套字典中的pandas数据框(elasticsearch结果) [英] pandas dataframe from a nested dictionary (elasticsearch result)
问题描述
我很难将来自Elasticsearch聚合的结果转换为熊猫. 我正在尝试编写一个抽象函数,该函数将使用嵌套字典(任意级别的数量)并将其展平为pandas数据框
I am having hard time translating results from elasticsearch aggregations to pandas. I am trying to write an abstract function which would take nested dictionary (arbitrary number of levels) and flatten them into a pandas dataframe
典型结果如下所示
-我也添加了父键
x1 = {u'xColor': {u'buckets': [{u'doc_count': 4,
u'key': u'red',
u'xMake': {u'buckets': [{u'doc_count': 3,
u'key': u'honda',
u'xCity': {u'buckets': [{u'doc_count': 2, u'key': u'ROME'},
{u'doc_count': 1, u'key': u'Paris'}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}},
{u'doc_count': 1,
u'key': u'bmw',
u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Paris'}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}},
{u'doc_count': 2,
u'key': u'blue',
u'xMake': {u'buckets': [{u'doc_count': 1,
u'key': u'ford',
u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Paris'}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}},
{u'doc_count': 1,
u'key': u'toyota',
u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Berlin'}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}},
{u'doc_count': 2,
u'key': u'green',
u'xMake': {u'buckets': [{u'doc_count': 1,
u'key': u'ford',
u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Berlin'}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}},
{u'doc_count': 1,
u'key': u'toyota',
u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Berlin'}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}}],
u'doc_count_error_upper_bound': 0,
u'sum_other_doc_count': 0}}
我想要的是一个具有最低级别doc_count的数据框
what I would like to have is a dataframe with the doc_count of the lowest level
第一条记录
red-honda-rome-2
red-honda-paris-1
red-bmw-paris-1
我在此处遇到了大熊猫中的json_normalize,但是我不明白如何输入参数,我也发现了扁平化的不同建议嵌套的字典,但无法真正理解它们的工作原理. 任何帮助我入门的帮助将不胜感激 Elasticsearch结果到表中
I came across json_normalize in pandas here but do not understand how to put the arguments and I and have seen different suggestions for flattening a nested dictionary but can't really understand how they work. Any help to get me started would be appreciated Elasticsearch result to table
更新
我尝试使用 dpath ,它是一个很棒的库,但我看不到如何将其抽象化(以仅将存储桶名称作为参数的函数形式)作为dpath不能处理值是列表(而不是其他字典)的结构
I tried to use dpath which is a great library, but I do not see how to abstract this (in form of a function which takes just the buckets names as arguments) as dpath cannot handle the structure in which values are lists (and not other dictionaries)
import dpath
import pandas as pd
xListData = []
for q1 in dpath.util.get(x1, 'xColor/buckets'):
xColor = q1['key']
for q2 in dpath.util.get(q1, 'xMake/buckets'):
#print '--', q2['key']
xMake = q2['key']
for q3 in dpath.util.get(q2, 'xCity/buckets'):
#xDict = []
xCity = q3['key']
doc_count = q3['doc_count']
xDict = {'color': xColor, 'make': xMake, 'city': xCity, 'doc_count': doc_count}
#print '------', q3['key'], q3['doc_count']
xListData.append(xDict)
pd.DataFrame(xListData)
这给出了:
city color doc_count make
0 ROME red 2 honda
1 Paris red 1 honda
2 Paris red 1 bmw
3 Paris blue 1 ford
4 Berlin blue 1 toyota
5 Berlin green 1 ford
6 Berlin green 1 toyota
推荐答案
尝试使用递归函数:
import pandas as pd
def elasticToDataframe(elasticResult,aggStructure,record={},fulllist=[]):
for agg in aggStructure:
buckets = elasticResult[agg['key']]['buckets']
for bucket in buckets:
record = record.copy()
record[agg['key']] = bucket['key']
if 'aggs' in agg:
elasticToDataframe(bucket,agg['aggs'],record,fulllist)
else:
for var in agg['variables']:
record[var['dfName']] = bucket[var['elasticName']]
fulllist.append(record)
df = pd.DataFrame(fulllist)
return df
然后使用数据(x1)和正确配置的"aggStructure"字典调用该函数.数据的嵌套性质必须在此字典中得到体现.
Then call the function with your data (x1) and a properly configured 'aggStructure' dict. The nested nature of the data must be reflected in this dict.
aggStructure=[{'key':'xColor','aggs':[{'key':'xMake','aggs':[{'key':'xCity','variables':[{'elasticName':'doc_count','dfName':'count'}]}]}]}]
elasticToDataframe(x1,aggStructure)
欢呼
这篇关于嵌套字典中的pandas数据框(elasticsearch结果)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!