从 ElasticSearch 结果创建 DataFrame [英] Creating DataFrame from ElasticSearch Results
问题描述
我正在尝试使用对 Elasticsearch 的非常基本的查询结果在 Pandas 中构建一个 DataFrame.我得到了我需要的数据,但它以某种方式对结果进行切片以构建正确的数据框.我真的只关心获取每个结果的时间戳和路径.我尝试了几种不同的 es.search 模式.
I am trying to build a DataFrame in pandas, using the results of a very basic query to Elasticsearch. I am getting the Data I need, but its a matter of slicing the results in a way to build the proper data frame. I really only care about getting the timestamp, and path, of each result. I have tried a few different es.search patterns.
代码:
from datetime import datetime
from elasticsearch import Elasticsearch
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
es = Elasticsearch(host="192.168.121.252")
res = es.search(index="_all", doc_type='logs', body={"query": {"match_all": {}}}, size=2, fields=('path','@timestamp'))
这给出了 4 个数据块.[u'hits', u'_shards', u'took', u'timed_out'].我的结果在点击率之内.
This gives 4 chunks of data. [u'hits', u'_shards', u'took', u'timed_out']. My results are inside the hits.
res['hits']['hits']
Out[47]:
[{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
u'_index': u'logstash-2014.08.07',
u'_score': 1.0,
u'_type': u'logs',
u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
u'path': u'app2.log'}},
{u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
u'_index': u'logstash-2014.08.07',
u'_score': 1.0,
u'_type': u'logs',
u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
u'path': u'app1.log'}}]
我唯一关心的是获取时间戳和每次点击的路径.
The only things I care about, are getting the timestamp, and path for each hit.
res['hits']['hits'][0]['fields']
Out[48]:
{u'@timestamp': u'2014-08-07T12:36:00.086Z',
u'path': u'app1.log'}
我终其一生都无法弄清楚该由谁获得该结果,并将其放入 Pandas 的数据框中.因此,对于我返回的 2 个结果,我希望像这样的数据框.
I can not for the life of me figure out who to get that result, into a dataframe in pandas. So for the 2 results I have returned, I would expect a dataframe like.
timestamp path
0 2014-08-07T12:36:00.086Z app1.log
1 2014-08-07T12:36:00.200Z app2.log
推荐答案
有一个名为 pd.DataFrame.from_dict
的好玩具,你可以在这样的情况下使用:
There is a nice toy called pd.DataFrame.from_dict
that you can use in situation like this:
In [34]:
Data = [{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
u'_index': u'logstash-2014.08.07',
u'_score': 1.0,
u'_type': u'logs',
u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
u'path': u'app2.log'}},
{u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
u'_index': u'logstash-2014.08.07',
u'_score': 1.0,
u'_type': u'logs',
u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
u'path': u'app1.log'}}]
In [35]:
df = pd.concat(map(pd.DataFrame.from_dict, Data), axis=1)['fields'].T
In [36]:
print df.reset_index(drop=True)
@timestamp path
0 2014-08-07T12:36:00.086Z app2.log
1 2014-08-07T12:36:00.200Z app1.log
分四步展示:
1、将列表中的每一项(是一个dictionary
)读入一个DataFrame
1, Read each item in the list (which is a dictionary
) into a DataFrame
2,我们可以将列表中的所有项目通过concat
逐行放入一个大的DataFrame
中,因为我们将对每个项目执行步骤#1,我们可以使用map
来做到这一点.
2, We can put all the items in the list into a big DataFrame
by concat
them row-wise, since we will do step#1 for each item, we can use map
to do it.
3、然后我们访问标有'fields'
4, 如果我们希望索引是默认的 int
,我们可能希望将 DataFrame
旋转 90 度(转置)和 reset_index
顺序.
4, We probably want to rotate the DataFrame
90 degrees (transpose) and reset_index
if we want the index to be the default int
sequence.
这篇关于从 ElasticSearch 结果创建 DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!