使用Google Big Query进行弹性搜索 [英] Elastic search with Google Big Query

查看:219
本文介绍了使用Google Big Query进行弹性搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在弹性搜索引擎中加载了事件日志,并使用Kibana对其进行了可视化处理。我的事件日志实际上存储在Google Big Query表中。目前,我正在将json文件转储到Google存储桶中,并将其下载到本地驱动器。然后使用logstash,将json文件从本地驱动器移至弹性搜索引擎。

I have the event logs loaded in elasticsearch engine and I visualise it using Kibana. My event logs are actually stored in the Google Big Query table. Currently I am dumping the json files to a Google bucket and download it to a local drive. Then using logstash, I move the json files from the local drive to the elastic search engine.

现在,我正在尝试通过在Google大查询和弹性搜索之间建立联系来实现流程自动化。根据我的阅读,我了解到有一个输出连接器,可将数据从弹性搜索发送到Google大查询,但反之则不然。只是想知道我是否应该将json文件上传到kubernete集群,然后在集群和Elastic搜索引擎之间建立连接。

Now, I am trying to automate the process by establishing the connection between google big query and elastic search. From what I have read, I understand that there is a output connector which sends the data from elastic search to Google big query but not vice versa. Just wondering whether I should upload the json file to a kubernete cluster and then establish the connection between the cluster and Elastic search engine.

在此方面的任何帮助将不胜感激。

Any help with this regard would be appreciated.

推荐答案

Apache Beam具有用于BigQuery和Elastic Search的连接器,我将使用DataFlow明确地执行此操作,因此您无需实现复杂的ETL和临时存储。您可以使用 BigQueryIO.Read.from 从BigQuery读取数据(如果性能很重要,请查看此内容 BigQueryIO读取与fromQuery ),然后使用 ElasticsearchIO.write()

Apache Beam has connectors for BigQuery and Elastic Search, I would definitly perform this using DataFlow so you don´t need to implement a complex ETL and staging storage. You can read the data from BigQuery using BigQueryIO.Read.from (take a look to this if performance is important BigQueryIO Read vs fromQuery) and load it into ElasticSearch using ElasticsearchIO.write()

请参阅此如何从BigQuery Dataflow中读取数据

Refer this how read data from BigQuery Dataflow

https://github.com/GoogleCloudPlatform /professional-services/blob/master/examples/dataflow-bigquery-transpose/src/main/java/com/google/cloud/pso/pipeline/Pivot.java

弹性搜索索引

https://github.c om / GoogleCloudPlatform / professional-services / tree / master / examples / dataflow-elasticsearch-indexer

已更新2019-06-24

UPDATED 2019-06-24

今年最近发布了BigQuery Storage API,该API改进了从BigQuery提取数据的并行性,并由DataFlow原生支持。请参阅 https://beam.apache.org / documentation / io / built-in / google-bigquery /#storage-api 了解更多详情。

Recently this year was release BigQuery Storage API which improve the parallelism to extract data from BigQuery and is natively supported by DataFlow. Refer to https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-api for more details.

从文档中


BigQuery Storage API允许您直接访问表在BigQuery存储中。结果,您的管道可以比以前更快的速度从BigQuery存储中读取数据。

The BigQuery Storage API allows you to directly access tables in BigQuery storage. As a result, your pipeline can read from BigQuery storage faster than previously possible.

这篇关于使用Google Big Query进行弹性搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆