记录所有presto查询 [英] Logging all presto queries

查看:1235
本文介绍了记录所有presto查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将提交给presto群集的所有查询存储在一个文件(ORC文件)中,或者存储在其他数据库中.目的是保留对presto worker执行的所有查询的记录.

How can I store all queries submitted to presto cluster in a file (ORC file) or may be some other database. Purpose is the keep the record of all queries executed on presto workers.

我知道我需要覆盖queryCompleted方法,我还尝试遵循

I am aware that I need to overwrite queryCompleted method, I have also tried to follow this and other link mentioned over there but I am unable to create correct jar using maven. After placing the presto jar file generated by maven, my presto stopped working.

我是新手,无论是专家还是专家.如果有人可以帮助我,那就太好了.

I am new to presto as well as in maven. It would be great if someone can help me with this.

推荐答案

这是我的方法,它可以在EMR5.9(0.184之前)上工作.

This is my way, and It works on EMR5.9 (presto 0.184).

首先,您已经知道,可以使用事件侦听器. 就我而言,我使用 https://github.com/wyukawa/presto-fluentd 进行收集查询日志是因为fluentd很方便.(易于重试,易于发送 到多个数据存储) 如果您想创建新的事件监听器插件,也可以引用它,因为它非常简单. (或 https://github.com/zz22394/presto-audit 也可以使用它)

Firstly, as you already know, you can use event-listener. In my case, I use https://github.com/wyukawa/presto-fluentd for collecting query logs because fluentd is convenient.(easy to retry, easy to send to multiple data store) if you want to create new event-listener plugin, also you can reference this because it's very simple. (or https://github.com/zz22394/presto-audit can also use for it)

接下来,您必须安装事件侦听器插件. 如果使用EMR,则可以使用此脚本在

Next, you have to install event-listener plugin. If you use EMR, you can use this script for installing presto-fluentd on bootstrap actions

# cf. https://github.com/mozilla/emr-bootstrap-presto/blob/master/files/bootstrap/presto-plugins.sh
#!/bin/bash

set -exo pipefail

# re-exec with sudo into background
if [ $(whoami) != root ]; then
  sudo "$0" "$@" &
  exit 0
fi

# set variables
s3uri=$1
fluentd_endpoint=$2

# wait until presto is installed and running
until test -s /var/run/presto/presto-server.pid; do sleep 1; done

# make symbolic link
sudo mkdir -p /usr/lib/presto/etc 2>/dev/null
sudo ln -s /usr/lib/presto/etc /mnt/var/lib/presto/data

# download presto plugins
aws s3 sync $s3uri/jar/ /usr/lib/presto/plugin/
aws s3 sync $s3uri/properties /usr/lib/presto/etc/

# make sure all plugins are owned by presto user
chown -R presto:presto /usr/lib/presto/plugin
chown -R presto:presto /usr/lib/presto/etc

# set event-listner.properties endpoint parameter
echo "event-listener.fluentd-host=$fluentd_endpoint" >> 
/usr/lib/presto/etc/event-listener.properties

# restart presto
stop  presto-server
start presto-server

event-listener.properties:

event-listener.properties:

event-listener.name=presto-fluentd
event-listener.fluentd-port=24224
event-listener.fluentd-tag=presto.query

在s3目录中:

$ aws s3 ls s3://<s3 bucket>/emr/bootstrap_actions/plugins/jar/presto-fluentd/
2017-10-30 19:12:59      90318 fluency-1.3.0.jar
2017-10-30 19:12:59    2521113 guava-21.0.jar
2017-10-30 19:12:59      55783 jackson-annotations-2.8.1.jar
2017-10-30 19:12:59     252303 jackson-core-2.7.1.jar
2017-10-30 19:12:59    1199160 jackson-databind-2.7.1.jar
2017-10-30 19:12:59      30488 jackson-dataformat-msgpack-0.8.12.jar
2017-10-30 19:12:59       3907 log-0.148.jar
2017-10-30 19:12:59     116125 msgpack-core-0.8.12.jar
2017-10-30 19:12:59       5509 phi-accural-failure-detector-0.0.4.jar
2017-10-30 19:12:59       6130 presto-fluentd-0.0.1.jar
2017-10-30 19:12:59      41077 slf4j-api-1.7.22.jar

$ aws s3 ls s3://<s3 bucket>/emr/bootstrap_actions/plugins/properties/
2017-10-30 19:12:59        109 event-listener.properties

并且仅通过熟练地在如下所示的另一台主机上工作即可接收查询日志

and just receive query logs by fluentd working on another host like below

<match presto.query>
  @type copy
  <store>
    # another data store
  </store>

  <store>
    @type relabel
    @label @presto-query-storage
  </store>
</match>

# In my case, I use bigquery for storing query log
<label @presto-query-storage>
  <match **>
    @label @presto-bigquery-out
    @type record_reformer
    renew_record true
    tag presto.query_storage.big_query
    <record>
      query_id ${record["queryId"]}
      user_name ${record["user"]}
      elapsed_time ${(record["endTime"] - record["createTime"]) / 1000.0}
      start_at 
${Time.at(record["executionStartTime"]/1000).utc.strftime("%Y-%m-%d %H:%M:%S.%3N")}
      end_at ${Time.at(record["endTime"]/1000).utc.strftime("%Y-%m-%d %H:%M:%S")}
      query ${record["query"]}
      status ${record["state"]}
    </record>
  </match>
</label>

提示

我使用此脚本来收集presto-fluentd的依赖项.

Tips

I use this script for collecting dependencies of presto-fluentd.

require 'fileutils'
require 'open3'
include FileUtils

TMP_PATH = File.expand_path('../../tmp', __FILE__)
JAR_PATH = File.expand_path('../bootstrap_actions/plugins/jar', __FILE__)
CLONE_URI = 'https://github.com/wyukawa/presto-fluentd'

NEEDED_JAR = %w(
  fluency-1.3.0.jar
  guava-21.0.jar
  jackson-annotations-2.8.1.jar
  jackson-core-2.7.1.jar
  jackson-databind-2.7.1.jar
  jackson-dataformat-msgpack-0.8.12.jar
  log-0.148.jar
  msgpack-core-0.8.12.jar
  phi-accural-failure-detector-0.0.4.jar
  presto-fluentd-0.0.1.jar
  slf4j-api-1.7.22.jar
)

def cleanup_dir
  puts "Clean up #{TMP_PATH}/presto-fluentd ..."
  rm_r(Dir.glob("#{TMP_PATH}/presto-fluentd"))
  mkdir_p("#{JAR_PATH}/presto-fluentd")

  puts "Clean up #{JAR_PATH}/presto-fluentd ..."
  rm(Dir.glob("#{JAR_PATH}/presto-fluentd/*.jar"))
end

def clone
  cd(TMP_PATH)

  puts "Download presto-fluentd repo ..."
  out, err, status = Open3.capture2("git clone #{CLONE_URI} #{TMP_PATH}/presto-fluentd")
  puts out
end

def mvn
  cd("#{TMP_PATH}/presto-fluentd")

  puts "Build presto-fluentd ..."
  out, err, status = Open3.capture2("mvn clean package")
  puts out

  out, err, status = Open3.capture2("mvn dependency:copy-dependencies -DoutputDirectory=target -DincludeScope=runtime")
  puts out
end

def copy_dependencies
  cd("#{TMP_PATH}/presto-fluentd/target")
  puts "Copy jar files to #{JAR_PATH} ..."

  # FIXME: it's better to fix actual pom.xml for assign scope
  mv(Dir.glob("*.jar").select{|file| NEEDED_JAR.include?(file)}, "#{JAR_PATH}/presto-fluentd")
  puts "done !!"
end


cleanup_dir
clone
mvn
copy_dependencies

这篇关于记录所有presto查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆