记录所有presto查询 [英] Logging all presto queries
问题描述
如何将提交给presto群集的所有查询存储在一个文件(ORC文件)中,或者存储在其他数据库中.目的是保留对presto worker执行的所有查询的记录.
How can I store all queries submitted to presto cluster in a file (ORC file) or may be some other database. Purpose is the keep the record of all queries executed on presto workers.
我知道我需要覆盖queryCompleted方法,我还尝试遵循
I am aware that I need to overwrite queryCompleted method, I have also tried to follow this and other link mentioned over there but I am unable to create correct jar using maven. After placing the presto jar file generated by maven, my presto stopped working.
我是新手,无论是专家还是专家.如果有人可以帮助我,那就太好了.
I am new to presto as well as in maven. It would be great if someone can help me with this.
推荐答案
这是我的方法,它可以在EMR5.9(0.184之前)上工作.
This is my way, and It works on EMR5.9 (presto 0.184).
首先,您已经知道,可以使用事件侦听器. 就我而言,我使用 https://github.com/wyukawa/presto-fluentd 进行收集查询日志是因为fluentd很方便.(易于重试,易于发送 到多个数据存储) 如果您想创建新的事件监听器插件,也可以引用它,因为它非常简单. (或 https://github.com/zz22394/presto-audit 也可以使用它)
Firstly, as you already know, you can use event-listener. In my case, I use https://github.com/wyukawa/presto-fluentd for collecting query logs because fluentd is convenient.(easy to retry, easy to send to multiple data store) if you want to create new event-listener plugin, also you can reference this because it's very simple. (or https://github.com/zz22394/presto-audit can also use for it)
接下来,您必须安装事件侦听器插件. 如果使用EMR,则可以使用此脚本在
Next, you have to install event-listener plugin.
If you use EMR, you can use this script for installing presto-fluentd
on bootstrap actions
# cf. https://github.com/mozilla/emr-bootstrap-presto/blob/master/files/bootstrap/presto-plugins.sh
#!/bin/bash
set -exo pipefail
# re-exec with sudo into background
if [ $(whoami) != root ]; then
sudo "$0" "$@" &
exit 0
fi
# set variables
s3uri=$1
fluentd_endpoint=$2
# wait until presto is installed and running
until test -s /var/run/presto/presto-server.pid; do sleep 1; done
# make symbolic link
sudo mkdir -p /usr/lib/presto/etc 2>/dev/null
sudo ln -s /usr/lib/presto/etc /mnt/var/lib/presto/data
# download presto plugins
aws s3 sync $s3uri/jar/ /usr/lib/presto/plugin/
aws s3 sync $s3uri/properties /usr/lib/presto/etc/
# make sure all plugins are owned by presto user
chown -R presto:presto /usr/lib/presto/plugin
chown -R presto:presto /usr/lib/presto/etc
# set event-listner.properties endpoint parameter
echo "event-listener.fluentd-host=$fluentd_endpoint" >>
/usr/lib/presto/etc/event-listener.properties
# restart presto
stop presto-server
start presto-server
event-listener.properties:
event-listener.properties:
event-listener.name=presto-fluentd
event-listener.fluentd-port=24224
event-listener.fluentd-tag=presto.query
在s3目录中:
$ aws s3 ls s3://<s3 bucket>/emr/bootstrap_actions/plugins/jar/presto-fluentd/
2017-10-30 19:12:59 90318 fluency-1.3.0.jar
2017-10-30 19:12:59 2521113 guava-21.0.jar
2017-10-30 19:12:59 55783 jackson-annotations-2.8.1.jar
2017-10-30 19:12:59 252303 jackson-core-2.7.1.jar
2017-10-30 19:12:59 1199160 jackson-databind-2.7.1.jar
2017-10-30 19:12:59 30488 jackson-dataformat-msgpack-0.8.12.jar
2017-10-30 19:12:59 3907 log-0.148.jar
2017-10-30 19:12:59 116125 msgpack-core-0.8.12.jar
2017-10-30 19:12:59 5509 phi-accural-failure-detector-0.0.4.jar
2017-10-30 19:12:59 6130 presto-fluentd-0.0.1.jar
2017-10-30 19:12:59 41077 slf4j-api-1.7.22.jar
$ aws s3 ls s3://<s3 bucket>/emr/bootstrap_actions/plugins/properties/
2017-10-30 19:12:59 109 event-listener.properties
并且仅通过熟练地在如下所示的另一台主机上工作即可接收查询日志
and just receive query logs by fluentd working on another host like below
<match presto.query>
@type copy
<store>
# another data store
</store>
<store>
@type relabel
@label @presto-query-storage
</store>
</match>
# In my case, I use bigquery for storing query log
<label @presto-query-storage>
<match **>
@label @presto-bigquery-out
@type record_reformer
renew_record true
tag presto.query_storage.big_query
<record>
query_id ${record["queryId"]}
user_name ${record["user"]}
elapsed_time ${(record["endTime"] - record["createTime"]) / 1000.0}
start_at
${Time.at(record["executionStartTime"]/1000).utc.strftime("%Y-%m-%d %H:%M:%S.%3N")}
end_at ${Time.at(record["endTime"]/1000).utc.strftime("%Y-%m-%d %H:%M:%S")}
query ${record["query"]}
status ${record["state"]}
</record>
</match>
</label>
提示
我使用此脚本来收集presto-fluentd的依赖项.
Tips
I use this script for collecting dependencies of presto-fluentd.
require 'fileutils'
require 'open3'
include FileUtils
TMP_PATH = File.expand_path('../../tmp', __FILE__)
JAR_PATH = File.expand_path('../bootstrap_actions/plugins/jar', __FILE__)
CLONE_URI = 'https://github.com/wyukawa/presto-fluentd'
NEEDED_JAR = %w(
fluency-1.3.0.jar
guava-21.0.jar
jackson-annotations-2.8.1.jar
jackson-core-2.7.1.jar
jackson-databind-2.7.1.jar
jackson-dataformat-msgpack-0.8.12.jar
log-0.148.jar
msgpack-core-0.8.12.jar
phi-accural-failure-detector-0.0.4.jar
presto-fluentd-0.0.1.jar
slf4j-api-1.7.22.jar
)
def cleanup_dir
puts "Clean up #{TMP_PATH}/presto-fluentd ..."
rm_r(Dir.glob("#{TMP_PATH}/presto-fluentd"))
mkdir_p("#{JAR_PATH}/presto-fluentd")
puts "Clean up #{JAR_PATH}/presto-fluentd ..."
rm(Dir.glob("#{JAR_PATH}/presto-fluentd/*.jar"))
end
def clone
cd(TMP_PATH)
puts "Download presto-fluentd repo ..."
out, err, status = Open3.capture2("git clone #{CLONE_URI} #{TMP_PATH}/presto-fluentd")
puts out
end
def mvn
cd("#{TMP_PATH}/presto-fluentd")
puts "Build presto-fluentd ..."
out, err, status = Open3.capture2("mvn clean package")
puts out
out, err, status = Open3.capture2("mvn dependency:copy-dependencies -DoutputDirectory=target -DincludeScope=runtime")
puts out
end
def copy_dependencies
cd("#{TMP_PATH}/presto-fluentd/target")
puts "Copy jar files to #{JAR_PATH} ..."
# FIXME: it's better to fix actual pom.xml for assign scope
mv(Dir.glob("*.jar").select{|file| NEEDED_JAR.include?(file)}, "#{JAR_PATH}/presto-fluentd")
puts "done !!"
end
cleanup_dir
clone
mvn
copy_dependencies
这篇关于记录所有presto查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!