Spark-sql CLI 在运行查询时仅使用 1 个执行程序 [英] Spark-sql CLI use only 1 executor when running query

查看:26
本文介绍了Spark-sql CLI 在运行查询时仅使用 1 个执行程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将使用 spark-sql cli 替换 hive cli shell,并按照以下命令运行 spark-sql cli,(我们在 yarn Hadoop 集群上使用,hive-site.xml 已经复制到/conf)

I am going to use spark-sql cli to replace the hive cli shell, and I run the spark-sql cli with following the command,(We are using on yarn Hadoop cluster, the hive-site.xml already copied to /conf)

.> spark-sql然后shell被打开并且工作正常,

.> spark-sql Then the shell is opened and works ok,

然后我执行一个类似的查询,

And I execute a query something like,

./spark-sql>select devicetype, count(*) from mytable group by devicetype;

./spark-sql>select devicetype, count(*) from mytable group by devicetype;

命令执行成功,结果正确.但我注意到性能很慢.

The command execute successfully and the result is correct. But I notice the performance is very slow.

从 spark job ui,http://myhost:4040,我注意到只有 1 个 Executor 标记为 used,所以这可能就是原因.

From the spark job ui, http://myhost:4040, I noticed that only 1 Executor marked used, so that’s maybe the reason.

并且我尝试修改 spark-sql 脚本并在 exec 命令中添加 –num-executors 500,但没有帮助.

And I try to modify the spark-sql script and add the –num-executors 500 in the exec command, but it does not help.

所以任何人都可以帮忙解释原因吗?

So anyone could help and explain why?

谢谢.

推荐答案

参考文档:http://spark.apache.org/docs/latest/sql-programming-guide.html

spark-sql 是一个 SQL CLI 工具,只能在本地模式下工作,这就是为什么你只能看到一个执行器

spark-sql is an SQL CLI tool that works only in local mode, that is why you see only one executor

如果你想要一个 SQL 的集群版本,你应该启动 thriftserver 并通过 JDBC 使用 beeline 工具(与 Spark 一起使用)连接到它,例如.您可以在官方文档http://spark.apache.org/docs/latest/sql-programming-guide.html

If you want to have a cluster version of SQL, you should start thriftserver and connect to it via JDBC using beeline tool (that goes with Spark), for example. You can find the description in chapter Running the Thrift JDBC/ODBC server of the official documentation http://spark.apache.org/docs/latest/sql-programming-guide.html

开始:

export HIVE_SERVER2_THRIFT_PORT=<listening-port>
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
./sbin/start-thriftserver.sh \
  --master <master-uri> \
  ...

连接:

./bin/beeline
beeline> !connect jdbc:hive2://<listening-host>:<listening-port>

这篇关于Spark-sql CLI 在运行查询时仅使用 1 个执行程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆