运行查询时,Spark-sql CLI仅使用1个执行程序 [英] Spark-sql CLI use only 1 executor when running query

查看:134
本文介绍了运行查询时,Spark-sql CLI仅使用1个执行程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将使用spark-sql cli替换hive cli shell,并使用以下命令运行spark-sql cli,(我们在yarn Hadoop集群上使用,hive-site.xml已复制到/conf)

I am going to use spark-sql cli to replace the hive cli shell, and I run the spark-sql cli with following the command,(We are using on yarn Hadoop cluster, the hive-site.xml already copied to /conf)

.> spark-sql 然后打开外壳并正常运行,

.> spark-sql Then the shell is opened and works ok,

然后我执行类似的查询

./spark-sql>选择设备类型,从mytable组中按设备类型计数(*);

./spark-sql>select devicetype, count(*) from mytable group by devicetype;

命令成功执行,结果正确.但是我注意到性能非常慢.

The command execute successfully and the result is correct. But I notice the performance is very slow.

在spark作业ui中, http://myhost:4040 ,我注意到只有1个执行器标记为已使用,因此这可能就是原因.

From the spark job ui, http://myhost:4040, I noticed that only 1 Executor marked used, so that’s maybe the reason.

我尝试修改spark-sql脚本,并在exec命令中添加–num-executors 500,但这无济于事.

And I try to modify the spark-sql script and add the –num-executors 500 in the exec command, but it does not help.

那么任何人都可以帮助并解释原因吗?

So anyone could help and explain why?

谢谢.

推荐答案

请参阅文档: http://spark.apache.org/docs/latest/sql-programming-guide.html

spark-sql是仅在本地模式下工作的SQL CLI工具,这就是为什么您仅看到一个执行程序的原因

spark-sql is an SQL CLI tool that works only in local mode, that is why you see only one executor

例如,如果要使用SQL的群集版本,则应启动thriftserver并使用beeline工具(随Spark附带)通过JDBC连接到它.您可以在官方文档运行Thrift JDBC/ODBC服务器 一章中找到说明./sql-programming-guide.html"rel =" nofollow> http://spark.apache.org/docs/latest/sql-programming-guide.html

If you want to have a cluster version of SQL, you should start thriftserver and connect to it via JDBC using beeline tool (that goes with Spark), for example. You can find the description in chapter Running the Thrift JDBC/ODBC server of the official documentation http://spark.apache.org/docs/latest/sql-programming-guide.html

要开始:

export HIVE_SERVER2_THRIFT_PORT=<listening-port>
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
./sbin/start-thriftserver.sh \
  --master <master-uri> \
  ...

要连接:

./bin/beeline
beeline> !connect jdbc:hive2://<listening-host>:<listening-port>

这篇关于运行查询时,Spark-sql CLI仅使用1个执行程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆