在 spark 中访问 Hive 表 [英] Accessing Hive tables in spark
问题描述
我安装了 Hive 0.13 并创建了自定义数据库.我有使用 mvn -hive 选项构建的 spark 1.1.0 单节点集群.我想使用 hivecontext 在 spark 应用程序中访问此数据库中的表.但是 hivecontext 总是读取在 spark 目录中创建的本地元存储.我已将 hive-site.xml 复制到 spark/conf 目录中.
我需要做任何其他配置吗?
I have Hive 0.13 installation and have created custom databases. I have spark 1.1.0 single node cluster built using mvn -hive option.
I want to access tables in this database in spark application using hivecontext. But hivecontext is always reading the local metastore created in spark directory. I have copied the hive-site.xml in spark/conf directory.
Do I need to do any other configuration?
推荐答案
第一步:使用最新版本设置 SPARK....
Step 1: Setup SPARK with latest version....
$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly
通过执行这个你会下载一些jar文件,默认情况下它会被添加,不需要添加......
By executing this you will download some jar files and bydefault it will be added no need to add....
第 2 步:
将 hive-site.xml
从您的 Hive 集群复制到您的 $SPARK_HOME/conf/dir
并编辑 XML 文件并将这些属性添加到下面列出的该文件中:
Step 2:
Copy hive-site.xml
from your Hive cluster to your $SPARK_HOME/conf/dir
and edit the XML file and add these properties to that file which is listed below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore/description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>XXXXXXXX</value>
<description>Username to use against metastore database/description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>XXXXXXXX</value>
<description>Password to use against metastore database/description>
</property>
第 3 步:下载 MYSQL JDBC 连接器并将其添加到 SPARK CLASSPATH.运行这个命令 bin/compute-classpath.sh
并为以下脚本添加以下行.
Step 3: Download MYSQL JDBC connector and add that to SPARK CLASSPATH.
Run this command bin/compute-classpath.sh
and add the below line for the following script.
CLASSPATH="$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar
<小时>
如何将数据从 HIVE 检索到 SPARK....
How to retrieve the data from HIVE to SPARK....
第一步:
通过以下命令启动所有守护进程....
Step 1:
Start all deamons by the following command....
start-all.sh
第 2 步:
通过以下命令启动 hive thrift server 2....
Step 2:
Start hive thrift server 2 by the following command....
hive --service hiveserver2 &
第 3 步:
通过以下命令启动spark服务器....
Step 3:
Start spark server by the following command....
start-spark.sh
最后通过检查以下命令来检查这些是否已启动....
And finally check whether these are started or not by checking with the following command....
RunJar
ResourceManager
Master
NameNode
SecondaryNameNode
Worker
Jps
JobHistoryServer
DataNode
NodeManager
第 4 步:
通过以下命令启动master....
Step 4:
Start the master by the following command....
./sbin/start-master.sh
要停止 master 使用以下命令.....
To stop the master use the below command.....
./sbin/stop-master.sh
第 5 步:
打开一个新终端....
通过以下路径开始直线......
Step 5:
Open a new terminal....
Start the beeline by the following path....
hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline
在它要求输入之后...传递下面列出的输入....
After it asks for input... Pass the input which is listed below....
!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver
然后通过以下命令设置SPARK....
注意:将这些配置设置在一个 conf 文件中,所以不需要总是运行......
After that set the SPARK by the following commands....
Note:set these configurations on a conf file so no need to run always....
set spark.master=spark://localhost:7077;
set hive.execution.engines=spark;
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
在它要求输入之后....传递要检索数据的查询....并打开浏览器并通过以下命令检查 URL localhost:8080 您可以看到正在运行的作业和已完成URL 中的职位....
After it asks for input.... Pass the Query which you want to retrieve the data.... and open a browser and check in the URL by the following command localhost:8080 You can see the Running Jobs and Completed Jobs in the URL....
这篇关于在 spark 中访问 Hive 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!