在Spark中访问Hive表 [英] Accessing Hive tables in spark

查看:200
本文介绍了在Spark中访问Hive表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我安装了Hive 0.13,并创建了自定义数据库.我有使用mvn -hive选项构建的spark 1.1.0单节点集群. 我想使用hivecontext访问spark应用程序中此数据库中的表.但是hivecontext总是读取在spark目录中创建的本地metastore.我已将hive-site.xml复制到spark/conf目录中.
我还需要做其他配置吗?

I have Hive 0.13 installation and have created custom databases. I have spark 1.1.0 single node cluster built using mvn -hive option. I want to access tables in this database in spark application using hivecontext. But hivecontext is always reading the local metastore created in spark directory. I have copied the hive-site.xml in spark/conf directory.
Do I need to do any other configuration?

推荐答案

步骤1: 使用最新版本设置SPARK ....

Step 1: Setup SPARK with latest version....

$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly

通过执行此操作,您将下载一些jar文件,并且默认情况下将添加它,而无需添加....

By executing this you will download some jar files and bydefault it will be added no need to add....

步骤2:
hive-site.xml从您的Hive群集复制到您的$SPARK_HOME/conf/dir并编辑XML文件,并将这些属性添加到下面列出的文件中:

Step 2:
Copy hive-site.xml from your Hive cluster to your $SPARK_HOME/conf/dir and edit the XML file and add these properties to that file which is listed below:

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore/description>
</property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>XXXXXXXX</value>
    <description>Username to use against metastore database/description>
</property> 
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>XXXXXXXX</value>
    <description>Password to use against metastore database/description>
</property>

第3步:下载MYSQL JDBC连接器并将其添加到SPARK CLASSPATH. 运行此命令bin/compute-classpath.sh
并为以下脚本添加以下行.

Step 3: Download MYSQL JDBC connector and add that to SPARK CLASSPATH. Run this command bin/compute-classpath.sh
and add the below line for the following script.

CLASSPATH="$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar


如何从HIVE到SPARK检索数据....


How to retrieve the data from HIVE to SPARK....

步骤1:
通过以下命令启动所有重传....

Step 1:
Start all deamons by the following command....

start-all.sh

步骤2:
通过以下命令启动hive thrift服务器2....

Step 2:
Start hive thrift server 2 by the following command....

hive --service hiveserver2 & 

第3步:
通过以下命令启动spark服务器....

Step 3:
Start spark server by the following command....

start-spark.sh 

最后通过使用以下命令检查它们是否已启动....

And finally check whether these are started or not by checking with the following command....

RunJar 
ResourceManager 
Master 
NameNode 
SecondaryNameNode 
Worker 
Jps 
JobHistoryServer 
DataNode 
NodeManager

第4步:
通过以下命令启动主服务器....

Step 4:
Start the master by the following command....

./sbin/start-master.sh 

要停止主服务器,请使用以下命令.....

To stop the master use the below command.....

./sbin/stop-master.sh

第5步:
打开一个新终端....
通过以下路径启动直线....

Step 5:
Open a new terminal....
Start the beeline by the following path....

hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline 

要求输入后...通过下面列出的输入....

After it asks for input... Pass the input which is listed below....

!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver 

之后,通过以下命令设置SPARK....
注意:在conf文件上设置这些配置,因此无需始终运行....

After that set the SPARK by the following commands....
Note:set these configurations on a conf file so no need to run always....

set spark.master=spark://localhost:7077; 
set hive.execution.engines=spark; 
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer; 
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; 

要求输入后....通过要检索数据的查询....并打开浏览器,并通过以下命令本地主机检查URL:8080您可以看到正在运行的作业"和已完成" URL中的作业....

After it asks for input.... Pass the Query which you want to retrieve the data.... and open a browser and check in the URL by the following command localhost:8080 You can see the Running Jobs and Completed Jobs in the URL....

这篇关于在Spark中访问Hive表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆