在火花中访问Hive表 [英] Accesing Hive tables in spark

查看：76 发布时间：2018/5/31 18:57:26 hadoop apache-spark hive

本文介绍了在火花中访问Hive表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经安装了Hive 0.13并创建了自定义数据库。我有使用mvn -hive选项构建的spark 1.1.0单节点集群。
我想在使用hivecontext的spark应用程序中访问此数据库中的表。但是hivecontext总是读取在spark目录中创建的本地Metastore。我已将hive-site.xml复制到spark / conf目录中。

是否需要执行其他配置？

解决方案

$ cd $ SPARK_Home; ./sbt/sbt -Phive程序集 $ cd $ SPARK_Home; ./sbt/sbt -Phivethriftserver程序集
通过执行此操作，您将下载一些jar文件，第二步：

复制 hive-site.xml c $ c>从您的Hive集群添加到您的 $ SPARK_HOME / conf / dir ，然后编辑XML文件并将这些属性添加到下面列出的文件中：
< property> < name> javax.jdo.option.ConnectionURL< / name> < value> jdbc：mysql：// MYSQL_HOST：3306 / hive_ {version}< / value> < description> JDBC元数据的JDBC连接字符串< / description> < / property> <属性> < name> javax.jdo.option.ConnectionDriverName< / name> <值> com.mysql.jdbc.Driver< /值> < description> JDBC元数据描述的驱动程序类名称> < / property> <属性> < name> javax.jdo.option.ConnectionUserName< / name> <值> XXXXXXXX< /值> < description>用于针对Metastore数据库/描述的用户名> < / property> <属性> < name> javax.jdo.option.ConnectionPassword< / name> <值> XXXXXXXX< /值> < description>针对Metastore数据库/描述使用的密码> < / property>
第3步：下载MYSQL JDBC连接器并将其添加到SPARK CLASSPATH。运行此命令bin / compute-classpath.sh 并为以下脚本添加以下行。
CLASSPATH =$ CLASSPATH：$ PATH_TO_mysql-connector-java-5.1.10.jar 如何从HIVE检索数据到SPARK .... 第1步：通过以下命令启动所有的deamons .... start-all.sh 第2步：开始hive thrift server 2通过以下命令。 hive --service hiveserver2& 第3步：通过以下命令启动spark服务器.... start-spark.sh 然后通过检查以下命令来检查这些是否开始。 RunJar ResourceManager Master NameNode SecondaryNameNode Worker Jps JobHistoryServer DataNode NodeManager 第4步：通过以下命令启动主... .... ./ sbin / start-master.sh 停止master使用下面的命令..... ./ sbin / stop-master.sh 第5步：打开一个新终端.. .. 通过以下路径启动直线.... hadoop @ localhost：/ usr / local / hadoop / hive / bin $ beeline 请求后输入...传递下面列出的输入.... ！connect jdbc：hive2：// localhost：10000 hadooporg.apache.hive.jdbc.HiveDriver 之后，通过以下命令设置SPARK .... 注意：在conf文件中设置这些配置，所以不需要总是运行.... set spark.master = spark：// localhost：7077; set hive.execution.engines = spark; 设置spark.executor.memory = 2g; //设置内存取决于你的服务器 set spark.serializer = org.apache.spark.serializer.kryoSerializer; set spark.io.compression.codec = org.apache.spark.io.LZFCompressionCodec; 在它要求输入之后....传递您想要检索数据的查询。 ..并打开一个浏览器，并通过以下命令localhost：8080检查URL。您可以在URL中看到正在运行的作业和已完成的作业.... I have Hive 0.13 installation and have created custom databases. I have spark 1.1.0 single node cluster built using mvn -hive option. I want to access tables in this database in spark application using hivecontext. But hivecontext is always reading the local metastore created in spark directory. I have copied the hive-site.xml in spark/conf directory. Do I need to do any other configuration?? 解决方案 Step 1: Setup SPARK with latest version.... $ cd $SPARK_Home; ./sbt/sbt -Phive assembly $ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly By executing this you will download some jar files and bydefault it will be added no need to add.... Step 2: Copy hive-site.xml from your Hive cluster to your $SPARK_HOME/conf/dir and edit the XML file and add these properties to that file which is listed below: <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore/description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>XXXXXXXX</value> <description>Username to use against metastore database/description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>XXXXXXXX</value> <description>Password to use against metastore database/description> </property> Step 3: Download MYSQL JDBC connector and add that to SPARK CLASSPATH. Run this command bin/compute-classpath.sh and add the below line for the following script. CLASSPATH="$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar How to retrieve the data from HIVE to SPARK.... Step 1: Start all deamons by the following command.... start-all.sh Step 2: Start hive thrift server 2 by the following command.... hive --service hiveserver2 & Step 3: Start spark server by the following command.... start-spark.sh And finally check whether these are started or not by checking with the following command.... RunJar ResourceManager Master NameNode SecondaryNameNode Worker Jps JobHistoryServer DataNode NodeManager Step 4: Start the master by the following command.... ./sbin/start-master.sh To stop the master use the below command..... ./sbin/stop-master.sh Step 5: Open a new terminal.... Start the beeline by the following path.... hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline After it asks for input... Pass the input which is listed below.... !connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver After that set the SPARK by the following commands.... Note:set these configurations on a conf file so no need to run always.... set spark.master=spark://localhost:7077; set hive.execution.engines=spark; set spark.executor.memory=2g; // set the memory depends on your server set spark.serializer=org.apache.spark.serializer.kryoSerializer; set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; After it asks for input.... Pass the Query which you want to retrieve the data.... and open a browser and check in the URL by the following command localhost:8080 You can see the Running Jobs and Completed Jobs in the URL.... 这篇关于在火花中访问Hive表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
查看全文

在火花中访问Hive表 [英] Accesing Hive tables in spark

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

在火花中访问Hive表 [英] Accesing Hive tables in spark

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭