为什么有许多火花仓库文件夹被创建? [英] Why there are many spark-warehouse folders got created?

查看:151
本文介绍了为什么有许多火花仓库文件夹被创建?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ubuntu上安装了hadoop 2.8.1,然后在其上安装了spark-2.2.0-bin-hadoop2.7。我使用了spark-shell并创建了表格。我再次使用直线并创建表格。我观察到有三个不同的文件夹被创建,名为spark-warehouse,如下所示:



<1-> spark-2.2.0-bin-hadoop2.7 / spark-warehouse

<2-> spark-2.2.0-bin-hadoop2.7 / bin / spark-warehouse

spark-2.2.0-bin-hadoop2.7 / sbin / spark-warehouse



什么是火花仓库,以及为什么会创建多次?
有时我的火花外壳和直线显示不同的数据库和表格,有时它显示相同。我没有看到发生了什么?



此外,我没有安装配置单元,但仍然可以使用直线,并且我可以通过java程序访问数据库。蜂巢是如何在我的机器上出现的?
请帮助我。



以下是我用来通过JDBC连接apache spark的java代码:

  private static String driverName =org.apache.hive.jdbc.HiveDriver; 

public static void main(String [] args)throws SQLException {
try {
Class.forName(driverName);
} catch(ClassNotFoundException e){
// TODO自动生成的catch块
e.printStackTrace();
System.exit(1);

Connection con = DriverManager.getConnection(jdbc:hive2://10.171.0.117:10000 / default,,);
Statement stmt = con.createStatement();


解决方案


-warehouse,为什么这些会创建很多次?

除非另行配置,否则Spark将创建一个名为 metastore_db 与 derby.log 。看起来你没有改变。



这是缺省行为,正如文档中指出的那样


当未由 hive-site.xml 配置时,上下文自动在当前目录中创建 metastore_db ,并且创建一个由 spark.sql.warehouse.dir 配置的目录,该目录默认为当前目录 spark-warehouse 启动Spark应用程序的目录




blockquote>

有时我的火花外壳和直线会显示不同的数据库和表格,有时显示相同的数据和表格

在这些不同的文件夹中重新启动这些命令,所以你看到的只是局限于当前的工作目录。


我使用直线和创建的表格......我的机器上的蜂房是如何出现的?

它没有。您可能会连接到 Spark Thrift Server ,它与HiveServer2协议,Derby数据库完全兼容,如上所述,或者实际上确实有一个HiveServer2实例坐落在 10.171.0.117



无论如何,这里不需要JDBC连接。您可以直接使用 SparkSession.sql 函数。

I have installed hadoop 2.8.1 on ubuntu and then installed spark-2.2.0-bin-hadoop2.7 on it. I used spark-shell and created the tables. Again I used beeline and created tables. I have observed that there are three different folders got created named spark-warehouse as :

1- spark-2.2.0-bin-hadoop2.7/spark-warehouse

2- spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse

3- spark-2.2.0-bin-hadoop2.7/sbin/spark-warehouse

What is exactly spark-warehouse and why are these created many times? Sometimes my spark shell and beeline shows different databases and tables and sometimes it show same. I am not getting what is happening?

Further, I did not installed hive but still I am able to use beeline and also I can access the databases though java program. How the hive came on my machine? Please help me. I am new to spark and installed it by online tutorials.

Below is the java code I was using to connect apache spark though JDBC:

 private static String driverName = "org.apache.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {
    try {
        Class.forName(driverName);
    } catch (ClassNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        System.exit(1);
    }
    Connection con = DriverManager.getConnection("jdbc:hive2://10.171.0.117:10000/default", "", "");
    Statement stmt = con.createStatement();

解决方案

What is exactly spark-warehouse and why are these created many times?

Unless configured otherwise, Spark will create an internal Derby database named metastore_db with a derby.log. Looks like you've not changed that.

This is the default behavior, as point out in the Documentation

When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started

Sometimes my spark shell and beeline shows different databases and tables and sometimes it show same

You're starting those commands in those different folders, so what you see is only confined to the current working directory.

I used beeline and created tables... How the hive came on my machine?

It didn't. You're probably connecting to the either the Spark Thrift Server, which is fully compatible with HiveServer2 protocol, the Derby database, as mentioned, or, you actually do have a HiveServer2 instance sitting at 10.171.0.117

Anyways, the JDBC connection is not required here. You can use SparkSession.sql function directly.

这篇关于为什么有许多火花仓库文件夹被创建?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆