Hive在HDFS中存储文件的位置? [英] Where does Hive store files in HDFS?

查看:450
本文介绍了Hive在HDFS中存储文件的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何找到Hive表格和他们所代表的实际HDFS文件(或者说,目录)之间的映射。我需要直接访问表文件。



Hive将文件存储在HDFS中的位置?

解决方案

一旦你知道在哪里寻找,它们存储在HDFS上的位置很容易找到。 :)



如果您在浏览器中访问 http:// NAMENODE_MACHINE_NAME:50070 / ,它应该带您在 $ HIVE_HOME / conf 中浏览文件系统链接。 $ b 目录中有 hive-default.xml 和/或 hive-site.xml 其中具有 hive.metastore.warehouse.dir 属性。在点击浏览文件系统链接后,该值就是您要导航到的位置。



的/ usr /蜂巢/仓库。一旦我导航到该位置,我会看到我的表格的名称。点击一个表名(这只是一个文件夹)将会暴露表的分区。在我的情况下,我目前只将它分割在 date 上。当我点击这个级别的文件夹时,我会看到文件(更多的分区会有更多的级别)。这些文件是数据实际存储在HDFS上的地方。



我没有试图直接访问这些文件,我假设它可以完成。如果你正在考虑编辑它们,我会非常关心。 :)
对我来说 - 我会想出一种方法来做我所需要的,而不需要直接访问磁盘上的Hive数据。如果您需要访问原始数据,则可以使用Hive查询并将结果输出到文件。这些将具有与 HDFS 中的文件完全相同的结构(列之间的分隔符等)。我一直在做这样的查询并将它们转换为CSV。



有关如何将查询中的数据写入磁盘的部分是 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries


I'd like to know how to find the mapping between Hive tables and the actual HDFS files (or rather, directories) that they represent. I need to access the table files directly.

Where does Hive store its files in HDFS?

解决方案

The location they are stored on the HDFS is fairly easy to figure out once you know where to look. :)

If you go to http://NAMENODE_MACHINE_NAME:50070/ in your browser it should take you to a page with a Browse the filesystem link.

In the $HIVE_HOME/conf directory there is the hive-default.xml and/or hive-site.xml which has the hive.metastore.warehouse.dir property. That value is where you will want to navigate to after clicking the Browse the filesystem link.

In mine, it's /usr/hive/warehouse. Once I navigate to that location, I see the names of my tables. Clicking on a table name (which is just a folder) will then expose the partitions of the table. In my case, I currently only have it partitioned on date. When I click on the folder at this level, I will then see files (more partitioning will have more levels). These files are where the data is actually stored on the HDFS.

I have not attempted to access these files directly, I'm assuming it can be done. I would take GREAT care if you are thinking about editing them. :) For me - I'd figure out a way to do what I need to without direct access to the Hive data on the disk. If you need access to raw data, you can use a Hive query and output the result to a file. These will have the exact same structure (divider between columns, ect) as the files on the HDFS. I do queries like this all the time and convert them to CSVs.

The section about how to write data from queries to disk is https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

这篇关于Hive在HDFS中存储文件的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆