Hive 在 HDFS 中存储文件的位置? [英] Where does Hive store files in HDFS?
问题描述
我想知道如何找到 Hive 表和它们代表的实际 HDFS 文件(或者更确切地说,目录)之间的映射.我需要直接访问表文件.
I'd like to know how to find the mapping between Hive tables and the actual HDFS files (or rather, directories) that they represent. I need to access the table files directly.
Hive 将其文件存储在 HDFS 中的什么位置?
Where does Hive store its files in HDFS?
推荐答案
一旦您知道要查找的位置,就很容易确定它们在 HDFS 上的存储位置.:)
The location they are stored on the HDFS is fairly easy to figure out once you know where to look. :)
如果您在浏览器中访问 http://NAMENODE_MACHINE_NAME:50070/
,它应该会将您带到带有 Browse the filesystem
链接的页面.
If you go to http://NAMENODE_MACHINE_NAME:50070/
in your browser it should take you to a page with a Browse the filesystem
link.
在 $HIVE_HOME/conf
目录中有 hive-default.xml
和/或 hive-site.xml
hive.metastore.warehouse.dir
属性.该值是您在点击 Browse the filesystem
链接后想要导航到的位置.
In the $HIVE_HOME/conf
directory there is the hive-default.xml
and/or hive-site.xml
which has the hive.metastore.warehouse.dir
property. That value is where you will want to navigate to after clicking the Browse the filesystem
link.
在我这里,它是 /usr/hive/warehouse
.一旦我导航到那个位置,我就会看到我的表的名称.单击表名称(只是一个文件夹)将显示表的分区.就我而言,我目前只在 date
上对其进行了分区.当我点击这个级别的文件夹时,我会看到文件(更多的分区会有更多的级别).这些文件是数据实际存储在 HDFS 上的位置.
In mine, it's /usr/hive/warehouse
. Once I navigate to that location, I see the names of my tables. Clicking on a table name (which is just a folder) will then expose the partitions of the table. In my case, I currently only have it partitioned on date
. When I click on the folder at this level, I will then see files (more partitioning will have more levels). These files are where the data is actually stored on the HDFS.
我没有尝试直接访问这些文件,我假设可以做到.如果您正在考虑编辑它们,我会非常小心.:)对我来说 - 我会想出一种方法来做我需要做的事情,而无需直接访问磁盘上的 Hive 数据.如果需要访问原始数据,可以使用 Hive 查询并将结果输出到文件.这些将具有与 HDFS
上的文件完全相同的结构(列之间的分隔符等).我一直在做这样的查询并将它们转换为 CSV.
I have not attempted to access these files directly, I'm assuming it can be done. I would take GREAT care if you are thinking about editing them. :)
For me - I'd figure out a way to do what I need to without direct access to the Hive data on the disk. If you need access to raw data, you can use a Hive query and output the result to a file. These will have the exact same structure (divider between columns, ect) as the files on the HDFS
. I do queries like this all the time and convert them to CSVs.
关于如何将查询中的数据写入磁盘的部分是https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries
The section about how to write data from queries to disk is https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries
更新
自 Hadoop 3.0.0 - Alpha 1 以来,默认端口号发生了变化.NAMENODE_MACHINE_NAME:50070 更改为 NAMENODE_MACHINE_NAME:9870.如果您在 Hadoop 3.x 上运行,请使用后者.HDFS-9427
Since Hadoop 3.0.0 - Alpha 1 there is a change in the default port numbers. NAMENODE_MACHINE_NAME:50070 changes to NAMENODE_MACHINE_NAME:9870. Use the latter if you are running on Hadoop 3.x. The full list of port changes are described in HDFS-9427
这篇关于Hive 在 HDFS 中存储文件的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!