Hive 在 HDFS 中存储文件的位置? [英] Where does Hive store files in HDFS?

查看:51
本文介绍了Hive 在 HDFS 中存储文件的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何找到 Hive 表和它们代表的实际 HDFS 文件(或者更确切地说,目录)之间的映射.我需要直接访问表文件.

I'd like to know how to find the mapping between Hive tables and the actual HDFS files (or rather, directories) that they represent. I need to access the table files directly.

Hive 将其文件存储在 HDFS 中的什么位置?

Where does Hive store its files in HDFS?

推荐答案

一旦您知道要查找的位置,就很容易确定它们在 HDFS 上的存储位置.:)

The location they are stored on the HDFS is fairly easy to figure out once you know where to look. :)

如果您在浏览器中访问 http://NAMENODE_MACHINE_NAME:50070/,它应该会将您带到带有 Browse the filesystem 链接的页面.

If you go to http://NAMENODE_MACHINE_NAME:50070/ in your browser it should take you to a page with a Browse the filesystem link.

$HIVE_HOME/conf 目录中有 hive-default.xml 和/或 hive-site.xmlhive.metastore.warehouse.dir 属性.该值是您在点击 Browse the filesystem 链接后​​想要导航到的位置.

In the $HIVE_HOME/conf directory there is the hive-default.xml and/or hive-site.xml which has the hive.metastore.warehouse.dir property. That value is where you will want to navigate to after clicking the Browse the filesystem link.

在我这里,它是 /usr/hive/warehouse.一旦我导航到那个位置,我就会看到我的表的名称.单击表名称(只是一个文件夹)将显示表的分区.就我而言,我目前只在 date 上对其进行了分区.当我点击这个级别的文件夹时,我会看到文件(更多的分区会有更多的级别).这些文件是数据实际存储在 HDFS 上的位置.

In mine, it's /usr/hive/warehouse. Once I navigate to that location, I see the names of my tables. Clicking on a table name (which is just a folder) will then expose the partitions of the table. In my case, I currently only have it partitioned on date. When I click on the folder at this level, I will then see files (more partitioning will have more levels). These files are where the data is actually stored on the HDFS.

我没有尝试直接访问这些文件,我假设可以做到.如果您正在考虑编辑它们,我会非常小心.:)对我来说 - 我会想出一种方法来做我需要做的事情,而无需直接访问磁盘上的 Hive 数据.如果需要访问原始数据,可以使用 Hive 查询并将结果输出到文件.这些将具有与 HDFS 上的文件完全相同的结构(列之间的分隔符等).我一直在做这样的查询并将它们转换为 CSV.

I have not attempted to access these files directly, I'm assuming it can be done. I would take GREAT care if you are thinking about editing them. :) For me - I'd figure out a way to do what I need to without direct access to the Hive data on the disk. If you need access to raw data, you can use a Hive query and output the result to a file. These will have the exact same structure (divider between columns, ect) as the files on the HDFS. I do queries like this all the time and convert them to CSVs.

关于如何将查询中的数据写入磁盘的部分是https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

The section about how to write data from queries to disk is https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

更新

自 Hadoop 3.0.0 - Alpha 1 以来,默认端口号发生了变化.NAMENODE_MACHINE_NAME:50070 更改为 NAMENODE_MACHINE_NAME:9870.如果您在 Hadoop 3.x 上运行,请使用后者.HDFS-9427

Since Hadoop 3.0.0 - Alpha 1 there is a change in the default port numbers. NAMENODE_MACHINE_NAME:50070 changes to NAMENODE_MACHINE_NAME:9870. Use the latter if you are running on Hadoop 3.x. The full list of port changes are described in HDFS-9427

这篇关于Hive 在 HDFS 中存储文件的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆