如何在蜂巢查询中获取输入文件名作为列 [英] how to get input file name as column within hive query

查看:104
本文介绍了如何在蜂巢查询中获取输入文件名作为列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个配置单元外部表,该表映射到某个目录. 该目录包含几个文件.

I have a hive external tables that mapped to some directory. This directory includes a several files.

我想在有用户"abc"的地方运行查询(如查找文件名)

I want to run query like find file name where there is a user "abc"

 select file_name , usr from usrs_tables where usr = "abc"

但是,当然,数据中不包含文件名.

But of course the data doesn't includes file name inside.

在MapReduce中,我可以通过

In MapReduce I can do it by

FileSplit fileSplit = (FileSplit)context.getInputSplit();
String filename = fileSplit.getPath().getName();
System.out.println("File name "+filename);
System.out.println("Directory and File name"+fileSplit.getPath().toString());

我如何在Hive中做到这一点?

How can I do it in Hive?

推荐答案

是的,您可以使用名为INPUT__FILE__NAME的虚拟列来检索在其中找到记录的文件,例如:

Yes, you can retrieve the file the record was found in using the virtual column named INPUT__FILE__NAME, for example:

select INPUT__FILE__NAME, id, name from users where ...;

产生类似:

hdfs://localhost.localdomain:8020/user/hive/warehouse/users/users1.txt    2    user2
hdfs://localhost.localdomain:8020/user/hive/warehouse/users/users2.txt    42    john.doe

如有必要,请使用提供的字符串函数修剪uri中的主机和目录.

If necessary, use the provided string functions to trim the host and directories from the uri.

您可以在以下虚拟列上找到文档: https://cwiki. apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

You can find the documentation on virtual columns here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

这篇关于如何在蜂巢查询中获取输入文件名作为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆