Can Hive可递归下降到没有分区的子目录或编辑hive-site.xml? [英] Can Hive recursively descend into subdirectories without partitions or editing hive-site.xml?

查看:230
本文介绍了Can Hive可递归下降到没有分区的子目录或编辑hive-site.xml?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些我想用Hive查询的Web服务器日志。 HDFS中的目录结构如下所示:

  / data / access / web1 / 2014/09 
/ data / access / web1 / 2014/09 / access-20140901.log
[... etc ...]
/ data / access / web1 / 2014/10
/ data / access /web1/2014/10/access-20141001.log
[... etc ...]
/ data / access / web2 / 2014/09
/ data / access / web2 / 2014/09 / access-20140901.log
[... etc ...]
/ data / access / web2 / 2014/10
/ data / access / web2 / 2014/10 /access-20141001.log
[... etc ...]



'I'能够创建一个外部表:

  CREATE EXTERNAL TABLE访问(
主机STRING,
标识STRING ,
用户STRING,
时间STRING,
请求STRING,
状态STRING,
大小STRING,
referer STRING,
代理STRING)
ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
input.regex=([^] *)([^] *)([^] *)( - | \\ [[^ \\]] * \\] )( - | [0-9] *)( - | [0-9] *)(?:([^ \] * | \[^ \] * \ \] * | \[^ \] * \)([^ \] * | \[^ \] * \))?,
output.format.string=%1 $ s%2 $ s%3 $ s%4 $ s%5 $ s%6 $ s%7 $ s%8 $ s%9 $ s)
LOCATION'/ data / access /'

...虽然Hive并没有进入除非在运行Hive查询之前运行以下命令:

  set hive.input.dir.recursive = true; 
set hive.mapred.supports.subdirectories = true;
set hive.supports.subdirectories = true;
set mapred.input.dir.recursive = true;

我已经看到其他帖子在表级别设置这些属性(例如问题使用tblproperties 创建Hive External table):

  TBLPROPERTIES(hive.input.dir.recursive=TRUE,
hive.mapred.supports.subdirectories=TRUE,
hive.supports.subdirectories=TRUE,
mapred.input.dir.recursive=TRUE);

不幸的是,这并不适用于我:查询时,表不会返回任何记录它。我知道可以在hive-site.xml中设置这些属性,但是我宁愿不做任何可能影响其他用户的更改(如果我不需要)。



< Q)有没有一种方法可以创建一个表,它可以在不使用分区的情况下进入子目录,进行整个网站的更改或每次运行这4个命令?

解决方案

在HDInsight中使用Hive之前,我在Hive查询中创建外部表之前设置了以下属性,它适用于我。

  SET hive.mapred.supports.subdirectories = TRUE; 
SET mapred.input.dir.recursive = TRUE;


I have some web server logs that I'd like to query with Hive. The directory structure, in HDFS, looks like this:

/data/access/web1/2014/09
/data/access/web1/2014/09/access-20140901.log
[... etc ...]
/data/access/web1/2014/10
/data/access/web1/2014/10/access-20141001.log
[... etc ...]
/data/access/web2/2014/09
/data/access/web2/2014/09/access-20140901.log
[... etc ...]
/data/access/web2/2014/10
/data/access/web2/2014/10/access-20141001.log
[... etc ...]

I'm able to create an external table:

CREATE EXTERNAL TABLE access(
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s")
LOCATION '/data/access/'

... though Hive doesn't descend into the subfolders unless I run the following commands before running the Hive query:

set hive.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;
set hive.supports.subdirectories=true;
set mapred.input.dir.recursive=true;

I've seen other posts set these properties at the table-level (e.g. Issue creating Hive External table using tblproperties):

TBLPROPERTIES ("hive.input.dir.recursive" = "TRUE", 
    "hive.mapred.supports.subdirectories" = "TRUE",
    "hive.supports.subdirectories" = "TRUE", 
    "mapred.input.dir.recursive" = "TRUE");

Unfortunately, this didn't work for me: the table doesn't return any records when I query it. I understand it's possible to set these properties in hive-site.xml, but I'd rather not make any changes that might impact other users if I don't need to.

Q) is there a way to create a table that descends into the subdirectories without using partitions, making site-wide changes, or running those 4 commands every time?

解决方案

Using Hive in HDInsight, I set the following properties before I create my external table in the Hive query and it works for me.

SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;

这篇关于Can Hive可递归下降到没有分区的子目录或编辑hive-site.xml?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆