如何将apache日志的日期和时间表达为蜂巢 [英] how to regex apache log date and time into hive

查看:328
本文介绍了如何将apache日志的日期和时间表达为蜂巢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想把我的日志文件放入配置单元(亚马逊雅典娜)



我的正则表达式没问题,测试人员说: https://regex101.com/r/hF4fP8/11



my create表是这样的:

  CREATE EXTERNAL TABLE IF NOT EXISTS webservicelogs.Test15(
`day` int,
`月`字符串,
`year` int,
`小时`int,
`分钟`int,
`秒`int,
`偏移量`字符串

ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES('input.regex'='\ [(\d {2}) \ /([A-ZA-Z] {3})\ /(\d {4}):( \d {2}):( \d {2}):( \d { 2)} \s(\ + \d {4})]')
LOCATION's3:// getag-athena / Test /'
TBLPROPERTIES('has_encrypted_data'='false' )

create table语句有效

如果我想要将表中的这个错误发生在

  SELECT * FROMwebservicelogs。test15limit 10; 

您的查询有以下错误:

  HIVE_CURSOR_ERROR:匹配组的数量与列数不匹配

我想解析的日志文件是这样的:

  85.239.101.101  -   -  [07 / Jan / 2016:01:00:00 +0100]POST / bpwsortsinfo1-3 / services / Ortsinfo?wsdl HTTP / 1.1200 467 - Axis2449/1883 23 BP7 0 


解决方案

我已经回答了我自己和同事的帮助

所有\ s必须用另一个反斜杠转义,更好:所有被转义的特殊字符必须被双重转义才是java事物

 (。*)\\s(。*)\\s(。*)\\s\\ [({\\d 2})\\ /([A-ZA-Z] {3})\\ /(\\d {4}):( \\d {2}):( \ \d {2}):( \\d {2})\\s(\\ + \\d {4})]。* ?$ 


i want to put my logfiles into a hive (amazon Athena)

my regex is ok, says the tester: https://regex101.com/r/hF4fP8/11

my create table is this:

CREATE EXTERNAL TABLE IF NOT EXISTS webservicelogs.Test15 (
         `day` int,
         `month` string,
         `year` int,
         `hour` int,
         `minute` int,
         `second` int,
         `offset` string 
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex' = '\[(\d{2})\/([a-zA-Z]{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\+\d{4})]' ) 
LOCATION 's3://getag-athena/Test/' 
TBLPROPERTIES ('has_encrypted_data'='false')

the create table statement works

if i want to select the table this erros occures

SELECT * FROM "webservicelogs"."test15" limit 10;

Your query has the following error(s):

HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns

the Logfiles i want to parse is like this:

85.239.101.101 - - [07/Jan/2016:01:00:00 +0100] "POST /bpwsortsinfo1-3/services/Ortsinfo?wsdl HTTP/1.1" 200 467 "-" "Axis2" 449/1883 23 BP7 0

解决方案

i have answered by myself and a help from a colleague

all the \s ses have to be escaped with another backslash, better: all the special characters which are escaped have to be double escaped thats a java thing

(.*)\\s(.*)\\s(.*)\\s\\[(\\d{2})\\/([a-zA-Z]{3})\\/(\\d{4}):(\\d{2}):(\\d{2}):(\\d{2})\\s(\\+\\d{4})].*?$

这篇关于如何将apache日志的日期和时间表达为蜂巢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆