如何将apache日志的日期和时间表达为蜂巢 [英] how to regex apache log date and time into hive
问题描述
我想把我的日志文件放入配置单元(亚马逊雅典娜)
我的正则表达式没问题,测试人员说: https://regex101.com/r/hF4fP8/11
my create表是这样的:
CREATE EXTERNAL TABLE IF NOT EXISTS webservicelogs.Test15(
`day` int,
`月`字符串,
`year` int,
`小时`int,
`分钟`int,
`秒`int,
`偏移量`字符串
)
ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES('input.regex'='\ [(\d {2}) \ /([A-ZA-Z] {3})\ /(\d {4}):( \d {2}):( \d {2}):( \d { 2)} \s(\ + \d {4})]')
LOCATION's3:// getag-athena / Test /'
TBLPROPERTIES('has_encrypted_data'='false' )
create table语句有效
如果我想要将表中的这个错误发生在
SELECT * FROMwebservicelogs。test15limit 10;
您的查询有以下错误:
HIVE_CURSOR_ERROR:匹配组的数量与列数不匹配
我想解析的日志文件是这样的:
85.239.101.101 - - [07 / Jan / 2016:01:00:00 +0100]POST / bpwsortsinfo1-3 / services / Ortsinfo?wsdl HTTP / 1.1200 467 - Axis2449/1883 23 BP7 0
我已经回答了我自己和同事的帮助
所有\ s必须用另一个反斜杠转义,更好:所有被转义的特殊字符必须被双重转义才是java事物
(。*)\\s(。*)\\s(。*)\\s\\ [({\\d 2})\\ /([A-ZA-Z] {3})\\ /(\\d {4}):( \\d {2}):( \ \d {2}):( \\d {2})\\s(\\ + \\d {4})]。* ?$
i want to put my logfiles into a hive (amazon Athena)
my regex is ok, says the tester: https://regex101.com/r/hF4fP8/11
my create table is this:
CREATE EXTERNAL TABLE IF NOT EXISTS webservicelogs.Test15 (
`day` int,
`month` string,
`year` int,
`hour` int,
`minute` int,
`second` int,
`offset` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex' = '\[(\d{2})\/([a-zA-Z]{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\+\d{4})]' )
LOCATION 's3://getag-athena/Test/'
TBLPROPERTIES ('has_encrypted_data'='false')
the create table statement works
if i want to select the table this erros occures
SELECT * FROM "webservicelogs"."test15" limit 10;
Your query has the following error(s):
HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns
the Logfiles i want to parse is like this:
85.239.101.101 - - [07/Jan/2016:01:00:00 +0100] "POST /bpwsortsinfo1-3/services/Ortsinfo?wsdl HTTP/1.1" 200 467 "-" "Axis2" 449/1883 23 BP7 0
i have answered by myself and a help from a colleague
all the \s ses have to be escaped with another backslash, better: all the special characters which are escaped have to be double escaped thats a java thing
(.*)\\s(.*)\\s(.*)\\s\\[(\\d{2})\\/([a-zA-Z]{3})\\/(\\d{4}):(\\d{2}):(\\d{2}):(\\d{2})\\s(\\+\\d{4})].*?$
这篇关于如何将apache日志的日期和时间表达为蜂巢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!