在Hive中无法识别刺字符分隔符 [英] Thorn character delimiter is not recognized in Hive
问题描述
在Hive中不识别刺字符分隔符
示例表
$ p $ lt; code> CREATE EXTERNAL TABLE IF NOT EXISTS zzzzz_raw(
spot_id INT,
activity_type_id INT,
activity_type STRING,
activity_id INT,
activity_sub_type STRING,
report_name STRING,
tag_method_id INT
)
由(dt日期)分隔
行格式限定字段终止'\-2'行终止'\\\
'
存储为TEXTFILE
LOCATION'/ raw / data / networkmatchtablesactivity / activity_cat';
输出
* from activity_cat_raw limit 1;
4552126þ805759þeaasv101þ2275868þbfeaac01þBF_EAAccess_InfoPageþ2NULL NULL NULL NULL NULL NULL 2015-03-24
我是否缺少某些东西?
我找到了答案。
代替'-2'(刺分隔符),我使用了'-61'分隔符,然后是一个子字符串来删除额外的符号,如下所示
< code CREATE EXTERNAL TABLE如果不存在SSSSSS(
spot_id STRING,
activity_type_id STRING,
activity_type STRING,
activity_id STRING,
activity_sub_type STRING,
report_name STRING,
tag_method_id STRING
)
由(dt STRING)分隔
行格式限定字段终止'\-61'行终止'\\\
'
保存为文本文件
LOCATION'SSSSSS';
然后使用子字符串删除其他符号
INSERT OVERWRITE TABLE vvvvvv PARTITION(dt)
SELECT spot_id STRING,
substr(activity_type_id,2),
dt
FROM SSSSS
希望它有帮助..
As mentioned in post Using the Icelandic Thorn character as a delimiter in Hive The thorn character delimiter is not recognized in Hive
Sample table
CREATE EXTERNAL TABLE IF NOT EXISTS zzzzz_raw (
spot_id INT,
activity_type_id INT,
activity_type STRING,
activity_id INT,
activity_sub_type STRING,
report_name STRING,
tag_method_id INT
)
PARTITIONED BY ( dt DATE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-2' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/raw/data/networkmatchtablesactivity/activity_cat';
Output
select * from activity_cat_raw limit 1;
4552126þ805759þeaasv101þ2275868þbfeaac01þBF_EA Access_Info Pageþ2 NULL NULL NULL NULL NULL NULL 2015-03-24
Am I missing something?
I found the answer. Instead of '-2' (thorn delimiter) , i used '-61' delimiter then a substring to remove the additional symbol, something like below
CREATE EXTERNAL TABLE IF NOT EXISTS SSSSSS (
spot_id STRING,
activity_type_id STRING,
activity_type STRING,
activity_id STRING,
activity_sub_type STRING,
report_name STRING,
tag_method_id STRING
)
PARTITIONED BY ( dt STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-61' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'SSSSSS';
and then use substring to remove other symbols
INSERT OVERWRITE TABLE vvvvvv PARTITION (dt)
SELECT spot_id STRING,
substr(activity_type_id,2),
dt
FROM SSSSS
Hope it helps..
这篇关于在Hive中无法识别刺字符分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!