使用冰岛荆棘角色作为Hive的分隔符 [英] Using the Icelandic Thorn character as a delimiter in Hive
问题描述
我正在尝试将一些DoubleClick广告日志导入Hadoop。
I'm currently trying to import some DoubleClick advertising logs into Hadoop.
这些日志存储在一个gzip分隔的文件中,该文件使用页面1252(Windows- ANSI?),它使用冰岛荆棘角色作为分隔符。
These logs are stored in a gzip delimited file which is encoding using page 1252 (Windows-ANSI?) and which uses the Icelandic Thorn character as a delimiter.
我可以很高兴地将这些日志导入单个列,但我似乎找不到一种方式获取Hive了解Thorn角色 - 我想也许是因为它不了解1252编码?
I can happily import these logs into a single column, but I can't seem to find a way to get Hive to understand the Thorn character - I think maybe because it doesn't understand the 1252 encoding?
我看过了Create Table文档 - http://hive.apache.org/docs/r0.9.0/ language_manual / data-manipulation-statements.html - 但似乎找不到任何方式来获得这个编码/分隔符的工作。
I've looked at the Create Table documentation - http://hive.apache.org/docs/r0.9.0/language_manual/data-manipulation-statements.html - but can't seem to find any way to get this encoding/delimiter working.
我也有从 https://karmasphere.com/karmasphere-analyst-faq 看到的建议这些文件的编码是ISO-8859-1 - 但我没有看到如何在Hive或HDFS中使用该信息。
I've also seen from https://karmasphere.com/karmasphere-analyst-faq a suggestion that the encoding for these files is ISO-8859-1 - but I don't see how to use that info in Hive or HDFS.
我知道我可以做一个导入后的地图作业将这些行分成多个记录。
I know I can do a map job after import to split these rows into multiple records.
但直接使用这个分隔符有更简单的方法吗?
But is there an easier way to use this delimiter directly?
谢谢
Stuart
推荐答案
使用'\ -2'
char是一个有符号的字节。
use '\-2' the char is a signed byte.
显然,hive开发者不认为这是一个问题:
https://issues.apache.org/jira/browse/HIVE-237
apparently hive devs don't think it is a problem: https://issues.apache.org/jira/browse/HIVE-237
这篇关于使用冰岛荆棘角色作为Hive的分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!