使用冰岛荆棘角色作为Hive的分隔符 [英] Using the Icelandic Thorn character as a delimiter in Hive

查看：125 发布时间：2017/8/17 1:22:53 encoding hadoop hive

本文介绍了使用冰岛荆棘角色作为Hive的分隔符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将一些DoubleClick广告日志导入Hadoop。

I'm currently trying to import some DoubleClick advertising logs into Hadoop.

这些日志存储在一个gzip分隔的文件中，该文件使用页面1252（Windows- ANSI？），它使用冰岛荆棘角色作为分隔符。

These logs are stored in a gzip delimited file which is encoding using page 1252 (Windows-ANSI?) and which uses the Icelandic Thorn character as a delimiter.

我可以很高兴地将这些日志导入单个列，但我似乎找不到一种方式获取Hive了解Thorn角色 - 我想也许是因为它不了解1252编码？

I can happily import these logs into a single column, but I can't seem to find a way to get Hive to understand the Thorn character - I think maybe because it doesn't understand the 1252 encoding?

我看过了Create Table文档 - http://hive.apache.org/docs/r0.9.0/ language_manual / data-manipulation-statements.html - 但似乎找不到任何方式来获得这个编码/分隔符的工作。

I've looked at the Create Table documentation - http://hive.apache.org/docs/r0.9.0/language_manual/data-manipulation-statements.html - but can't seem to find any way to get this encoding/delimiter working.

我也有从 https://karmasphere.com/karmasphere-analyst-faq 看到的建议这些文件的编码是ISO-8859-1 - 但我没有看到如何在Hive或HDFS中使用该信息。

I've also seen from https://karmasphere.com/karmasphere-analyst-faq a suggestion that the encoding for these files is ISO-8859-1 - but I don't see how to use that info in Hive or HDFS.

我知道我可以做一个导入后的地图作业将这些行分成多个记录。

I know I can do a map job after import to split these rows into multiple records.

但直接使用这个分隔符有更简单的方法吗？

But is there an easier way to use this delimiter directly?

谢谢

Stuart

使用冰岛荆棘角色作为Hive的分隔符 [英] Using the Icelandic Thorn character as a delimiter in Hive

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

使用冰岛荆棘角色作为Hive的分隔符 [英] Using the Icelandic Thorn character as a delimiter in Hive

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭