在 Cloudera 中使用 serde 加载 JSON 文件 [英] Loading JSON file with serde in Cloudera

查看：68 发布时间：2021/11/12 4:05:20 hadoop hive apache-pig hue cloudera-cdh

本文介绍了在 Cloudera 中使用 serde 加载 JSON 文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用具有此包结构的 JSON 文件:

I am trying to work with a JSON file with this bag structure :

{
   "user_id": "kim95",
   "type": "Book",
   "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.",
   "year": "1995",
   "publisher": "ACM Press and Addison-Wesley",
   "authors": [
      {
         "name": "null"
      }
   ],
   "source": "DBLP"
}
{
   "user_id": "marshallo79",
   "type": "Book",
   "title": "Inequalities: Theory of Majorization and Its Application.",
   "year": "1979",
   "publisher": "Academic Press",
   "authors": [
      {
         "name": "Albert W. Marshall" 
      },
      {
         "name": "Ingram Olkin"
      }
   ],
   "source": "DBLP"
}

我尝试使用 serde 为 Hive 加载 JSON 数据.我遵循了我在这里看到的两种方式:http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

I tried to use serde to load JSON data for Hive. I followed both ways that I saw here : http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

使用此代码:

CREATE EXTERNAL TABLE IF NOT EXISTS serd (
           user_id:string, 
           type:string, 
           title:string,
           year:string,
           publisher:string,
           authors:array<struct<name:string>>,
           source:string)       
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION '/user/hdfs/data/book-seded_workings-reduced.json';

我收到此错误:

error while compiling statement: failed: parseexception line 2:17 cannot recognize input near ':' 'string' ',' in column type

我也尝试过这个版本:https://github.com/rcongiu/Hive-JSON-塞尔德

给出了不同的错误:

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.openx.data.jsonserde.JsonSerde

有什么想法吗?

我还想知道有什么替代方法可以使用这样的 JSON 来查询作者"中的姓名"字段.是猪还是蜂巢?

I also want to know what are alternatives to work with a JSON like this to make queries on 'name' field in 'authors'. Whether it's Pig or Hive?

我已经将其转换为tsv"文件.但是，由于我的作者列是一个元组，如果我从该文件构建表，我不知道如何使用 Hive 对名称"发出请求.我应该更改我的tsv"转换脚本还是保留它?或者有没有 Hive 或 Pig 的替代品?

I have already converted it in to a "tsv" file. But, since my authors column is a tuple, I don't know how make requests on 'name' with Hive, If I build a table from this file. Should I change my script for "tsv" conversion or keep it? Or are there any alternatives with Hive or Pig?

在 Cloudera 中使用 serde 加载 JSON 文件 [英] Loading JSON file with serde in Cloudera

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 Cloudera 中使用 serde 加载 JSON 文件 [英] Loading JSON file with serde in Cloudera

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭