使用OpenCSVSerde将具有NULL的列作为某些字符串写入-HIVE [英] Writing columns having NULL as some string using OpenCSVSerde - HIVE

查看:65
本文介绍了使用OpenCSVSerde将具有NULL的列作为某些字符串写入-HIVE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用'org.apache.hadoop.hive.serde2.OpenCSVSerde'写入配置单元表数据.

I'm using 'org.apache.hadoop.hive.serde2.OpenCSVSerde' to write hive table data.

CREATE TABLE testtable ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  "separatorChar" = ","
  "quoteChar"     = "'"
   )  
STORED AS TEXTFILE LOCATION '<location>' AS
select * from foo;

因此,如果'foo'表中包含空字符串,例如:'1','2',''.空字符串按原样写入文本文件.文本文件中的数据读取为'1','2',''

So, if 'foo' table has empty strings in it, for eg: '1','2','' . The empty strings are written as is to the textfile. The data in textfile reads '1','2',''

但是如果'foo'包含空值,例如:'1','2',null.空值未写入文本文件中.文本文件中的数据读取为'1','2',

But if 'foo' contains null values, for eg: '1','2',null. The null value is not written in the text file. The data in the textfile reads '1','2',

如何确保使用csv serde将空值正确写入文本文件.是写为空字符串还是其他任何字符串都为"nullstring"?

How do I make sure that the nulls are properly written to the textfile using csv serde. Either written as empty strings or any other string say "nullstring"?

我也尝试过:

CREATE TABLE testtable ROW FORMAT SERDE
....
....  
STORED AS TEXTFILE LOCATION '<location>'
TBLPROPERTIES ('serialization.null.format'='')
AS select * foo;

尽管这可能应该将空字符串替换为null.但这甚至没有做到这一点.

Though this should probably replace the empty strings with null. But this doesn't even do that.

请指导我如何将空值写入csv文件.

Please guide me on how to write nulls to csv files.

我是否必须检查选择查询本身(例如NVL或其他)中的列的空值并将其替换为某些内容?

Will I have to check for the null values for columns in the select query itself like (NVL or something) and replace it with something?

推荐答案

打开CSV Serde忽略'serialization.null.format'属性,您可以使用以下步骤处理空值

Open CSV Serde ignores 'serialization.null.format' property , you can handle null values using below steps

1. CREATE TABLE testtable 
    (
    name string,
    title string,
    birth_year string
    )ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
    "separatorChar" = ","
    ,"quoteChar"     = "'"
    )
    STORED AS TEXTFILE;

2. load data into testtable

3. CREATE  TABLE testtable1
(
name string,
title string,
birth_year string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
TBLPROPERTIES('serialization.null.format'='');

4. INSERT OVERWRITE TABLE testtable1 SELECT * FROM testtable

这篇关于使用OpenCSVSerde将具有NULL的列作为某些字符串写入-HIVE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆