使用nzload加载特殊字符 [英] Using nzload to load special characters

查看:131
本文介绍了使用nzload加载特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Oracle表数据中扩展了ascii字符,我可以使用带\转义字符前缀的sqlplus将其提取到文件中.我想使用nzload将完全相同的数据加载到netezza表中.

I have extended ascii chars in Oracle table data which I am able extract to a file using sqlplus with the \ escape character prefixed. I want to use nzload to load the exact same data into a netezza table.

nzload遇到此字符序列(c2bf)时会添加几个额外的字节 在提取的文件数据中:

nzload adds a couple of extra bytes when it encounters this char seq (c2bf) in the extracted file data:

echo "PROFESSIONAL¿" | od -x
0000000  5052 4f46 4553 5349 4f4e 414c **c2bf** 0a00

nzload之后:

echo "PROFESSIONAL¿" | od -x
0000000  5052 4f46 4553 5349 4f4e 414c **c382 c2bf**

在nzload命令行上,我有以下选项: -escapechar \ -ctrlchars

on the nzload command line, I have these options: -escapechar \ -ctrlchars

有人可以提供任何帮助吗?

Can anyone provide any help with this?

推荐答案

我对Unicode转换问题不是很精明,但是我之前已经对自己做到了,我将演示我的想法.

I'm not very savvy with Unicode conversion issues, but I've done this to myself before, and I'll demonstrate what I think is happening.

我相信您在这里看到的不是用nzload加载特殊字符的问题,而是您的显示/终端软件如何显示数据和/或Netezza如何存储字符数据的问题.我怀疑双向转换为UTF-8(Netezza支持的Unicode编码).让我们看看是否可以确定是哪个.

I believe what you are seeing here is not an issue with loading special characters with nzload, rather it's an issue with how your display/terminal software is display the data and/or Netezza how is storing the character data. I suspect a double conversion to/from UTF-8 (the Unicode encoding that Netezza supports). Let's see if we can suss out which it is.

在这里,我使用的是PuTTY,默认情况下(对我来说)远程字符集为Latin-1.

Here I am using PuTTY with the default (for me) Remote Character Set as Latin-1.

$ od -xa input.txt
0000000    5250    464f    5345    4953    4e4f    4c41    bfc2    000a
          P   R   O   F   E   S   S   I   O   N   A   L   B   ?  nl
0000017

$ cat input.txt
PROFESSIONAL¿

在这里我们从 od 可以看到该文件仅包含我们期望的数据,但是当我们 cat 时,我们看到了多余的字符.如果它不在文件中,则该字符很可能来自显示翻译.

Here we can see from od that the file has only the data we expect, however when we cat the file we see the extra character. If it's not in the file, then the character is likely coming from the display translation.

如果我将PuTTY设置更改为将UTF-8设置为远程字符集,我们将以这种方式看到它:

If I change the PuTTY settings to have UTF-8 be the remote character set, we see it this way:

$ od -xa input.txt
0000000    5250    464f    5345    4953    4e4f    4c41    bfc2    000a
          P   R   O   F   E   S   S   I   O   N   A   L   B   ?  nl
0000017
$ cat input.txt
PROFESSIONAL¿

因此,相同的源数据,但两个不同的屏幕表示形式与两个不同的输出相同,并非巧合.相同的数据至少可以以两种方式显示.

So, the same source data, but two different on-screen representations, which are, not coincidentally, the same as your two different outputs. The same data can be displayed at least two ways.

现在让我们看看如何将其加载到Netezza中,一次加载到VARCHAR列中,再一次加载到NVARCHAR列中.

Now let's see how it loads into Netezza, once into a VARCHAR column, and again into an NVARCHAR column.

create table test_enc_vchar (col1 varchar(50));
create table test_enc_nvchar (col1 nvarchar(50));

$ nzload -db testdb -df input.txt -t test_enc_vchar -escapechar '\' -ctrlchars
Load session of table 'TEST_ENC_VCHAR' completed successfully
$ nzload -db testdb -df input.txt -t test_enc_nvchar -escapechar '\' -ctrlchars
Load session of table 'TEST_ENC_NVCHAR' completed successfully

加载的数据没有错误.请注意,尽管我为 nzload 指定了escapechar选项,但是在此特定输入数据示例中,所有字符都不需要转义,也不会转义.

The data loaded with no errors. Note while I specify the escapechar option for nzload, none of the characters in this specific sample of input data require escaping, nor are they escaped.

我现在将使用SQL Extension Toolkit中的rawtohex函数作为数据库内工具,就像我们在命令行中使用 od 一样.

I will now use the rawtohex function from the SQL Extension Toolkit as an in-database tool like we've used od from the command line.

select rawtohex(col1) from test_enc_vchar;
           RAWTOHEX
------------------------------
 50524F46455353494F4E414CC2BF
(1 row)

select rawtohex(col1) from test_enc_nvchar;
           RAWTOHEX
------------------------------
 50524F46455353494F4E414CC2BF
(1 row)

这时,两列似乎都具有与输入文件完全相同的数据.到目前为止,一切都很好.

At this point both columns seem to have exactly the same data as the input file. So far, so good.

如果我们选择该列怎么办?作为记录,我是在PuTTY会话中使用远程字符集UTF-8进行此操作的.

What if we select the column? For the record, I am doing this in a PuTTY session with remote character set of UTF-8.

select col1 from test_enc_vchar;
      COL1
----------------
 PROFESSIONAL¿
(1 row)

select col1 from test_enc_nvchar;
     COL1
---------------
 PROFESSIONAL¿
(1 row)

相同的二进制数据,但显示不同.然后,如果我将每个选择的输出复制到通过 od

Same binary data, but different display. If I then copy the output of each of those selects into echo piped to od,

$ echo PROFESSIONAL¿ | od -xa
0000000    5250    464f    5345    4953    4e4f    4c41    82c3    bfc2
          P   R   O   F   E   S   S   I   O   N   A   L   C stx   B   ?
0000020    000a
         nl
0000021

$ echo  PROFESSIONAL¿ | od -xa
0000000    5250    464f    5345    4953    4e4f    4c41    bfc2    000a
          P   R   O   F   E   S   S   I   O   N   A   L   B   ?  nl
0000017

基于此输出,我押注您正在将样本数据(也是押注UTF-8)加载到VARCHAR列而不是NVARCHAR列中.这本身不是问题,但是可能会导致显示/转换问题.

Based on this output, I'd wager that you are loading your sample data, which I'd also wager is UTF-8, into a VARCHAR column rather than an NVARCHAR column. This is not, in of itself, a problem, but can have display/conversion issues down the line.

通常来说,您希望将UTF-8数据加载到NVARCHAR列中.

Generally speaking, you'd want to load UTF-8 data into NVARCHAR columns.

这篇关于使用nzload加载特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆