从列中删除替换字符 [英] Removing replacement character � from column

查看:20
本文介绍了从列中删除替换字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据我目前的研究,这个字符表示数据库和前端之间的编码错误.不幸的是,我对其中任何一个都没有任何控制权.我正在使用 Teradata Studio.

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.

如何过滤掉这个字符?我正在尝试对偶尔包含 的列执行 REGEX_SUBSTR 函数,这会引发错误字符串包含不可翻译的字符".

How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains , which throws the error "The string contains an untranslatable character".

这是我的 SQL.AIRCFT_POSITN_ID 是包含替换字符的列.

Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.

 SELECT DISTINCT AIRCFT_POSITN_ID, 
 REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
 FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL 
 WHERE DFRL_CREATE_TMS > CURRENT_DATE -25

推荐答案

您的诊断是正确的,所以首先,您可能需要检查会话字符集(它是连接的一部分定义).如果是ASCII,请将其更改为UTF8,您将能够看到原始字符而不是替换字符.

Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition). If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.

如果字符确实是数据的一部分,而不仅仅是编码翻译问题的指示:

And in case the character is indeed part of the data and not just an indication for encoding translations issues:

替代字符 AKA SUB(DEC:26 HEX:1A)在 Teradata 中非常独特.

The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.

你不能直接使用它-

select  '�';

-- [6706] The string contains an untranslatable character.

<小时>

select  '1A'XC;

-- [6706] The string contains an untranslatable character.

如果您使用的是 14.0 或更高版本,则可以使用 CHR 函数生成它:

If you are using version 14.0 or above you can generate it with the CHR function:

select  chr(26);

如果您的版本低于 14.0,您可以像这样生成它:

If you're below version 14.0 you can generate it like this:

select  translate (_unicode '05D0'XC using unicode_to_latin with error);

<小时>

生成字符后,您现在可以通过 REPLACEOTRANSLATE

create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);

insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));

<小时>

select * from t;

-- Hello ���� world ����

<小时>

select otranslate (txt,chr(26),'') from t;

-- Hello  world 

select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;

-- Hello  world 

<小时>

顺便说一句,OTRANSLATEOREPLACE 有 2 个版本:


BTW, there are 2 versions for OTRANSLATE and OREPLACE:

  • syslib 下的函数适用于 LATIN.
  • TD_SYSFNLIB 下的函数适用于 UNICODE.
  • The functions under syslib works with LATIN.
  • the functions under TD_SYSFNLIB works with UNICODE.

这篇关于从列中删除替换字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆