为什么在加入时Redshift自动修剪varchar列? [英] Why Redshift automatically trims varchar column when joining?

查看:122
本文介绍了为什么在加入时Redshift自动修剪varchar列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Redshift时遇到了独特的问题.请参见以下说明性示例:

I encountered unique problem when using Redshift. Please see the below illustrative example:

drop table if exists joinTrim_temp1;
create table joinTrim_temp1(rowIndex1 int, charToJoin1 varchar(20));
insert into joinTrim_temp1 values(1, 'Sudan' );
insert into joinTrim_temp1 values(2, 'Africa' );
insert into joinTrim_temp1 values(3, 'USA' );

drop table if exists joinTrim_temp2;
create table joinTrim_temp2(rowIndex2 int, charToJoin2 varchar(20));
insert into joinTrim_temp2 values(1, 'Sudan ' );
insert into joinTrim_temp2 values(2, 'Africa ' );
insert into joinTrim_temp2 values(3, 'USA ' );

select * from joinTrim_temp1 a join joinTrim_temp2 b on a.charToJoin1 = b.charToJoin2;

查询的输出如下:

在查询中,您可以看到第二个表中有一个尾随空格.因此,不应进行内部连接.但是似乎Redshift在加入时能够修剪尾随的空格.

In the query you can see that there is a trailing space in the second table. So no inner join should take place. But it seems that Redshift is able to trim the trailing whitespaces when joining.

在将现有的Redshift sql代码转换为PySpark时遇到了这个问题.

I encountered this problem, while converting the existing Redshift sql code to PySpark.

关于, 库马尔

推荐答案

啊!确实,这是一个非常有趣的发现!

Ah! Indeed, a very interesting find!

来自字符类型-亚马逊Redshift :

比较值时,VARCHAR和CHAR值中的尾随空格被视为语义上无关紧要的.

Trailing spaces in VARCHAR and CHAR values are treated as semantically insignificant when values are compared.

看来,如果您想强制比较,是否需要避免尾随空格,例如:

It appears that, if you wish to force the comparison, would you need to avoid the trailing space, such as:

SELECT * 
FROM joinTrim_temp1 a 
JOIN joinTrim_temp2 b 
ON a.charToJoin1 || '.' = b.charToJoin2 || '.';

这篇关于为什么在加入时Redshift自动修剪varchar列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆