从逗号分隔的字符串中删除重复项(Amazon Redshift) [英] remove duplicates from comma separated string (Amazon Redshift)
问题描述
我正在使用Amazon Redshift.
I am using Amazon Redshift.
我在该列中有一列以逗号分隔的形式存储,就像Private, Private, Private, Private, Private, Private, United Healthcare
一样.我想使用query
从其中删除重复项,因此结果应为Private, United Healthcare
.我显然从Stackoverflow找到了一些解决方案,并且知道使用正则表达式是可行的.
I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare
. I want to remove the duplicates from it using query
, so the result should be Private, United Healthcare
. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.
因此,我尝试使用:
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances;
还有
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances;
还有其他一些正则表达式,但似乎不起作用.有解决办法吗?
And also some other regular expressions but seems not working. Any solution?
推荐答案
这是Amazon Redshift的用户定义功能(UDF):
Here is a User-Defined Function (UDF) for Amazon Redshift:
CREATE FUNCTION f_uniquify (s text)
RETURNS text
IMMUTABLE
AS $$
-- Split string by comma-space, remove duplicates, convert back to comma-separated
return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;
使用以下方法进行测试:
Testing it with:
select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');
返回:
United Healthcare, Private
如果返回值的顺序很重要,则需要一些更具体的代码.
If the order of return values is important, then it would need some more specific code.
这篇关于从逗号分隔的字符串中删除重复项(Amazon Redshift)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!