从逗号分隔的字符串中删除重复项(Amazon Redshift) [英] remove duplicates from comma separated string (Amazon Redshift)

查看:125
本文介绍了从逗号分隔的字符串中删除重复项(Amazon Redshift)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Amazon Redshift.

I am using Amazon Redshift.

我在该列中有一列以逗号分隔的形式存储,就像Private, Private, Private, Private, Private, Private, United Healthcare一样.我想使用query从其中删除重复项,因此结果应为Private, United Healthcare.我显然从Stackoverflow找到了一些解决方案,并且知道使用正则表达式是可行的.

I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare. I want to remove the duplicates from it using query, so the result should be Private, United Healthcare. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.

因此,我尝试使用:

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances; 

还有

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances; 

还有其他一些正则表达式,但似乎不起作用.有解决办法吗?

And also some other regular expressions but seems not working. Any solution?

推荐答案

这是Amazon Redshift的用户定义功能(UDF):

Here is a User-Defined Function (UDF) for Amazon Redshift:

CREATE FUNCTION f_uniquify (s text)
  RETURNS text
IMMUTABLE
AS $$
  -- Split string by comma-space, remove duplicates, convert back to comma-separated
  return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;

使用以下方法进行测试:

Testing it with:

select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');

返回:

United Healthcare, Private

如果返回值的顺序很重要,则需要一些更具体的代码.

If the order of return values is important, then it would need some more specific code.

这篇关于从逗号分隔的字符串中删除重复项(Amazon Redshift)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆