根据值列表查询子字符串 [英] Querying substrings against a list of values

查看:74
本文介绍了根据值列表查询子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很不幸,我正在从一个数据集读取数据,但我无权进行修改.它具有连接的值字符串,我想选择记录,这些记录的任何子字符串(由给定字符分隔)都与特定列表中的任何值匹配.我将通过Python传递查询,因此不会将其与静态列表进行比较.

I'm reading from a dataset that I unfortunately don't have the access to modify. It has concatenated strings of values, and I want to select records for which any of those substrings (as split by a given character) matches any of the values in a specific list. I'll be passing the queries in via Python, so it won't be compared against a static list.

例如,表格如下:

CrappyColumn
-----------
1;2
4
1
2;1
1;3
2

我可能想返回其中包含2或4的任何内容.所以,我的结果应该是:

And I might want to return anything that has 2 or 4 in it. So, my result should be:

1;2
4
2
2;1

我玩过regexp_substr并得到了一些实际有用的东西;但是,当我在完整的数据集上运行它时,它将无限期地运行(最多要放弃10分钟)(该数据集仅包含大约三千条记录,其值通常长达几百个字符).我需要在合理的时间内可以重复执行的内容.

I have played with regexp_substr and gotten something that actually works; however, it just runs indefinitely (as much as 10 minutes before I give up) when I run it on the full dataset (which only includes about three thousand records with values that are often a couple hundred characters long). I need something that works in a reasonable amount of time for repeated execution.

我意识到,即使有了变量比较列表,我也可以编写Python代码来解析列表并构造多个LIKE语句,但这似乎效率不高,我认为有更好的方法

I realize that--even with a variable comparison list--I could just write my Python code to parse the list and construct multiple LIKE statements, but that seems inefficient, and I assume that there is a better way.

这是我做的耗时太长的事情:

And here's what I've done that takes too long:

SELECT DISTINCT CrappyColumn
   FROM 
     (SELECT DISTINCT CrappyColumn, regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) as UGH
          FROM CrappyTable
CONNECT BY regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) IS NOT NULL)
WHERE UGH IN ('2', '4')

是否有更好,更快,更清洁的方式来实现这一目标?

Is there a better, faster, cleaner way to accomplish this?

编辑-解决方法:

感谢vkp的帮助,这是我实现的:

Thanks to vkp's help, here is what I implemented:

regexp_like(SITE_ID, '^(2|4)(:)|(:)(2|4)(:)|(:)(2|4)$|^(2|4)$')

我将其修改为最终产品,以便它可以处理多个字符(通过将[2|4]更改为(2|4)).在搜索非一位数字的情况下,此方法有效.

I modified it for my final product, so that it can handle strings of more than one character--by changing [2|4] to (2|4). This works in cases of searching for numbers that aren't single-digit.

推荐答案

一种简单的方法是使用regexp_like检查列表中是否包含24.

A simpler method would be to use regexp_like to check if the list has 2 or 4 in it.

select *
from tablename
where regexp_like(crappycolumn,'^[2|4][^0-9]|[^0-9][2|4][^0-9]|[^0-9][2|4]$|^[2|4]$')

  • ^[2|4][^0-9]-以2或4开头,后跟数字.
  • [^0-9][2|4][^0-9]-2或4后面没有数字.
  • [^0-9][2|4]$-以2或4结尾,不带数字.
  • ^[2|4]$-2或4是字符串中的唯一字符.
    • ^[2|4][^0-9] - Starts with 2 or 4 not followed by a digit.
    • [^0-9][2|4][^0-9] - 2 or 4 not succeeded or preceded by a digit.
    • [^0-9][2|4]$ - Ends with 2 or 4 not preceded by a digit.
    • ^[2|4]$ - 2 or 4 is the only character in the string.
    • 这篇关于根据值列表查询子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆