根据值列表查询子字符串 [英] Querying substrings against a list of values

查看：74 发布时间：2020/5/22 3:25:54 sql oracle parsing

本文介绍了根据值列表查询子字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

很不幸，我正在从一个数据集读取数据，但我无权进行修改.它具有连接的值字符串，我想选择记录，这些记录的任何子字符串(由给定字符分隔)都与特定列表中的任何值匹配.我将通过Python传递查询，因此不会将其与静态列表进行比较.

I'm reading from a dataset that I unfortunately don't have the access to modify. It has concatenated strings of values, and I want to select records for which any of those substrings (as split by a given character) matches any of the values in a specific list. I'll be passing the queries in via Python, so it won't be compared against a static list.

例如，表格如下:

CrappyColumn
-----------
1;2
4
1
2;1
1;3
2

我可能想返回其中包含2或4的任何内容.所以，我的结果应该是:

And I might want to return anything that has 2 or 4 in it. So, my result should be:

我玩过regexp_substr并得到了一些实际有用的东西；但是，当我在完整的数据集上运行它时，它将无限期地运行(最多要放弃10分钟)(该数据集仅包含大约三千条记录，其值通常长达几百个字符).我需要在合理的时间内可以重复执行的内容.

I have played with regexp_substr and gotten something that actually works; however, it just runs indefinitely (as much as 10 minutes before I give up) when I run it on the full dataset (which only includes about three thousand records with values that are often a couple hundred characters long). I need something that works in a reasonable amount of time for repeated execution.

我意识到，即使有了变量比较列表，我也可以编写Python代码来解析列表并构造多个LIKE语句，但这似乎效率不高，我认为有更好的方法

I realize that--even with a variable comparison list--I could just write my Python code to parse the list and construct multiple LIKE statements, but that seems inefficient, and I assume that there is a better way.

这是我做的耗时太长的事情:

And here's what I've done that takes too long:

SELECT DISTINCT CrappyColumn
   FROM 
     (SELECT DISTINCT CrappyColumn, regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) as UGH
          FROM CrappyTable
CONNECT BY regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) IS NOT NULL)
WHERE UGH IN ('2', '4')

是否有更好，更快，更清洁的方式来实现这一目标?

Is there a better, faster, cleaner way to accomplish this?

编辑-解决方法:

感谢vkp的帮助，这是我实现的:

Thanks to vkp's help, here is what I implemented:

regexp_like(SITE_ID, '^(2|4)(:)|(:)(2|4)(:)|(:)(2|4)$|^(2|4)$')

我将其修改为最终产品，以便它可以处理多个字符(通过将[2|4]更改为(2|4)).在搜索非一位数字的情况下，此方法有效.

I modified it for my final product, so that it can handle strings of more than one character--by changing [2|4] to (2|4). This works in cases of searching for numbers that aren't single-digit.

推荐答案

一种简单的方法是使用regexp_like检查列表中是否包含2或4.

A simpler method would be to use regexp_like to check if the list has 2 or 4 in it.

select *
from tablename
where regexp_like(crappycolumn,'^[2|4][^0-9]|[^0-9][2|4][^0-9]|[^0-9][2|4]$|^[2|4]$')

^[2|4][^0-9]-以2或4开头，后跟数字.
[^0-9][2|4][^0-9]-2或4后面没有数字.
[^0-9][2|4]$-以2或4结尾，不带数字.
^[2|4]$-2或4是字符串中的唯一字符.

^[2|4][^0-9] - Starts with 2 or 4 not followed by a digit.
[^0-9][2|4][^0-9] - 2 or 4 not succeeded or preceded by a digit.
[^0-9][2|4]$ - Ends with 2 or 4 not preceded by a digit.
^[2|4]$ - 2 or 4 is the only character in the string.

这篇关于根据值列表查询子字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据值列表查询子字符串 [英] Querying substrings against a list of values

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

根据值列表查询子字符串 [英] Querying substrings against a list of values

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭