T-SQL 中的 SQL Server 正则表达式解决方法? [英] SQL Server Regular Expression Workaround in T-SQL?

查看:24
本文介绍了T-SQL 中的 SQL Server 正则表达式解决方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些 SQLCLR 代码用于处理正则表达式.但是现在它正在迁移到不允许 SQLCLR 的 Azure 中,这就结束了.我需要找到一种在纯 T-SQL 中执行正则表达式的方法.

I have some SQLCLR code for working with Regular Expresions. But now that it is getting migrated into Azure, which does not allow SQLCLR, that's out. I need to find a way to do regex in pure T-SQL.

Master Data Services 不可用,因为我们拥有的 MSSQL 开发版不是 R2.

Master Data Services are not available because the dev edition of MSSQL we have is not R2.

感谢所有想法,谢谢.

正则表达式匹配需要处理的样本(过去几年从regexlib和其他地方剔除)

Regular expression match samples that need handling (culled from regexlib and other places over the past few years)

电子邮件地址

^[\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$

美元

^(\$)?(([1-9]\d{0,2}(\,\d{3})*)|([1-9]\d*)|(0))(\.\d{2})?$

uri

^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$

一位数字

^\d$

百分比

^-?[0-9]{0,2}(\.[0-9]{1,2})?$|^-?(100)(\.[0]{1,2})?$

高度符号

^\d?\d'(\d|1[01])"$

1 1000 之间的数字

numbers between 1 1000

^([1-9]|[1-9]\d|1000)$

信用卡号

^((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}$

年份列表

^([1-9]{1}[0-9]{3}[,]?)*([1-9]{1}[0-9]{3})$

星期几

^(Sun|Mon|(T(ues|hurs))|Fri)(day|\.)?$|Wed(\.|nesday)?$|Sat(\.|urday)?$|T((ue?)|(hu?r?))\.?$

12 小时制时间

(?<Time>^(?:0?[1-9]:[0-5]|1(?=[012])\d:[0-5])\d(?:[ap]m)?)

24 小时制时间

^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

美国电话号码

^\(?[\d]{3}\)?[\s-]?[\d]{3}[\s-]?[\d]{4}$

推荐答案

很遗憾,您将无法将 CLR 函数移动到 SQL Azure.您将需要使用普通字符串函数(PATINDEX、CHARINDEX、LIKE 等)或在数据库之外执行这些操作.

Unfortunately, you will not be able to move your CLR function(s) to SQL Azure. You will need to either use the normal string functions (PATINDEX, CHARINDEX, LIKE, and so on) or perform these operations outside of the database.

EDIT 为添加到问题的示例添加一些信息.

EDIT Adding some information for the examples added to the question.

电子邮件地址

这个总是有争议的,因为人们对他们想要支持的 RFC 版本存在分歧.例如,原版不支持撇号(或者至少人们坚持认为它不支持 - 我没有从档案中挖掘出来并自己阅读,诚然),并且必须经常扩展以用于新的TLD(一次用于 .info 等 4 个字母的 TLD,然后再次用于 .museum 等 6 个字母的 TLD).我经常听到知识渊博的人说完美的电子邮件验证是不可能的,而且我以前在电子邮件服务提供商工作过,我可以告诉你,这是一个不断变化的目标.但对于最简单的方法,请参阅问题TSQL 电子邮件验证(无正则表达式).

This one is always controversial because people disagree about which version of the RFC they want to support. The original didn't support apostrophes, for example (or at least people insist that it didn't - I haven't dug it up from the archives and read it myself, admittedly), and it has to be expanded quite often for new TLDs (once for 4-letter TLDs like .info, then again for 6-letter TLDs like .museum). I've often heard quite knowledgeable people state that perfect e-mail validation is impossible, and having previously worked for an e-mail service provider, I can tell you that it was a constantly moving target. But for the simplest approaches, see the question TSQL Email Validation (without regex).

一位数字

可能是最简单的一个:

WHERE @s LIKE '[0-9]';

信用卡号

假设您去掉了破折号和空格,无论如何您都应该这样做.请注意,这不是对信用卡号码算法的实际检查以确保号码本身实际上有效,只是它符合一般格式(AmEx = 15 位以 3 开头,其余为 16 位 - Visa以 4 开头,MasterCard 以 5 开头,Discover 以 6 开头,我认为有一个以 7 开头(尽管这可能只是某种礼品卡):

Assuming you strip out dashes and spaces, which you should do in any case. Note that this isn't an actual check of the credit card number algorithm to ensure that the number itself is actually valid, just that it conforms to the general format (AmEx = 15 digits starting with a 3, the rest are 16 digits - Visa starts with a 4, MasterCard starts with a 5, Discover starts with 6 and I think there's one that starts with a 7 (though that may just be gift cards of some kind)):

WHERE @s + ' ' LIKE '[3-7]'+ REPLICATE('[0-9]', 14) + '[0-9 ]';

如果你想以冗长的代价更精确一点,你可以说:

If you want to be a little more precise at the cost of being long-winded, you can say:

WHERE (LEN(@s) = 15 AND @s LIKE '3'     + REPLICATE('[0-9]', 14))
   OR (LEN(@s) = 16 AND @s LIKE '[4-7]' + REPLICATE('[0-9]', 15));

美国电话号码

同样,假设您要先去掉括号、破折号和空格.很确定美国区号不能以 1 开头;如果有其他规则,我不知道.

Again, assuming you're going to strip out parentheses, dashes and spaces first. Pretty sure a US area code can't start with a 1; if there are other rules, I am not aware of them.

WHERE @s LIKE '[2-9]' + REPLICATE('[0-9]', 9);

-----

我不打算更进一步,因为您定义的许多其他表达式都可以从上面推断出来.希望这能给你一个开始.您应该能够在 Google 上搜索其他一些人,以了解其他人如何使用 T-SQL 复制这些模式.其中一些(如一周中的几天)可能只是根据表格进行检查 - 对一组 7 个可能的值进行 invasie 模式匹配似乎有点过分.与包含 1000 个数字或年份的列表类似,这些事情会更容易(并且可能更有效)来检查数值是否在表中,而不是将其转换为字符串并查看它是否与某种模式匹配.

I'm not going to go further, because a lot of the other expressions you've defined can be extrapolated from the above. Hopefully this gives you a start. You should be able to Google for some of the others to see how other people have replicated the patterns with T-SQL. Some of them (like days of the week) can probably just be checked against a table - seems overkill to do an invasie pattern matching for a set of 7 possible values. Similarly with a list of 1000 numbers or years, these are things that will be much easier (and probably more efficient) to check if the numeric value is in a table rather than convert it to a string and see if it matches some pattern.

我会再次声明,如果您可以在数据进入数据库之前清理和验证数据,那么其中的很多内容都会好得多.您应该尽可能地做到这一点,因为没有 CLR,您就无法在 SQL Server 中执行强大的 RegEx.

I'll state again that a lot of this will be much better if you can cleanse and validate the data before it gets into the database in the first place. You should strive to do this wherever possible, because without CLR, you just can't do powerful RegEx inside SQL Server.

这篇关于T-SQL 中的 SQL Server 正则表达式解决方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆