SQL SERVER使用Regex模式生成数据 [英] SQL SERVER generate data using Regex pattern

查看:87
本文介绍了SQL SERVER使用Regex模式生成数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过SQL Server中的给定正则表达式模式生成数据.有没有可能做?说,我的模式如下,我想生成如下数据:

该概念背后的想法是 SQL静态数据屏蔽 (已在

输出应为

  +---------------+--------------+
  |  RandonNumber | RandomString |
  +---------------+--------------+
  |  7894562      | AHJIL        |
  +---------------+--------------+
  |  9632587      | ZLOKP        |
  +---------------+--------------+
  |  4561238      | UJIOK        |
  +---------------+--------------+

除了这种常规模式外,我还有一些定制的模式,例如Test_Product_(\d){1,4},其结果应如下所示:

Test_Product_012 
Test_Product_143
Test_Product_8936

我将用于遮罩的完整图案

Other Patterns                Samples
(\l){30}                      ahukoklijfahukokponmahukoahuko
(\d){7}                       7895623
(\W){5}                       ABCDEF
Test_Product_(\d){1,4}        Test_Product_007
0\.(\d){2}                    0.59
https://www\.(\l){10}\.com    https://www.anything.com

好吧,我可以为您提供一个不基于正则表达式而是基于一组参数的解决方案-但是它包含您所有要求的完整集合.
我已将此解决方案基于我编写的用于生成随机字符串的用户定义函数(
SQL STATIC DATA MASKING (which was removed in current feature). Our client wants to mask the production data in test database. We don't have SQL STATIC DATA MASKING feature with sql now but we have patterns to mask the column, so what I am thinking is, with these pattern we can run the update query.

SELECT "(\d){7}" AS RandonNumber, "(\W){5}" AS RandomString FROM tbl

Output Should be

  +---------------+--------------+
  |  RandonNumber | RandomString |
  +---------------+--------------+
  |  7894562      | AHJIL        |
  +---------------+--------------+
  |  9632587      | ZLOKP        |
  +---------------+--------------+
  |  4561238      | UJIOK        |
  +---------------+--------------+

Apart from this regular pattern, I have some customized pattern like Test_Product_(\d){1,4}, which should give result as below:

Test_Product_012 
Test_Product_143
Test_Product_8936

Complete Patterns which I am going to use for masking

Other Patterns                Samples
(\l){30}                      ahukoklijfahukokponmahukoahuko
(\d){7}                       7895623
(\W){5}                       ABCDEF
Test_Product_(\d){1,4}        Test_Product_007
0\.(\d){2}                    0.59
https://www\.(\l){10}\.com    https://www.anything.com

解决方案

Well, I can give you a solution that is not based on regular expressions, but on a set of parameters - but it contains a complete set of all your requirements.
I've based this solution on a user-defined function I've written to generate random strings (You can read my blog post about it here) - I've just changed it so that it could generate the mask you wanted based on the following conditions:

  • The mask has an optional prefix.
  • The mask has an optional suffix.
  • The mask has a variable-length random string.
  • The random string can contain either lower-case letters, upper-case letters, digits, or any combination of the above.

I've decided these set of rules based on your update to the question, containing your desired masks:

(\d){7}                       7895623
(\W){5}                       ABCDEF
Test_Product_(\d){1,4}        Test_Product_007
0\.(\d){2}                    0.59
https://www\.(\l){10}\.com    https://www.anything.com

And now, for the code:
Since I'm using a user-defined function, I can't use inside it the NewId() built in function - so we first need to create a view to generate the guid for us:

CREATE VIEW GuidGenerator
AS
    SELECT Newid() As NewGuid;

In the function, we're going to use that view to generate a NewID() as the base of all randomness.

The function itself is a lot more cumbersome then the random string generator I've started from:

CREATE FUNCTION dbo.MaskGenerator
(
    -- use null or an empty string for no prefix
    @Prefix nvarchar(4000), 
    -- use null or an empty string for no suffix
    @suffix nvarchar(4000), 
    -- the minimum length of the random part
    @MinLength int, 
    -- the maximum length of the random part
    @MaxLength int, 
    -- the maximum number of rows to return. Note: up to 1,000,000 rows
    @Count int, 
    -- 1, 2 and 4 stands for lower-case, upper-case and digits. 
    -- a bitwise combination of these values can be used to generate all possible combinations:
    -- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
    @CharType tinyint 
)
RETURNS TABLE
AS 
RETURN 
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)),   -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000

SELECT TOP(@Count) 
        n As Number, 
        CONCAT(@Prefix, (
        SELECT  TOP (Length) 
                -- choose what char combination to use for the random part
                CASE @CharType 
                    WHEN 1 THEN Lower
                    WHEN 2 THEN Upper
                    WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
                    WHEN 4 THEN Digit
                    WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
                    WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
                    WHEN 7 THEN 
                        CASE Rnd % 3
                            WHEN 0 THEN Lower
                            WHEN 1 THEN Upper
                            ELSE Digit
                        END
                END
        FROM Tally As t0  
        -- create a random number from the guid using the GuidGenerator view
        CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
        CROSS APPLY
        (
            -- generate a random lower-case char, upper-case char and digit
            SELECT  CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
                    CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
                    CHAR(48 + Rnd % 10) As Digit -- Random digit
        ) As Chars
        WHERE  t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
        FOR XML PATH('') 
        ), @Suffix) As RandomString
FROM Tally As t1
CROSS APPLY
(
    -- Select a random length between @MinLength and @MaxLength (inclusive)
    SELECT TOP 1 n As Length
    FROM Tally As t2
    CROSS JOIN GuidGenerator 
    WHERE t2.n >= @MinLength
    AND t2.n <= @MaxLength
    AND t2.n <> t1.n
    ORDER BY NewGuid
) As Lengths;

And finally, Test cases:

(\l){30} - ahukoklijfahukokponmahukoahuko

SELECT RandomString FROM dbo.MaskGenerator(null, null, 30, 30, 2, 1); 

Results:

1, eyrutkzdugogyhxutcmcmplvzofser
2, juuyvtzsvmmcdkngnzipvsepviepsp

(\d){7} - 7895623

SELECT RandomString FROM dbo.MaskGenerator(null, null, 7, 7, 2, 4); 

Results:

1, 8744412
2, 2275313

(\W){5} - ABCDE

SELECT RandomString FROM dbo.MaskGenerator(null, null, 5, 5, 2, 2); 

Results:

1, RSYJE
2, MMFAA

Test_Product_(\d){1,4} - Test_Product_007

SELECT RandomString FROM dbo.MaskGenerator('Test_Product_', null, 1, 4, 2, 4); 

Results:

1, Test_Product_933
2, Test_Product_7

0\.(\d){2} - 0.59

SELECT RandomString FROM dbo.MaskGenerator('0.', null, 2, 2, 2, 4); 

Results:

1, 0.68
2, 0.70

https://www\.(\l){10}\.com - https://www.anything.com

SELECT RandomString FROM dbo.MaskGenerator('https://www.', '.com', 10, 10, 2, 1); 

Results:

1, https://www.xayvkmkuci.com
2, https://www.asbfcvomax.com       

Here's how you use it to mask the content of a table:

DECLARE @Count int = 10; 

SELECT  CAST(IntVal.RandomString As Int) As IntColumn, 
        UpVal.RandomString as UpperCaseValue, 
        LowVal.RandomString as LowerCaseValue, 
        MixVal.RandomString as MixedValue,
        WithPrefix.RandomString As PrefixedValue
FROM dbo.MaskGenerator(null, null, 3, 7, @Count, 4) As IntVal
JOIN dbo.MaskGenerator(null, null, 10, 10, @Count, 1) As LowVal
    ON IntVal.Number = LowVal.Number
JOIN dbo.MaskGenerator(null, null, 5, 10, @Count, 2) As UpVal
    ON IntVal.Number = UpVal.Number
JOIN dbo.MaskGenerator(null, null, 10, 20, @Count, 7) As MixVal
    ON IntVal.Number = MixVal.Number
JOIN dbo.MaskGenerator('Test ', null, 1, 4, @Count, 4) As WithPrefix
    ON IntVal.Number = WithPrefix.Number

Results:

IntColumn   UpperCaseValue  LowerCaseValue  MixedValue              PrefixedValue
674         CCNVSDI         esjyyesesv      O2FAC7bfwg2Be5a91Q0     Test 4935
30732       UJKSL           jktisddbnq      7o8B91Sg1qrIZSvG3AcL    Test 0
4669472     HDLJNBWPJ       qgtfkjdyku      xUoLAZ4pAnpn            Test 8
26347       DNAKERR         vlehbnampb      NBv08yJdKb75ybhaFqED    Test 91
6084965     LJPMZMEU        ccigzyfwnf      MPxQ2t8jjmv0IT45yVcR    Test 4
6619851     FEHKGHTUW       wswuefehsp      40n7Ttg7H5YtVPF         Test 848
781         LRWKVDUV        bywoxqizju      UxIp2O4Jb82Ts           Test 6268
52237       XXNPBL          beqxrgstdo      Uf9j7tCB4W2             Test 43
876150      ZDRABW          fvvinypvqa      uo8zfRx07s6d0EP         Test 7

Note that this is a fast process - generating 1000 rows with 5 columns took less than half a second on average in tests I've made.

这篇关于SQL SERVER使用Regex模式生成数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆