SQL SERVER使用Regex模式生成数据 [英] SQL SERVER generate data using Regex pattern
问题描述
我想通过SQL Server
中的给定正则表达式模式生成数据.有没有可能做?说,我的模式如下,我想生成如下数据:
该概念背后的想法是 SQL静态数据屏蔽 (已在 输出应为 除了这种常规模式外,我还有一些定制的模式,例如 我将用于遮罩的完整图案
好吧,我可以为您提供一个不基于正则表达式而是基于一组参数的解决方案-但是它包含您所有要求的完整集合. Output Should be Apart from this regular pattern, I have some customized pattern like Complete Patterns which I am going to use for masking
Well, I can give you a solution that is not based on regular expressions, but on a set of parameters - but it contains a complete set of all your requirements. I've decided these set of rules based on your update to the question, containing your desired masks: +---------------+--------------+
| RandonNumber | RandomString |
+---------------+--------------+
| 7894562 | AHJIL |
+---------------+--------------+
| 9632587 | ZLOKP |
+---------------+--------------+
| 4561238 | UJIOK |
+---------------+--------------+
Test_Product_(\d){1,4}
,其结果应如下所示:Test_Product_012
Test_Product_143
Test_Product_8936
Other Patterns Samples
(\l){30} ahukoklijfahukokponmahukoahuko
(\d){7} 7895623
(\W){5} ABCDEF
Test_Product_(\d){1,4} Test_Product_007
0\.(\d){2} 0.59
https://www\.(\l){10}\.com https://www.anything.com
我已将此解决方案基于我编写的用于生成随机字符串的用户定义函数(SQL STATIC DATA MASKING (which was removed in current feature). Our client wants to mask the production data in test database. We don't have SQL STATIC DATA MASKING feature with sql now but we have patterns to mask the column, so what I am thinking is, with these pattern we can run the update query.SELECT "(\d){7}" AS RandonNumber, "(\W){5}" AS RandomString FROM tbl
+---------------+--------------+
| RandonNumber | RandomString |
+---------------+--------------+
| 7894562 | AHJIL |
+---------------+--------------+
| 9632587 | ZLOKP |
+---------------+--------------+
| 4561238 | UJIOK |
+---------------+--------------+
Test_Product_(\d){1,4}
, which should give result as below:Test_Product_012
Test_Product_143
Test_Product_8936
Other Patterns Samples
(\l){30} ahukoklijfahukokponmahukoahuko
(\d){7} 7895623
(\W){5} ABCDEF
Test_Product_(\d){1,4} Test_Product_007
0\.(\d){2} 0.59
https://www\.(\l){10}\.com https://www.anything.com
I've based this solution on a user-defined function I've written to generate random strings (You can read my blog post about it here) - I've just changed it so that it could generate the mask you wanted based on the following conditions:
(\d){7} 7895623
(\W){5} ABCDEF
Test_Product_(\d){1,4} Test_Product_007
0\.(\d){2} 0.59
https://www\.(\l){10}\.com https://www.anything.com
And now, for the code:
Since I'm using a user-defined function, I can't use inside it the NewId()
built in function - so we first need to create a view to generate the guid for us:
CREATE VIEW GuidGenerator
AS
SELECT Newid() As NewGuid;
In the function, we're going to use that view to generate a NewID()
as the base of all randomness.
The function itself is a lot more cumbersome then the random string generator I've started from:
CREATE FUNCTION dbo.MaskGenerator
(
-- use null or an empty string for no prefix
@Prefix nvarchar(4000),
-- use null or an empty string for no suffix
@suffix nvarchar(4000),
-- the minimum length of the random part
@MinLength int,
-- the maximum length of the random part
@MaxLength int,
-- the maximum number of rows to return. Note: up to 1,000,000 rows
@Count int,
-- 1, 2 and 4 stands for lower-case, upper-case and digits.
-- a bitwise combination of these values can be used to generate all possible combinations:
-- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
@CharType tinyint
)
RETURNS TABLE
AS
RETURN
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000
SELECT TOP(@Count)
n As Number,
CONCAT(@Prefix, (
SELECT TOP (Length)
-- choose what char combination to use for the random part
CASE @CharType
WHEN 1 THEN Lower
WHEN 2 THEN Upper
WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
WHEN 4 THEN Digit
WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
WHEN 7 THEN
CASE Rnd % 3
WHEN 0 THEN Lower
WHEN 1 THEN Upper
ELSE Digit
END
END
FROM Tally As t0
-- create a random number from the guid using the GuidGenerator view
CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
CROSS APPLY
(
-- generate a random lower-case char, upper-case char and digit
SELECT CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
CHAR(48 + Rnd % 10) As Digit -- Random digit
) As Chars
WHERE t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
FOR XML PATH('')
), @Suffix) As RandomString
FROM Tally As t1
CROSS APPLY
(
-- Select a random length between @MinLength and @MaxLength (inclusive)
SELECT TOP 1 n As Length
FROM Tally As t2
CROSS JOIN GuidGenerator
WHERE t2.n >= @MinLength
AND t2.n <= @MaxLength
AND t2.n <> t1.n
ORDER BY NewGuid
) As Lengths;
And finally, Test cases:
(\l){30} - ahukoklijfahukokponmahukoahuko
SELECT RandomString FROM dbo.MaskGenerator(null, null, 30, 30, 2, 1);
Results:
1, eyrutkzdugogyhxutcmcmplvzofser
2, juuyvtzsvmmcdkngnzipvsepviepsp
(\d){7} - 7895623
SELECT RandomString FROM dbo.MaskGenerator(null, null, 7, 7, 2, 4);
Results:
1, 8744412
2, 2275313
(\W){5} - ABCDE
SELECT RandomString FROM dbo.MaskGenerator(null, null, 5, 5, 2, 2);
Results:
1, RSYJE
2, MMFAA
Test_Product_(\d){1,4} - Test_Product_007
SELECT RandomString FROM dbo.MaskGenerator('Test_Product_', null, 1, 4, 2, 4);
Results:
1, Test_Product_933
2, Test_Product_7
0\.(\d){2} - 0.59
SELECT RandomString FROM dbo.MaskGenerator('0.', null, 2, 2, 2, 4);
Results:
1, 0.68
2, 0.70
https://www\.(\l){10}\.com - https://www.anything.com
SELECT RandomString FROM dbo.MaskGenerator('https://www.', '.com', 10, 10, 2, 1);
Results:
1, https://www.xayvkmkuci.com
2, https://www.asbfcvomax.com
Here's how you use it to mask the content of a table:
DECLARE @Count int = 10;
SELECT CAST(IntVal.RandomString As Int) As IntColumn,
UpVal.RandomString as UpperCaseValue,
LowVal.RandomString as LowerCaseValue,
MixVal.RandomString as MixedValue,
WithPrefix.RandomString As PrefixedValue
FROM dbo.MaskGenerator(null, null, 3, 7, @Count, 4) As IntVal
JOIN dbo.MaskGenerator(null, null, 10, 10, @Count, 1) As LowVal
ON IntVal.Number = LowVal.Number
JOIN dbo.MaskGenerator(null, null, 5, 10, @Count, 2) As UpVal
ON IntVal.Number = UpVal.Number
JOIN dbo.MaskGenerator(null, null, 10, 20, @Count, 7) As MixVal
ON IntVal.Number = MixVal.Number
JOIN dbo.MaskGenerator('Test ', null, 1, 4, @Count, 4) As WithPrefix
ON IntVal.Number = WithPrefix.Number
Results:
IntColumn UpperCaseValue LowerCaseValue MixedValue PrefixedValue
674 CCNVSDI esjyyesesv O2FAC7bfwg2Be5a91Q0 Test 4935
30732 UJKSL jktisddbnq 7o8B91Sg1qrIZSvG3AcL Test 0
4669472 HDLJNBWPJ qgtfkjdyku xUoLAZ4pAnpn Test 8
26347 DNAKERR vlehbnampb NBv08yJdKb75ybhaFqED Test 91
6084965 LJPMZMEU ccigzyfwnf MPxQ2t8jjmv0IT45yVcR Test 4
6619851 FEHKGHTUW wswuefehsp 40n7Ttg7H5YtVPF Test 848
781 LRWKVDUV bywoxqizju UxIp2O4Jb82Ts Test 6268
52237 XXNPBL beqxrgstdo Uf9j7tCB4W2 Test 43
876150 ZDRABW fvvinypvqa uo8zfRx07s6d0EP Test 7
Note that this is a fast process - generating 1000 rows with 5 columns took less than half a second on average in tests I've made.
这篇关于SQL SERVER使用Regex模式生成数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!