PowerShell-查找并替换多个模式以使文件匿名 [英] PowerShell - Find and replace multiple patterns to anonymize file

查看:61
本文介绍了PowerShell-查找并替换多个模式以使文件匿名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要你的帮助.我有一个log.txt文件,其中包含各种数据,我必须对其进行匿名处理.我想检索所有这些字符串"匹配预定义的模式,并用每个模式的另一个值替换这些模式.重要的是,来自相同模式(且具有与先前模式不同的值)的每个新字符串应由增加了+1的预定义值替换(例如,"orderID = 123ABC"变为"orderID = order1"和"orderID = 456ABC"变为"orderID = order2").
要搜索的模式超过20个,因此不可能将它们全部放在一行中.我的想法是:

I need you help. I have a log.txt file with various data in it which I have to anonymize. I would like to retrieve all these "strings" matching a predefined patterns, and replace these by another values for each of them. What is important is that each new string from the same pattern (and with different value from the previous) should be replaced by the predefined value increased by +1 (e.g. "orderID = 123ABC" becomes "orderID = order1" and "orderID=456ABC" becomes "orderID=order2").
The patterns to search for are more than 20 so it is not possible to put them all in single line. My idea is:

  1. 定义"patterns.txt"文件
  2. 定义"replace.txt"文件(样式"值和替换值)
  3. 搜索所有模式";在日志文件中,结果将为ARRAY
  4. 在该阵列中查找唯一条目
  5. 获取替换"数组中每个唯一条目的值
  6. 替换所有出现在log.txt中的内容.这里最棘手的部分是,相同类型(但与前一个值不同)的任何出现都需要增加(+1)才能与之前的类型不同.

我所拥有的例子:

requestID> qwerty1-qwerty2-qwerty3</requestID
requestID> 12345a-12345b-12345c</requestID
requestID> qwerty1-qwerty2-qwerty3</requestID
requestID> qwerty1-qwerty2-qwerty3</requestID
orderID> 012345ABCDE</orderID
orderID> 012345ABCDE</orderID
orderID> ABCDE012345</orderID
orderID> ABCDE012345</orderID
keyId> XYZ123</keyId
keyId> ABC987</keyId
keyId> XYZ123</keyId

requestID>qwerty1-qwerty2-qwerty3</requestID
requestID>12345a-12345b-12345c</requestID
requestID>qwerty1-qwerty2-qwerty3</requestID
requestID>qwerty1-qwerty2-qwerty3</requestID
orderID>012345ABCDE</orderID
orderID>012345ABCDE</orderID
orderID>ABCDE012345</orderID
orderID>ABCDE012345</orderID
keyId>XYZ123</keyId
keyId>ABC987</keyId
keyId>XYZ123</keyId

所需结果:

requestID> Request-1</requestID
requestID> Request-2</requestID
requestID> Request-1</requestID
requestID> Request-1</requestID
orderID> Order-1</orderID
orderID> Order-1</orderID
orderID> Order-2</orderID
orderID> Order-2</orderID
keyId> Key-1</keyId
keyId>密钥2</keyId
keyId> Key-1</keyId

requestID>Request-1</requestID
requestID>Request-2</requestID
requestID>Request-1</requestID
requestID>Request-1</requestID
orderID>Order-1</orderID
orderID>Order-1</orderID
orderID>Order-2</orderID
orderID>Order-2</orderID
keyId>Key-1</keyId
keyId>Key-2</keyId
keyId>Key-1</keyId

目前,我仅能找到每种类型的唯一值:

For the moment I managed only to find the unique values per type:

$N = "C:\FindAndReplace\input.txt"
$Patterns = "C:\FindAndReplace\pattern.txt"
(Select-String $N -Pattern 'requestID>\w{6}-\w{6}-\w{6}</requestID>').Matches.Value | Sort-Object -Descending -Unique
(Select-String $N -Pattern '<orderID>\w{20}</orderID>').Matches.Value | Sort-Object -Descending -Unique
(Select-String $N -Pattern '<keyId>\w{8}</keyId>').Matches.Value | Sort-Object -Descending -Unique

预先感谢您对进度的任何建议.

Thanks in advance for any suggestion on how to progress.

推荐答案

您的模式与您的样本数据不匹配.我已经纠正了模式以适应实际的样本数据.

Your patterns don't match your sample data. I've corrected the patterns to accommodate the actual sample data.

似乎每种类型都有一个简单的哈希表可以满足跟踪匹配和计数的需求.如果我们使用 -Regex -File 参数通过 switch 语句处理日志文件,则我们可以一次处理每一行.每个逻辑都是

It seems a simple hash table per type would fulfill the need to keep track of matches and counts. If we process the log file with a switch statement using the -Regex and -File parameters we can work on each line at a time. The logic for each is

  • 检查当前匹配项是否存在于特定类型的匹配项数组中.
    • 如果没有,请添加它的替换值(类型计数)和增量计数.
    • 如果确实存在,请使用已经定义的替换值.

    创建示例日志文件

    $log = New-TemporaryFile
    
    @'
    <requestID>qwerty1-qwerty2-qwerty3</requestID> -match 
    <requestID>12345a-12345b-12345c</requestID>
    <requestID>qwerty1-qwerty2-qwerty3</requestID>
    <requestID>qwerty1-qwerty2-qwerty3</requestID>
    <orderID>012345ABCDE</orderID>
    <orderID>012345ABCDE</orderID>
    <orderID>ABCDE012345</orderID>
    <orderID>ABCDE012345</orderID>
    <keyId>XYZ123</keyId>
    <keyId>ABC987</keyId>
    <keyId>XYZ123</keyId>
    '@ | Set-Content $log -Encoding UTF8
    

    定义跟踪器"包含计数和匹配数组的每种类型的变量

    $Request = @{
        Count   = 1
        Matches = @()
    }
    $Order = @{
        Count   = 1
        Matches = @()
    }
    $Key = @{
        Count   = 1
        Matches = @()
    }
    

    逐行读取和处理日志文件

    $output = switch -Regex -File $log {
        '<requestID>(\w{6,7}-\w{6,7}-\w{6,7})</requestID>' {
            if(!$Request.matches.($matches.1))
            {
                $Request.matches += @{$matches.1 = "Request-$($Request.count)"}
                $Request.count++
            }
            $_ -replace $matches.1,$Request.matches.($matches.1)
        }
        '<orderID>(\w{11})</orderID>' {
            if(!$Order.matches.($matches.1))
            {
                $Order.matches += @{$matches.1 = "Order-$($Order.count)"}
                $Order.count++
            }
            $_ -replace $matches.1,$Order.matches.($matches.1)
        }
        '<keyId>(\w{6})</keyId>' {
            if(!$Key.matches.($matches.1))
            {
                $Key.matches += @{$matches.1 = "Key-$($Key.count)"}
                $Key.count++
            }
            $_ -replace $matches.1,$Key.matches.($matches.1)
        }
        default {$_}
    }
    
    $output | Set-Content $log -Encoding UTF8
    

    $ log文件现在包含

    The $log file now contains

    <requestID>Request-1</requestID>
    <requestID>Request-2</requestID>
    <requestID>Request-1</requestID>
    <requestID>Request-1</requestID>
    <orderID>Order-1</orderID>
    <orderID>Order-1</orderID>
    <orderID>Order-2</orderID>
    <orderID>Order-2</orderID>
    <keyId>Key-1</keyId>
    <keyId>Key-2</keyId>
    <keyId>Key-1</keyId>
    

    这篇关于PowerShell-查找并替换多个模式以使文件匿名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆