解析行并就地更改一些文本 [英] Parsing line and changing some text in place

查看:15
本文介绍了解析行并就地更改一些文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何为 ExtData 标签解析日志文件(不是完整的 xml 文件,但它有一部分 xml 数据),它有一些名称-值对,我需要像这样屏蔽它:例如:

<块引用>

Name="Jason" Value="Special"到<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>

仅当 Name 是 Jason 或某些名称集时,我才需要像上面一样屏蔽 ExtData 标记值,而不是每个 Name.

例如:如果DummyName"不在名称集中,那么我不想在下面更改此行.

<块引用>

Name="DummyName" Value="Garbage"

例如:如果DummyName"不在名称集中,那么我不想更改以下行.(请注意该值为杰森")

<块引用>

Name="DummyName" Value="Jason"

例如:如果DummyJasonName"不在名称集中,那么我不想更改以下行.(注意Dummy"和Name"之间的Jason")

<块引用>

Name="DummyJasonName" Value="Garbage"

我需要在 bash/shell 脚本中完成所有这些.

最重要的是,我想通过 sed/awk/match 命令读取文件.检查行中的 ExtData 标记.如果匹配,读取 ExtData 标签和/ExtData 标签之间的文本.在此多行文本中,提取 Name.如果 Name 来自一组名称,则使用相同数量的 'X' 屏蔽其对应的Value"数据.

请告诉我如何完成上述任务.

更新,输入行实际上可以跨越多行.

<块引用>

Name="Jason"值=特殊"</ExtData>

或者像这样:

<块引用>

姓名=杰森"值=特殊"</ExtData>

谢谢!!普尼特

解决方案

要仅替换名称 Jason 和 Jim,请尝试:

sed -E '/Jason|Jim/{:a;/值=/bb;n;巴;:b;s/(Value="X*)[^X"]/1X/;tb;}'文件.xml

此命令已在 GNU sed 上测试.对于 BSD/OSX sed,需要做一些小改动.

示例

让我们考虑这个测试文件:

$ cat file.xml<ExtData>Name="Jason" Value="Special"</ExtData><ExtData>Name="DummyName" Value="Garbage"</ExtData><ExtData>Name="吉姆"值=确定"</ExtData>

现在,让我们运行我们的命令:

$ sed -E '/Jason|Jim/{:a;/值=/bb;n;巴;:b;s/(Value="X*)[^X"]/1X/;tb;}'文件.xml<ExtData>Name="Jason" Value="XXXXXXX"</ExtData><ExtData>Name="DummyName" Value="Garbage"</ExtData><ExtData>Name="吉姆"值="XX"</ExtData>

工作原理

  • -E

    这告诉 set 使用扩展的正则表达式.

  • /Jason|Jim/{...}

    这告诉 sed 只对包含 Jason 或 Jim 的行运行花括号内的命令.大括号内的命令分为两部分:

    1. <代码>:a;/值=/bb;n;巴;

      第一部分读取行,直到我们找到包含 Value= 的行.更详细地说,:a 定义了一个标签 a.如果当前行包含 Value=/Value=/bb 分支到标签 b.如果没有,我们打印出当前行并使用 n 命令读入下一行.然后我们分支(b)回到标签a.

    2. <代码>:b;s/(Value="X*)[^X"]/1X/;tb;

      这会将值替换为我们需要的任意数量的 X.

      更详细地说,:b 定义了一个标签 b.s/(Value="X*)[^X"]/1X/ 替换 Value= 之后我们需要的下一个 X.如果进行了替换(意味着需要另一个 X),则测试命令 (t) 告诉 sed 跳回标签 b然后我们再试一次.

将更改限制在 ExtData 标签内

让我们考虑这个更复杂的测试文件:

$ cat file2.xml<Misc>Name="Jason" Value="DontChange"</Misc><ExtData>Name="Jason" Value="Special"</ExtData><Misc>Name="Jason" Value="DontChange"</Misc><ExtData>Name="DummyName" Value="DontChange"</ExtData><Misc>Name="Jason" Value="DontChange"</Misc><ExtData>Name="吉姆"值=确定"</ExtData><Misc>Name="Jason" Value="DontChange"</Misc>

要更改 ExtData 标签而不是其他标签,请尝试:

$ sed -E '/[<]ExtData[>]/{:a;/Name=/{/Name="(Jason|Jim)"/!b};/值=/bb;n;巴;:b;s/(Value="X*)[^X"]/1X/;tb;}' file2.xml<Misc>Name="Jason" Value="DontChange"</Misc><ExtData>Name="Jason" Value="XXXXXXX"</ExtData><Misc>Name="Jason" Value="DontChange"</Misc><ExtData>Name="DummyName" Value="DontChange"</ExtData><Misc>Name="Jason" Value="DontChange"</Misc><ExtData>Name="吉姆"值="XX"</ExtData><Misc>Name="Jason" Value="DontChange"</Misc>

要使用名称的 shell 变量执行上述操作:

names='Jason|Jim'sed -E '/[<]ExtData[>]/{:a;/Name=/{/Name="'"$names"'"/!b};/值=/bb;n;巴;:b;s/(Value="X*)[^X"]/1X/;tb;}' file2.xml

这会将 shell 变量直接替换到 sed 命令中.只有在您信任 shell 变量的来源时才应该这样做.

How do I parse a log file (not a full xml file, but it has some portion of xml data) for ExtData tags, which has some name-value pair, I need to mask it like this : For eg:

<ExtData>Name="Jason" Value="Special"</ExtData>
to
<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>

I need to mask ExtData tag value like above only when Name is Jason or some set of name, and not for every Name.

For eg: if "DummyName" is not in set of names, than I do not want to change this below line.

<ExtData>Name="DummyName" Value="Garbage"</ExtData>

For eg: if "DummyName" is not in set of names, than I do not want to change this below line. (Please note that the value is "Jason")

<ExtData>Name="DummyName" Value="Jason"</ExtData>

For eg: if "DummyJasonName" is not in set of names, than I do not want to change this below line. (Note "Jason" in between "Dummy" and "Name")

<ExtData>Name="DummyJasonName" Value="Garbage"</ExtData>

I need to do all this in bash/shell script.

Bottom line is, I want to read a file, say, via sed/awk/match command. Check for ExtData tag in the line. If matched, Read the text between ExtData tag and /ExtData tag. In this multiline text, extract Name. If Name is from a set of names, then mask its corresponding "Value" data with equal number of 'X'.

Please let me know how to achieve the above task.

Update, the input line can actually span over multiple lines.

<ExtData>Name="Jason" 
Value="Special"
    </ExtData>

Or like this too:

<ExtData>
     Name="Jason" 
  Value="Special"
    </ExtData>

Thanks !! Puneet

解决方案

To make the substitutions only for names Jason and Jim, try:

sed -E '/Jason|Jim/{:a; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/1X/; tb; }' file.xml

This command was tested on GNU sed. For BSD/OSX sed, some minor changes would be needed.

Example

Let's consider this test file:

$ cat file.xml
<ExtData>Name="Jason" Value="Special"</ExtData>
<ExtData>Name="DummyName" Value="Garbage"</ExtData>
<ExtData>Name="Jim"
    Value="OK"
        </ExtData>

Now, let's run our command:

$ sed -E '/Jason|Jim/{:a; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/1X/; tb; }' file.xml
<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>
<ExtData>Name="DummyName" Value="Garbage"</ExtData>
<ExtData>Name="Jim"
    Value="XX"
        </ExtData>

How it works

  • -E

    This tells set to use extended regular expressions.

  • /Jason|Jim/{...}

    This tells sed to run the commands inside the curly braces only for lines that contain Jason or Jim. The command insides the braces breaks down into two parts:

    1. :a; /Value=/bb; n; ba;

      The first part reads lines until we find one that contains Value=. In more detail, :a defines a label a. /Value=/bb branches to label b if the current line contains Value=. If it doesn't, we print out the current line and read in the next one using the n command. We then branch (b) back to label a.

    2. :b; s/(Value="X*)[^X"]/1X/; tb;

      This replaces the value with as many X as we need.

      In more detail, :b defines a label b. s/(Value="X*)[^X"]/1X/ substitutes in the next X that we need after Value=. If a substitution was made (meaning that another X was needed), then the test command (t) tells sed to jump back to label b and we try again.

Restricting changes to within ExtData tags

Let's consider this more complex test file:

$ cat file2.xml
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jason" Value="Special"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="DummyName" Value="DontChange"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jim"
    Value="OK"
        </ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>

To make the changes in ExtData tags but not the other tags, try:

$ sed -E '/[<]ExtData[>]/{:a; /Name=/{/Name="(Jason|Jim)"/!b}; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/1X/; tb; }' file2.xml
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="DummyName" Value="DontChange"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jim"
    Value="XX"
        </ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>

To do the above using a shell variable for the names:

names='Jason|Jim'
sed -E '/[<]ExtData[>]/{:a; /Name=/{/Name="'"$names"'"/!b}; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/1X/; tb; }' file2.xml

This substitutes the shell variable directly into the sed command. This should only be done this way if you trust the source of the shell variable.

这篇关于解析行并就地更改一些文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆