强制解析可选组 [英] Force parsing optional groups
问题描述
我正在尝试制作一个从报表文件中提取数据的正则表达式字符串.棘手的部分是,我需要此单个正则表达式字符串来匹配多种报告文件内容格式.我希望即使找不到一些可选组,正则表达式也总是匹配.
I'm trying to make a regex string that extracts data from report files. The tricky part is that I need this single regex string to match multiple report file content formats. I want the regex to always match even if some optional groups are not found.
获取以下报告文件的内容(注意:#2缺少"val2"部分.):
Take the following report files content (Note: #2 is missing the "val2" part.):
- 文件#1:" -val1-test-val2-result-val3-done-"
- 预期结果:
- Val1组:测试
- Val2组:结果
- Val3组:完成
- File #1: "-val1-test-val2-result-val3-done-"
- Expected Result:
- Val1 Group: test
- Val2 Group: result
- Val3 Group: done
- 预期结果:
- Val1组:测试
- Val2组:(空)
- Val3组:完成
- Expected Result:
- Val1 Group: test
- Val2 Group: (empty)
- Val3 Group: done
我尝试了以下正则表达式字符串:
I tried the following regex strings :
Regex #1(Normal): "-val1-(?<val1>.+?)-val2-(?<val2>.+?)-val3-(?<val3>.+?)-"
问题:文件#1工作正常,但在文件#2上,正则表达式不匹配,因此我没有任何组值.
Problem: File #1 works fine but on file #2, the regex is not matching so I don't have any group values.
Regex #2(Non greedy)): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?))?-val3-(?<val3>.+?)-" Regex #3(Boolean OR): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?)|(.*?))-val3-(?<val3>.+?)-" Regex #4(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-" Regex #5(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?)))-val3-(?<val3>.+?)-" Regex #6(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"
问题:文件#2可以按预期工作,但文件#1的val2组始终为空.
Problem: File #2 works as expected but the val2 group of file #1 is always empty.
结论:行为似乎是,即使存在可选组,正则表达式也会将空组值优先于当前值.有没有办法强制获得可选组的值,而在可选组不存在时才返回(空)?
Conclusion: The behavior seems to be that even if an optional group is present, the regex will prioritize an empty group value over the present value. Is there a way to force getting the optional groups' value when they are present and only return (empty) when they're not?
注意:我正在使用最新的.NET框架,该代码将移植到Java(Android).我试图避免对性能和带宽问题使用多个操作.
Note: I'm using the latest .NET framework and the code will ported to Java(Android). I'm trying to avoid using multiple operations for performance and bandwidth concerns.
有人可以帮我吗?
推荐答案
如果我们做一些假设,就有可能:
It is possible if we make some assumptions:
- 值可能会丢失,但它们始终处于相同的顺序
- 第一个值始终存在
- 我们要寻找的零件前后有一个定界符
-val1-([^-]+)(?:-val2-([^-]+)|)(?:-val3-([^-]+)|)-
https://regex101.com/r/yY6vF9/1
这篇关于强制解析可选组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!