如何有选择地向捕获组添加逗号和空格? [英] How to optionally add a comma and whitespace to a capture group?

查看：83 发布时间：2020/5/21 21:47:38 php regex optional substring capture-group

本文介绍了如何有选择地向捕获组添加逗号和空格?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在每个文本块中匹配五个子字符串(总共有100个块).

I am trying to match five substrings in each block of text (there are 100 blocks total).

我匹配99％的文本块，但是关于第3组和第4组有一些错误.

I am matching 99% of the blocks of text, but with a few errors regarding groups 3 and 4.

这是一个演示链接: https://regex101.com/r/cW2Is3/4

第3组是语言的一部分"，第4组是英语翻译.

Group 3 is "parts of speech", and group 4 is an English translation.

在第一行文本中，det, pro应该全部在第3组中，然后the; him, her, it, them应该在第4组中.

In the first block of text, det, pro should all be in group 3, and then the; him, her, it, them should be in group 4.

在第三段文本中再次出现相同的问题.
第3组应为adj, det, nm, pro，第4组应为a, an, one.

The same issue occurs again in the third block of text.
Group 3 should be adj, det, nm, pro and Group 4 should be a, an, one.

这是我的模式:

([0-9]+)\s+(\w+(?:, \w+)?)\s+(\N+?)\s+(\H.+).*?\r?\n•\s+([\s\S]*?)\s+[0-9]+\s\|.*\s*

推荐答案

当您必须描述包含许多部分的长字符串时，第一个反射就是使用自由空间模式(x修饰符)和命名组(即使命名组在替换上下文中不是很有用，它们有助于使模式更易读且更易于调试):

When you have to describe a long string with many parts, the first reflex is to use the free-space mode (x modifier) and named groups (even if named groups aren't very useful in a replacement context, they help to make the pattern readable and more easy to debug):

~^
(?<No> [0-9]+ )  \h+
(?<word> \pL+ )  \h+
(?<type> [\pL()]+ (?: , \h* [\pL()]+ )* )  \h+
(?<wd_tr> [^•]* [^•\s] )  \h* \R

• \h*
(?<sent_fr> [^–]* [^\s–] )   \s* – \s*
(?<sent_eng> .* (?:\R .*)*? )  \h* \R

(?<num1> [0-9]+ )  \h* \| \h*
(?<num2> .*\S )
~xum

演示

没有神奇的方法可以为格式模糊的字符串构建模式.您所能做的就是在一开始就采取最严格的措施，并在遇到不匹配的案件时增加灵活性.

There are no magic recipe to build a pattern for a string with a blurred format. All you can do is to be the most constrictive at the beginning and to add flexibility when you encounter cases that don't match.

这篇关于如何有选择地向捕获组添加逗号和空格?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何有选择地向捕获组添加逗号和空格? [英] How to optionally add a comma and whitespace to a capture group?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何有选择地向捕获组添加逗号和空格? [英] How to optionally add a comma and whitespace to a capture group?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭