如何仅获取 RegEx 的第一个匹配项(UiPath Studio RegEx Based Extractor) [英] How to get only the first match of a RegEx (UiPath Studio RegEx Based Extractor)
问题描述
我使用 UiPath Studio 的 OCR 从 PDF 中提取了以下文本.由于它是原始、重复和重复的文本块,因此它重复了 3 次.同一 PDF 页面一式三份.
I have the following text that I extracted from a PDF using UiPath Studio's OCR. It's the same block of text repeated 3 times due to it being the original, duplicate & triplicate of the same PDF page.
Os bens/serviços foram colocados à disposição do adquirente em 2020-04-16 * Data/Hora início de transporte: 2020-04-16 às 11:52
Total Líquido 500,00
Total de Descontos 500,00
Desconto Documento
Total de IVA 115,00
Total do Documento (EUR) 615,00
IVA Incidência Valor do IVA
Isento
6%
13%
23% 500,00 115,00
b5El-Processado por programa certificado n.º75/AT.
Os bens/serviços foram colocados à disposição do adquirente em 2020-04-16 * Data/Hora início de transporte: 2020-04-16 às 11:52
Total Líquido 500,00
Total de Descontos 500,00
Desconto Documento
Total de IVA 115,00
Total do Documento (EUR) 615,00
IVA Incidência Valor do IVA
Isento
6%
13%
23% 500,00 115,00
b5El-Processado por programa certificado n.º75/AT.
Os bens/serviços foram colocados à disposição do adquirente em 2020-04-16 * Data/Hora início de transporte: 2020-04-16 às 11:52
Total Líquido 500,00
Total de Descontos 500,00
Desconto Documento
Total de IVA 115,00
Total do Documento (EUR) 615,00
IVA Incidência Valor do IVA
Isento
6%
13%
23% 500,00 115,00
b5El-Processado por programa certificado n.º75/AT.
我需要提取-Processado por programa"后面的 4 个字符的代码;但只想要一场比赛或第一场比赛.
I need to extract the 4 character code behind "-Processado por programa" but just want 1 match or the 1st match.
已经试过[^*]+(?=-Processado\spor\sprograma)
和(.*?)(?=-Processado\spor\sprograma)
但这会输出 3 个匹配项.
Already tried [^*]+(?=-Processado\spor\sprograma)
and (.*?)(?=-Processado\spor\sprograma)
but that outputs me 3 matches.
当我删除 /g
标志时它起作用了,但我使用的是 UiPath Studio 的 RegEx 提取器,我不知道如何在该程序上删除该标志.
It worked when I removed the /g
flag but I'm using UiPath Studio's RegEx extractor and I don't know how to remove that flag on that program.
推荐答案
您可以匹配所有不以 4 个单词字符开头的行和 -Processado por programa
使用否定前瞻.
You could match all lines that do not start with 4 word characters and -Processado por programa
using a negative lookahead.
当你遇到这样的行时,捕获第 1 组中的前 4 个单词字符
When you encounter the line that does, capture the first 4 word characters in group 1
\A.*(?:\r?\n(?!\w{4}-Processado\spor\sprograma\b).*)*\r?\n(\w{4})
说明
\A.*
断言字符串开头的位置和除换行符以外的任何字符 0+ 次(?:
非捕获组\r?\n
匹配换行(?!\w{4}-Processado\spor\sprograma\b)
负前瞻,断言不是-Processado por programa
直接向右.*
匹配该行的其余部分
\A.*
Assert the position at the start of the string and any char except a newline 0+ times(?:
Non capture group\r?\n
Match a newline(?!\w{4}-Processado\spor\sprograma\b)
Negative lookahead, assert not-Processado por programa
directly to the right.*
Match the rest of the line
这篇关于如何仅获取 RegEx 的第一个匹配项(UiPath Studio RegEx Based Extractor)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!