如何在Lua中匹配一个句子 [英] How to match a sentence in Lua
问题描述
我正在尝试创建一个试图匹配句子的正则表达式.
I am trying to create a regex which attempts to match a sentence.
这是一个片段.
local utf8 = require 'lua-utf8'
function matchsent(text)
local text = text
for sent in utf8.gmatch(text, "[^
]+.[
]") do
print(sent)
print('-----')
end
end
然而,它不像在 python 中那样工作.我知道 Lua 使用不同的正则表达式模式集,它的正则表达式功能是有限的,但为什么上面的正则表达式给我一个语法错误?Lua 中匹配正则表达式的句子会是什么样子?
However, it does not work like in python for example. I know that Lua uses different set of regex patterns and it's regex capabilities are limited but why does the regex above give me a syntax error? And how a sentence matching regex in Lua would look like?
推荐答案
注意 Lua 使用 Lua 模式,它不是常规"表达式,因为它们不能匹配常规语言.它们几乎不能用于将文本拆分成句子,因为您需要考虑各种缩写、间距、大小写等.要将文本拆分成句子,您需要一个 NLP 包而不是一两个正则表达式,因为它的复杂性任务.
Note that Lua uses Lua patterns, that are not "regular" expressions as they cannot match a regular language. They can hardly be used to split a text into sentences since you'd need to account for various abbreviations, spacing, case etc. To split a text into sentences, you need an NLP package rather than one or two regexps due to the complexity of the task.
关于
为什么上面的正则表达式给我一个语法错误?
why does the regex above give me a syntax error?
你需要在 Lua patterns 中使用 %
符号来转义特殊符号.查看示例代码:
you need to escape special symbols with a %
symbol in Lua patterns. See an example code:
function matchsent(text)
for sent in string.gmatch(text, '[^
]+%.[
]') do
print(sent)
print("---")
end
end
matchsent("Some text here.
Show me")
这篇关于如何在Lua中匹配一个句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!