如何匹配Lua中的句子 [英] How to match a sentence in Lua
问题描述
我正在尝试创建一个正则表达式,试图匹配一个句子.
I am trying to create a regex which attempts to match a sentence.
这是一个片段.
local utf8 = require 'lua-utf8'
function matchsent(text)
local text = text
for sent in utf8.gmatch(text, "[^\r\n]+\.[\r\n ]") do
print(sent)
print('-----')
end
end
但是,它不能像在python中那样工作.我知道Lua使用不同的正则表达式模式集,并且它的正则表达式功能有限,但是为什么上述正则表达式会给我带来语法错误?Lua中与正则表达式匹配的句子会是什么样子?
However, it does not work like in python for example. I know that Lua uses different set of regex patterns and it's regex capabilities are limited but why does the regex above give me a syntax error? And how a sentence matching regex in Lua would look like?
推荐答案
请注意,Lua使用Lua模式,它们不是常规"表达式,因为它们不能与常规语言匹配.由于您需要考虑各种缩写,空格,大小写等原因,因此几乎不能将它们用于将文本拆分为句子.由于将文本拆分为句子,您需要使用NLP软件包而不是一两个正则表达式,因为任务.
Note that Lua uses Lua patterns, that are not "regular" expressions as they cannot match a regular language. They can hardly be used to split a text into sentences since you'd need to account for various abbreviations, spacing, case etc. To split a text into sentences, you need an NLP package rather than one or two regexps due to the complexity of the task.
关于
为什么上面的正则表达式给我一个语法错误?
why does the regex above give me a syntax error?
您需要在Lua 模式中使用%
符号对特殊符号进行转义.查看示例代码:
you need to escape special symbols with a %
symbol in Lua patterns. See an example code:
function matchsent(text)
for sent in string.gmatch(text, '[^\r\n]+%.[\r\n ]') do
print(sent)
print("---")
end
end
matchsent("Some text here.\nShow me")
这篇关于如何匹配Lua中的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!