如何在Lua中匹配一个句子 [英] How to match a sentence in Lua

查看:23
本文介绍了如何在Lua中匹配一个句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个试图匹配句子的正则表达式.

I am trying to create a regex which attempts to match a sentence.

这是一个片段.

local utf8 = require 'lua-utf8'
function matchsent(text)
  local text = text
  for sent in utf8.gmatch(text, "[^
]+.[
 ]") do
    print(sent)
    print('-----')
  end
end

然而,它不像在 python 中那样工作.我知道 Lua 使用不同的正则表达式模式集,它的正则表达式功能是有限的,但为什么上面的正则表达式给我一个语法错误?Lua 中匹配正则表达式的句子会是什么样子?

However, it does not work like in python for example. I know that Lua uses different set of regex patterns and it's regex capabilities are limited but why does the regex above give me a syntax error? And how a sentence matching regex in Lua would look like?

推荐答案

注意 Lua 使用 Lua 模式,它不是常规"表达式,因为它们不能匹配常规语言.它们几乎不能用于将文本拆分成句子,因为您需要考虑各种缩写、间距、大小写等.要将文本拆分成句子,您需要一个 NLP 包而不是一两个正则表达式,因为它的复杂性任务.

Note that Lua uses Lua patterns, that are not "regular" expressions as they cannot match a regular language. They can hardly be used to split a text into sentences since you'd need to account for various abbreviations, spacing, case etc. To split a text into sentences, you need an NLP package rather than one or two regexps due to the complexity of the task.

关于

为什么上面的正则表达式给我一个语法错误?

why does the regex above give me a syntax error?

你需要在 Lua patterns 中使用 % 符号来转义特殊符号.查看示例代码:

you need to escape special symbols with a % symbol in Lua patterns. See an example code:

function matchsent(text)
    for sent in string.gmatch(text, '[^
]+%.[
 ]') do
        print(sent)
        print("---")
    end
end
matchsent("Some text here.
Show me")

在线演示

这篇关于如何在Lua中匹配一个句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆