如何匹配Lua中的句子 [英] How to match a sentence in Lua

查看:46
本文介绍了如何匹配Lua中的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个正则表达式,试图匹配一个句子.

I am trying to create a regex which attempts to match a sentence.

这是一个片段.

local utf8 = require 'lua-utf8'
function matchsent(text)
  local text = text
  for sent in utf8.gmatch(text, "[^\r\n]+\.[\r\n ]") do
    print(sent)
    print('-----')
  end
end

但是,它不能像在python中那样工作.我知道Lua使用不同的正则表达式模式集,并且它的正则表达式功能有限,但是为什么上述正则表达式会给我带来语法错误?Lua中与正则表达式匹配的句子会是什么样子?

However, it does not work like in python for example. I know that Lua uses different set of regex patterns and it's regex capabilities are limited but why does the regex above give me a syntax error? And how a sentence matching regex in Lua would look like?

推荐答案

请注意,Lua使用Lua模式,它们不是常规"表达式,因为它们不能与常规语言匹配.由于您需要考虑各种缩写,空格,大小写等原因,因此几乎不能将它们用于将文本拆分为句子.由于将文本拆分为句子,您需要使用NLP软件包而不是一两个正则表达式,因为任务.

Note that Lua uses Lua patterns, that are not "regular" expressions as they cannot match a regular language. They can hardly be used to split a text into sentences since you'd need to account for various abbreviations, spacing, case etc. To split a text into sentences, you need an NLP package rather than one or two regexps due to the complexity of the task.

关于

为什么上面的正则表达式给我一个语法错误?

why does the regex above give me a syntax error?

您需要在Lua 模式中使用符号对特殊符号进行转义.查看示例代码:

you need to escape special symbols with a % symbol in Lua patterns. See an example code:

function matchsent(text)
    for sent in string.gmatch(text, '[^\r\n]+%.[\r\n ]') do
        print(sent)
        print("---")
    end
end
matchsent("Some text here.\nShow me")

在线演示

这篇关于如何匹配Lua中的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆