正则表达式使用match()在单独的句子中拆分长文本 [英] Regex that splits long text in separate sentences with match()

查看:254
本文介绍了正则表达式使用match()在单独的句子中拆分长文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是用户在其中写一些文本的文本区域.我已经在其中写了一个例子.

This is a textarea where the user writes some text. I've written an example in it.

<textarea id="text">First sentence. Second sentence? Third sentence!
Fourth sentence.

Fifth sentence
</textarea>

正则表达式中已考虑的要求

  • 分隔符包含在数组项中
  • 最后一句不一定需要使用分隔符(可以以任何字符结尾)
  • 如果一个句子有多个分隔符,则它包含在数组项中.示例:第二句?!?应为[...,第二句?!?",...]
  • separator is included in array item
  • last sentence doesn't necessarily require a separator character (it can end with any character)
  • if a sentence has more than one separator char, it is included in the array item. Example: second sentence?!? should be [...,"second sentence?!?",...]

缺少的要求(我需要帮助)<<

每行都应由一个空数组项表示.如果应用了正则表达式,则应为以下响应:

Each new line should be represented by an empty array item. If the regex is applied, this should be the response:

["First sentence.", "Second sentence?", "Third sentence!", "", "Fourth sentence.", "", "", "Fifth sentence"]

相反,我收到此消息:

["First sentence.", "Second sentence?", "Third sentence!", "Fourth sentence.", "Fifth sentence"]

这是正则表达式和匹配项:

This is the regex and match call:

var tregex = /[^\r\n.!?]+(:?(:?\r\n|[\r\n]|[.!?])+|$)/gi;
var sentences = $('#text').val().match(tregex).map($.trim);

有什么想法吗?谢谢!

推荐答案

我将其简化了很多,要么匹配一行的结尾(换行),要么匹配一个句子,然后加上标点符号:

I simplified it a lot, either match the end of a line (new line) or a sentence followed by punctuation:

var tregex = /\n|([^\r\n.!?]+([.!?]+|$))/gim;

我也相信多行的m标志很重要

I also believe the m flag for multiline is important

这篇关于正则表达式使用match()在单独的句子中拆分长文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆