正则表达式将文本文档拆分为句子 [英] regex split text document into sentences

查看:394
本文介绍了正则表达式将文本文档拆分为句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的文本字符串,我正在尝试将其拆分为基于.?!"的句子.但是我的正则表达式无法正常工作,有人可以指导我检测错误吗?

I have a big text string and I am trying to split it into the sentences based on ". ? !". But my regex is not working somehow, can somebody guide me to detect the error?

String str = "When my friend said he likes deep dish pizza one day, I immediately set a time to come back to Little Star. Arguably, the best deep dish pizza in SF...though...I don't believe there are many places that do deep dish pizza. That being said...its not the BEST ever, just the best for the area. They use cornmeal in the crust, or on the baking surface, so there's a bit of extra crunch to it. That being said...I'm not sure how much I like the cornmeal texture to my pizza. I kind of want just a GOOD CRUST, you know? No extra stuff to try to make it more crunchy.";
String[] sentences = str.split("/(?<=[.?!])\\S+(?=[a-z])/i");

但是它不会拆分句子.有人可以检测到错误吗?

But it is not splitting the sentences. Can somebody detect the error?

推荐答案

这里有个小提示:

斜杠与正则表达式无关

斜线是*一些+语言的应用程序语言伪像. Java不是其中之一.

Slashes are an application language artefact of *some+ languages. Java is not one of them.

尝试删除斜杠,并用(?i)"替换结尾的"/i":

Try removing the slashes and replacing the trailing "/i" with "(?i)":

String[] sentences = str.split("(?i)(?<=[.?!])\\S+(?=[a-z])");

这篇关于正则表达式将文本文档拆分为句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆