正则表达式排除学术头衔 [英] RegEx to exclude academic title

查看:27
本文介绍了正则表达式排除学术头衔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将段落字符串拆分为句子数组.当然,我使用带有字符点 (.) 的正则表达式将段落拆分为句子.问题是句子中的学术名称缩写,每个缩写都使用点(.).所以我的正则表达式完全错误地分割了段落.

I want split paragraph string into array of sentences. Of course I am using regular expression with character dot (.) to split the paragraph into sentences. The problem is academic title abbreviation in the sentences, every abbreviation is using dot (.). So my regex totally wrong to split the paragraph.

这是一个段落的例子:

兼任茂物农业校长大学教授 Herry Suhardiyanto 博士,在他的讲话中要求研究生应该继续学习并将按时完成学业.在场一般观众是研究生院副院长茂物农业大学德迪博士Jusadi,研究生院秘书茂物农业大学博士项目,教授博士玛丽敏.

Meanwhile Rector of Bogor Agricultural University, Prof. Dr. Herry Suhardiyanto, in his remarks requested that the graduate students should keep on studying and will finalize their studies on time. Present in that general audience were the Deputy Dean of the Graduate School of Bogor Agricultural University, Dr.Dedi Jusadi, Secretary of the Graduate School for Doctoral Program of Bogor Agricultural University, Prof.Dr. Marimin.

仅使用点 (.) 作为正则表达式,我得到:

Only using dot (.) as regex, I get :

Array (
[0] => Meanwhile Rector of Bogor Agricultural University, Prof
[1] => Dr
[2] => Herry Suhardiyanto, in his remarks requested that the graduate students should keep on studying and will finalize their studies on time
[3] => ...
)

这实际上是我想要的:

Array (
[0] => Meanwhile Rector of Bogor Agricultural University, Prof. Dr. Herry Suhardiyanto, in his remarks requested that the graduate students should keep on studying and will finalize their studies on time
[1] => Present in  that general audience were  the Deputy Dean of the Graduate School of Bogor Agricultural University, Dr.Dedi Jusadi, Secretary of the Graduate School for Doctoral Program of Bogor Agricultural University, Prof.Dr. Marimin
)

推荐答案

你可以使用 Negative Lookbehinds:

You could use Negative Lookbehinds:

((? add如果需要更多

在此处解释演示:http://regex101.com/r/xQ3xF9

代码可能如下所示:

$text="Meanwhile Rector of Bogor Agricultural University, Prof. Dr. Herry Suhardiyanto, in his remarks about Mr. John requested that the graduate students should keep on studying and will finalize their studies on time. Present in that general audience were Mrs. Peterson of the Graduate School of Bogor Agricultural University, Dr.Dedi Jusadi, Secretary of the Graduate School for Doctoral Program of Bogor Agricultural University, Prof.Dr. Marimin.";

$titles=array('(?<!Prof)', '(?<!Dr)', '(?<!Mr)', '(?<!Mrs)', '(?<!Ms)');
$sentences=preg_split('/('.implode('',$titles).')\./',$text);
print_r($sentences);

这篇关于正则表达式排除学术头衔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆