您如何将一段文本解析为句子? (最好在Ruby中) [英] How do you parse a paragraph of text into sentences? (perferrably in Ruby)

查看:78
本文介绍了您如何将一段文本解析为句子? (最好在Ruby中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到Mr.,Dr.和U.S.A等案件,您如何将段落或大量文本分成句子(最好使用Ruby)? (假设您只是将句子放入数组数组中)

How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby) taking into account cases such as Mr. and Dr. and U.S.A? (Assuming you just put the sentences into an array of arrays)

更新: 我想到的一种可能的解决方案涉及使用词性标记器(POST)和分类器来确定句子的结尾:

UPDATE: One possible solution I thought of involves using a parts-of-speech tagger (POST) and a classifier to determine the end of a sentence:

从琼斯先生那里获取数据时,当他走出意大利避暑别墅的阳台时,脸上感到温暖的阳光.他很高兴活着.

Getting data from Mr. Jones felt the warm sun on his face as he stepped out onto the balcony of his summer home in Italy. He was happy to be alive.

分类器 Mr./PERSON Jones/PERSON感到/O温暖/O阳光/O上/O他/O脸部/O//他/O踩着/O出/O上/O/O阳台/O /O他的/O夏季/O住所/O/在意大利/LOCATION./O他/O很高兴/O乐于/O到/O Be/O活着/O ./O

CLASSIFIER Mr./PERSON Jones/PERSON felt/O the/O warm/O sun/O on/O his/O face/O as/O he/O stepped/O out/O onto/O the/O balcony/O of/O his/O summer/O home/O in/O Italy/LOCATION ./O He/O was/O happy/O to/O be/O alive/O ./O

开机自检 Mr./NNP Jones/NNP感到/VBD/DT温暖/JJ sun/NN上/IN他/PRP $脸部/NN为/IN他/PRP踩踏/VBD出/RP上/IN/DT阳台/NN的/IN他的/PRP $夏季的/NN家/NN在/IN的意大利./NNP他/PRP是/VBD高兴/JJ到/将要/VB活着./IN

POST Mr./NNP Jones/NNP felt/VBD the/DT warm/JJ sun/NN on/IN his/PRP$ face/NN as/IN he/PRP stepped/VBD out/RP onto/IN the/DT balcony/NN of/IN his/PRP$ summer/NN home/NN in/IN Italy./NNP He/PRP was/VBD happy/JJ to/TO be/VB alive./IN

我们可以假设,由于意大利是一个地点,所以句号是句子的有效结尾?自从以先生"结尾.不会有其他词性,我们可以假设这不是一个有效的句末期吗?这是对我的问题的最佳答案吗?

Can we assume, since Italy is a location, the period is the valid end of the sentence? Since ending on "Mr." would have no other parts-of-speech, can we assume this is not a valid end-of-sentence period? Is this the best answer to the my question?

有想法吗?

推荐答案

尝试查看围绕Stanford Parser的Ruby包装器.它具有getSentencesFromString()函数.

Try looking at the Ruby wrapper around the Stanford Parser. It has a getSentencesFromString() function.

这篇关于您如何将一段文本解析为句子? (最好在Ruby中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆