你如何将一段文本解析成句子?(最好是在 Ruby 中) [英] How do you parse a paragraph of text into sentences? (perferrably in Ruby)

查看:26
本文介绍了你如何将一段文本解析成句子?(最好是在 Ruby 中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到 Mr. and Dr. 和 U.S.A 等案例,您如何将段落或大量文本分成句子(最好使用 Ruby)?(假设您只是将句子放入数组数组中)

How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby) taking into account cases such as Mr. and Dr. and U.S.A? (Assuming you just put the sentences into an array of arrays)

更新:我想到的一种可能的解决方案是使用词性标注器 (POST) 和分类器来确定句子的结尾:

UPDATE: One possible solution I thought of involves using a parts-of-speech tagger (POST) and a classifier to determine the end of a sentence:

从琼斯先生那里获取数据,当他走出意大利避暑别墅的阳台时,他感觉到温暖的阳光照在他的脸上.他很高兴还活着.

Getting data from Mr. Jones felt the warm sun on his face as he stepped out onto the balcony of his summer home in Italy. He was happy to be alive.

分类器Mr./PERSON Jones/PERSON 感觉/O the/O温暖/O sun/O on/O his/O face/O as/O he/O step/O out/O on/O the/O阳台/O of/O 他/O 夏天/O 家/O 在/O 意大利/LOCATION ./O 他/O 是/O 快乐/O 到/O 成为/O 活着/O ./O

CLASSIFIER Mr./PERSON Jones/PERSON felt/O the/O warm/O sun/O on/O his/O face/O as/O he/O stepped/O out/O onto/O the/O balcony/O of/O his/O summer/O home/O in/O Italy/LOCATION ./O He/O was/O happy/O to/O be/O alive/O ./O

发布Mr./NNP Jones/NNPfeel/VBD the/DT温暖/JJ sun/NN on/IN his/PRP$ face/NN as/IN he/PRP step/VBD out/RP on/IN the/DT阳台/NNof/IN his/PRP$ Summer/NN home/NN in/IN Italy./NNP He/PRP was/VBD happy/JJ to/TO be/VB活着./IN

POST Mr./NNP Jones/NNP felt/VBD the/DT warm/JJ sun/NN on/IN his/PRP$ face/NN as/IN he/PRP stepped/VBD out/RP onto/IN the/DT balcony/NN of/IN his/PRP$ summer/NN home/NN in/IN Italy./NNP He/PRP was/VBD happy/JJ to/TO be/VB alive./IN

我们可以假设,由于意大利是一个地点,句号是句子的有效结尾吗?自从以先生"结尾将没有其他词性,我们可以假设这不是有效的句末期吗?这是我问题的最佳答案吗?

Can we assume, since Italy is a location, the period is the valid end of the sentence? Since ending on "Mr." would have no other parts-of-speech, can we assume this is not a valid end-of-sentence period? Is this the best answer to the my question?

想法?

推荐答案

尝试查看斯坦福周围的 Ruby 包装器解析器.它有一个 getSentencesFromString() 函数.

Try looking at the Ruby wrapper around the Stanford Parser. It has a getSentencesFromString() function.

这篇关于你如何将一段文本解析成句子?(最好是在 Ruby 中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆