将文本拆分为段NLTK-使用nltk.tokenize.texttiling吗? [英] Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling?

查看：129 发布时间：2020/5/18 1:15:38 python nltk

本文介绍了将文本拆分为段NLTK-使用nltk.tokenize.texttiling吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找将文档拆分为段落的方法，并且遇到了

I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this.

这是我尝试使用它的尝试.但是，我不明白如何使用输出.多谢您的协助.

Here is my attempt to use it. However, I don't understand how to work with the output. I'd appreciate your help.

t = unidecode(doclist[0].decode('utf-8','ignore'))

nltk.tokenize.texttiling.TextTilingTokenizer(t)

输出:

<nltk.tokenize.texttiling.TextTilingTokenizer at 0x11e9c6350>

推荐答案

我现在正和这个人搞混，原因与您相同，并且遇到了同样的问题，因此如果这个问题不要太难过是错的.我想最好地传达我所不知道的... :)

I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)

我不确定，但是我在此错误报告使用TextTilingTokenizer的示例:

I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:

alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(alice[140309 : ])

您似乎想要将文本提供给TextTilingTokenizer上的tokenize方法.

It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.

这篇关于将文本拆分为段NLTK-使用nltk.tokenize.texttiling吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将文本拆分为段NLTK-使用nltk.tokenize.texttiling吗? [英] Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将文本拆分为段NLTK-使用nltk.tokenize.texttiling吗? [英] Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭