如何使用Python恢复标点符号? [英] How to restore punctuation using Python?

查看:242
本文介绍了如何使用Python恢复标点符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想恢复逗号和句号而不加标点.例如,让我们看一下这句话:

I would like to restore commas and full stops in text without punctuation. For example, let's take this sentence:

I am XYZ I want to execute I have a doubt

在上面的示例中,我想检测到应该有1个逗号和1个句号:

And I would like to detect that there should be 1 commas and 1 full stop in the above example:

I am XYZ, I want to execute. I have a doubt.

有人可以建议我如何使用Python和NLP概念实现这一目标吗?

Can anyone advise me on how to achieve this using Python and NLP concepts?

推荐答案

如果我很好理解,您想通过添加适当的标点来提高句子的质量.有时称为标点恢复.

If I understand well, you want to improve the quality of a sentence by adding the appropriate punctuation. This is sometimes called punctuation restoration.

一个好的第一步是应用常规的NLP管道,即令牌化 POS标记 NLTK a>或 Spacy .

A good first step is to apply the usual NLP pipeline, namely tokenization, POS tagging, and parsing, using libraries such as NLTK or Spacy.

完成此预处理后,您将必须基于NLP管道中提取的功能(例如,句子边界,解析树,POS),应用基于规则或机器学习的方法来定义标点符号的位置等).

Once this preprocessing is done, you'll have to apply a rule-based or a machine learning approach to define where the punctuation should be, based on the features extracted from the NLP pipeline (e.g. sentence boundaries, parsing tree, POS, etc.).

但是,这并不是一件微不足道的任务.如果要自定义算法,可能需要强大的NLP/AI技能.

However this is not a trivial task. It can require strong NLP/AI skills if you want to customise your algorithm.

一些可以重用的示例:

  • Here 是使用Spacy的一种简单方法,主要基于句子边界.
  • 此处是更复杂的解决方案,它使用
  • Here is a simple approach using Spacy, mainly based on sentence boundaries.
  • Here is a more complex solution, using the Theano deep learning library.

这篇关于如何使用Python恢复标点符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆