Python从文本文件中删除标点符号 [英] Python remove punctuation from a text file

查看:957
本文介绍了Python从文本文件中删除标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从文本文件中删除标点符号列表,但是与连字符分开的单词只有一个问题.例如,如果我有创伤后"一词,我反而会得到创伤后".我想获得创伤".

I'm trying to remove a list of punctuation from my text file but I have only one problem with words separated from hyphen. For example, if I have the word "post-trauma" I get "posttrama" conversely I want to get "post" "trauma".

我的代码是:

 punct=['!', '#', '"', '%', '$', '&', ')', '(', '+', '*', '-'] 

 with open(myFile, "r") as f:
      text= f.read()
      remove = '|'.join(REMOVE_LIST) #list of word to remove
      regex = re.compile(r'('+remove+r')', flags=re.IGNORECASE) 
      out = regex.sub("", text)

      delta= " ".join(out.split())
      txt = "".join(c for c in delta if c not in punct )

有办法解决吗?

推荐答案

我相信您可以在delta上调用内置的replace函数,因此您的最后一行将变为以下内容:

I believe you can just call the built-in replace function on delta, so your last line would become the following:

txt = "".join(c for c in delta.replace("-", " ") if c not in punct )

这意味着您文本中的所有连字符都将变为空格,因此单词将被视为分开.

This means all the hyphens in your text will become spaces, so the words will be treated as if they were separate.

这篇关于Python从文本文件中删除标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆