Python替换除撇号外的单引号 [英] Python Replace Single Quotes Except Apostrophes

查看:159
本文介绍了Python替换除撇号外的单引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对单词列表执行以下操作.我从Gutenberg项目文本文件中读取行,将每一行拆分为空格,执行常规的标点替换,然后将每个单词和标点标记打印在其自己的行上,以备后用.我不确定如何用标签替换所有单引号或除所有撇号外.我当前的方法是使用已编译的正则表达式:

I am performing the following operations on lists of words. I read lines in from a Project Gutenberg text file, split each line on spaces, perform general punctuation substitution, and then print each word and punctuation tag on its own line for further processing later. I am unsure how to replace every single quote with a tag or excepting all apostrophes. My current method is to use a compiled regex:

apo = re.compile("[A-Za-z]'[A-Za-z]")

并执行以下操作:

if "'" in word and !apo.search(word):
    word = word.replace("'","\n<singlequote>")

,但这忽略了在带有撇号的单词周围使用单引号的情况.这也没有向我表明单引号是否邻接单词的开头或单词的结尾.

but this ignores cases where a single quote is used around a word with an apostrophe. It also does not indicate to me whether the single quote is abutting the start of a word of the end of a word.

示例输入:

don't
'George
ma'am
end.'
didn't.'
'Won't

示例输出(在处理并打印到文件之后):

Example output (after processing and printing to file):

don't
<opensingle>
George
ma'am
end
<period>
<closesingle>
didn't
<period>
<closesingle>
<opensingle>
Won't

关于此任务,我还有一个进一步的问题:由于<opensingle><closesingle>的区分似乎相当困难,因此执行

I do have a further question in relation to this task: since the distinguishment of <opensingle> vs <closesingle> seems rather difficult, would it be wiser to perform substitutions like

word = word.replace('.','\n<period>')
word = word.replace(',','\n<comma>')

之后执行替换操作?

推荐答案

正确替换开始和结束位置'的真正需求 是正则表达式. 要匹配它们,您应该使用:

What you really need to properly replace starting and ending ' is regex. To match them you should use:

  • ^'用于启动'( opensingle ),
  • '$结束'( closesingle ).
  • ^' for starting ' (opensingle),
  • '$ for ending ' (closesingle).

很遗憾,replace方法不支持正则表达式, 因此您应该改用re.sub.

Unfortunately, replace method does not support regexes, so you should use re.sub instead.

下面有一个示例程序,打印所需的输出 (在 Python 3 中):

Below you have an example program, printing your desired output (in Python 3):

import re
str = "don't 'George ma'am end.' didn't.' 'Won't"
words = str.split(" ")
for word in words:
    word = re.sub(r"^'", '<opensingle>\n', word)
    word = re.sub(r"'$", '\n<closesingle>', word)
    word = word.replace('.', '\n<period>')
    word = word.replace(',', '\n<comma>')
    print(word)

这篇关于Python替换除撇号外的单引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆