在Python未编码的字符串中查找和替换两种引号样式 [英] Find and replace both quotation styles in Python unicoded string

查看:52
本文介绍了在Python未编码的字符串中查找和替换两种引号样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Python字符串中替换用双引号样式("..."和"...")标记的字符串.

I'm trying to replace strings marked in both quotation mark styles ("..." and "...") on a string in Python.

我已经写了一个正则表达式来替换标准报价

I've already written a regex to replace the standard quotations

print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

当我尝试为文学作品(?)做这件事时,它什么也不会取代.

When I try to do it for the literary (?) ones it doesn't replace anything.

return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

事实上,就目前而言,我什至无法进行条件查询:

In fact, as I have it right now, I can't even make a conditional query:

quote_list = ['"', '"']

if all(character in self.title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)
print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

编辑:更多内容:这是一个对象

EDIT: Further context: It's an object

class Entry(models.Model):
    title = models.CharField(max_length=200)

def render_title(self):
    """
    This function wraps italics around quotation marks
    """
    quote_list = ['"', '"']

    if all(character in self.title for character in quote_list):
        print "It has literary quotes"
        return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)
    return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

我不熟悉regex命令.我在做什么错了?

I am not well-versed in regex commands. What am I doing wrong?

EDIT2 :距离问题更近了一步!这是因为我正在处理未编码的字符串.我仍然为解决这个问题而感到困惑.感谢您的帮助!

EDIT2: One step closer to the problem! It lies with the fact that I'm dealing with unicoded strings. I'm still stumped as how I can solve this. Any help is appreciated!

>>> title = u"sdsfgsdfgsdgfsdgs " asd" asd"
>>> print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', title)
sdsfgsdfgsdgfsdgs " asd" asd
>>> title = "sdsfgsdfgsdgfsdgs " asd" asd"
>>> print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', title)
sdsfgsdfgsdgfsdgs <em>" asd"</em> asd

推荐答案

我终于找到了答案.按照@interjay的建议打印变量后,我发现该字符串未编码.

I finally found an answer. After printing the variable as suggested by @interjay I found out that the string was unicoded.

与简单的字符串进行比较无法正常工作,因此我删除了条件,并使用了 answer 即可简单地制作一个转义为Unicode的正则表达式字符串,以处理简单和文学"引号.

Comparing it with a simple string didn't work so I removed the conditional and used this answer to simply make an unicode-escaped regex string to handle both simple and "literary" quotes.

title = re.sub(ur'\"(.+?)\"', ur'"<em>\1</em>"', self.title)  # notice the ur
title = re.sub(ur'\"(.+?)\"', ur'"<em>\1</em>"', title)

我在此处的评论中(不幸的是现在已删除)看到了如何将以上两个句子合二为一,但目前仍然有效.

I've seen here in a comment (unfortunately now deleted) how one could merge the above two sentences in one, but for now it works.

非常感谢您的帮助!

这篇关于在Python未编码的字符串中查找和替换两种引号样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆