使用decode()与regex取消对这个字符串的反应 [英] Using decode() vs. regex to unescape this string

查看:142
本文介绍了使用decode()与regex取消对这个字符串的反应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下字符串,我试图找出最佳做法来解除它。



解决方案必须有一些灵活性,因为我从API接收这个输入,我不能绝对确定当前的字符结构( \\\
而不是 \r )将永远是一样的。



'如果它没有打破,不能修复它。 \\\
n在进行详细的洗车。当乘坐汽车进入隧道
时,服务员将我的发动机驶过。注意:我的车是...'



这个正则表达式似乎应该可以工作:

  text_excerpt = re.sub(r'[\s\\]','',raw_text_excerpt).strip()

我已经读了 decode()可能会工作(而且会是一般更好的解决方案)。

  raw_text_excerpt.decode('string_unescape')
pre>

尝试这些行的东西,它没有工作任何建议?正则表达式最好在这里?

解决方案

您正在寻找的编解码器是 string-escape

 >>>打印\\解码(string-escape)
'
/ pre>

我不知道他们添加了什么版本,虽然...可能是您使用的旧版本,没有它'运行:

  Python 2.6.6(r266:84292,2011年3月25日,19:36:32)
[ GCC 4.5.2] on linux2


I have the following string and I'm trying to figure out the best practice for unescaping it.

The solution has to be somewhat flexible in that I'm receiving this input from an API and I can't be absolutely certain that the current character structure (\n as opposed to \r) will always be the same.

'"If it ain\'t broke, don\'t fix it." \nWent in for a detailed car wash.\nThe attendants raved-up my engine when taking the car into the tunnel. NOTE: my car is...'

This regex seems like it should work:

text_excerpt = re.sub(r'[\s"\\]', ' ', raw_text_excerpt).strip()

I've aso read that decode() might work (and would be a better solution generally).

raw_text_excerpt.decode('string_unescape')

Tried something along those lines and it didn't work. Any suggestions? Is regex best here?

解决方案

The codec you're looking for is string-escape:

>>> print "\\'".decode("string-escape")
'

I'm not sure what version they added it in, though... could be an older version you're using that doesn't have it. I'm running:

Python 2.6.6 (r266:84292, Mar 25 2011, 19:36:32) 
[GCC 4.5.2] on linux2

这篇关于使用decode()与regex取消对这个字符串的反应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆