如何消除☎unicode? [英] How to eliminate the ☎ unicode?

查看：31 发布时间：2021/6/26 19:49:07 python regex python-2.7 scrapy

本文介绍了如何消除☎unicode?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在网页抓取过程中，去掉所有 html 标签后，我得到了 unicode (☎) 中的黑色电话字符 \u260e.但与此回复不同，我也想摆脱它.

During web scraping and after getting rid of all html tags, I got the black telephone character \u260e in unicode (☎). But unlike this response I do want to get rid of it too.

我在 Scrapy 中使用了以下正则表达式来消除 html 标签:

I used the following regular expressions in Scrapy to eliminate html tags:

pattern = re.compile("<.*?>|&nbsp;|&amp;",re.DOTALL|re.M)

然后我尝试匹配 \u260e，我想我被反斜杠瘟疫.我试过这种模式没有成功:

Then I tried to match \u260e and I think I got caught by the backslash plague. I tried unsuccessfully this patterns:

pattern = re.compile("<.*?>|&nbsp;|&amp;|\u260e",re.DOTALL|re.M)
pattern = re.compile("<.*?>|&nbsp;|&amp;|\\u260e",re.DOTALL|re.M)
pattern = re.compile("<.*?>|&nbsp;|&amp;|\\\\u260e",re.DOTALL|re.M)

这些都不起作用，我仍然有 \u260e 作为输出.我怎样才能让它消失?

None of this worked and I still have \u260e as an output. How can I make this disappear?

推荐答案

使用 Python 2.7.3，以下对我来说效果很好:

Using Python 2.7.3, the following works fine for me:

import re

pattern = re.compile(u"<.*?>|&nbsp;|&amp;|\u260e",re.DOTALL|re.M)
s = u"bla ble \u260e blo"
re.sub(pattern, "", s)

输出:

u'bla ble  blo'

正如@Zack 所指出的，这是因为字符串现在是 unicode，即字符串已经被转换，并且字符序列 \u260e 现在是 --可能——两个字节用来写那个黑色的小电话☎ (:

As pointed by @Zack, this works due to the fact that the string is now in unicode, i.e., the string is already converted, and the sequence of characters \u260e is now the -- probably -- two bytes used to write that little black phone ☎ (:

一旦要搜索的字符串和正则表达式都有黑色电话本身，而不是字符序列\u260e，它们都匹配.

Once both the string to be searched and the regular expression have the black phone itself, and not the sequence of characters \u260e, they both match.

这篇关于如何消除☎unicode?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何消除☎unicode? [英] How to eliminate the ☎ unicode?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何消除☎unicode? [英] How to eliminate the ☎ unicode?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭