如何处理urllib.request.urlopen（）的响应编码 [英] How to handle response encoding from urllib.request.urlopen()

查看：1220 发布时间：2017/8/16 19:31:03 python regex encoding urllib

本文介绍了如何处理urllib.request.urlopen（）的响应编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

TypeError：can'在字节状对象上使用字符串模式

我明白为什么，urllib.request.urlopen（）返回一个字节流等，至少我猜，不知道要使用的编码。在这种情况下我应该做什么？有没有办法在urlrequest中指定编码方法，或者我需要自己重新编码字符串？如果是这样我想做什么，我假设我应该从标题信息或编码类型读取编码，如果在html中指定，然后重新编码到那个？

解决方案

您只需要解码响应，使用 Content-Type 标头通常是最后一个值。教程中也有一个示例。

  output = response.decode（'utf-8'）

I'm trying to search a webpage using regular expressions, but I'm getting the following error:

TypeError: can't use a string pattern on a bytes-like object

I understand why, urllib.request.urlopen() returns a bytestream and so, at least I'm guessing, re doesn't know the encoding to use. What am I supposed to do in this situation? Is there a way to specify the encoding method in a urlrequest maybe or will I need to re-encode the string myself? If so what am I looking to do, I assume I should read the encoding from the header info or the encoding type if specified in the html and then re-encode it to that?

解决方案

You just need to decode the response, using the Content-Type header typically the last value. There is an example given in the tutorial too.

output = response.decode('utf-8')

这篇关于如何处理urllib.request.urlopen（）的响应编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何处理urllib.request.urlopen（）的响应编码 [英] How to handle response encoding from urllib.request.urlopen()

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何处理urllib.request.urlopen（）的响应编码 [英] How to handle response encoding from urllib.request.urlopen()

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭