Python请求以utf-8编码但无法解码的响应 [英] Python requests response encoded in utf-8 but cannot be decoded

查看：24 发布时间：2022/1/5 15:39:48 python post request facebook-messenger

本文介绍了Python请求以utf-8编码但无法解码的响应的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 python 抓取我的 messenger.com(facebook messenger)聊天记录，我使用谷歌浏览器开发人员工具查看聊天记录的 POST 请求，我已将整个标题和正文复制为请求的格式可以使用.

I am trying to scrape my messenger.com (facebook messenger) chats using python and i have used google chromes developer tools to see the POST request for the chat history and i have copied the entire header and body into a format that requests can use.

我得到 HTTP 代码 200，这意味着该请求至少得到了东西，但是我可以打印 res.encoding 以获取它返回的编码，它说的是 utf-8.但我无法解码！

I get HTTP code 200 implying the request at least got something but and i can print res.encoding to get the encoding it returned in which it says is utf-8. But i cannot decode it!

这里是功能:

def download_thread(self, limit, offset, message_timestamp):
    """Download the specified number of messages from the
    provided thread, with an optional offset
    """
    data = request_data(self.thread, offset=offset,
                        limit=limit, group=self.group,
                        timestamp=message_timestamp)

    res = self.ses.post(url_thread, data=data, headers=headers)

    print(res.content)

    thread_contents = json.loads(res.content)
    print(thread_contents)
    return thread_contents

收益

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte

当它尝试json.load(或loads)数据时

但是 res.encoding 确实返回 utf-8.

But res.encoding does return utf-8.

我尝试用 gzip 解压，但那不是 gzip 压缩的内容.

I tried unzipping with gzip but that says it is is not gzipped content.

如果我只是尝试做 print(res.content) 我得到

If i just try to do print(res.content) i get

Traceback (most recent call last):
  File "FBChatScraper.py", line 200, in <module>
    main()
  File "FBChatScraper.py", line 134, in main
    fbms.run()
0fx82x048xbbxb9=x87xebK0.xffx90xddxebxfax16xc6xbbzx8bx82)xe8xaaVx01^xdax8bxbdx15d-xb1x10@x17\xd43xa8x92wxe8xc0xcdUxc4xffxc7xfax90xb2xb3xf5x84x11ux0b	x8fx83rxf3}xe5!y$xe6xf6c0xf0xb4x98xcat_x0cx08xb5xddx8ctxx91xa9x95
B%xe2x93xa52x85_xa6x10xc2xc9xa3xee4SDbxa5x18QJx83Xx19)xaa$xf4xb4xb7x0bx84x15&x88x08Lxc9iPxa2xb9xf2xafx96x96Nxd8xcf=x05xc1x18x8dxa0xf2Yx8e
xcfxc8x0fE4xd6)xa1xd4xb7Dxd6{ixc8Px96Rx11HCxacxbcKyT#~}x93xf7@Kxc7r/x82xb0xe4xefXxf9jx08xa6Hpxfcnx06xfdox9axd0wJxb4fJ(x89+x1cxf6x0eOIx90xacx9eDDxfd,xa5xe9x89x1blhx86Zx98x05xdd9xc7xf4x80xfcYx8exadxeex99!x15x13+x9bx07xe8Fdjxfcx11xfcxfe7x06hx02x00@>]Wx92xc9x02xb1c3x82xcdxa4xefN9x90xe6x81yx9cx84erxd4xc3x06x1cx06x14xcfxc7x07hjxbfHxdcxf5~xf7zx18Cexaf^x8cxab xdfVxcexb8x11xf8x06x03'

Traceback (most recent call last):
  File "FBChatScraper.py", line 200, in <module>
    main()
  File "FBChatScraper.py", line 134, in main
    fbms.run()
  File "FBChatScraper.py", line 43, in run
    thread_contents = self.download_thread(limit, offset, message_timestamp)
  File "FBChatScraper.py", line 74, in download_thread
    thread_contents = json.loads(res.content)
  File "/Users/silman/anaconda/lib/python3.6/json/__init__.py", line 349, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte

奇怪地将内容打印在回溯的中间，让我认为有一些不可见的字符将其向下推.

oddly printing the content in the middle of the traceback leading me to think there are some invisible characters pushing it down.

我无法将响应加载为 json 格式，因为无论我如何处理响应内容，它都没有正确格式化以供 json 库解释.

I am unable to get the response loaded into a json format because no matter how i handle the response content it isn't properly formatted for json library to interpret.

此外，如果我只做 print(res.text) 我会得到垃圾:

Moreover if i just do print(res.text) i get garbage:

Traceback (most recent call last):
  File "FBChatScraper.py", line 200, in <module>
    main()
  File "FBChatScraper.py", line 134, in main
    fbms.run()
}sP���c���f�u0���� QZed�C��� M$x�Ҹ�H�����eǘ�]���5���^�*�ӄaM�Y��b���/ڶ�JW/���>H6z���l4����t=i��%Ҳu�x��%�x�
       F    <���{1i�#%;�rɲ=Rχm��1B�Z(+�(S-���#��v�{b��
                                                           �    f/V�i̴��_��83�  �_����*��O��
                                                                                            ������Z��i-�TVeaG54�!v�a?ǯ|gu-g��.���"J$�L`&�tΊ#s)�H����s���q���^׷0��[)���j�ॽ�T���U���J�ЁwW���!eg�#j ��r��$y���3�4��4.��M�@Kb�AX�SDb�QJ�X)�,���a�   "Sp�h�����sOA0Vé|�������:%�rKdKC���@ M��.�^
�       �g���SWQHӳ.��BӄG�,����@E��������
                                        nras��L�/��ch@>]W���c3�ͤ�N9��y��er����hj�H��~�zCe�^�� �Vθ�

Traceback (most recent call last):
  File "FBChatScraper.py", line 200, in <module>
    main()
  File "FBChatScraper.py", line 134, in main
    fbms.run()
  File "FBChatScraper.py", line 43, in run
    thread_contents = self.download_thread(limit, offset, message_timestamp)
  File "FBChatScraper.py", line 74, in download_thread
    thread_contents = json.loads(res.content)
  File "/Users/silman/anaconda/lib/python3.6/json/__init__.py", line 349, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte

MWE 尽我所能，不确定我的帖子请求中的哪些数据是私人的，所以我遗漏了一些

MWE as best i can, not sure what data from my post request is private so i left some out

使用这些数据

url_thread = "https://www.messenger.com/api/graphqlbatch/"


request_data = {
  "batch_name": "MessengerGraphQLThreadFetcher",
  "__user": "<user_id>",
  "__a": "1",
  "__dyn": "<dyn>",
  "__req": "9",
  '__be'      : '-1',
  '__pc'      : 'PHASED:messengerdotcom_pkg',
  "fb_dtsg": "AQFni7TU2nes:AQGSC8FSDqyw",
  "ttstamp": "265817254666710077746711957586581715370521181008510710777",
  "__rev": "3791607",
  "jazoest": "<jazoest>",
  "queries": '<queries>'
  }

headers = {
  "authority": "www.messenger.com",
  "method": "POST",
  "path": "/api/graphqlbatch/",
  "scheme": "https",
  "accept": "*/*",
  "accept-encoding": "gzip, deflate, br",
  "accept-language": "en-US,en;q=0.9",
  "cache-control": "no-cache",
  "content-length": "754",
  "content-type" : "application/x-www-form-urlencoded",
  "cookie": "<cookies>",
  "origin": "https://www.messenger.com",
  "pragma": "no-cache",
  "referer": "https://www.messenger.com/t/<chatID>",
  "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
}

您可以使用 chrome 开发人员工具获取所有，并在网络选项卡上查找对 Request URL:https://www.messenger.com 的 POST 请求/api/graphqlbatch/.

You can get all the <items> by using chrome developer tools and lookng on the network tab for a POST request to Request URL: https://www.messenger.com/api/graphqlbatch/.

如果您在 chrome 开发工具正在录制时向上滚动以重新加载旧消息，则很容易找到.

Its easy to find if you scroll up to reload old messages while chrome dev tools is recording.

然后用python组合一个简单的请求

Then put together a simple request with python

import requests as rq
import time

ses = rq.Session()
thread = <ID of thread found in URL of messenger.com>

conversation_type = <'thread_fbids' if group chat else 'user_ids'>

data = request_data
data['messages[{}][{}][offset]'.format(conversation_type, thread)] = 0
data['messages[{}][{}][timestamp]'.format(conversation_type, thread)] = int(time.time())
data['messages[{}][{}][limit]'.format(conversation_type, thread)] = 2000

res = ses.post(url_thread, data=data, headers=headers)

print(res.content)
thread_contents = json.loads(res.content)
print(thread_contents)

作为我的开发工具返回的内容，您可以在此处

As what my dev tools got back you can see the start of the json here

Python请求以utf-8编码但无法解码的响应 [英] Python requests response encoded in utf-8 but cannot be decoded

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python请求以utf-8编码但无法解码的响应 [英] Python requests response encoded in utf-8 but cannot be decoded

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭