Python 请求 - 是否可以在 HTTP POST 后收到部分响应? [英] Python Requests - Is it possible to receive a partial response after an HTTP POST?

查看:25
本文介绍了Python 请求 - 是否可以在 HTTP POST 后收到部分响应?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python 请求模块来对网站进行数据挖掘.作为数据挖掘的一部分,我必须通过 HTTP POST 表单并通过检查结果 URL 来检查它是否成功.我的问题是,在 POST 之后,是否可以请求服务器不发送整个页面?我只需要检查 URL,但我的程序会下载整个页面并消耗不必要的带宽.代码很简单

I am using the Python Requests Module to datamine a website. As part of the datamining, I have to HTTP POST a form and check if it succeeded by checking the resulting URL. My question is, after the POST, is it possible to request the server to not send the entire page? I only need to check the URL, yet my program downloads the entire page and consumes unnecessary bandwidth. The code is very simple

import requests
r = requests.post(URL, payload)
if 'keyword' in r.url:
   success
fail

推荐答案

一个简单的解决方案,如果它可以为您实现.是走低级.使用套接字库.例如,您需要发送一个在其正文中包含一些数据的 POST.我在一个网站的爬虫中使用了这个.

An easy solution, if it's implementable for you. Is to go low-level. Use socket library. For example you need to send a POST with some data in its body. I used this in my Crawler for one site.

import socket
from urllib import quote # POST body is escaped. use quote

req_header = "POST /{0} HTTP/1.1
Host: www.yourtarget.com
User-Agent: For the lulz..
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: {1}"
req_body = quote("data1=yourtestdata&data2=foo&data3=bar=")
req_url = "test.php"
header = req_header.format(req_url,str(len(req_body))) #plug in req_url as {0} 
                                                       #and length of req_body as Content-length
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)   #create a socket
s.connect(("www.yourtarget.com",80))                   #connect it
s.send(header+"

"+body+"

")              # send header+ two times CR_LF + body + 2 times CR_LF to complete the request

page = ""
while True:
    buf = s.recv(1024) #receive first 1024 bytes(in UTF-8 chars), this should be enought to receive the header in one try
    if not buf:
        break
    if "

" in page: # if we received the whole header(ending with 2x CRLF) break
        break
    page+=buf
s.close()       # close the socket here. which should close the TCP connection even if data is still flowing in
                # this should leave you with a header where you should find a 302 redirected and then your target URL in "Location:" header statement.

这篇关于Python 请求 - 是否可以在 HTTP POST 后收到部分响应?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆