python urllib3登录+搜索 [英] python urllib3 login + search

查看:114
本文介绍了python urllib3登录+搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import urllib3
import io
from bs4 import BeautifulSoup
import re
import cookielib

http = urllib3.PoolManager()
url = 'http://www.example.com'
headers = urllib3.util.make_headers(keep_alive=True,user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')
r = http.urlopen('GET', url, preload_content=False)

# Params die dann am Post request übergeben werden
params = {
    'login': '/shop//index.php',
    'user': 'username',
    'pw': 'password'
  }
suche = {
    'id' : 'searchfield',
    'name' : 'suche',
    }

# Post Anfrage inkl params (login) Antwort in response.data
response = http.request('POST', url, params, headers)
suche = http.request('POST', site-to-search? , suche, headers)
html_suche = suche.data

print html_suche

我尝试使用此代码登录到站点,然后搜索.通过此代码,我得到一个未登录的答案.

I try to login with this code to a site and search after that. With this code i get a answer that i am not loged in.

我该如何结合我第一次登录和之后的搜索.谢谢.

how can i combine that i first login and after that to search. Thx.

推荐答案

Web服务器通过设置cookie(客户端必须返回)来跟踪类似浏览器的客户端状态.默认情况下, urllib3 不会假装为浏览器,因此我们需要做一些额外的工作才能将cookie中继回服务器.这是使用 httpbin.org 的示例:

Web servers track browser-like client state by setting cookies, which the client must return. By default, urllib3 does not pretend to be a browser, so we need to do a little extra work to relay the cookie back to the server. Here's an example of how to do this with httpbin.org:

import urllib3
http = urllib3.PoolManager()

# httpbin does a redirect right after setting a cookie, so we disable redirects
# for this request
r = http.request('GET', 'http://httpbin.org/cookies/set?foo=bar', redirect=False)

# Grab the set-cookie header and build our headers for our next request.
# Note: This is a simplified version of what a browser would do.
headers = {'cookie': r.getheader('set-cookie')}
print headers
# -> {'cookie': 'foo=bar; Path=/'}

r = http.request('GET', 'http://httpbin.org/cookies', headers=headers)
print r.body
# -> {
#      "cookies": {
#        "foo": "bar"
#      }
#    }

(注意:此食谱非常有用, urllib3 的文档将受益于此.我很感谢请求请求,此请求为此添加了一些内容.)

(Note: This recipe is useful and urllib3's documentation would benefit from having it. I'd appreciate a pull request which adds something to this effect.)

正如Martijn所提到的,其他选项是使用一个更高层的库,它看起来更像是一个浏览器. robobrowser 看起来是此类工作的绝佳选择,但 requests 也有为您管理cookie的规定在下面使用 urllib3 .:)

Other options, as mentioned by Martijn, is to use a higher-level library that pretends to be more like a browser. robobrowser looks like a great choice for this kind of work, but also requests has provisions for managing cookies for you and it uses urllib3 underneath. :)

这篇关于python urllib3登录+搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆