python urllib3登录+搜索 [英] python urllib3 login + search
问题描述
import urllib3
import io
from bs4 import BeautifulSoup
import re
import cookielib
http = urllib3.PoolManager()
url = 'http://www.example.com'
headers = urllib3.util.make_headers(keep_alive=True,user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')
r = http.urlopen('GET', url, preload_content=False)
# Params die dann am Post request übergeben werden
params = {
'login': '/shop//index.php',
'user': 'username',
'pw': 'password'
}
suche = {
'id' : 'searchfield',
'name' : 'suche',
}
# Post Anfrage inkl params (login) Antwort in response.data
response = http.request('POST', url, params, headers)
suche = http.request('POST', site-to-search? , suche, headers)
html_suche = suche.data
print html_suche
我尝试使用此代码登录到站点,然后搜索.通过此代码,我得到一个未登录的答案.
I try to login with this code to a site and search after that. With this code i get a answer that i am not loged in.
我该如何结合我第一次登录和之后的搜索.谢谢.
how can i combine that i first login and after that to search. Thx.
推荐答案
Web服务器通过设置cookie(客户端必须返回)来跟踪类似浏览器的客户端状态.默认情况下, urllib3
不会假装为浏览器,因此我们需要做一些额外的工作才能将cookie中继回服务器.这是使用 httpbin.org 的示例:
Web servers track browser-like client state by setting cookies, which the client must return. By default, urllib3
does not pretend to be a browser, so we need to do a little extra work to relay the cookie back to the server. Here's an example of how to do this with httpbin.org:
import urllib3
http = urllib3.PoolManager()
# httpbin does a redirect right after setting a cookie, so we disable redirects
# for this request
r = http.request('GET', 'http://httpbin.org/cookies/set?foo=bar', redirect=False)
# Grab the set-cookie header and build our headers for our next request.
# Note: This is a simplified version of what a browser would do.
headers = {'cookie': r.getheader('set-cookie')}
print headers
# -> {'cookie': 'foo=bar; Path=/'}
r = http.request('GET', 'http://httpbin.org/cookies', headers=headers)
print r.body
# -> {
# "cookies": {
# "foo": "bar"
# }
# }
(注意:此食谱非常有用, urllib3
的文档将受益于此.我很感谢请求请求,此请求为此添加了一些内容.)
(Note: This recipe is useful and urllib3
's documentation would benefit from having it. I'd appreciate a pull request which adds something to this effect.)
正如Martijn所提到的,其他选项是使用一个更高层的库,它看起来更像是一个浏览器. robobrowser
看起来是此类工作的绝佳选择,但 requests
也有为您管理cookie的规定在下面使用 urllib3
.:)
Other options, as mentioned by Martijn, is to use a higher-level library that pretends to be more like a browser. robobrowser
looks like a great choice for this kind of work, but also requests
has provisions for managing cookies for you and it uses urllib3
underneath. :)
这篇关于python urllib3登录+搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!