Python URL变量int添加到字符串 [英] Python URL variable int add to string

查看:602
本文介绍了Python URL变量int添加到字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pgno = 1
while pgno < 4304:
    result = urllib.urlopen("http://www.example.comtraderesourcespincode.aspx?" +
                            "&GridInfo=Pincode0"+ pgno)
    print pgno
    html = result.read()
    parser = etree.HTMLParser()
    tree   = etree.parse(StringIO.StringIO(html), parser)
    pgno += 1

in http://.......=Pincode0 我需要添加1 ..例如像'Pincode01',将它循环01到02,03 ..我正在使用while循环并且分配的变量是'pgno'。

in http://.......=Pincode0 I need to add 1..for e.g like 'Pincode01', loop it 01 to 02, 03 .. for which I am using a while loop and the variable assigned is 'pgno'.

问题是计数器加1,但'Pincode01'没有变成'Pincode02'...因此它没有打开网站的第2页。

The problem is the counter is adding 1, but 'Pincode01' is not becoming 'Pincode02' ... therefore it is not opening the 2nd page of the site.

我甚至试过 + str(pgno)) ...没有运气。

I even tried +str(pgno)) ... no luck.

请展示如何做到这一点。我无法做到这一点......并尝试过几次。

Please show how to do it. I am not able to do this ...and have attempted it several times.

推荐答案

可能你想要这个:

from urllib import urlopen
import re 

pgno = 2
url = "http://www.eximguru.com/traderesources/pincode.aspx?&amp;GridInfo=Pincode0%s" %str(pgno)
print url +'\n'
sock = urlopen(url)
htmlcode = sock.read()
sock.close()

x = re.search('%;"><a href="javascript:__doPostBack',htmlcode).start()

pat = ('\t\t\t\t<td style="width:\d+%;">(\d+)</td>'
       '<td style="width:\d+%;">(.+?)</td>'
       '<td style="width:\d+%;">(.+?)</td>'
       '<td style="width:30%;">(.+?)</td>\r\n')
regx = re.compile(pat)

print '\n'.join(map(repr,regx.findall(htmlcode,x)))

结果

http://www.eximguru.com/traderesources/pincode.aspx?&amp;GridInfo=Pincode02

('110001', 'New Delhi', 'Delhi', 'Baroda House')
('110001', 'New Delhi', 'Delhi', 'Bengali Market')
('110001', 'New Delhi', 'Delhi', 'Bhagat Singh Market')
('110001', 'New Delhi', 'Delhi', 'Connaught Place')
('110001', 'New Delhi', 'Delhi', 'Constitution House')
('110001', 'New Delhi', 'Delhi', 'Election Commission')
('110001', 'New Delhi', 'Delhi', 'Janpath')
('110001', 'New Delhi', 'Delhi', 'Krishi Bhawan')
('110001', 'New Delhi', 'Delhi', 'Lady Harding Medical College')
('110001', 'New Delhi', 'Delhi', 'New Delhi Gpo')
('110001', 'New Delhi', 'Delhi', 'New Delhi Ho')
('110001', 'New Delhi', 'Delhi', 'North Avenue')
('110001', 'New Delhi', 'Delhi', 'Parliament House')
('110001', 'New Delhi', 'Delhi', 'Patiala House')
('110001', 'New Delhi', 'Delhi', 'Pragati Maidan')
('110001', 'New Delhi', 'Delhi', 'Rail Bhawan')
('110001', 'New Delhi', 'Delhi', 'Sansad Marg Hpo')
('110001', 'New Delhi', 'Delhi', 'Sansadiya Soudh')
('110001', 'New Delhi', 'Delhi', 'Secretariat North')
('110001', 'New Delhi', 'Delhi', 'Shastri Bhawan')
('110001', 'New Delhi', 'Delhi', 'Supreme Court')
('110002', 'New Delhi', 'Delhi', 'Rajghat Power House')
('110002', 'New Delhi', 'Delhi', 'Minto Road')
('110002', 'New Delhi', 'Delhi', 'Indraprastha Hpo')
('110002', 'New Delhi', 'Delhi', 'Darya Ganj')

我在研究后写了这段代码d使用以下代码的HTML源代码的结构(我想你会理解它而不再做任何解释):

I wrote this code after having studied the structure of the HTML source code with the following code (I think you'll understand it without any more explanations):

from urllib2 import Request,urlopen
import re 

pgno = 2
url = "http://www.eximguru.com/traderesources/pincode.aspx?&amp;GridInfo=Pincode0%s" %str(pgno)
print url +'\n'
sock = urlopen(url)
htmlcode = sock.read()
sock.close()

li = htmlcode.splitlines(True)

print '\n'.join(str(i) + ' ' + repr(line)+'\n' for i,line in enumerate(li) if 275<i<300)


ch = ''.join(li[0:291])
from collections import defaultdict
didi =defaultdict(int)
for c in ch:
    didi[c] += 1

print '\n\n'+repr(li[289])
print '\n'.join('%r -> %s' % (c,didi[c]) for c in li[289] if didi[c]<35)


,则li [289]中c的%s'%(c,didi [c])

.

现在,问题是为pgno的所有值返回相同的HTML。该站点可能检测到它是一个想要连接和获取数据的程序。这个问题必须使用 urllib2 中的工具来处理,但我没有接受过这方面的培训。

Now, the problem is that the same HTML is returned for all the values of pgno. The site may detect it is a program that wants to connect and fetch data. This problem must be treated with the tools in urllib2, but I'm not trained to that.

这篇关于Python URL变量int添加到字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆