如何循环访问标签并重定向以检索更多标签？ [英] How to loop through a tags and redirect to retrieve more a tags?

查看：83 发布时间：2018/6/23 16:22:25 python html web-scraping

本文介绍了如何循环访问标签并重定向以检索更多标签？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为了教育目的，我试图编写一个程序，提示用户输入url，count和position。网址将被刮掉，网址中的标签将被检索，这将产生一个标签列表。然后使用位置从先前检索的标签列表中选择一个新链接，并将其用作要被抓取的新url。计数是这个过程发生的次数。

 代码：
导入urllib 
从bs4导入BeautifulSoup作为bfs 
 
＃声明全局变量
 href_list = [] 
 no_iterations = 0 
 
＃提示用户输入
 url = raw_input（'Enter url  - '）
 count = raw_input（'Enter count  - '）
 position = raw_input（'Enter position  - '）
 
＃While while with condition 
 while no_iterations！= int（count）：
 no_iterations + = 1 
 
＃刮取网址
 html = urllib.urlopen（url）.read（）
 soup = bfs（html）
 
＃检索所有定位标记
 tags = soup（'a'）
标记中的标记：
 href_list .append（tag.get（'href'，None））
 
＃Assiginig new url $ b $ url url = href_list [int（position）-1] 
 
＃打印信息给用户
 print'正在检索：'，href_list [int（position）-1] 
 print'Last Url：'，href_list [int（pos ition）-1]

当我在这里运行程序时，我得到了：

 输入网址 -  http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html 
 Enter count  -  4 
输入位置 -  3 
 
检索：http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html 
检索：http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html 
检索：http://pr4e.dr-chuck.com/tsugi/mod /python-data/data/known_by_Montgomery.html 
正在检索：http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html 
上次访问的网址：http： //pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html

<通过观察输出，我可以看到网址没有被重置，因为它应该，任何意见赞赏。

我通过重置列表解决了存储检索到的标记
的代码：
从bs4导入urllib 导入BeautifulSoup作为bfs ＃声明全局变量 href_list = [] no_iterations = 0 ＃提示用户输入 url = raw_input（'Enter url - '） count = raw_input（'Enter count - '） position = raw_input（ '输入位置'）＃while循环条件而no_iterations！= int（count）： no_iterations + = 1 ＃刮url html = urllib.urlopen（url）.read（） soup = bfs（html）＃检索所有锚定标记 tags = soup （'a'）标签中的标签： href_list.append（tag.get（'href'，None））＃Assiginig new url url = href_list [int（position）-1] href_list = [] ＃打印用户信息 print'正在检索：'，href_list [int（position）-1] print'Last Url：'，url
所以现在新的输出是：
输入url - http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html Enter count - 4 输入位置 - 3 检索：http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html 检索：http：//pr4e.dr -chuck.com/tsugi/mod/python-data/data/known_by_Mhairade.html 检索：http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Butchi.html 检索：http：//pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Anayah.html 上一个网址：http://pr4e.dr-chuck.com/tsugi /mod/python-data/data/known_by_Anayah.html
感谢您的支持

For educational purposes I am trying to write a program that would prompt the user for "url", "count" and "position". The "url" will be scraped and "a tags" within the "url" will be retrieved and this would yield a list of "a tags". The "position" is then used to select a new link from the list of "a tags" previously retrieved and use it as the new "url" to be scraped. "Count" is the number of times this process takes place.
Code: import urllib from bs4 import BeautifulSoup as bfs # Declare global variables href_list = [] no_iterations = 0 # Prompt user for input url = raw_input('Enter url - ') count = raw_input('Enter count - ') position = raw_input('Enter position - ') # While loop with condition while no_iterations != int(count): no_iterations += 1 # Scraping the url html = urllib.urlopen(url).read() soup = bfs(html) # Retrieve all of the anchor tags tags = soup('a') for tag in tags: href_list.append(tag.get('href', None)) # Assiginig new url url = href_list[int(position)-1] # Printing info for user print 'Retrieving:', href_list[int(position)-1] print 'Last Url:', href_list[int(position)-1]
When I run the program here is what I get:
Enter url - http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html Enter count - 4 Enter position - 3 Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html Last Url: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html
By observing the output, I can see that the URL is not reset as it should, any advice is appreciated.
解决方案
I solved by resetting the list were I stored the retrieved a tags Code:
import urllib from bs4 import BeautifulSoup as bfs # Declare global variables href_list = [] no_iterations = 0 # Prompt user for input url = raw_input('Enter url - ') count = raw_input('Enter count - ') position = raw_input('Enter position - ') # While loop with condition while no_iterations != int(count): no_iterations += 1 # Scraping the url html = urllib.urlopen(url).read() soup = bfs(html) # Retrieve all of the anchor tags tags = soup('a') for tag in tags: href_list.append(tag.get('href', None)) # Assiginig new url url = href_list[int(position)-1] href_list = [] # Printing info for user print 'Retrieving:', href_list[int(position)-1] print 'Last Url:', url
So the new output now is:
Enter url - http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html Enter count - 4 Enter position - 3 Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Mhairade.html Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Butchi.html Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Anayah.html Last Url: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Anayah.html
Thanks for your support

这篇关于如何循环访问标签并重定向以检索更多标签？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何循环访问标签并重定向以检索更多标签？ [英] How to loop through a tags and redirect to retrieve more a tags?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何循环访问标签并重定向以检索更多标签？ [英] How to loop through a tags and redirect to retrieve more a tags?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭