使用BeautifulSoup遍历并检索特定的URL [英] Use BeautifulSoup to loop through and retrieve specific URLs

查看:120
本文介绍了使用BeautifulSoup遍历并检索特定的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用BeautifulSoup并在特定位置重复检索特定的URL.您可能会想像有4个不同的URL列表,每个URL列表包含100个不同的URL链接.

I want to use BeautifulSoup and retrieve specific URLs at specific position repeatedly. You may imagine that there are 4 different URL lists each containing 100 different URL links.

我需要始终在每个列表上获取并打印第3个URL,而先前的URL(例如第一个列表上的第3个URL)将导致第2个列表(然后需要获取并打印第3个URL,依此类推直到第4次检索).

I need to get and print always the 3rd URL on every list, while the previous URL (e.g. the 3rd URL on the first list) will lead to the 2nd list (and then need to get and print the 3rd URL and so on till the 4th retrieval).

但是,我的循环仅获得第一个结果(列表1中的第三个URL),我不知道如何将新URL循环回到while循环并继续该过程.

Yet, my loop only achieves the first result (3rd URL on list 1), and I don't know how to loop the new URL back to the while loop and continue the process.

这是我的代码:

import urllib.request
import json
import ssl
from bs4 import BeautifulSoup


num=int(input('enter count times: ' ))
position=int(input('enter position: ' ))

url='https://pr4e.dr-chuck.com/tsugi/mod/python-   
data/data/known_by_Fikret.html'
print (url)

count=0
order=0
while count<num:
    context = ssl._create_unverified_context()
    htm=urllib.request.urlopen(url, context=context).read()
    soup=BeautifulSoup(htm)
    for i in soup.find_all('a'):
        order+=1
        if order ==position:
            x=i.get('href')
            print (x)
    count+=1
    url=x        
print ('done')

推荐答案

只需按索引从find_all()获取链接:

Just get the link from find_all() by index:

while count < num:
    context = ssl._create_unverified_context()
    htm = urllib.request.urlopen(url, context=context).read()

    soup = BeautifulSoup(htm)
    url = soup.find_all('a')[position].get('href')

    count += 1

这篇关于使用BeautifulSoup遍历并检索特定的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆