不满足条件时无法让我的脚本继续尝试几次 [英] Unable to let my script keep trying for few times when a condition is not met

查看:95
本文介绍了不满足条件时无法让我的脚本继续尝试几次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在python中创建了一个脚本,以从网页的不同链接中获取某些帖子的标题。问题是我尝试播放的网页有时无法为我提供有效的响应,但是当我尝试两次或三次时,我的确得到了有效的响应。

I've created a script in python to fetch the title of certain posts from different links of a webpage. The thing is the webpage I'm trying to play with sometimes doesn't provide me with valid response, but I do get a valid response when I try it twice or thrice.

我一直试图以一种方式创建循环,以便脚本将检查我定义的标题是否为空。如果标题为空,则脚本将继续循环4次以查看是否可以成功。但是,在对每个链接进行第四次尝试之后,脚本将转到另一个链接以重复相同的操作,直到所有链接都用尽。

I've been trying to create a loop in such a way so that the script will check whether my defined title is nothing. If the title is nothing then the script will keep looping 4 times to see If it can succeed. However, after fourth try of each link the script will go for another link to repeat the same until all the links are exhausted.

此我到目前为止的尝试:

This my my attempt so far:

import time
import requests
from bs4 import BeautifulSoup

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
    ]
counter = 0

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""

    if not title:
        while counter<=4:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            fetch_data(link)
    else:
        counter = 0

    print("tried with this link:",link)

if __name__ == '__main__':
    for link in links:
        fetch_data(link)

这是我目前在控制台中看到的输出:

This is the output I can see in the console at this moment:

trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

我的预期输出:

trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

PS我在脚本中使用了错误的选择器我可以让它满足上面定义的条件。

如何让我的脚本继续尝试每个链接几次,如果不满足条件

How can I let my script keep trying with every link for few times when a condition is not met

推荐答案

我想重新安排您的代码如下。

I think re-arrange your code as follows.

import time
import requests
from bs4 import BeautifulSoup
​
links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
    ]

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""
​
    if not title:
        while counter<=4:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            fetch_data(link)   
​
if __name__ == '__main__':
    for link in links:
        counter = 0
        fetch_data(link)
        print("tried with this link:",link)

这篇关于不满足条件时无法让我的脚本继续尝试几次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆