无法区分应该以相同方式工作的两个表达式 [英] Can't differentiate the two expressions supposed to work in the same way

查看:84
本文介绍了无法区分应该以相同方式工作的两个表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几天前,我创建了 这篇文章 ,以寻求有关如何让脚本以这样一种方式循环的任何解决方案:将使用很少的链接来检查我定义的标题(应该从每个链接中提取)是否在四次内没有值。如果 title 仍然没有,那么脚本将 break loop ,然后转到另一个链接以重复同样的操作。

Few days back I created this post, to seek any solution as to how I can let my script loop in such a way so that the script will use few links to check whether my defined title (supposed to be extracted from each link) is nothing for four times. If the title is still nothing then the script will break the loop and go for another link to repeat the same.

这就是我获得成功的方法-►通过更改 fetch_data(link)返回fetch_data(link)并在外定义 counter = 0 循环,但在 if 语句中。

This is how I got success--► By changing fetch_data(link) to return fetch_data(link) and defining counter=0 outside while loop but inside if statement.

经过整流的脚本:

import time
import requests
from bs4 import BeautifulSoup

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
]
counter = 0

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""

    if not title:
        while counter<=3:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            return fetch_data(link) #First fix
        counter=0 #Second fix

    print("tried with this link:",link)

if __name__ == '__main__':
    for link in links:
        fetch_data(link)

这是上面脚本根据需要生成的输出:

This is the output the above script produces (as desired):

trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

我在脚本中使用了错误的选择器,因此我可以让它满足我上面定义的条件。


为什么要使用返回fetch_data(link)而不是 fetch_data(link),因为表达式在大多数情况下都是相同的?

Why should I use return fetch_data(link) instead of fetch_data(link) as the expressions work identically most of the times?


推荐答案

函数内部的while循环将在以下情况下启动递归调用:无法获取标题。当您使用返回fetch_data(link)时,此方法有效,因为只要计数器小于或等于3 而计数器<= 3 ,它将在while循环结束时立即退出函数,因此不会下降到将计数器重置为0 counter = 0 的下一行。由于计数器是全局变量,并且每次递归深度仅增加1,因此只要 counter 大于3,您最多只能有4个递归深度。不要进入while循环,它将调用另一个 fetch_data(link)

The while loop inside your function will initiate a recursive call if it fails to fetch the title. It works when you use return fetch_data(link) since whenever the counter is less than or equal to 3 while counter<=3, it will exit the function immediately at the end of the while loop, thus not going down to the lower line that will reset the counter to 0 counter=0. Since the counter is a global variable and only increases by 1 for each recursion depth, you will only have a maximum 4 recursion depths as anytime the counter is larger than 3, it won't go into the while loop that will call another fetch_data(link).

fetch_data (counter=0)
  --> fetch_data (counter=1)
    --> fetch_data (counter=2)
      --> fetch_data (counter=3)
        --> fetch_data (counter=4) 
        - not go into while loop, reset counter, print url
        - return to above function
      - return to above function
    - return to above function
  - return to above function

如果使用 fetch_data(link) ,该函数仍将在while循环中启动递归调用。但是,不要立即退出并将计数器重置为0。这很危险,因为在计数器变为4之后,该函数返回while循环内上一个函数调用的while循环,while循环不会中断,并且继续启动其他递归调用,因为计数器当前设置为0,即< =3。这将最终达到最大递归深度,并使程序崩溃。

If you use fetch_data(link), the function will still initiate a recursive call in the while loop. However, not exit immediately and will reset the counter to 0. This is dangerous because after your counter goes to 4, the function and go back to the while loop of the previous function call inside the while loop, the while loop will not break and continue to initiate additional recursive calls because the counter is currently set to 0 which is <= 3. This will eventually reach the maximum recursion depth and will crash the program.

fetch_data (counter=0)
  --> fetch_data (counter=1)
    --> fetch_data (counter=2)
      --> fetch_data (counter=3)
        --> fetch_data (counter=4) 
        - not go into while loop, !!!reset counter!!!, print url
        - return to above function
      - not return to above function call
      - since counter = 0, continue the while loop
        --> fetch_data (counter=1)
          --> fetch_data (counter=2)
            --> fetch_data (counter=3)
...

这篇关于无法区分应该以相同方式工作的两个表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆