有人可以详细解释一下该代码如何工作(使用Python访问Web数据) [英] Can someone explain me in detail how this code works regarding (Using Python to Access Web Data)

查看：68 发布时间：2021/4/15 19:07:15 python ssl beautifulsoup urllib

本文介绍了有人可以详细解释一下该代码如何工作(使用Python访问Web数据)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用urllib从下面的数据文件中读取HTML，从定位标记中提取href = vaues，扫描相对于列表中第一个名称处于特定位置的标记，然后点击该链接并重复处理多次并报告您找到的姓氏.

Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

这是数据的HTML链接 http://py4e-data.dr-chuck.net/known_by_Caragh.html

This is HTML link for data http://py4e-data.dr-chuck.net/known_by_Caragh.html

所以我必须在位置18(名字是1)找到链接.点击该链接.重复此过程7次.答案是您检索到的姓氏.

So I have to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.

有人可以逐行详细地向我解释这两个循环如何工作(同时"和用于").
因此，当我输入positi 18时，它会提取href标记的第18行，然后提取7的下一个18行吗?因为即使我输入了不同的数字，我仍然会得到相同的答案.提前非常感谢您.

代码:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = input("Enter URL:")
numbers  = input("Enter count:")
position = input("Enter position:")

while n < 7:
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == 18:
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
n = n + 1

推荐答案

因为即使我输入了不同的数字，我仍然会得到相同的答案.

Because even if I Enter different number I'm still getting same answer.

您得到了相同的答案，因为您已使用以下代码对其进行了硬编码:

You're getting the same answer because you've hard coded that in with:

while n < 7

和

if count == 18

我认为您打算将它们作为变量/输入.这样，您还需要将这些输入作为 int ，因为目前，它们被存储为 str .还要注意，我不想每次都输入url，所以用硬编码进行编码，但是您可以在此处取消注释输入，然后注释掉 url ='http://py4e-data.dr-chuck.net/known_by_Caragh.html'

I think you've meant to have those as your variable/input. With that, you'll also need those inputs as an int, as currently, they get stored as as str. Also just note, I didn't want to type in the url each time, so hard coded that, but you can uncomment your input there, and then comment out the url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

n = 0
count = 0

url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
#url = input("Enter URL:")

numbers  = int(input("Enter count:"))
position = int(input("Enter position:"))

while n < numbers:    #<----- there's your variable of how many times to try
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == position:  #<------- and the variable to get the position
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
    n = n + 1    #<---- I fixed your indentation. The way it was previously would never get yourself out of the while loop because n will never increment.

这篇关于有人可以详细解释一下该代码如何工作(使用Python访问Web数据)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有人可以详细解释一下该代码如何工作(使用Python访问Web数据) [英] Can someone explain me in detail how this code works regarding (Using Python to Access Web Data)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有人可以详细解释一下该代码如何工作(使用Python访问Web数据) [英] Can someone explain me in detail how this code works regarding (Using Python to Access Web Data)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭