有人可以详细解释一下该代码如何工作(使用Python访问Web数据) [英] Can someone explain me in detail how this code works regarding (Using Python to Access Web Data)
问题描述
使用urllib从下面的数据文件中读取HTML,从定位标记中提取href = vaues,扫描相对于列表中第一个名称处于特定位置的标记,然后点击该链接并重复处理多次并报告您找到的姓氏.
Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.
这是数据的HTML链接 http://py4e-data.dr-chuck.net/known_by_Caragh.html
This is HTML link for data http://py4e-data.dr-chuck.net/known_by_Caragh.html
所以我必须在位置18(名字是1)找到链接.点击该链接.重复此过程7次.答案是您检索到的姓氏.
So I have to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.
- 有人可以逐行详细地向我解释这两个循环如何工作(同时"和用于").
- 因此,当我输入positi 18时,它会提取href标记的第18行,然后提取7的下一个18行吗?因为即使我输入了不同的数字,我仍然会得到相同的答案.提前非常感谢您.
代码:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = input("Enter URL:")
numbers = input("Enter count:")
position = input("Enter position:")
while n < 7:
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
for tag in tags:
count = count + 1
if count == 18:
url = tag.get('href', None)
print("Retrieving:" , url)
count = 0
break
n = n + 1
推荐答案
因为即使我输入了不同的数字,我仍然会得到相同的答案.
Because even if I Enter different number I'm still getting same answer.
您得到了相同的答案,因为您已使用以下代码对其进行了硬编码:
You're getting the same answer because you've hard coded that in with:
while n < 7
和
if count == 18
我认为您打算将它们作为变量/输入.这样,您还需要将这些输入作为 int
,因为目前,它们被存储为 str
.还要注意,我不想每次都输入url,所以用硬编码进行编码,但是您可以在此处取消注释输入,然后注释掉 url ='http://py4e-data.dr-chuck.net/known_by_Caragh.html'
I think you've meant to have those as your variable/input. With that, you'll also need those inputs as an int
, as currently, they get stored as as str
. Also just note, I didn't want to type in the url each time, so hard coded that, but you can uncomment your input there, and then comment out the url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
#url = input("Enter URL:")
numbers = int(input("Enter count:"))
position = int(input("Enter position:"))
while n < numbers: #<----- there's your variable of how many times to try
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
for tag in tags:
count = count + 1
if count == position: #<------- and the variable to get the position
url = tag.get('href', None)
print("Retrieving:" , url)
count = 0
break
n = n + 1 #<---- I fixed your indentation. The way it was previously would never get yourself out of the while loop because n will never increment.
这篇关于有人可以详细解释一下该代码如何工作(使用Python访问Web数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!