有人可以详细解释一下该代码如何工作(使用Python访问Web数据) [英] Can someone explain me in detail how this code works regarding (Using Python to Access Web Data)

查看:68
本文介绍了有人可以详细解释一下该代码如何工作(使用Python访问Web数据)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用urllib从下面的数据文件中读取HTML,从定位标记中提取href = vaues,扫描相对于列表中第一个名称处于特定位置的标记,然后点击该链接并重复处理多次并报告您找到的姓氏.

Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

这是数据的HTML链接 http://py4e-data.dr-chuck.net/known_by_Caragh.html

This is HTML link for data http://py4e-data.dr-chuck.net/known_by_Caragh.html

所以我必须在位置18(名字是1)找到链接.点击该链接.重复此过程7次.答案是您检索到的姓氏.

So I have to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.

  1. 有人可以逐行详细地向我解释这两个循环如何工作(同时"和用于").
  2. 因此,当我输入positi 18时,它会提取href标记的第18行,然后提取7的下一个18行吗?因为即使我输入了不同的数字,我仍然会得到相同的答案.提前非常感谢您.

代码:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = input("Enter URL:")
numbers  = input("Enter count:")
position = input("Enter position:")

while n < 7:
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == 18:
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
n = n + 1

推荐答案

因为即使我输入了不同的数字,我仍然会得到相同的答案.

Because even if I Enter different number I'm still getting same answer.

您得到了相同的答案,因为您已使用以下代码对其进行了硬编码:

You're getting the same answer because you've hard coded that in with:

while n < 7

if count == 18

我认为您打算将它们作为变量/输入.这样,您还需要将这些输入作为 int ,因为目前,它们被存储为 str .还要注意,我不想每次都输入url,所以用硬编码进行编码,但是您可以在此处取消注释输入,然后注释掉 url ='http://py4e-data.dr-chuck.net/known_by_Caragh.html'

I think you've meant to have those as your variable/input. With that, you'll also need those inputs as an int, as currently, they get stored as as str. Also just note, I didn't want to type in the url each time, so hard coded that, but you can uncomment your input there, and then comment out the url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

n = 0
count = 0

url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
#url = input("Enter URL:")

numbers  = int(input("Enter count:"))
position = int(input("Enter position:"))

while n < numbers:    #<----- there's your variable of how many times to try
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == position:  #<------- and the variable to get the position
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
    n = n + 1    #<---- I fixed your indentation. The way it was previously would never get yourself out of the while loop because n will never increment.

这篇关于有人可以详细解释一下该代码如何工作(使用Python访问Web数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆