使用Beautifulsoup跟随python分配中的链接 [英] Following links in python assignment using Beautifulsoup

查看:82
本文介绍了使用Beautifulsoup跟随python分配中的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为python类分配了此作业,我必须从特定位置的特定链接开始,然后再对该链接进行特定的次数.假设第一个链接的位置为1. 这是链接: http://python-data.dr-chuck.net/known_by_Fikret. html

I have this assignment for a python class where I have to start from a specific link at a specific position, then follow that link for a specific number of times. Supposedly the first link has the position 1. This is the link: http://python-data.dr-chuck.net/known_by_Fikret.html

回溯错误图片 我在定位链接时遇到了麻烦,出现了索引超出范围"错误.谁能帮忙弄清楚如何找到链接/位置?这是我的代码:

traceback error picture I have trouble with locating the link, the error "index out of range" comes out. can anyone help with figuring out how to locate the link/position? This is my code:

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
count = int(raw_input('Enter count: '))+1
position = int(raw_input('Enter position: '))


tags = soup('a')
tags_lst = list()
for tag in tags:
    needed_tag = tag.get('href', None)
    tags_lst.append(needed_tag)
    for i in range(0,count):
        print 'retrieving: ',tags_lst[position]

好的,我编写了这段代码,并且可以正常工作:

OK I wrote this code and it kind of works:

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
count = int(raw_input('Enter count: '))+1
position = int(raw_input('Enter position: '))


tags = soup('a')
tags_lst = list()
for tag in tags:
    needed_tag = tag.get('href', None)
    tags_lst.append(needed_tag)
for i in range(0,count):    
    print 'retrieving: ',tags_lst[position]
    position = position + 1

我仍然得到除示例中的链接以外的其他链接,但是当我打印整个链接列表时,位置匹配,所以我不知道.很奇怪.

I'm still getting other links than the ones in the example however when I print the whole list of links the positions match so I don't know. Very weird.

推荐答案

尝试一下. 您可以保留输入URL的权限.有您以前链接的示例. 祝你好运!

Try this. You can leave entering the URL. There is sample of your former link. Good Luck!

import urllib.request
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter url ')
cn = input('Enter count: ')
cnint = int(cn)
pos = input('Enter position: ')
posint = int(pos)
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/known_by_Fikret.html''''url''', context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

tags_lst = list()
for x in range(0,cnint):
    tags = soup('a')
    my_tags = tags[posint-1]
    needed_tag = my_tags.get('href', None)
    url = str(needed_tag)
    html = urllib.request.urlopen(url,context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    print(my_tags.get('href', None))

这篇关于使用Beautifulsoup跟随python分配中的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆