在与美丽的汤某些链接 [英] Following certain links with beautiful soup

查看:182
本文介绍了在与美丽的汤某些链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直有很多麻烦这个问题,我想我理解的工作,但后来我的头现在在它的凹痕从撞它放在桌子上。

I've been having a lot of trouble with this problem, and I think I understand the work, but then my head now has a dent in it from banging it on the desk.

我需要做的就是通过与美丽的汤网页擦伤一个程序,但它然后获取一定的联系(任何地方从3或一页20链接),然后转到该3日(或20日,或任何数字)的链接,并试图找到该页第三环节,一遍又一遍,对于倍数额不详(IM保持它20岁以下的解释的目的。我需要找到的最后一个(第三),但是许多搜索后链接。

What I need to do is make a program that scrapes through a webpage with beautiful soup, but it then gets a certain link (anywhere from the 3rd or 20th link down the page) then goes to that 3rd(or 20th, or whatever number) link and tries to find the 3rd link from that page, over and over, for an unspecified amount of times (im keeping it under 20 for explanation purposes. I need to find the last (3rd) link after however many searches.

我有我的计划,但我不能让过去的第二次迭代!我没有找到一个办法了几个小时后,并得到我的答案,但它是一个无限循环,而这不会帮助我学习。

I've got my program, but I can't get past the 2nd iteration! I did find a way a couple hours into it and got my answer, but it was an infinite loop, and that's not going to help me learn.

可以说,这是我必须做的:

lets say this is what I have to do:

查找第7位(第一页7链接)的链接。按照该链接。重复此过程5次。答案是从您检索链接姓。

我有一个方法来检索的名字,只是遇到了问题搞清楚一个循环!

I've got a way to retrieve the name, just having trouble figuring out a loop!

我也有点过分热心试图找到另一个职位这个一小时。还有很多类似的,但不与,我发现这个确切的问题。谢谢你的时间。这里是我到目前为止所。

I also was a little overzealous trying to find another post about this for an hour. There are many similar, but not with this exact problem that I have found. Thanks for your time. Here is what I have so far.

from urllib.request import urlopen
from bs4 import BeautifulSoup

#first page url
url = 'insertwebsitehere.com' 
html = urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags
tags = soup('a')

taglist= []
count = 0

for tag in tags:
    name = tag.contents[0]
    newtag = tag.get('href',None)
    #print (newtag)
    # add count? count += 1 , then do something when it reaches a certain count?
    #taglist.append(newtag), this method didnt really work.

我是一个新的codeR,所以我想这样做没有的先进技术,我不一定需要的答案,只是帮助。

I am a new coder, so I'm trying to do this without advanced techniques, and I don't necessarily need the answer, just help.

推荐答案

我是通过这个证书课程在Assignement对Python的情报。

I'm in this Assignement at Python for Informatics via Coursera.

对于重复一定数量的时候我用的循环:

For the loop that repeats a certain amount of times I use:

for _ in range(c)

c等于计数=输入(),因此用户可以选择多少次想要循环重复,在这种情况下为4倍。

c is equal to count = input(), so the user can choose how many times want the loop to repeat, in our case is 4 times.

这篇关于在与美丽的汤某些链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆