重复过程中遵循一个网站链接（BeautifulSoup） [英] Repetitive process to follow links in a website (BeautifulSoup)

查看：161 发布时间：2016/8/5 19:11:22 python loops beautifulsoup

本文介绍了重复过程中遵循一个网站链接（BeautifulSoup）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用Python写一个code让所有的'a'使用美丽的汤在URL标记，然后我用3位的链接，那么我应该遵循的链接，我会重复这个过程约18倍。我包括低于code，其具有重复两次该过程。我不能去约的方式来重复同样的过程18次在loop.Any帮助将是AP preciated。

 进口重
进口的urllib从BeautifulSoup进口*
htm1 =了urllib.urlopen（'https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html'）.read（）
汤= BeautifulSoup（htm1）
标签=汤（'A'）
list1的=名单（）
在标签标签：
    X = tag.get（'href属性，无）
    list1.append（X）M = list1的[2]HTM2 =了urllib.urlopen（M）.read（）
汤= BeautifulSoup（HTM2）
tags1 =汤（'A'）
列表2 =名单（）
在tags1标签1：
    X2 = tag1.get（'href属性，无）
    list2.append（2次）Y =列表2 [2]
打印ÿ

OK，我只是写了这个code，它的工作，但我得到的结果相同的4个环节。它看起来像有什么不对的循环（请注意：我试图循环4次）

 进口重
进口的urllib
从BeautifulSoup进口*
list1的=名单（）
URL ='https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html因为我在范围内（4）：＃重复4次
    HTM2 =了urllib.urlopen（URL）.read（）
    soup1 = BeautifulSoup（HTM2）
    tags1 = soup1（'A'）
    在tags1标签1：
        X2 = tag1.get（'href属性，无）
        list1.append（2次）
    Y = list1的[2]
    如果len（X2）＆LT; 3：＃没有第三链接
        打破＃退出循环
    其他：
        URL = Y
    打印ÿ

解决方案

我不能来关于一种方法来重复相同的过程18次在一个循环

要重复一些18次在Python中，你可以使用 _为在范围（18）循环：
 ＃！的/ usr / bin中/ env的python2
从进口的urllib2的urlopen
从进口里urlparse urljoin
从BS4进口BeautifulSoup＃$ PIP安装beautifulsoup4URL ='http://example.com'
对于_范围内（18）：＃重复18次
    汤= BeautifulSoup（的urlopen（URL））
    A = soup.find_all（'A'中，href = TRUE）＃所有＆LT; A HREF＆GT;链接
    如果len（一）LT; 3：＃没有第三链接
        打破＃退出循环
    URL = urljoin（URL中，[2] ['HREF']）＃3链接，注意事项：忽略＆LT;基本href＆GT;
 
I'm writing a code in Python to get all the 'a' tags in a URL using Beautiful soup, then I use the link at position 3, then I should follow that link, I will repeat this process about 18 times. I included the code below, which has the process repeated twice. I can't come about a way to repeat the same process 18 times in a loop.Any help would be appreciated.
import re
import urllib

from BeautifulSoup import *
htm1= urllib.urlopen('https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html ').read()
soup =BeautifulSoup(htm1)
tags = soup('a')
list1=list()
for tag in tags:
    x = tag.get('href', None)
    list1.append(x)

M= list1[2]

htm2= urllib.urlopen(M).read()
soup =BeautifulSoup(htm2)
tags1 = soup('a')
list2=list()
for tag1 in tags1:
    x2 = tag1.get('href', None)
    list2.append(x2)

y= list2[2]
print y
OK, I just wrote this code, it's working but I get the same 4 links in the results. It looks like there is something wrong in the loop (please note: I'm trying the loop 4 times)
import re
import urllib
from BeautifulSoup import *
list1=list()
url = 'https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html'

for i in range (4):  # repeat 4 times
    htm2= urllib.urlopen(url).read()
    soup1=BeautifulSoup(htm2)
    tags1= soup1('a')
    for tag1 in tags1:
        x2 = tag1.get('href', None)
        list1.append(x2)
    y= list1[2]
    if len(x2) < 3:  # no 3rd link
        break  # exit the loop
    else:
        url=y             
    print y
解决方案

I can't come about a way to repeat the same process 18 times in a loop.

To repeat something 18 times in Python, you could use for _ in range(18) loop:
#!/usr/bin/env python2
from urllib2 import urlopen
from urlparse import urljoin
from bs4 import BeautifulSoup # $ pip install beautifulsoup4

url = 'http://example.com'
for _ in range(18):  # repeat 18 times
    soup = BeautifulSoup(urlopen(url))
    a = soup.find_all('a', href=True)  # all <a href> links
    if len(a) < 3:  # no 3rd link
        break  # exit the loop
    url = urljoin(url, a[2]['href'])  # 3rd link, note: ignore <base href>
这篇关于重复过程中遵循一个网站链接（BeautifulSoup）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

重复过程中遵循一个网站链接（BeautifulSoup） [英] Repetitive process to follow links in a website (BeautifulSoup)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

重复过程中遵循一个网站链接（BeautifulSoup） [英] Repetitive process to follow links in a website (BeautifulSoup)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭