使用python抓取网站 [英] Use python to crawl a website

查看：117 发布时间：2018/7/11 17:15:39 python nested while-loop hyperlink web-crawler

本文介绍了使用python抓取网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我正在寻找一种动态的方式来抓取网站并从每个页面获取链接。我决定尝试Beauitfulsoup。两个问题：如何使用嵌套的while语句搜索链接，更加动态地执行此操作。我想从这个网站获得所有链接。但我不想继续使用嵌套的while循环。

So I am looking for a dynamic way to crawl a website and grab links from each page. I decided to experiment with Beauitfulsoup. Two questions: How do I do this more dynamically then using nested while statements searching for links. I want to get all the links from this site. But I don't want to continue to put nested while loops.

    topLevelLinks = self.getAllUniqueLinks(baseUrl)
    listOfLinks = list(topLevelLinks)       

    length = len(listOfLinks)
    count = 0       

    while(count < length):

        twoLevelLinks = self.getAllUniqueLinks(listOfLinks[count])
        twoListOfLinks = list(twoLevelLinks)
        twoCount = 0
        twoLength = len(twoListOfLinks)

        for twoLinks in twoListOfLinks:
            listOfLinks.append(twoLinks)

        count = count + 1

        while(twoCount < twoLength):
            threeLevelLinks = self.getAllUniqueLinks(twoListOfLinks[twoCount])  
            threeListOfLinks = list(threeLevelLinks)

            for threeLinks in threeListOfLinks:
                listOfLinks.append(threeLinks)

            twoCount = twoCount +1



    print '--------------------------------------------------------------------------------------'
    #remove all duplicates
    finalList = list(set(listOfLinks))  
    print finalList

我的第二个问题无论如何要告诉我是否从网站获得了所有链接。请原谅我，我对python（一年左右）有些新意，我知道我的一些进程和逻辑可能是幼稚的。但我必须以某种方式学习。主要是我只想使用嵌套的while循环更加动态。提前感谢您的任何见解。

My second questions is there anyway to tell if I got all the links from the site. Please forgive me, I am somewhat new to python (year or so) and I know some of my processes and logic might be childish. But I have to learn somehow. Mainly I just want to do this more dynamic then using nested while loop. Thanks in advance for any insight.

使用python抓取网站 [英] Use python to crawl a website

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python抓取网站 [英] Use python to crawl a website

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭