从代码获取所有href [英] Getting all href from a code

查看：96 发布时间：2020/5/4 8:37:05 python selenium web-crawler beautifulsoup lxml

本文介绍了从代码获取所有href的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在做一个网络爬虫.为了在页面中查找链接，我在硒中使用了xpath

I'm making a web-crawler. For finding the links in a page I was using xpath in selenium

driver = webdriver.Firefox()
driver.get(side)
Listlinker = driver.find_elements_by_xpath("//a")

这很好.但是，在测试搜寻器时，我发现并非所有链接都位于标记下. href有时也会在area或div标签中使用.

This worked fine. Testing the crawler however, I found that not all links come under the a tag. href is sometimes used in area or div tags as well.

现在我被困住了

driver = webdriver.Firefox()
driver.get(side)
Listlinkera = driver.find_elements_by_xpath("//a")
Listlinkerdiv = driver.find_elements_by_xpath("//div")
Listlinkerarea = driver.find_elements_by_xpath("//area")

确实将爬网放入了Web爬网程序中.

which really puts the crawl in web-crawler.

我已经尝试过xpath "//@href"，但这是行不通的.我还尝试了几种方法来有效地获取所有href url，都使用了漂亮的汤和lxml，但是到目前为止，都没有用.对不起，我没有任何代码可以显示我对美丽的汤和lxml所做的工作，但是由于这些代码无用，我删除了它们，我知道这不是最明智的做法.为了我自己，现在我已经开始保存这些失败的尝试，如果我想再次尝试，并且想知道第一次出了什么问题

I've tried xpath "//@href", but that doesn't work. I've also tried several ways to get all href url's in an efficient manner, both using beautiful soup and lxml, but so far, to no avail. I'm sorry I do not have any code to show for my efforts with beautiful soup and lxml, but as these proved useless, I deleted them, which isn't the smartest practice, I know. I have now started to save these unsuccessful attempts, for my own sake, if I ever want to try again, and want to know what went wrong the first time

在此方面能提供的任何帮助将不胜感激.

Any help I could get on this would be greatly appreciated.

从代码获取所有href [英] Getting all href from a code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从代码获取所有href [英] Getting all href from a code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭