如何使用 Scrapy 从网站上抓取地址? [英] How to scrape address from websites using Scrapy?

查看：57 发布时间：2021/7/16 21:44:29 web-scraping scrapy scrape

本文介绍了如何使用 Scrapy 从网站上抓取地址?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Scrapy，我需要从给定域的联系我们页面中抓取地址.这些域是由 google 搜索 api 提供的，因此我不知道网页的确切结构是什么.这种刮痧可能吗?任何例子都会很好.

I am using Scrapy and I need to scrape the address from contact us page from a given domain. The domains are provided as a result of google search api and hence I do not know what the exact structure of the web page is going to be. Is this kind of scraping possible? Any examples would be nice.

推荐答案

提供一些示例有助于更好地回答，但总体思路可能是:

Providing few examples would help to make a better answer, but the general idea could be to:

找到联系我们"链接
点击链接并提取地址

假设您没有关于您将获得的网站的任何信息.

assuming you don't have any information about the web-sites you'll be given.

让我们关注第一个问题.

Let's focus on the first problem.

这里的主要问题是网站的结构不同，严格来说，您无法建立一种 100% 可靠的方式来找到联系我们"页面.但是，您可以涵盖"最常见的情况:

The main problem here is that the web-sites are structured differently and, strictly speaking, you cannot build a 100% reliable way to find the "Contact Us" page. But, you can "cover" the most common cases:

在 a 标签后面加上联系我们"、联系方式"、关于我们"、关于"等文本
检查 /about、/contact_us 和类似的端点，示例:
- http://www.sample.com/contact.php
- http://www.sample.com/contact
- follow the a tag with the text "Contact Us", "Contact", "About Us", "About" etc
- check /about, /contact_us and similar endpoints, examples:
  - http://www.sample.com/contact.php
  - http://www.sample.com/contact
  从这些你可以构建一组您的 CrawlSpider.
  
  From these you can build a set of Rules for your CrawlSpider.
  
  第二个问题并不容易 - 您不知道地址位于页面上的哪个位置(并且可能在页面上不存在)，并且您不知道地址格式.您可能需要深入研究自然语言处理和机器学习.
  
  The second problem is no easier - you don't know where on the page an address is located (and may be it doesn't exist on a page), and you don't know the address format. You may need to dive into Natural Language Processing and Machine Learning.
  
  这篇关于如何使用 Scrapy 从网站上抓取地址?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 Scrapy 从网站上抓取地址? [英] How to scrape address from websites using Scrapy?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 Scrapy 从网站上抓取地址? [英] How to scrape address from websites using Scrapy?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭