使用scrapy从值列表中抓取网站 [英] Crawl website from list of values using scrapy

查看：122 发布时间：2020/7/6 6:51:17 python scrapy scrapy-spider scrapy-pipeline

本文介绍了使用scrapy从值列表中抓取网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个NPI列表，我想从npidb.org中抓取提供程序的名称 NPI值存储在一个csv文件中.

I have a list of NPIs which I want to scrape the names of the providers for from npidb.org The NPI values are stored in a csv file.

我可以通过将URL粘贴到代码中来手动完成此操作.但是，如果我有每个要提供者名称的NPI列表，则无法弄清楚该怎么做.

I am able to do it manually by pasting the URLs in the code. However, I am unable to figure out how to do it if I have a list of NPIs for each of which I want the provider names.

这是我当前的代码:

import scrapy
from scrapy.spider import BaseSpider



class MySpider(BaseSpider):
    name = "npidb"

    def start_requests(self):
        urls = [

            'https://npidb.org/npi-lookup/?npi=1366425381',
            'https://npidb.org/npi-lookup/?npi=1902873227',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-1]
        filename = 'npidb-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

推荐答案

好吧，这取决于csv文件的结构，但是如果它在单独的行中包含npi，则可以执行类似的操作

Well, it depends on the structure of your csv file, but if it contains the npis in separate lines, you could do something like

def start_requests(self):
    with open('npis.csv') as f:
        for line in f:
            yield scrapy.Request(
                url='https://npidb.org/npi-lookup/?npi={}'.format(line.strip()), 
                callback=self.parse
            )

这篇关于使用scrapy从值列表中抓取网站的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用scrapy从值列表中抓取网站 [英] Crawl website from list of values using scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用scrapy从值列表中抓取网站 [英] Crawl website from list of values using scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭