我想从使用scrapy刮取的数据中打印出正确的表格 [英] i want to print a proper table out of data scraped using scrapy

查看:59
本文介绍了我想从使用scrapy刮取的数据中打印出正确的表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我已经将所有代码从 [http://www.rarityguide.com/cbgames_view.php?FirstRecord=21][1] 写入 scrape 表,但我得到的输出如下

so i have written all the code to scrape table from [http://www.rarityguide.com/cbgames_view.php?FirstRecord=21][1] but i am getting output like

# the output that i get

{'EXG': (['17.00',
          '10.00',
          '90.00',
          '9.00',
          '13.00',
          '17.00',
          '16.00',
          '43.00',
          '125.00',
          '16.00',
          '11.00',
          '150.00',
          '17.00',
          '24.00',
          '15.00',
          '24.00',

  

'21.00',
          '36.00',
          '270.00',
          '280.00'],),
 'G': ['8.00',
       '5.00',
       '38.00',
       '2.00',
       '6.00',
       '7.00',
       '6.00',
       '20.00',
       '40.00',
       '7.00',
       '5.00',
       '70.00',
       '6.00',
       '12.00',
       '7.00',
       '12.00',
       '10.00',
       '15.00',
       '120.00',
       '140.00'],
 'company': (['Milton Bradley',
              'Lowell',
              'Milton Bradley',
              'Transogram',
              'Milton Bradley',
              'Transogram',
              'Standard Toykraft',
              'Ideal',
              'Game Gems',
              'Milton Bradley',
              'Parker Brothers',
              'CPC',
              'Parker Brothers',
              'Whitman',
              'Ideal',
              'Transogram',
              'King Features',
              'Westinghouse',
              'Parker Brothers',
              'Parker Brothers'],),
 'mnm': (['26.00',
          '19.00',
          '195.00',
          '15.00',
          '30.00',
          '29.00',
          '31.00',
          '65.00',
          '204.00',
          '25.00',
          '22.00',
          '250.00',
          '27.00',
          '42.00',
          '23.00',
          '37.00',
          '40.00',
          '57.00',
          '415.00',
          '435.00'],),
 'rarity': ([],),
 'title': (['Beat the Clock',
            'Beat the Clock',
            'Beatles - Flip Your Wig',
            'Ben Casey M.D.',
            'Bermuda Triangle',
            'Betsy Ross and the Flag',
            'Beverly Hillbillies',
            'Beware the Spider',
            'Bewitched',
            'Bewitched - Stymie Card Game',
            'Bionic Woman',
            'Blade Runner',
            'Blondie',
            'Blondie - Playing Card Game',
            'Blondie - Sunday Funnies',
            'Blondie - The Hurry Scurry Game',
            "Blondie and Dagwood's Race for the Office",
            'Blondie Goes to Leisureland',
            'Boom or Bust',
            'Boom or Bust'],),
 'year': (['1969',
           '1954',
           '1964',
           '1961',
           '1976',
           '1961',
           '1963',
           '1980',
           '1965',
           '1964',
           '1976',
           '1982',
           '1969',
           '1941',
           '1972',
           '1966',
           '1950',
           '1935',
           '1951',
           '1959'],)}

谁能帮我实现类似的输出

can ayone help me achieve output like

# the output that i want!
{"EXG": ["17.00"],
  "MNM": ["26.00"],
  "year": ["1969"],
  "company": ["Milton Bradley"],
  "Title": ["Beat the Clock"] }

{"EXG": ["10.00"],
  "MNM": ["19.00"],
  "year": ["1954"],
  "company": ["Lowell"],
  "Title": ["Beat the Clock"] }
and then so on for all values.

基本上我希望有一个包含所有键值对的字典,而不是每个键都有一个完整的字典.还有我的蜘蛛代码

basically i want to have one dictionary containing all the key value pairs instead of having one entire dictionary for each key. also here's my spider's code

import scrapy
from ..items import RarityItem


class RarityScraper(scrapy.Spider):
    name = "rarity"
    start_urls = [
        "http://www.rarityguide.com/cbgames_view.php?FirstRecord=21"
    ]

    def parse(self, response):
        table = response.css(
            "form")

        items = RarityItem()

        for contents in table:
            title = contents.css("td:nth-child(2)::text").extract()
            company = contents.css("td:nth-child(3)::text").extract()
            year = contents.css("td:nth-child(4)::text").extract()
            rarity = contents.css("td:nth-child(5)::text").extract()
            mnm = contents.css("td:nth-child(6)::text").extract()
            EXG = contents.css("td:nth-child(7)::text").extract()
            G = contents.css("td:nth-child(8)::text").extract()

            items["title"] = title,
            items["company"] = company,
            items["year"] = year,
            items["rarity"] = rarity,
            items["mnm"] = mnm,
            items["EXG"] = EXG,
            items["G"] = G

            yield items

推荐答案

如果所有列表长度相同,则在此行之后

If all lists are same length, after this line

G = contents.css("td:nth-child(8)::text").extract():

添加此 ode 片段:

Add this ode snippet:

arr = []
for _ in range(len(title)):
    arr.append({
        'EXP': title[_], 'company': company[_], 'year': year[_], 'rarity': rarity[_],
        'MNM': mnm[_], 'EXG': EXG[_], 'G': G[_]})

然后输入:

for _ in arr:
    print(_)

查看输出数组

这篇关于我想从使用scrapy刮取的数据中打印出正确的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆