为什么此JSON文件填充了最后一次哈希数据的1747倍? [英] Why does this JSON file get filled with 1747 times the last Hash data?

查看:134
本文介绍了为什么此JSON文件填充了最后一次哈希数据的1747倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码生成一个JSON文件,其中包含特定网站的所有类别信息.

I'm using the following code to generate a JSON file containing all category information for a particular website.

require 'mechanize'

@hashes = []

@categories_hash = {}
@categories_hash['category'] ||= {}
@categories_hash['category']['id'] ||= {}
@categories_hash['category']['name'] ||= {}
@categories_hash['category']['group'] ||= {}

# Initialize Mechanize object
a = Mechanize.new

# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
  groups = page.search('//*[(@id = "navigation-categories")]//a')

  groups.each_with_index do |group, index_1|
    a.get(group[:href]) do |page_2|
      categories = page_2.search('//*[(@id = "category-browser")]//a')

      categories.each_with_index do |category, index_2|
        @categories_hash['category']['id'] = "#{index_1}_#{index_2}"
        @categories_hash['category']['name'] = category.text
        @categories_hash['category']['group'] = group.text

        @hashes << @categories_hash['category']

        # Uncomment if you want to see what's being written
        puts @categories_hash['category'].to_json
      end
    end
  end
end

File.open("json/magic/#{Time.now.strftime '%Y%m%d%H%M%S'}_magic_categories.json", 'w') do |f|
  puts '# Writing category data to JSON file'
  f.write(@hashes.to_json)
  puts "|-----------> Done. #{@hashes.length} written."
end

puts '# Finished.'

但是此代码返回仅填充最后一个类别数据的JSON文件.对于完整的JSON文件,请此处.这是一个示例:

But this code returns a JSON file filled with just the last category data. For the full JSON file take a look here. This is a sample:

[
   {
      "id":"36_17",
      "name":"Overige Diversen",
      "group":"Diversen"
   },
   {
      "id":"36_17",
      "name":"Overige Diversen",
      "group":"Diversen"
   },
   {
      "id":"36_17",
      "name":"Overige Diversen",
      "group":"Diversen"
   }, {...}
]

问题是,这是什么原因造成的,我该如何解决?

The question is, what's causing this and how can I solve it?

推荐答案

相同对象(@categories_hash['category']的结果)将在每个循环中更新.

The same object, the result of @categories_hash['category'], is being updated each loop.

因此,数组用 same 对象填充了1747次,并且该对象反映了在以后查看时在最后一个循环中完成的变异.

Thus the array is filled with the same object 1747 times, and the object reflects the mutations done on the last loop when it is viewed later.

虽然修复方法可能是使用@categories_hash[category_name]或类似方法(即每个循环获取/确保不同对象),但以下内容避免了所描述的问题以及类别"的未使用/误用的哈希键.

While a fix might be to use @categories_hash[category_name] or similar (i.e. fetch/ensure a different object each loop), the following avoids the problem described and the unused/misused hash of 'category' keys.

categories.each_with_index do |category, index_2|
    # creates a new Hash object
    item = {
        id: "#{index_1}_#{index_2}",
        name: category.text,
        group: group.text
    }
    # adds the new (per yield) object
    @hashes << item
end

或者,更实用"的方法可能是使用

Alternatively, a more "functional" approach might be to use map, but it solves the problem in the same way - by creating new [Hash] objects. (This could be expanded to also include the outer loop, but it's just here for a taste.)

h = categories.each_with_index.map do |category, index_2|
    {
        id: "#{index_1}_#{index_2}",
        name: category.text,
        group: group.text
    }
end
@hashes.concat(h)

这篇关于为什么此JSON文件填充了最后一次哈希数据的1747倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆