我如何将这个哈希结合到单个JSON对象? [英] How do I combine this Hash to a single JSON object?

查看:140
本文介绍了我如何将这个哈希结合到单个JSON对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码生成包含特定网站所有类别信息的JSON文件。



目标是让JSON文件具有以下内容格式:

  [
{
id:36_17,
name :Diversen Particulier,
group:Diversen,
search_attributes:{
0:Prijs van / tot,
1 :Groep en Rubriek,
2:Conditie,
}
},
{
id:36_18,
name:Diversen Zakelijk,
group:Diversen,
search_attributes:{
0:Prijs van / tot,
1:Groep en Rubriek,
2:Conditie,
}
},
{
id:36_19,
name:Overige Diversen,
group:Diversen,
search_attributes:{
0:Prijs van / tot,
1:Groep en Rubriek,
2:Conditie,

} {...}
]

但我一直得到这样的格式:

  [
{
id:36_17,
name:Diversen Particulier,
group:Diversen,
search_attributes:{0:Prijs van / tot}
},

id:36_17,
name:Diversen Particulier,
group:Diversen,
search_attributes:{ 1:Groep en Rubriek}
},
{
id:36_17,
name:Diversen Particulier,
group:Diversen,
search_attributes:{2:Conditie}
},{...}
]

search_attributes 没有正确保存。



我使用下面的代码:

  require'mechanize'

@hashes = []

#初始化机械化对象
a = Mechanize.new

#开始刮取
a.get('http:// www .marktplaat s.nl/')do | page |
groups = page.search('// * [(@ id =navigation-categories)] // a')
groups.each_with_index do | group,index_1 |

a.get(group [:href])do | page_2 |
categories = page_2.search('// * [(@ id =category-browser)] // a')
categories.each_with_index do | category,index_2 |

a.get(category [:href])do | page_3 |
search_attributes = page_3.search('// * [contains(concat(,@class,),concat(,heading,))]')

search_attributes.each_with_index do | attribute,index_3 |
item = {
id:#{index_1} _#{index_2},
name:category.text,
group:group.text,
: search_attributes => {
:index_3.to_s => #{attribute.text unless attribute.text =='Outlet'}
}
}

@hashes<<项目

提取项目

结束
结束
结束
结束
结束
结束

#打开文件并开始
File.open(json / light /#{Time.now.strftime'%Y%m%d%H%M%S'} _ light_categories.json,'w ')do | f |
puts'#将类别数据写入JSON文件'
f.write(@ hashes.to_json)
puts| ----------->完成。 {@ hashes.length}写入。
end

puts'#Finished。'

是什么导致这种情况,我该如何解决它?



更新



非常感谢arie-shaw 为他的回答



以下是工作代码:

  require'mechanize '

@hashes = []

#初始化机械化对象
a = Mechanize.new

#开始刮取
a .get('http://www.marktplaats.nl/')do | page |
groups = page.search('// * [(@ id =navigation-categories)] // a')
groups.each_with_index do | group,index_1 |

a.get(group [:href])do | page_2 |
categories = page_2.search('// * [(@ id =category-browser)] // a')
categories.each_with_index do | category,index_2 |

a.get(category [:href])do | page_3 |
search_attributes = page_3.search('// * [contains(concat(,@class,),concat(,heading,))]')

attributes_hash = {}

search_attributes.each_with_index do |属性,index_3 |
attributes_hash [index_3.to_s] =#{attribute.text unless attribute.text =='Outlet'}
end

item = {
id: #{index_1}。#{index_2},
name:category.text,
group:group.text,
:search_attributes => attributes_hash
}

@hashes<<项目

放置项目
结束
结束
结束
结束
结束

#打开文件并开始
File.open(json / light /#{Time.now.strftime'%Y%m%d%H%M%S'} _ light_categories.json,'w')do | f |
puts'#将类别数据写入JSON文件'
f.write(@ hashes.to_json)
puts| ----------->完成。 {@ hashes.length}写入。
end
$ b $ puts'#Finished。'


解决方案

最内部的 each_with_index 应该仅用于生成 search_attributes hash,而不是结果中顶层数组的元素散列。

 #开始刮取
a.get('http ://www.marktplaats.nl/')do | page |
groups = page.search('// * [(@ id =navigation-categories)] // a')
groups.each_with_index do | group,index_1 |

a.get(group [:href])do | page_2 |
categories = page_2.search('// * [(@ id =category-browser)] // a')
categories.each_with_index do | category,index_2 |

a.get(category [:href])do | page_3 |
search_attributes = page_3.search('// * [contains(concat(,@class,),concat(,heading,))]')

attributes_hash = {}
search_attributes.each_with_index do |属性,index_3 |
attributes_hash [index_3.to_s] =#{attribute.text unless attribute.text =='Outlet'}
end

@hashes<< {
id:#{index_1} _#{index_2},
name:category.text,
group:group.text,
search_attributes:attributes_hash
}
结束
结束
结束
结束
结束


I'm using the following code to generate a JSON file containing all category information for a particular website.

The goal is to have a JSON file with the following format:

[
   {
      "id":"36_17",
      "name":"Diversen Particulier",
      "group":"Diversen",
      "search_attributes":{
         "0":"Prijs van/tot",
         "1":"Groep en Rubriek",
         "2":"Conditie",
      }
   },
   {
      "id":"36_18",
      "name":"Diversen Zakelijk",
      "group":"Diversen",
      "search_attributes":{
         "0":"Prijs van/tot",
         "1":"Groep en Rubriek",
         "2":"Conditie",
      }
   },
   {
      "id":"36_19",
      "name":"Overige Diversen",
      "group":"Diversen",
      "search_attributes":{
         "0":"Prijs van/tot",
         "1":"Groep en Rubriek",
         "2":"Conditie",
      }
   }, {...}
]

But I keep getting this format:

[
   {
      "id":"36_17",
      "name":"Diversen Particulier",
      "group":"Diversen",
      "search_attributes":{"0":"Prijs van/tot"}
   },
  {
     "id":"36_17",
     "name":"Diversen Particulier",
     "group":"Diversen",
     "search_attributes":{"1":"Groep en Rubriek"}
  },
  {
     "id":"36_17",
     "name":"Diversen Particulier",
     "group":"Diversen",
     "search_attributes":{"2":"Conditie"}
  }, {...}
]

The search_attributes are not getting saved correctly.

I'm using the following code:

require 'mechanize'

@hashes = []

# Initialize Mechanize object
a = Mechanize.new

# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
  groups = page.search('//*[(@id = "navigation-categories")]//a')
  groups.each_with_index do |group, index_1|

    a.get(group[:href]) do |page_2|
      categories = page_2.search('//*[(@id = "category-browser")]//a')
      categories.each_with_index do |category, index_2|

        a.get(category[:href]) do |page_3|
          search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')

          search_attributes.each_with_index do |attribute, index_3|
            item = {
              id: "#{index_1}_#{index_2}",
              name: category.text,
              group: group.text,
              :search_attributes => {
                :index_3.to_s => "#{attribute.text unless attribute.text == 'Outlet '}"
              }
            }

            @hashes << item

            puts item

          end
        end
      end
    end
  end
end

# Open file and begin
File.open("json/light/#{Time.now.strftime '%Y%m%d%H%M%S'}_light_categories.json", 'w') do |f|
  puts '# Writing category data to JSON file'
  f.write(@hashes.to_json)
  puts "|-----------> Done. #{@hashes.length} written."
end

puts '# Finished.'

The question is what's causing this and how do I solve it?

Update

A big thanks to arie-shaw for his answer.

Here's the working code:

require 'mechanize'

@hashes = []

# Initialize Mechanize object
a = Mechanize.new

# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
  groups = page.search('//*[(@id = "navigation-categories")]//a')
  groups.each_with_index do |group, index_1|

    a.get(group[:href]) do |page_2|
      categories = page_2.search('//*[(@id = "category-browser")]//a')
      categories.each_with_index do |category, index_2|

        a.get(category[:href]) do |page_3|
          search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')

          attributes_hash = {}

          search_attributes.each_with_index do |attribute, index_3|
            attributes_hash[index_3.to_s] = "#{attribute.text unless attribute.text == 'Outlet '}"
          end

          item = {
            id: "#{index_1}.#{index_2}",
            name: category.text,
            group: group.text,
            :search_attributes => attributes_hash
          }

          @hashes << item

          puts item
        end
      end
    end
  end
end

# Open file and begin
File.open("json/light/#{Time.now.strftime '%Y%m%d%H%M%S'}_light_categories.json", 'w') do |f|
  puts '# Writing category data to JSON file'
  f.write(@hashes.to_json)
  puts "|-----------> Done. #{@hashes.length} written."
end

puts '# Finished.'

解决方案

The most inner each_with_index should be only be used to generate the search_attributes hash, rather than an element hash of the top level array in the result.

# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
  groups = page.search('//*[(@id = "navigation-categories")]//a')
  groups.each_with_index do |group, index_1|

    a.get(group[:href]) do |page_2|
      categories = page_2.search('//*[(@id = "category-browser")]//a')
      categories.each_with_index do |category, index_2|

        a.get(category[:href]) do |page_3|
          search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')

          attributes_hash = {}
          search_attributes.each_with_index do |attribute, index_3|
            attributes_hash[index_3.to_s] = "#{attribute.text unless attribute.text == 'Outlet '}"
          end

          @hashes << {
            id: "#{index_1}_#{index_2}",
            name: category.text,
            group: group.text,
            search_attributes: attributes_hash
          }
        end
      end
    end
  end
end

这篇关于我如何将这个哈希结合到单个JSON对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆