如何将多个哈希合并成一个有效的JSON文件? [英] How do I merge multiple Hashes into a single valid JSON file?
问题描述
需要' mechanize'
@categories_hash = {}
@categories_hash ['category'] || = {}
@categories_hash ['category'] ['id'] || = {}
@categories_hash ['category'] ['name'] || = {}
@categories_hash ['category'] ['group'] || = {}
@categories_hash ['category'] ['search_attributes'] || = {}
#初始化机械化对象
a = Mechanize.new
#打开文件并开始
File.open(json / booyah /#{Time.now.strftime'%Y%m%d%H%M%S'} _ booyah_categories.json,'w')do | f |
puts'#将类别数据写入JSON文件'
#开始刮取
a.get('http://www.marktplaats.nl/')do | page |
groups = page.search('// * [(@ id =navigation-categories)] // a')
groups.each_with_index do | group,index_1 |
a.get(group [:href])do | page_2 |
categories = page_2.search('// * [(@ id =category-browser)] // a')
categories.each_with_index do | category,index_2 |
a.get(category [:href])do | page_3 |
search_attributes = page_3.search('// * [contains(concat(,@class,),concat(,heading,))]')
search_attributes.each_with_index do | attribute,index_3 |
@categories_hash ['category'] ['id'] =#{index_1} _#{index_2}
@categories_hash ['category'] ['name'] = category.text
@categories_hash ['category'] ['group'] = group.text
@categories_hash ['category'] ['search_attributes'] [index_3] = attribute.text unless attribute.text =='Outlet'
end
#取消注释如果您想查看正在写入的内容
puts @categories_hash ['category']。to_json
#写入转换后的哈希到JSON文件
f.write(@categories_hash ['category']。to_json)
end
end
end
end
end
puts'| ----------->完成'
end
puts'#Finished。'
此代码产生以下无效 JSON文件。查看完整的JSON文件此处。它看起来像这样:
{
id:0_0,
name: Boeken en Bijbels,
group:Antiek en Kunst,
search_attributes:{
0:Prijs van / tot,
1 :Groep en Rubriek,
2:Aangeboden sinds
}
} {
id:0_1,
name :电子邮件,
group:Antiek en Kunst,
search_attributes:{
0:Prijs van / tot,
1 :Groep en Rubriek,
2:Aangeboden sinds
}
} {
id:0_2,
name: b $ bgroup:Antiek en Kunst,
search_attributes:{
0:Prijs van / tot,
1 :Groep en Rubriek,
2:Aangeboden sinds
}
} {...}
我希望输出是有效的JSON,如下所示:
[
{
id:0_0,
name:Boeken en Bijbels,
group:Antiek en Kunst,
搜索h_attributes:{
0:Prijs van / tot,
1:Groep en Rubriek,
2:Aangeboden sinds
}
},
{
id:0_1,
name:Emaille,
group:Antiek en Kunst,
search_attributes:{
0:Prijs van / tot,
1:Groep en Rubriek,
2:Aangeboden sinds
id:0_2,
name:Gereedschap en Instrumenten,
group: Antiek en Kunst,
search_attributes:{
0:Prijs van / tot,
1:Groep en Rubriek,
2 :Aangeboden sinds
}
},
{...}
]
问题是,我该如何做到这一点?
更新
非常感谢您为他的 maerics > answer 。
这里是稍微更新,但工作代码:
pre $ require'mechanize'
@categories_hash = {}
@categories_hash ['category'] || = {}
@categories_hash ['category'] ['id'] || = {}
@categories_hash ['category'] ['name '] || = {}
@categories_hash ['category'] ['group'] || = {}
@categories_hash ['category'] ['search_attributes'] || = {}
@hashes = []
#初始化机械化对象
a = Mechanize.new
#开始刮取
a.get ('http://www.marktplaats.nl/')do | page |
groups = page.search('// * [(@ id =navigation-categories)] // a')
groups.each_with_index do | group,index_1 |
a.get(group [:href])do | page_2 |
categories = page_2.search('// * [(@ id =category-browser)] // a')
categories.each_with_index do | category,index_2 |
a.get(category [:href])do | page_3 |
search_attributes = page_3.search('// * [contains(concat(,@class,),concat(,heading,))]')
search_attributes.each_with_index do | attribute,index_3 |
item = {
id:#{index_1} _#{index_2},
name:category.text,
group:group.text,
: search_attributes => {
:index_3.to_s => #{attribute.text unless attribute.text =='Outlet'}
}
}
@hashes<<项目
提取项目
结束
结束
结束
结束
结束
结束
#打开文件并开始
File.open(json / light /#{Time.now.strftime'%Y%m%d%H%M%S'} _ light_categories.json,'w ')do | f |
puts'#将类别数据写入JSON文件'
f.write(@ hashes.to_json)
puts'| ----------->完成'
end
puts'#Finished。'
使用内置 Ruby JSON库:
require'json'
哈希= []
all_hashes.each {| h |散列 h}
print hashes.to_json
或者,在极端情况下,你的哈希将不会适合可用内存(伪代码):
print'['
for each JSON hash H
打印H
print','除非H是集合中的最后一个
print']'
I'm using the following code to generate a JSON file containing all category information for a particular website.
require 'mechanize'
@categories_hash = {}
@categories_hash['category'] ||= {}
@categories_hash['category']['id'] ||= {}
@categories_hash['category']['name'] ||= {}
@categories_hash['category']['group'] ||= {}
@categories_hash['category']['search_attributes'] ||= {}
# Initialize Mechanize object
a = Mechanize.new
# Open file and begin
File.open("json/booyah/#{Time.now.strftime '%Y%m%d%H%M%S'}_booyah_categories.json", 'w') do |f|
puts '# Writing category data to JSON file'
# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
groups = page.search('//*[(@id = "navigation-categories")]//a')
groups.each_with_index do |group, index_1|
a.get(group[:href]) do |page_2|
categories = page_2.search('//*[(@id = "category-browser")]//a')
categories.each_with_index do |category, index_2|
a.get(category[:href]) do |page_3|
search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')
search_attributes.each_with_index do |attribute, index_3|
@categories_hash['category']['id'] = "#{index_1}_#{index_2}"
@categories_hash['category']['name'] = category.text
@categories_hash['category']['group'] = group.text
@categories_hash['category']['search_attributes'][index_3] = attribute.text unless attribute.text == 'Outlet '
end
# Uncomment if you want to see what's being written
puts @categories_hash['category'].to_json
# Write the converted Hash to the JSON file
f.write(@categories_hash['category'].to_json)
end
end
end
end
end
puts '|-----------> Done.'
end
puts '# Finished.'
This code produces the following, invalid JSON file. Take a look at the full JSON file here. It looks like this:
{
"id": "0_0",
"name": "Boeken en Bijbels",
"group": "Antiek en Kunst",
"search_attributes": {
"0": "Prijs van/tot",
"1": "Groep en Rubriek",
"2": "Aangeboden sinds"
}
}{
"id": "0_1",
"name": "Emaille",
"group": "Antiek en Kunst",
"search_attributes": {
"0": "Prijs van/tot",
"1": "Groep en Rubriek",
"2": "Aangeboden sinds"
}
}{
"id": "0_2",
"name": "Gereedschap en Instrumenten",
"group": "Antiek en Kunst",
"search_attributes": {
"0": "Prijs van/tot",
"1": "Groep en Rubriek",
"2": "Aangeboden sinds"
}
}{...}
I want the output to be valid JSON and look like this:
[
{
"id": "0_0",
"name": "Boeken en Bijbels",
"group": "Antiek en Kunst",
"search_attributes": {
"0": "Prijs van/tot",
"1": "Groep en Rubriek",
"2": "Aangeboden sinds"
}
},
{
"id": "0_1",
"name": "Emaille",
"group": "Antiek en Kunst",
"search_attributes": {
"0": "Prijs van/tot",
"1": "Groep en Rubriek",
"2": "Aangeboden sinds"
}
},
{
"id": "0_2",
"name": "Gereedschap en Instrumenten",
"group": "Antiek en Kunst",
"search_attributes": {
"0": "Prijs van/tot",
"1": "Groep en Rubriek",
"2": "Aangeboden sinds"
}
},
{...}
]
The question is, how do I accomplish this?
Update
A big thank you to maerics for his answer.
Here's the slightly updated, but working code:
require 'mechanize'
@categories_hash = {}
@categories_hash['category'] ||= {}
@categories_hash['category']['id'] ||= {}
@categories_hash['category']['name'] ||= {}
@categories_hash['category']['group'] ||= {}
@categories_hash['category']['search_attributes'] ||= {}
@hashes = []
# Initialize Mechanize object
a = Mechanize.new
# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
groups = page.search('//*[(@id = "navigation-categories")]//a')
groups.each_with_index do |group, index_1|
a.get(group[:href]) do |page_2|
categories = page_2.search('//*[(@id = "category-browser")]//a')
categories.each_with_index do |category, index_2|
a.get(category[:href]) do |page_3|
search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')
search_attributes.each_with_index do |attribute, index_3|
item = {
id: "#{index_1}_#{index_2}",
name: category.text,
group: group.text,
:search_attributes => {
:index_3.to_s => "#{attribute.text unless attribute.text == 'Outlet '}"
}
}
@hashes << item
puts item
end
end
end
end
end
end
# Open file and begin
File.open("json/light/#{Time.now.strftime '%Y%m%d%H%M%S'}_light_categories.json", 'w') do |f|
puts '# Writing category data to JSON file'
f.write(@hashes.to_json)
puts '|-----------> Done.'
end
puts '# Finished.'
Using the builtin Ruby JSON library:
require 'json'
hashes = []
all_hashes.each { |h| hashes << h }
print hashes.to_json
Or, in the extreme case that your hashes will not fit into the available memory (pseudocode):
print '['
for each JSON hash H
print H
print ',' unless H is the last of the set
print ']'
这篇关于如何将多个哈希合并成一个有效的JSON文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!