在Heroku上渲染大量JSON的有效方法 [英] Efficient way to render ton of JSON on Heroku

查看:140
本文介绍了在Heroku上渲染大量JSON的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用一个端点构建了一个简单的API。它刮擦文件,目前有大约30,000条记录。我希望能够通过一个http调用在JSON中获取所有这些记录。



这是我的Sinatra查看代码:

  require'sinatra' 
require'json'
require'mongoid'

Mongoid.identity_map_enabled = false

get'/'do
content_type:json
Book.all
end

我尝试了以下操作:
使用multi_json与

  require'./require.rb'
require'sinatra'
require' multi_json'
MultiJson.engine =:yajl

Mongoid.identity_map_enabled = false

get'/'do
content_type:json
MultiJson .encode(Book.all)
end

这种方法的问题是我得到错误R14(超出内存配额)。当我尝试使用'oj'宝石时,我收到了同样的错误。



我只是将一切冗长的Redis字符串进行协调,但Heroku的redis服务为每月30美元我需要的实例大小(> 10mb)。

我当前的解决方案是使用后台任务创建对象,并在满足Mongoid对象大小限制(16mb)。这种方法的问题:渲染仍然需要将近30秒,我必须在接收应用程序上运行后处理,以正确地从对象中提取json。



有没有人有任何更好的想法,我可以如何在一次调用中为30k记录呈现json而无需切换离开Heroku? 听起来就像你想直接将JSON流式传输到客户端,而不是将它们全部存储在内存中。这可能是减少内存使用量的最佳方法。例如,您可以使用 yajl 将JSON直接编码到流中。

编辑:我重写了 yajl 的整个代码,因为它的API更加引人注目,并且允许更简洁的代码。我还包括一个阅读大块响应的例子。这是我写的流式JSON数组助手:

  require'yajl'

模块JsonArray
class StreamWriter
def initialize(out)
super()
@out = out
@encoder = Yajl :: Encoder.new
@first = true
end

def <(lt)(object)
@out<< ','除非@first
@out<< @ encoder.encode(object)
@out<< \\\

@first = false
结束
结束

def self.write_stream(app,&block)
app.stream do |出去|
out<< '['
block.call StreamWriter.new(out)
out<< ']'
结束
结束
结束

用法:

  require'sinatra'
require'mongoid'

Mongoid.identity_map_enabled = false

#使用支持流式传输的服务器
set:server,:thin

get'/'do
content_type:json
JsonArray.write_stream (self)do | json |
Book.all.each do | book |
json<< book.attributes
end
end
end

解码客户端可以使用 em-http 来读取和解析块中的响应。请注意,此解决方案要求客户端内存足够大以存储整个对象数组。下面是相应的流式解析器帮助器:

  require'yajl'

module JsonArray
class StreamParser
def初始化(&回调)
@parser = Yajl :: Parser.new
@ parser.on_parse_complete =回调
结束

def< ;<(str)
@parser<< str
end
end

def self.parse_stream(&callback)
StreamParser.new(&callback)
end
end

用法:

  require'em-http'

parser = JsonArray.parse_stream do | object |当我们完成分析
#整个数组时,调用
#块;现在我们可以处理数据
p object
end

EventMachine.run do
http = EventMachine :: HttpRequest.new('http:// localhost:4567' ).get
http.stream do | chunk |
解析器<< chunk
end
http.callback do
EventMachine.stop
end
end

替代解决方案

实际上,当您放弃生成一个适当的JSON数组。以上解决方案所产生的JSON格式如下:

  [{... book_1 ...} 
, {... book_2 ...}
,{... book_3 ...}
...
,{... book_n ...}
]

然而,我们可以将每本书作为单独的JSON进行流式处理,从而将格式缩小为以下内容:

  {... book_1 ...} 
{... book_2 ...}
{.. 。book_3 ...}
...
{... book_n ...}

然后服务器上的代码会很简单:

  require'sinatra '
require'mongoid'
require'yajl'

Mongoid.identity_map_enabled = false
set:server,:thin

get' /'do
content_type:json
encoder = Yajl :: Encoder.new
stream do | out |
Book.all.each do | book |
out<< encoder.encode(book.attributes)<< \\\

结束
结束
结束

由于以及客户:

  require'em-http'
require'yajl'

parser = Yajl :: Parser.new
parser.on_parse_complete = Proc.new do | book |
#现在将为每本书分别调用
p book
end

EventMachine.run do
http = EventMachine :: HttpRequest.new(' http:// localhost:4567').get
http.stream do | chunk |
解析器<< chunk
end
http.callback do
EventMachine.stop
end
end

重要的是现在客户端不必等待整个响应,而是分别分析每本书。但是,如果您的某个客户希望有一个大的JSON数组,那么这将不起作用。


I built a simple API with one endpoint. It scrapes files and currently has around 30,000 records. I would ideally like to be able to fetch all those records in JSON with one http call.

Here is my Sinatra view code:

require 'sinatra'
require 'json'
require 'mongoid'

Mongoid.identity_map_enabled = false

get '/' do
  content_type :json
  Book.all
end

I've tried the following: using multi_json with

require './require.rb'
require 'sinatra'
require 'multi_json'
MultiJson.engine = :yajl

Mongoid.identity_map_enabled = false

get '/' do
  content_type :json
  MultiJson.encode(Book.all)
end

The problem with this approach is I get Error R14 (Memory quota exceeded). I get the same error when I try to use the 'oj' gem.

I would just concatinate everything one long Redis string, but Heroku's redis service is $30 per month for the instance size I would need (> 10mb).

My current solution is to use background task that creates objects and stuffs them full of jsonified objects at near the Mongoid object size limit (16mb). The problems with this approach: It still takes nearly 30 seconds to render, and I have to run post-processing on the receiving app to properly extract the json from the objects.

Does anyone have any better idea for how I can render json for 30k records in one call without switching away from Heroku?

解决方案

Sounds like you want to stream the JSON directly to the client instead of building it all up in memory. It's probably the best way to cut down memory usage. You could for example use yajl to encode JSON directly to a stream.

Edit: I rewrote the entire code for yajl, because its API is much more compelling and allows for much cleaner code. I also included an example for reading the response in chunks. Here's the streamed JSON array helper I wrote:

require 'yajl'

module JsonArray
  class StreamWriter
    def initialize(out)
      super()
      @out = out
      @encoder = Yajl::Encoder.new
      @first = true
    end

    def <<(object)
      @out << ',' unless @first
      @out << @encoder.encode(object)
      @out << "\n"
      @first = false
    end
  end

  def self.write_stream(app, &block)
    app.stream do |out|
      out << '['
      block.call StreamWriter.new(out)
      out << ']'
    end
  end
end

Usage:

require 'sinatra'
require 'mongoid'

Mongoid.identity_map_enabled = false

# use a server that supports streaming
set :server, :thin

get '/' do
  content_type :json
  JsonArray.write_stream(self) do |json|
    Book.all.each do |book|
      json << book.attributes
    end
  end
end

To decode on the client side you can read and parse the response in chunks, for example with em-http. Note that this solution requires the clients memory to be large enough to store the entire objects array. Here's the corresponding streamed parser helper:

require 'yajl'

module JsonArray
  class StreamParser
    def initialize(&callback)
      @parser = Yajl::Parser.new
      @parser.on_parse_complete = callback
    end

    def <<(str)
      @parser << str
    end
  end

  def self.parse_stream(&callback)
    StreamParser.new(&callback)
  end
end

Usage:

require 'em-http'

parser = JsonArray.parse_stream do |object|
  # block is called when we are done parsing the
  # entire array; now we can handle the data
  p object
end

EventMachine.run do
  http = EventMachine::HttpRequest.new('http://localhost:4567').get
  http.stream do |chunk|
    parser << chunk
  end
  http.callback do
    EventMachine.stop
  end
end

Alternative solution

You could actually simplify the whole thing a lot when you give up the need for generating a "proper" JSON array. What the above solution generates is JSON in this form:

[{ ... book_1 ... }
,{ ... book_2 ... }
,{ ... book_3 ... }
...
,{ ... book_n ... }
]

We could however stream each book as a separate JSON and thus reduce the format to the following:

{ ... book_1 ... }
{ ... book_2 ... }
{ ... book_3 ... }
...
{ ... book_n ... }

The code on the server would then be much simpler:

require 'sinatra'
require 'mongoid'
require 'yajl'

Mongoid.identity_map_enabled = false
set :server, :thin

get '/' do
  content_type :json
  encoder = Yajl::Encoder.new
  stream do |out|
    Book.all.each do |book|
      out << encoder.encode(book.attributes) << "\n"
    end
  end
end

As well as the client:

require 'em-http'
require 'yajl'

parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new do |book|
  # this will now be called separately for every book
  p book
end

EventMachine.run do
  http = EventMachine::HttpRequest.new('http://localhost:4567').get
  http.stream do |chunk|
    parser << chunk
  end
  http.callback do
    EventMachine.stop
  end
end

The great thing is that now the client does not have to wait for the entire response, but instead parses every book separately. However, this will not work if one of your clients expects one single big JSON array.

这篇关于在Heroku上渲染大量JSON的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆