Ruby on Rails-存储和访问大数据集 [英] Ruby on Rails - Storing and accessing large data sets

查看:98
本文介绍了Ruby on Rails-存储和访问大数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难管理Ruby on Rails应用程序中大型数据集的存储和访问.简而言之,这是我的应用程序:我正在执行Dijkstra的算法,因为它与公路网有关,然后使用google maps API显示其访问的节点.我正在使用美国公路网的开放数据集来构建图重复访问链接中给定的两个txt文件,但是我无法将这些数据存储在我的应用中.

I am having a hard time managing the storage and access of a large dataset within a Ruby on Rails application. Here is my application in a nutshell: I am performing Dijkstra's algorithm as it pertains to a road network, and then displaying the nodes that it visits using the google maps API. I am using an open dataset of the US road network to construct the graph by iterating over two txt files given in the link, but I am having trouble storing this data in my app.

我的印象是,像这样的大型数据集不是ActiveRecord对象-我不需要修改此数据的内容,而是可以访问它并将其本地缓存在散列中以在其中执行ruby方法它.我已经尝试了一些方法,但是遇到了麻烦.

I am under the impression that a large dataset like this not an ActiveRecord object - I don't need to modify the contents of this data, rather be able to access it and cache it locally in a hash to perform ruby methods on it. I have tried a few things but I am running into trouble.

  1. 我认为解析txt文件并以yml格式存储图形是最有意义的.然后,我将能够将图作为种子数据加载到数据库中,并使用Node.all或类似的东西来获取图.不幸的是,yml文件变得太大,rails无法处理.运行Rake会使系统无限运行100%...

  1. I figured that it would make most sense to parse the txt files and store the graph in yml format. I would then be able to load the graph into a DB as seed data, and grab the graph using Node.all, or something along those lines. Unfortunately, the yml file becomes too large for rails to handle. Running a Rake causes the system to run at 100% for infinity...

接下来我想到了,由于我不需要修改数据,因此每次应用程序加载时,只要将其初始化"即可就可以创建图形.但是我不完全知道将这段代码放在哪里,我需要运行一些方法或至少一个数据块.然后将其存储在我可以在所有控制器/方法中访问的某种全局/会话变量中.我不想传递这么大的数据集,而只能从任何地方访问它.

Next I figured, well since I don't need to modify the data, I can just create the graph every time the application loads as start of its "initialization." But I don't exactly know where to put this code, I need to run some methods, or at least a block of data. And then store it in some sort of global/session variable that I can access in all controllers/methods. I don't want to be passing this large dataset around, just have access to it from anywhere.

这是我目前的操作方式,但这是不可接受的.我正在解析在控制器操作上创建图的文本文件,并希望它在服务器超时之前得到计算.

This is the way I am currently doing it, but it is just not acceptable. I am parsing the text files that creates the graph on a controller action, and hoping that it gets computing before the server times out.

理想情况下,我会将图形存储在数据库中,这样我就可以获取全部内容以供本地使用.或至少在应用程序加载时仅需要对数据进行一次解析,然后我就可以从不同的页面视图等访问数据.我觉得这将是最有效的,但是我遇到了障碍片刻.

Ideally, I would store the graph in a database that I could grab the entire contents to use locally. Or at least only require the parsing of the data once as the application loads and then I would be able to access it from different page views, etc.. I feel like this would be the most efficient, but I am running into hurdles at the moment.

有什么想法吗?

推荐答案

您在正确的道路上.有两种方法可以做到这一点.一种是,在模型类中,在任何方法之外,像下面的示例一样设置常量:

You're on the right path. There are a couple of ways to do this. One is, in your model class, outside of any method, set up constants like these examples:

MY_MAP = Hash[ActiveRecord::Base.connection.select_all('SELECT thingone, thingtwo from table').map{|one| [one['thingone'], one['thingtwo']]}]
RAW_DATA = `cat the_file`  # However you read and parse your file
CA = State.find_by_name 'California'
NY = State.find_by_name 'New York'

这些将在生产应用程序中执行一次:加载模型的类时.另一种选择:在初始化程序或其他配置文件中进行初始化.请参阅config/initializers目录.

These will get executed once in a production app: when the model's class is loaded. Another option: do this initialization in an initializer or other config file. See the config/initializers directory.

这篇关于Ruby on Rails-存储和访问大数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆