如何写Rake任务导入数据到Rails应用程序? [英] How to write Rake task to import data to Rails app?

查看:179
本文介绍了如何写Rake任务导入数据到Rails应用程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:使用CRON任务(或其他计划的事件)以每晚从现有系统导出数据来更新数据库。



所有数据在现有系统中创建/更新/删除。网站没有直接与这个系统集成,因此rails应用程序只需要反映出现在数据导出中的更新。



我有一个 .txt 文件,大约5,000个产品,如下所示:

 1234:名称:attr 1:attr 2:ABC制造:2222
A134 2447
...

所有值都是用双引号括起来的字符串$ c>)以冒号分隔(






  • id :唯一ID;字母数字

  • name :产品名称;任何字符

  • 属性列:字符串;任何字符(例如大小,尺寸)

  • vendor_name :string;任何字符



  • 这里的最佳做法是什么?是否可以删除产品和供应商表,并在每个周期重写新的数据?或者最好只添加新行并更新现有行?



    注意:


    1. 此数据将用于生成订单,将通过每夜数据库导入持续。 OrderItems 将需要连接到数据文件中指定的产品ID,因此我们不能依赖自动递增的主键对每个进口;需要使用唯一的字母数字ID来将产品加入 order_items

    2. 我希望进口商正常化供应商数据

    3. 我不能使用vanilla SQL语句,所以我想我需要写一个 rake (...) Vendor.create(...) code>样式语法。

    4. 这将在EngineYard上实现


    解决方案

    我不会在每个周期删除产品和供应商表。这是一个rails应用程序?



    如果你有一个产品活动记录模型,你可以这样做:

      p = Product.find_or_initialize_by_identifier(< id from file>)
    p.name =< name from file>
    p.size =< size from file>
    etc ...
    p.save!

    find_or_initialize将通过您指定的ID在数据库中查找产品,如果不能找到它,它会创建一个新的。以这种方式做这件事真的很方便,是ActiveRecord只会保存到数据库,如果任何数据已更改,它会自动更新表中的任何时间戳字段(updated_at)。还有一件事,因为你将通过标识符(从文件中的id)查找记录,我会确保在数据库中的该字段上添加一个索引。



    要做一个rake任务来完成这个,我会添加一个rake文件到您的rails应用程序的lib / tasks目录。我们将它称为data.rake。



    在data.rake里面,它看起来像这样:

      namespace:data do 
    desc从数据库导入数据到数据库
    task:import => :environment do
    file = File.open(< file to import>)
    file.each do | line |
    attrs = line.split(:)
    p = Product.find_or_initialize_by_identifier(attrs [0])
    p.name = attrs [1]
    etc ...
    p.save!
    end
    end
    end

    比调用rake任务,请从命令行使用rake data:import。


    Goal: Using a CRON task (or other scheduled event) to update database with nightly export of data from an existing system.

    All data is created/updated/deleted in an existing system. The website does no directly integrate with this system, so the rails app simply needs to reflect the updates that appear in the data export.

    I have a .txt file of ~5,000 products that looks like this:

    "1234":"product name":"attr 1":"attr 2":"ABC Manufacturing":"2222"
    "A134":"another product":"attr 1":"attr 2":"Foobar World":"2447"
    ...
    

    All values are strings enclosed in double quotes (") that are separated by colons (:)

    Fields are:

    • id: unique id; alphanumeric
    • name: product name; any character
    • attribute columns: strings; any character (e.g., size, weight, color, dimension)
    • vendor_name: string; any character
    • vendor_id: unique vendor id; numeric

    Vendor information is not normalized in the current system.

    What are best practices here? Is it okay to delete the products and vendors tables and rewrite with the new data on every cycle? Or is it better to only add new rows and update existing ones?

    Notes:

    1. This data will be used to generate Orders that will persist through nightly database imports. OrderItems will need to be connected to the product ids that are specified in the data file, so we can't rely on an auto-incrementing primary key to be the same for each import; the unique alphanumeric id will need to be used to join products to order_items.
    2. Ideally, I'd like the importer to normalize the Vendor data
    3. I cannot use vanilla SQL statements, so I imagine I'll need to write a rake task in order to use Product.create(...) and Vendor.create(...) style syntax.
    4. This will be implemented on EngineYard

    解决方案

    I wouldn't delete the products and vendors tables on every cycle. Is this a rails app? If so there are some really nice ActiveRecord helpers that would come in handy for you.

    If you have a Product active record model, you can do:

    p = Product.find_or_initialize_by_identifier(<id you get from file>)
    p.name = <name from file>
    p.size = <size from file>
    etc...
    p.save!
    

    The find_or_initialize will lookup the product in the database by the id you specify, and if it can't find it, it will create a new one. The really handy thing about doing it this way, is that ActiveRecord will only save to the database if any of the data has changed, and it will automatically update any timestamp fields you have in the table (updated_at) accordingly. One more thing, since you would be looking up records by the identifier (id from the file), I would make sure to add an index on that field in the database.

    To make a rake task to accomplish this, I would add a rake file to the lib/tasks directory of your rails app. We'll call it data.rake.

    Inside data.rake, it would look something like this:

    namespace :data do
      desc "import data from files to database"
      task :import => :environment do
        file = File.open(<file to import>)
        file.each do |line|
          attrs = line.split(":")
          p = Product.find_or_initialize_by_identifier(attrs[0])
          p.name = attrs[1]
          etc...
          p.save!
        end
      end
    end
    

    Than to call the rake task, use "rake data:import" from the command line.

    这篇关于如何写Rake任务导入数据到Rails应用程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆