如何写Rake任务导入数据到Rails应用程序? [英] How to write Rake task to import data to Rails app?
问题描述
目标:使用CRON任务(或其他计划的事件)以每晚从现有系统导出数据来更新数据库。
所有数据在现有系统中创建/更新/删除。网站没有直接与这个系统集成,因此rails应用程序只需要反映出现在数据导出中的更新。
我有一个 .txt
文件,大约5,000个产品,如下所示:
1234:名称:attr 1:attr 2:ABC制造:2222
A134 2447
...
所有值都是用双引号括起来的字符串$ c>)以冒号分隔(:
)
:
-
id
:唯一ID;字母数字 -
name
:产品名称;任何字符 - 属性列:字符串;任何字符(例如大小,尺寸)
-
vendor_name
:string;任何字符 -
这里的最佳做法是什么?是否可以删除产品和供应商表,并在每个周期重写新的数据?或者最好只添加新行并更新现有行?
注意:
- 此数据将用于生成
订单
,将通过每夜数据库导入持续。OrderItems
将需要连接到数据文件中指定的产品ID,因此我们不能依赖自动递增的主键对每个进口;需要使用唯一的字母数字ID来将产品
加入order_items
。 - 我希望进口商正常化供应商数据
- 我不能使用vanilla SQL语句,所以我想我需要写一个
rake (...)
和Vendor.create(...)
code>样式语法。 - 这将在EngineYard上实现
解决方案我不会在每个周期删除产品和供应商表。这是一个rails应用程序?
如果你有一个产品活动记录模型,你可以这样做:
p = Product.find_or_initialize_by_identifier(< id from file>)
p.name =< name from file>
p.size =< size from file>
etc ...
p.save!
find_or_initialize将通过您指定的ID在数据库中查找产品,如果不能找到它,它会创建一个新的。以这种方式做这件事真的很方便,是ActiveRecord只会保存到数据库,如果任何数据已更改,它会自动更新表中的任何时间戳字段(updated_at)。还有一件事,因为你将通过标识符(从文件中的id)查找记录,我会确保在数据库中的该字段上添加一个索引。
要做一个rake任务来完成这个,我会添加一个rake文件到您的rails应用程序的lib / tasks目录。我们将它称为data.rake。
在data.rake里面,它看起来像这样:
namespace:data do
desc从数据库导入数据到数据库
task:import => :environment do
file = File.open(< file to import>)
file.each do | line |
attrs = line.split(:)
p = Product.find_or_initialize_by_identifier(attrs [0])
p.name = attrs [1]
etc ...
p.save!
end
end
end
比调用rake任务,请从命令行使用rake data:import。
Goal: Using a CRON task (or other scheduled event) to update database with nightly export of data from an existing system.
All data is created/updated/deleted in an existing system. The website does no directly integrate with this system, so the rails app simply needs to reflect the updates that appear in the data export.
I have a
.txt
file of ~5,000 products that looks like this:"1234":"product name":"attr 1":"attr 2":"ABC Manufacturing":"2222" "A134":"another product":"attr 1":"attr 2":"Foobar World":"2447" ...
All values are strings enclosed in double quotes (
"
) that are separated by colons (:
)Fields are:
id
: unique id; alphanumericname
: product name; any character- attribute columns: strings; any character (e.g., size, weight, color, dimension)
vendor_name
: string; any charactervendor_id
: unique vendor id; numeric
Vendor information is not normalized in the current system.
What are best practices here? Is it okay to delete the products and vendors tables and rewrite with the new data on every cycle? Or is it better to only add new rows and update existing ones?
Notes:
- This data will be used to generate
Orders
that will persist through nightly database imports.OrderItems
will need to be connected to the product ids that are specified in the data file, so we can't rely on an auto-incrementing primary key to be the same for each import; the unique alphanumeric id will need to be used to joinproducts
toorder_items
. - Ideally, I'd like the importer to normalize the Vendor data
- I cannot use vanilla SQL statements, so I imagine I'll need to write a
rake
task in order to useProduct.create(...)
andVendor.create(...)
style syntax. - This will be implemented on EngineYard
解决方案I wouldn't delete the products and vendors tables on every cycle. Is this a rails app? If so there are some really nice ActiveRecord helpers that would come in handy for you.
If you have a Product active record model, you can do:
p = Product.find_or_initialize_by_identifier(<id you get from file>) p.name = <name from file> p.size = <size from file> etc... p.save!
The find_or_initialize will lookup the product in the database by the id you specify, and if it can't find it, it will create a new one. The really handy thing about doing it this way, is that ActiveRecord will only save to the database if any of the data has changed, and it will automatically update any timestamp fields you have in the table (updated_at) accordingly. One more thing, since you would be looking up records by the identifier (id from the file), I would make sure to add an index on that field in the database.
To make a rake task to accomplish this, I would add a rake file to the lib/tasks directory of your rails app. We'll call it data.rake.
Inside data.rake, it would look something like this:
namespace :data do desc "import data from files to database" task :import => :environment do file = File.open(<file to import>) file.each do |line| attrs = line.split(":") p = Product.find_or_initialize_by_identifier(attrs[0]) p.name = attrs[1] etc... p.save! end end end
Than to call the rake task, use "rake data:import" from the command line.
这篇关于如何写Rake任务导入数据到Rails应用程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 此数据将用于生成