解析& Ruby中的日期#strptime,更优雅的方式还是最佳实践? [英] Speed up Date#parse & Date#strptime in Ruby, more elegant way or best practice?

查看:236
本文介绍了解析& Ruby中的日期#strptime,更优雅的方式还是最佳实践?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题源于使用日期格式化字符串处理大文本的另一个性能问题。



将数据从csv文件加载到ruby数组后,最无效的部分是将360,000个日期格式化的字符串对象解析为日期对象。它需要超过50%的CPU时间。



有一些关于在SO中解析字符串到日期的最有效方式的问题。但大多数已经过时了,他们都没有考虑到这种情况,在这36万条记录中,只有5个日期对象真的应该被解析。



更常见的是,对于企业应用程序,所有需要的日期可能在5到10年之内,约为2,000到4,000个日期。如果一天中只有100个数据记录,我需要从文件或数据库中获取,解析日期和创建日期对象所用的CPU时间的99%是不必要的。



这是我的尝试



定义一个 StaticDate class通过存储之前解析的日期对象来提高性能。

  require'date'
class StaticDate
@@ all = {}
def self .instance(p1 = nil,p2 = nil,p3 = nil,p4 = Date :: JULIAN)
@@ all [p1 * 10000 + p2 * 100 + p3] || = Date.new p1,p2, p3,p4
end

def self.parse(date_str)
@@ all [date_str] || = Date.parse date_str
end

def self.strptime(date_str,format_str)
@@ all [date_str + format_str] || = Date.strptime date_str,format_str
end
end



我的问题



我知道我的代码有一个重复的味道功能类,但在这种情况下,360,000条记录,它的速度为$ code> Date#strptime 为13x,加速为$ code> Date#parse 。所以我认为改进和重构是非常值得的:




  • 是否有任何宝石或插件已经以更优雅的方式实现这些东西?或者任何改进或重构这些代码的建议都是值得赞赏的。

  • 由于我们都知道所有的ruby日期对象都是不可变的。您认为将这些功能扩展到ruby日期类是必需的吗?

  • 还有其他最佳做法是在rails应用程序中获得最佳性能的日期对象操作?(如果您认为广泛,请忽略此问题。)



当然我做错了,非英语,所以任何帮助改善这个课程或这个问题,他将非常感谢。



提前感谢



我的尝试的基准



而不是从文件加载数据,我创建一个360,000行的输入数组,如下所示:

  a = [['a','2014-6-1','1'],
['a','2014-6-2 ','2'],
['a','2014-6-4','3'],
['a','2014-6-5','4'] ,
['b','2014-6-1','1'],
['b','2014-6-2','2'],
[ 'b','2014-6-3','3'],
['b','2014-6-4' '4'],
['b','2014-6-5','5']] * 40000

基准代码

  b = a.map {| x | x + x [1] .split(' - ')。map(&:to_i)} 
Benchmark.bm {| x |
x.report('0。Date#New 1 date'){360000.times {Date.new(2014,1,1)}}
x.report('1。Date#New' ){b.each {| x | Date.new(x [3],x [4],x [5])}}
x.report('2。Date#Strptime'){a.each {| x | Date.strptime(x [1],%Y-%m-%d)}}
x.report('3。Date#Parse'){a.each {| x | Date.parse(x [1])}}
x.report('4。StaticDate#New'){b.each {| x | StaticDate.instance(x [3],x [4],x [5])}}
x.report('5。StaticDate#StrP'){a.each {| x | StaticDate.strptime(x [1],%Y-%m-%d)}}
x.report('6。StaticDate#Parse'){a.each {| x | StaticDate.parse(x [1])}}
x.report('7。split to date'){a.each {| x | Date.new(*(x [1] .split(' - ')。map(&:to_i)))}}

}
/ pre>

基准测试结果:

 用户系统总真实
0.日期#新1日期0.297000 0.000000 0.297000(0.299017)
1.日期#新0.390000 0.000000 0.390000(0.384022)
2.日期#时间段2.293000 0.000000 2.293000( 2.306132)
3.日期#解析7.113000 0.000000 7.113000(7.101406)
4. StaticDate#新的0.188000 0.000000 0.188000(0.188011)
5. StaticDate#StrP 0.546000 0.000000 0.546000(0.558032)
6. StaticDate#解析0.171000 0.000000 0.171000(0.167010)
7.拆分日期1.623000 0.000000 1.623000(1.641094)


解决方案

根据 日期文档


所有日期对象都是不可变的;因此不能修改自己。


如果从字符串创建日期实例是您的瓶颈,则可以使用哈希创建和存储它们:

  date_store = Hash.new {| h,k | h [k] = Date.strptime(k,'%Y-%m-%d')} 

date_store ['2014-6-1']#=> #< Date:2014-06-01((2456810j,0s,0n),+ 0s,2299161j)>
date_store ['2014-6-2']#=> #< Date:2014-06-02((2456811j,0s,0n),+ 0s,2299161j)>
date_store ['2014-6-3']#=> #< Date:2014-06-03((2456812j,0s,0n),+ 0s,2299161j)>

所有结果都保存在哈希中:

  date_store 
#=> {2014-6-1=>#<日期:2014-06-01((2456810j,0s,0n),+ 0s,2299161j)>
#2014-6-2 =<日期:2014-06-02((2456811j,0s,0n),+ 0s,2299161j)>
#2014-6-3=>#<日期: 2014-06-03((2456812j,0s,0n),+ 0s,2299161j)>}



Date 实例。


This question is derived from another performance issue of processing a large text with date formatted string.

After loading data from csv file in to a ruby array, the most inefficient part is parse those 360,000 date formatted string objects into date objects. It takes more than 50% cpu time.

There are some question about the most efficient way of parsing string into date in SO. But most of them are out of date, and none of them considered this situation that there are only 5 date objects really should be parsed among all those 360,000 records.

More commonly, for an enterprise application, all the dates needed may be within 5 or 10 years, that's about 2,000 to 4,000 dates. If there are only 100 data records for one day I need to fetch from file or DB, 99% of the CPU time used on parsing dates and create date objects are not necessary.

Here's my attempt

Define an StaticDate class to improve the performance by storing the date objects parsed before.

require 'date'
class StaticDate
  @@all={}
  def self.instance(p1 = nil, p2 = nil, p3 = nil, p4 = Date::JULIAN)
    @@all[p1*10000+p2*100+p3] ||= Date.new p1, p2, p3, p4
  end

  def self.parse( date_str)
    @@all[date_str] ||= Date.parse date_str
  end

  def self.strptime( date_str, format_str)
    @@all[date_str + format_str] ||= Date.strptime date_str, format_str
  end
end

My questions

I known my code had bad smell of duplicating a same functional class, but in this scenario of 360,000 records, it gets 13x speed up for Date#strptime and 41x speed up for Date#parse. So I think it's really worth to improve and refactory:

  • Is there any gem or plugin already implement these stuff with more elegant way? Or any suggestion to improve or refactory these code is appreciated.
  • Since we all know that all ruby date objects are immutable. Do you think it's neccessary to extend these features to ruby date class?
  • Is there any other best practice of getting best performance of date object operations in an rails application? (Omit this question if you think it's to broad.)

Sure I'm doing something wrong and I'm non-English, so any help to improve this class or this question will he greatly appreciated.

Thanks in advance

Benchmark of my attempt

Instead of loading data from file, I create an input array of 360,000 rows like this:

a= [['a', '2014-6-1', '1'],
    ['a', '2014-6-2', '2'],
    ['a', '2014-6-4', '3'],
    ['a', '2014-6-5', '4'],
    ['b', '2014-6-1', '1'],
    ['b', '2014-6-2', '2'],
    ['b', '2014-6-3', '3'],
    ['b', '2014-6-4', '4'],
    ['b', '2014-6-5', '5']]*40000

Benchmark code:

b=a.map{|x| x + x[1].split('-').map(& :to_i) }
Benchmark.bm {|x|
  x.report('0. Date#New 1 date '){ 360000.times{ Date.new(2014,1,1)} }
  x.report('1. Date#New        '){ b.each{ |x| Date.new(x[3],x[4],x[5])} }
  x.report('2. Date#Strptime   '){ a.each{ |x| Date.strptime(x[1],"%Y-%m-%d")} }
  x.report('3. Date#Parse      '){ a.each{ |x| Date.parse(x[1])} }
  x.report('4. StaticDate#New  '){ b.each{ |x| StaticDate.instance( x[3],x[4],x[5]) } }
  x.report('5. StaticDate#StrP '){ a.each{ |x| StaticDate.strptime(x[1],"%Y-%m-%d")} }
  x.report('6. StaticDate#Parse'){ a.each{ |x| StaticDate.parse(x[1])} }
  x.report('7. split to date   '){ a.each{ |x| Date.new(*(x[1].split('-').map(& :to_i)))} }

}  

Benchmark result:

                         user     system      total        real
0. Date#New 1 date   0.297000   0.000000   0.297000 (  0.299017)
1. Date#New          0.390000   0.000000   0.390000 (  0.384022)
2. Date#Strptime     2.293000   0.000000   2.293000 (  2.306132)
3. Date#Parse        7.113000   0.000000   7.113000 (  7.101406)
4. StaticDate#New    0.188000   0.000000   0.188000 (  0.188011)
5. StaticDate#StrP   0.546000   0.000000   0.546000 (  0.558032)
6. StaticDate#Parse  0.171000   0.000000   0.171000 (  0.167010)
7. split to date     1.623000   0.000000   1.623000 (  1.641094)

解决方案

According to the Date documentation:

All date objects are immutable; hence cannot modify themselves.

If creating date instances from a string is your bottleneck, you could use a hash to create and store them:

date_store = Hash.new { |h, k| h[k] = Date.strptime(k, '%Y-%m-%d') }

date_store['2014-6-1'] #=> #<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>
date_store['2014-6-2'] #=> #<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>
date_store['2014-6-3'] #=> #<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>

All results are saved in the hash:

date_store
#=> {"2014-6-1"=>#<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>,
#    "2014-6-2"=>#<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>,
#    "2014-6-3"=>#<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>}

Fetching a known key is merely a lookup, no parsing is performed and no new Date instances have to be created.

这篇关于解析&amp; Ruby中的日期#strptime,更优雅的方式还是最佳实践?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆