嵌套信息结构解析纯文本文件的最佳方法 [英] best way to parse plain text file with a nested information structure

查看:236
本文介绍了嵌套信息结构解析纯文本文件的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文本文件中包含数百个此类条目(格式为MT940银行对帐单)

The text file has hundreds of these entries (format is MT940 bank statement)

{1:F01AHHBCH110XXX0000000000}{2:I940X           N2}{3:{108:XBS/091502}}{4:
:20:XBS/091202/0001
:25:5887/507004-50
:28C:140/1
:60F:C0914CHF7789,
:61:0912021202D36,80NTRFNONREF//0887-1202-29-941
04392579-0 LUTHY + xxx, ZUR
:86:6034?60LUTHY + xxxx, ZUR vom 01.12.09 um 16:28 Karten-Nr. 2232
2579-0
:62F:C091202CHF52,2
:64:C091302CHF52,2
-}

这应该放入像哈希一样的数组

This should go into an Array of Hashes like

[{"1"=>"F01AHHBCH110XXX0000000000"},
  "2"=>"I940X           N2", 
   3 => {108=>"XBS/091502"}
etc.
} ]

我在树顶尝试过它,但这似乎不是正确的方法,因为它更多地是您想进行计算的内容,而我只是想获得信息.

I tried it with tree top, but it seemed not to be the right way, because it's more for something you want to do calculations on, and I just want the information.

grammar Mt940

  rule document
    part1:string spaces [:|/] spaces part2:document 
    {
      def eval(env={})
        return part1.eval, part2.eval
      end
    }
    / string
    /  '{' spaces document spaces '}' spaces
    {
      def eval(env={})
        return [document.eval]
      end
    }
  end
end

我也尝试过使用正则表达式

I also tried with a regular expression

matches = str.scan(/\A[{]?([0-9]+)[:]?([^}]*)[}]?\Z/i)

但是使用递归很难...

but it's difficult with recursion ...

我该如何解决这个问题?

How can I solve this problem?

推荐答案

有几种Java和PHP可用的开源MT940解析器.您可以查看源代码并将其移植到Ruby.如果您使用的是JRuby,则可以在ruby代码中使用java解析器.

There are several open source MT940 parsers available in Java and PHP. You can look at the source code and port it to Ruby. If you are on JRuby then you can use the java parser in your ruby code.

其他选择是使用 OFX gem . gem解析OFX文件.由于您的文件为MT940格式,因此您必须使用一种可用的免费转换器将文件转换为OFX格式.如果要导入批处理作业等,此方法很实用.

Other option is to use the OFX gem. The gem parses OFX files. Since your file is in MT940 format, you have to convert the file to OFX format using one of the free converters available. This approach is practical if you are importing in a batch job etc.

参考

将MT940转换为OFX转换器1

将MT940转换为OFX Converter 2

这篇关于嵌套信息结构解析纯文本文件的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆