如何解析此OFX文件? [英] How to parse this OFX file?

查看:149
本文介绍了如何解析此OFX文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是来自m bank的原始ofx文件(不用担心,没有敏感内容,我将所有交易切掉了中间部分)

this is an original ofx file as it comes from m bank (no worries, theres nothing sensitive, i cut out the middle part with all the transactions)

开放式金融交易所(OFX)是 数据流格式进行交换 不断发展的财务信息 来自微软的Open Financial 连接性(OFC)和Intuit的开放 交换文件格式.

Open Financial Exchange (OFX) is a data-stream format for exchanging financial information that evolved from Microsoft's Open Financial Connectivity (OFC) and Intuit's Open Exchange file formats.

现在我需要解析这个.我已经看到了问题,但这不是dup,因为我对如何执行此操作很感兴趣.

now i need to parse this. i already saw that question, but this is not a dup because i am interested in how to do this.

我敢肯定,我会找出一些能胜任的正则表达式,但这很丑陋,而且容易出错(如果格式更改,某些字段可能会丢失,格式/空白会有所不同,等等.) )

i am sure i could figure out some clever regexps that would do the job, but that is ugly and error vulnerable (if the format is changed, some fields may be missing, the formatting/white spaces are different etc etc...)

OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
    <SIGNONMSGSRSV1>
        <SONRS>
            <STATUS>
                <CODE>0
                <SEVERITY>INFO
            </STATUS>
            <DTSERVER>20110420000000[+1:CET]
            <LANGUAGE>ENG
        </SONRS>
    </SIGNONMSGSRSV1>
    <BANKMSGSRSV1>
        <STMTTRNRS>
            <TRNUID>1
            <STATUS>
                <CODE>0
                <SEVERITY>INFO
            </STATUS>
            <STMTRS>
                <CURDEF>EUR
                <BANKACCTFROM>
                    <BANKID>20404
                    <ACCTID>02608983629
                    <ACCTTYPE>CHECKING
                </BANKACCTFROM>
                    <BANKTRANLIST>
                    <DTSTART>20110207
                    <DTEND>20110419
                    <STMTTRN>
                        <TRNTYPE>XFER
                        <DTPOSTED>20110205000000[+1:CET]
                        <TRNAMT>-6.12
                        <FITID>C74BD430D5FF2521
                        <NAME>unbekannt
                        <MEMO>BILLA DANKT  1265P K2 05.02.UM 17.49 
                    </STMTTRN>
                    <STMTTRN>
                        <TRNTYPE>XFER
                        <DTPOSTED>20110207000000[+1:CET]
                        <TRNAMT>-10.00
                        <FITID>C74BE0F90A657901
                        <NAME>unbekannt
                        <MEMO>AUTOMAT  13177 KARTE2 07.02.UM 10:22 
                    </STMTTRN>
............................. goes on like this ........................
                    <STMTTRN>
                        <TRNTYPE>XFER
                        <DTPOSTED>20110418000000[+1:CET]
                        <TRNAMT>-9.45
                        <FITID>C7A5071492D14D29
                        <NAME>unbekannt
                        <MEMO>HOFER DANKT  0408P K2 18.04.UM 18.47 
                    </STMTTRN>
                </BANKTRANLIST>
                <LEDGERBAL>
                    <BALAMT>1992.29
                    <DTASOF>20110420000000[+1:CET]
                </LEDGERBAL>
            </STMTRS>
        </STMTTRNRS>
    </BANKMSGSRSV1>
</OFX>

我目前正在使用此代码,该代码可以为我带来预期的效果:

i currently use this code which gives me the desired result:

<?

$files = array();
$files[] = '***_2011001.ofx';
$files[] = '***_2011002.ofx';
$files[] = '***_2011003.ofx';

system('touch file.csv && chmod 777 file.csv');
$fp = fopen('file.csv', 'w');

foreach($files as $file) {
    echo $file."...\n";
    $content = file_get_contents($file);

    $content = str_replace("\n","",$content);
    $content = str_replace(" ","",$content);

    $regex = '|<STMTTRN><TRNTYPE>(.+?)<DTPOSTED>(.+?)<TRNAMT>(.+?)<FITID>(.+?)<NAME>(.+?)<MEMO>(.+?)</STMTTRN>|';


    echo preg_match_all($regex,$content,$matches,PREG_SET_ORDER)." matches... \n";


    foreach($matches as $match) {
        echo ".";
        array_shift($match);
        fputcsv($fp, $match);
    }
    echo "\n";
}
echo "done.\n";
fclose($fp);

这真的很难看,如果这是一个有效的xml文件,我会为此亲自杀死自己,但是如何做得更好呢?

this is really ugly and if this was a valid xml file i would personally kill myself for that, but how to do it better?

推荐答案

考虑到该文件不是XML 甚至不是SGML ,您的代码似乎还不错.您唯一可以做的就是尝试制作一个更通用的类似SAX的解析器.也就是说,您只需一次在输入流中浏览一个块(其中的块可以是任何内容,例如一行或只是一定数量的字符).然后,每次遇到<ELEMENT>时都要调用一个回调函数.您甚至可以像建立一个解析器类一样幻想,在其中可以注册侦听特定元素的回调函数.

Your code seems fine, considering that the file isn't XML or even SGML. The only thing you could do is try to make a more generic SAX-like parser. That is, you simply go through the input stream one block at a time (where block can be anything, e.g. a line or simply a set amount of characters). Then, call a callback function every time you encounter an <ELEMENT>. You can even go as fanciful as building a parser class where you can register callback functions that listen to specific elements.

它将更加通用,并且不那么丑陋"(对于丑陋"的某些定义),但是将需要维护更多的代码.如果您需要大量(或以许多不同的形式)分析此文件格式,那就很高兴,也很高兴.如果仅是您发布的代码,则只需亲吻.

It will be more generic and less "ugly" (for some definition of "ugly") but it will be more code to maintain. Nice to do and nice to have if you need to parse this file format a lot (or in a lot of different variations). If your posted code is the only place you do this then just KISS.

这篇关于如何解析此OFX文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆