如何在PHP中解析OFX(版本1.0.2)文件? [英] How to parse a OFX (Version 1.0.2) file in PHP?

查看:50
本文介绍了如何在PHP中解析OFX(版本1.0.2)文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 OFX 文件,该文件是从下载的花旗银行,此文件的DTD在 http://www.ofx.net/DownloadPage/Files中定义/ofx102spec.zip (文件OFXBANK.DTD),则OFX文件似乎是 SGML 有效的.我正在尝试使用PHP 5.4.13的 DomDocument ,但收到了一些警告,文件是没有解析.我的代码是:

I have a OFX file downloaded from Citibank, this file has a DTD defined at http://www.ofx.net/DownloadPage/Files/ofx102spec.zip (file OFXBANK.DTD), the OFX file appear to be SGML valid. I'm trying with DomDocument of PHP 5.4.13, but I get several warning and file is not parsed. My Code is:

$file = "source/ACCT_013.OFX";
$dtd = "source/ofx102spec/OFXBANK.DTD";
$doc = new DomDocument();
$doc->loadHTMLFile($file);
$doc->schemaValidate($dtd);
$dom->validateOnParse = true;

OFX文件开始于:

OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE

<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20130331073401
<LANGUAGE>SPA
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>COP
<BANKACCTFROM> ...

我愿意安装和使用Server(Centos)中的任何程序来通过PHP进行调用.

I'm open to install and use any program in Server (Centos) for call from PHP.

PD:此类 http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html 对我不起作用.

PD: This class http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html don't work for me.

推荐答案

首先,即使XML是SGML的子集,有效的SGML文件也不能是格式正确的XML文件.XML更加严格,并且没有使用SGML提供的所有功能.

Well first of all even XML is a subset of SGML a valid SGML file must not be a well-formed XML file. XML is more strict and does not use all features that SGML offers.

由于 DOMDocument 是基于XML(而不是SGML)的,因此这并不是真正的兼容.

As DOMDocument is XML (and not SGML) based, this is not really compatible.

在该问题旁边,请参阅Ofexfin1.doc中的 2.2打开Financial Exchange标头,其中为您说明了

Next to that problem, please see 2.2 Open Financial Exchange Headers in Ofexfin1.doc it explains you that

Open Financial Exchange文件的内容由一组简单的标题组成,后跟由该标题定义的内容

The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header

并进一步:

最后一个标题后面是空白行.然后(对于OFXSGML类型),SGML可读数据以< OFX>标记开头.

A blank line follows the last header. Then (for type OFXSGML), the SGML-readable data begins with the <OFX> tag.

因此找到第一个空白行并剥离所有内容,直到那里.然后,先将SGML转换为XML,然后将SGML部分加载到DOMDocument中:

So locate the first blank line and strip everyhing until there. Then load the SGML part into DOMDocument by converting the SGML into XML first:

$source = fopen('file.ofx', 'r');
if (!$source) {
    throw new Exception('Unable to open OFX file.');
}

// skip headers of OFX file
$headers = array();
$charsets = array(
    1252 => 'WINDOWS-1251',
);
while(!feof($source)) {
    $line = trim(fgets($source));
    if ($line === '') {
        break;
    }
    list($header, $value) = explode(':', $line, 2);
    $headers[$header] = $value;
}

$buffer = '';

// dead-cheap SGML to XML conversion
// see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx
while(!feof($source)) {

    $line = trim(fgets($source));
    if ($line === '') continue;

    $line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line);
    if (substr($line, -1, 1) !== '>') {
        list($tag) = explode('>', $line, 2);
        $line .= '</' . substr($tag, 1) . '>';
    }
    $buffer .= $line ."\n";
}

// use DOMDocument with non-standard recover mode
$doc = new DOMDocument();
$doc->recover = true;
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$save = libxml_use_internal_errors(true);
$doc->loadXML($buffer);
libxml_use_internal_errors($save);

echo $doc->saveXML();

然后,此代码示例输出以下(重新格式化的)XML,该XML还显示DOMDocument正确加载了数据:

This code-example then outputs the following (re-formatted) XML which also shows that DOMDocument loaded the data properly:

<?xml version="1.0"?>
<OFX>
  <SIGNONMSGSRSV1>
    <SONRS>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
      </STATUS>
      <DTSERVER>20130331073401</DTSERVER>
      <LANGUAGE>SPA</LANGUAGE>
    </SONRS>
  </SIGNONMSGSRSV1>
  <BANKMSGSRSV1>
    <STMTTRNRS>
      <TRNUID>0</TRNUID>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
      </STATUS>
      <STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM>
</STMTRS>
    </STMTTRNRS>
  </BANKMSGSRSV1>
</OFX>

我不知道这是否可以针对DTD进行验证.也许这可行.另外,如果SGML没有在同一行上写有标记的值(并且每行只需要一个元素),那么这种脆弱的转换将中断.

I do not know whether or not this can be validated against the DTD then. Maybe this works. Additionally if the SGML is not written with the values that are of a tag on the same line (and only a single element on each line is required), then this fragile conversion will break.

这篇关于如何在PHP中解析OFX(版本1.0.2)文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆