XML文件到PHPMyAdmin数据库 [英] XML file to a PHPMyAdmin Database

查看:61
本文介绍了XML文件到PHPMyAdmin数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件(第一个条目的片段加上下面的标题),我想将其放入新创建的数据库中.看起来XML将至少有2个不同的表.

我希望它也忽略一些我不需要的特定行.

此外,我应该首先创建表和列吗?还是界面可以为我创建它们?

<?xml version="1.0" encoding="UTF-8" ?>
<npidata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.npi.gov.au/includes/download/npi-data-download.xsd">
<report>
    <year>2014</year>
    <registered_business_name>W H HECK &amp; SONS PTY LTD</registered_business_name> - IGNORE
    <abn>91009661401</abn> - IGNORE
    <acn>009661401</acn> - IGNORE
    <sub_threshold>N</sub_threshold>
    <data_start_date>2013-07-01</data_start_date>
    <data_end_date>2014-06-30</data_end_date>
    <first_published_date>2015-03-31</first_published_date>
    <last_updated_date>2015-03-31</last_updated_date>
    <number_of_employees>55</number_of_employees> - IGNORE
    <facility_name>Rocky Point Sugarmill</facility_name>
    <jurisdiction_facility_id>Q014HEK001</jurisdiction_facility_id> - PRIMARY KEY
    <jurisdiction>QLD</jurisdiction> - IGNORE
    <site_address_street>MILL RD</site_address_street> - IGNORE
    <site_address_suburb>Woongoolba</site_address_suburb>
    <site_address_state>QLD</site_address_state>
    <site_address_postcode>4207</site_address_postcode>
    <main_activities>Sugarmilling</main_activities>
    <site_latitude>-27.735666</site_latitude>
    <site_longitude>153.327611</site_longitude>
    <anzsic_codes> - IGNORE
        <anzsic_code> - IGNORE
            <type>Primary</type><code>1181</code> - IGNORE
            <name>Sugar Manufacturing</name> - IGNORE
        </anzsic_code> - IGNORE
    </anzsic_codes> - IGNORE
    <public_contact> - IGNORE
        <title>Mr</title>
        <name>David</name> - IGNORE
        <surname>Heck</surname> - IGNORE
        <position>Managing Director</position> - IGNORE
        <phone>(07) 5547 4500</phone> - IGNORE
        <fax>(07) 5546 1233</fax> - IGNORE
        <email>david@heckgroup.com.au</email> - IGNORE
    </public_contact> - IGNORE
    <emissions>
        <emission>
            <substance>Ethanol</substance>
            <destination>Water</destination>
            <quantity_in_kg>19518</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Ethanol</substance>
            <destination>Air Fugitive</destination>
            <quantity_in_kg>2114.45</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Ethanol</substance>
            <destination>Air Total</destination>
            <quantity_in_kg>2114.45</quantity_in_kg>
        </emission>
        <emission>
            <substance>Hydrochloric acid</substance>
            <destination>Land</destination>
            <quantity_in_kg>2282.95</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Particulate Matter 10.0 um</substance>
            <destination>Air Fugitive</destination>
            <quantity_in_kg>398.91285</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Particulate Matter 10.0 um</substance>
            <destination>Air Total</destination>
            <quantity_in_kg>398.91285</quantity_in_kg>
        </emission>
    </emissions>
    <transfers></transfers> - IGNORE
    <pollution_control_devices></pollution_control_devices> - IGNORE
    <cleaner_production_activities></cleaner_production_activities> - IGNORE
</report>

此XML文件包含11,000多个条目,因此手动输入是不可能的.

解决方案

您将必须首先对文件进行一些操作.确实,phpMyAdmin可以导入和导出XML数据,但是phpMyAdmin使用的格式与您的数据当前所使用的格式不同.

关于XML数据的不幸之处在于XML可以很好地包装任何类型的数据,但是对于系统之间的互操作性,它要求两个系统使用相同的方言,而在实际情况下我几乎看不到这种方言. /p>

是否有可能通过其他方式获取数据-例如导出为其他格式?

这是phpMyAdmin期望数据格式化的示例;您可以通过导出现有表来查看更多详细信息,因为我只是从中间选择了一点.

<database name="stackoverflowtest">
    <table name="tblUsers">
        <column name="id">1</column>
        <column name="name">Bob</column>
    </table>
    <table name="15">
        <column name="id">2</column>
        <column name="name">Jim</column>
    </table>
</database>

如果您无法获取其他格式的数据,建议您使用一些文本解析器来处理您拥有的数据.我认为CSV可能是您导入时使用的更好格式,并且通过一些工作,您可能会想出awk/sed/perl/python脚本来提出所需的格式(或者可能是其他任何语言,但是在处理文本文件时很常见).您甚至可能会找到XML解析库函数,该函数可让您将XML文件读取为实际的XML属性,而不必自己解析.这可能是理想的解决方案,但是即使没有这种解决方案,您也可以查找包含<year></year>的每一行并从中间提取数据.添加一个逗号,然后重复<data_start_date>,依此类推,直到到达<report>/</report>块的末尾并添加换行符.使该循环遍历整个XML文件,您将拥有一个格式良好的CSV文件,phpMyAdmin将很乐意加载该文件. MySQL本身使用的XML风格略有不同(仍然与您现有的XML不同).看起来更像

<row>
    <field name="id">1</field>
    <field name="name">Bob</field>
</row>
<row>
    <field name="id">2</field>
    <field name="name">Jim</field>
</row>

我认为这将比使用XML更容易.不幸的是,该解决方案将需要一些工作,但是希望我已经为您提供了一些启动资源.有很多教程和示例使用我提到的工具来处理文件,尽管我还是建议您先浏览XML友好的Python,Perl甚至PHP库.

I have an XML file (snippet of first entry plus header below) and I want to put it into a newly created database. It looks like the XML will be at least 2 different tables.

I want it to also ignore some particular rows that I won't be needing.

Also, should I create the tables and columns first? Or can the interface create them for me?

<?xml version="1.0" encoding="UTF-8" ?>
<npidata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.npi.gov.au/includes/download/npi-data-download.xsd">
<report>
    <year>2014</year>
    <registered_business_name>W H HECK &amp; SONS PTY LTD</registered_business_name> - IGNORE
    <abn>91009661401</abn> - IGNORE
    <acn>009661401</acn> - IGNORE
    <sub_threshold>N</sub_threshold>
    <data_start_date>2013-07-01</data_start_date>
    <data_end_date>2014-06-30</data_end_date>
    <first_published_date>2015-03-31</first_published_date>
    <last_updated_date>2015-03-31</last_updated_date>
    <number_of_employees>55</number_of_employees> - IGNORE
    <facility_name>Rocky Point Sugarmill</facility_name>
    <jurisdiction_facility_id>Q014HEK001</jurisdiction_facility_id> - PRIMARY KEY
    <jurisdiction>QLD</jurisdiction> - IGNORE
    <site_address_street>MILL RD</site_address_street> - IGNORE
    <site_address_suburb>Woongoolba</site_address_suburb>
    <site_address_state>QLD</site_address_state>
    <site_address_postcode>4207</site_address_postcode>
    <main_activities>Sugarmilling</main_activities>
    <site_latitude>-27.735666</site_latitude>
    <site_longitude>153.327611</site_longitude>
    <anzsic_codes> - IGNORE
        <anzsic_code> - IGNORE
            <type>Primary</type><code>1181</code> - IGNORE
            <name>Sugar Manufacturing</name> - IGNORE
        </anzsic_code> - IGNORE
    </anzsic_codes> - IGNORE
    <public_contact> - IGNORE
        <title>Mr</title>
        <name>David</name> - IGNORE
        <surname>Heck</surname> - IGNORE
        <position>Managing Director</position> - IGNORE
        <phone>(07) 5547 4500</phone> - IGNORE
        <fax>(07) 5546 1233</fax> - IGNORE
        <email>david@heckgroup.com.au</email> - IGNORE
    </public_contact> - IGNORE
    <emissions>
        <emission>
            <substance>Ethanol</substance>
            <destination>Water</destination>
            <quantity_in_kg>19518</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Ethanol</substance>
            <destination>Air Fugitive</destination>
            <quantity_in_kg>2114.45</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Ethanol</substance>
            <destination>Air Total</destination>
            <quantity_in_kg>2114.45</quantity_in_kg>
        </emission>
        <emission>
            <substance>Hydrochloric acid</substance>
            <destination>Land</destination>
            <quantity_in_kg>2282.95</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Particulate Matter 10.0 um</substance>
            <destination>Air Fugitive</destination>
            <quantity_in_kg>398.91285</quantity_in_kg>
            <mass_balance_estimation>N</mass_balance_estimation>
            <engineering_calculations_estimation>Y</engineering_calculations_estimation>
            <direct_measurement_estimation>N</direct_measurement_estimation>
            <emission_factors_estimation>N</emission_factors_estimation>
            <approved_alternative_estimation>N</approved_alternative_estimation>
        </emission>
        <emission>
            <substance>Particulate Matter 10.0 um</substance>
            <destination>Air Total</destination>
            <quantity_in_kg>398.91285</quantity_in_kg>
        </emission>
    </emissions>
    <transfers></transfers> - IGNORE
    <pollution_control_devices></pollution_control_devices> - IGNORE
    <cleaner_production_activities></cleaner_production_activities> - IGNORE
</report>

This XML file has over 11,000 entries so manually entering them is out of the question.

解决方案

You're going to have to do some manipulation on the file first. It's true that phpMyAdmin can import and export XML data, but the format used by phpMyAdmin isn't the same format your data currently exists in.

The unfortunate thing about XML data is that XML makes a great wrapper for any kind of data, but for interoperability between systems it requires the two systems be speaking the same dialect which I hardly ever see actually work out in practical situations.

Is there any chance of getting the data by some other means -- exporting to a different format, for instance?

Here's an example of how phpMyAdmin is expecting the data to be formatted; you can see more detail by doing an export of an existing table since I only picked out a bit from the middle.

<database name="stackoverflowtest">
    <table name="tblUsers">
        <column name="id">1</column>
        <column name="name">Bob</column>
    </table>
    <table name="15">
        <column name="id">2</column>
        <column name="name">Jim</column>
    </table>
</database>

If you're not able to get the data in another format, I suggest using some text parser to manipulate the data you do have. I think CSV might be a better format for you to use for your import and with some work you could come up with an awk/sed/perl/python script to come up with the format you need (or probably any other language, but those are common for manipulating text files). You might even find an XML parsing library function which will let you read the XML file as actual XML attributes rather than having to parse it yourself. That's probably the ideal solution, but even without that you could look for each line containing <year> and </year> and pull the data from in between. Add a comma, then repeat for <data_start_date> and so on until you reach the end of a <report>/</report> block and add a newline. Make that loop over your whole XML file and you'll have a well-formatted CSV file that phpMyAdmin will gladly load. MySQL itself uses a slightly different flavor of XML (still different from what you have available); that would look more like

<row>
    <field name="id">1</field>
    <field name="name">Bob</field>
</row>
<row>
    <field name="id">2</field>
    <field name="name">Jim</field>
</row>

I think that's going to be easier than working with XML. Unfortunately the solution will take some work, but hopefully I've given you some resources to start. There are countless tutorials and examples of using the tools I mentioned for manipulating files, though again I suggest looking in to an XML friendly Python, Perl, or even PHP library first.

这篇关于XML文件到PHPMyAdmin数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆