PostgreSQL到具有3个表的XML [英] PostgreSQL to XML with 3 Tables

查看:178
本文介绍了PostgreSQL到具有3个表的XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个小型开发团队的学员,我的项目负责人要我写一个函数,将PostgreSQL数据导出到一个XML文件。不幸的是,我只知道如何写出一个导出到csv。



有3个不同的表,他希望它像这样(XML视图)

 < Table 1 Col1 =..Col2 =..> 
<表2 Col1 =...>
< Table3 Col1 =..Col2 =Col3 =..Col4 =.../>
< Table3 Col1 =..Col2 =Col3 =..Col4 =.../>
< Table3 Col1 =..Col2 =Col3 =..Col4 =.../>
< / Table2>
< Table1>
< Table1> ....< / Table>
< / Table>
< / Table2>
< Table1 Col1 =xxxCol2 =xxx>
...

这些标签都是我的表名。我如何编写代码导出这个?



StackOverflow的其他问题主要集中在导出单个表,所以我希望这个问题也将帮助其他人尝试导出多个表。

解决方案

您有三层嵌套表格。



示例数据:



  CREATE TABLE a(
a_id整数主键,
名称text
);

CREATE TABLE b(
b_id整数主键,
a_id整数引用a(a_id),
val text
);

CREATE TABLE c(
c_id序列主键,
b_id整数引用b(b_id),
blah text
);

INSERT INTO(a_id,name)VALUES(1,'fred'),(2,'bert');

INSERT INTO b(b_id,a_id,val)VALUES
(11,1,'x'),(12,1,'y'),(21,2,'a '),(22,2,'b');

INSERT INTO c(b_id,blah)VALUES
(11,'whatever'),(11,'gah'),(12,'borkbork'), ');方法1:执行左连接,在客户端处理XML


$ b <

处理这个的最简单的方法是对所有三个表执行一个左连接,从最外层到最内层排序。

  select * 
从左连接b(a.a_id = b.a_id)
left join c on(b.b_id = c.b_id)
order by a.a_id,b.b_id,c。 c_id;

然后在所返回的行上循环,并为每行伪代码

  cur_row = get_new_row()

if(cur_row [b_id]!= prev_row [b_id]) {
emit_close_tableb();
}
if(cur_row [a_id]!= prev_row [a_id]){
emit_close_tablea();
emit_open_tablea(cur_row);
}
if(cur_row [b_id]!= prev_row [b_id]){
emit_open_tableb(cur_row);
}
emit_tablec(cur_row);

prev_row = cur_row;

要写XML,你可以使用 XMLWriter 。要读取查询数据,您可以使用像PDO或您喜欢的任何驱动程序。如果数据集很大,可以考虑使用游标读取数据。



这样做效果很好,但它会传输一个因为您为与 n 相关联的内表的每个 n 行传递外表数据的 n p>




为了减少交换的多余数据,您只能选择外表的ID

 从左连接b(a.a_id = b.a_id)选择a.a_id,b.b_id,c。* 
join c on(b.b_id = c.b_id)
order by a.a_id,b.b_id,c.c_id;

...然后当切换到新的tablea / tableb时, SELECT 其余的行。您可能会使用第二个连接执行此操作,因此您不会破坏正在读取行的主连接上的结果集和游标状态。



方法2:在PostgreSQL中这样做



对于较小的数据集,或者对于较大数据集的内层,可以使用PostgreSQL的XML支持来构造XML文档,例如:

  WITH xmlinput AS(
SELECT a,b,c
FROM a
LEFT JOIN b ON(a.a_id = b.a_id)
LEFT JOIN c on(b.b_id = c.b_id)
ORDER BY a.a_id,b.b_id,c.c_id

SELECT
XMLELEMENT(name items,
xmlagg(
XMLELEMENT(name a,
XMLFOREST((a).a_id AS a_id,(a)。name as name),
b_xml

ORDER BY(a).a_id)
)AS输出
FROM

SELECT
a,
xmlagg(
XMLELEMENT(name b,
XMLFOREST((b).b_id AS b_id,(b).val AS val),
c_xml

ORDER BY(b).b_id)
AS b_xml
FROM

SELECT
a,b,
xmlagg $ b XMLELEMENT(name c,
XMLFOREST((c).c_id AS c_id,(c).blah As blah)

ORDER BY(c).c_id)
AS c_xml
FROM xmlinput
GROUP BY a,b
)c_as_xml
GROUP BY a
)b_as_xml;

...但是,真的,你必须是某种类型的。



要理解查询您需要阅读PostgreSQL XML文档



另请注意,行变量被大量使用在上面的代码中保持有组织。 a , b c 外层的查询。这避免了当名称冲突时需要混淆别名。语法(a).a_id 等表示 a_id 字段的行变量 a 。有关详细信息,请参阅PostgreSQL手册。



上面使用了更好的XML结构(见下面的注释)。如果要发出属性而不是元素,可以将 XMLFOREST 调用更改为 XMLATTRIBUTES 调用。



输出:

 < items>< a>< a_id& / a_id>< / name> fred< / name>< b>< b_id> 11< / b_id>< val> x< / val>< c>< c_id> 1< / c_id>< blah> ; any< / blah>< / c>< c>< c_id> 2< / c_id>< blah> gah< / blah>< / c>< / b>< b>< b_id> ; 12< / b_id>< val> y< / val>< c>< c_id> 3< / c_id>< blah> borkbork< / blah>< / c>< / b& a> a< a / b>< a>< / a>< c / a>< / a> >< / b>< b>< b_id> 22< / b_id>< val> b< / val>< c>< c_id> 4< / c_id>< blah> fuzz< / blah> ;< / c>< / b>< / a>< / items> 

或,漂亮的:

 <?xml version =1.0encoding =utf-16?> 
< items>
< a>
< a_id> 1< / a_id>
< name> fred< / name>
< b>
< b_id> 11< / b_id>
< val> x< / val>
< c>
< c_id> 1< / c_id>
< blah> whatever< / blah>
< / c>
< c>
< c_id> 2< / c_id>
< blah> gah< / blah>
< / c>
< / b>
< b>
< b_id> 12< / b_id>
< val> y< / val>
< c>
< c_id> 3< / c_id>
< blah> borkbork< / blah>
< / c>
< / b>
< / a>
< a>
< a_id> 2< / a_id>
< name> bert< / name>
< b>
< b_id> 21< / b_id>
< val> a< / val>
< c />
< / b>
< b>
< b_id> 22< / b_id>
< val> b< / val>
< c>
< c_id> 4< / c_id>
< blah> fuzz< / blah>
< / c>
< / b>
< / a>
< / items>



请发出更好的XML



旁注,使用XML中的属性似乎很诱人,但它很快变得困难和难以处理。请使用普通的XML元素:

 < Table 1> 
< Nr> 1< / Nr>
< Name> blah< / Name>
<表2>
< Nr> 1< / Nr>
<表3>
< Col1> 42< / Col1>
< Col2> ...< / Col2>
< Col3> ...< / Col3>
< Col4> ...< / Col4>
...
< /表3>
< /表2>
< / Table 1>


I'm a trainee in a small dev team and my project leader wants me to write a function that will export PostgreSQL data to a XML file. Unfortunately, I only know how to write an export to a csv.

There are 3 different Tables and he wants it like this (XML view)

<Table 1 Col1=".." Col2="..">
    <Table 2 Col1="...">
        <Table3 Col1=".." Col2="" Col3=".." Col4="..." />
        <Table3 Col1=".." Col2="" Col3=".." Col4="..." />
        <Table3 Col1=".." Col2="" Col3=".." Col4="..." />       
     </Table2>
    <Table1>
        <Table1>....</Table>
    </Table>
</Table2>
<Table1 Col1="xxx" Col2="xxx">
 ...

The tags are each for my table names. How do I write the code to export this?

Other questions on StackOverflow are mainly focused on exporting a single table, so I hope that this question will also help others trying to export multiple tables.

解决方案

You have three levels of nested tables.

Sample data:

CREATE TABLE a(
  a_id integer primary key,
  name text
);

CREATE TABLE b(
  b_id integer primary key,
  a_id integer references a(a_id),
  val text
);

CREATE TABLE c(
  c_id serial primary key,
  b_id integer references b(b_id),
  blah text
);

INSERT INTO a(a_id, name) VALUES (1, 'fred'),(2, 'bert');

INSERT INTO b(b_id, a_id, val) VALUES 
(11, 1, 'x'), (12, 1, 'y'), (21, 2, 'a'), (22, 2, 'b');

INSERT INTO c(b_id, blah) VALUES
(11, 'whatever'), (11, 'gah'), (12, 'borkbork'), (22, 'fuzz');

Method 1: Do a left join, handle XML in the client

The simplest way to handle this is to do a left join over all three tables, ordered from outermost to innermost. Then you iterate down the result set, closing one element and opening another whenever the subject at that level changes.

select *
from a left join b on (a.a_id = b.a_id)
       left join c on (b.b_id = c.b_id)
order by a.a_id, b.b_id, c.c_id;

then loop over the rows returned, and for each row, pseudocode:

cur_row = get_new_row()

if (cur_row[b_id] != prev_row[b_id]) {
   emit_close_tableb();
}
if (cur_row[a_id] != prev_row[a_id]) {
   emit_close_tablea();
   emit_open_tablea(cur_row);
}
if (cur_row[b_id] != prev_row[b_id]) {
   emit_open_tableb(cur_row);
}
emit_tablec(cur_row);

prev_row = cur_row;

To write the XML you'd use something like XMLWriter. To read the query data you can use something like PDO or whatever driver you prefer. If the data set is large consider using a cursor to read the data.

This works well, but it transfers a lot of excess data, since you transfer n copies of the outer table's data for each n rows of the inner table associated with it.


To reduce the excess data exchanged you can select only the IDs for the outer tables

select a.a_id, b.b_id, c.*
from a left join b on (a.a_id = b.a_id)
       left join c on (b.b_id = c.b_id)
order by a.a_id, b.b_id, c.c_id;

... then when you switch to a new tablea / tableb, SELECT the rest of its rows then. You'll probably use a second connection to do this so you don't distrupt the result set and cursor state on the main connection you're reading rows from.

Method 2: Do it all in PostgreSQL

For smaller data sets, or for the inner levels of bigger data sets, you can use PostgreSQL's XML support to construct the XML documents, e.g.:

WITH xmlinput AS (
  SELECT a, b, c
  FROM a
  LEFT JOIN b ON (a.a_id = b.a_id)
  LEFT JOIN c on (b.b_id = c.b_id)
  ORDER BY a.a_id, b.b_id, c.c_id
)
SELECT
  XMLELEMENT(name items,
    xmlagg(
      XMLELEMENT(name a,
        XMLFOREST((a).a_id AS a_id, (a)."name" AS name),
        b_xml
      )
    ORDER BY (a).a_id)
  ) AS output
FROM
(
  SELECT
    a,
    xmlagg(
      XMLELEMENT(name b,
        XMLFOREST((b).b_id AS b_id, (b).val AS val),
        c_xml
      )
    ORDER BY (b).b_id)
    AS b_xml
  FROM
  (
    SELECT
      a, b,
      xmlagg(
        XMLELEMENT(name c,
          XMLFOREST((c).c_id AS c_id, (c).blah AS blah)
        )
      ORDER BY (c).c_id)
      AS c_xml
    FROM xmlinput
    GROUP BY a, b
  ) c_as_xml
  GROUP BY a
) b_as_xml;

... but really, you've got to be some kind of masochist to write code like that. Though it could prove to be fairly fast.

To understand the query you'll need to read the PostgreSQL XML docs. The wackly syntax was dreamed up by the SQL/XML committee, don't blame us.

Also note that row-variables are used heavily in the above code to keep it organized. a, b and c are passed as whole rows to outer layers of the query. This avoids the need to mess with aliases when names collide. The syntax (a).a_id, etc, means "the a_id field of the row-variable a". See the PostgreSQL manual for details.

The above uses a better XML structure (see comments below). If you want to emit attributes not elements, you can change the XMLFOREST calls to XMLATTRIBUTES calls.

Output:

<items><a><a_id>1</a_id><name>fred</name><b><b_id>11</b_id><val>x</val><c><c_id>1</c_id><blah>whatever</blah></c><c><c_id>2</c_id><blah>gah</blah></c></b><b><b_id>12</b_id><val>y</val><c><c_id>3</c_id><blah>borkbork</blah></c></b></a><a><a_id>2</a_id><name>bert</name><b><b_id>21</b_id><val>a</val><c/></b><b><b_id>22</b_id><val>b</val><c><c_id>4</c_id><blah>fuzz</blah></c></b></a></items>

or, pretty-printed:

<?xml version="1.0" encoding="utf-16"?>
<items>
    <a>
        <a_id>1</a_id>
        <name>fred</name>
        <b>
            <b_id>11</b_id>
            <val>x</val>
            <c>
                <c_id>1</c_id>
                <blah>whatever</blah>
            </c>
            <c>
                <c_id>2</c_id>
                <blah>gah</blah>
            </c>
        </b>
        <b>
            <b_id>12</b_id>
            <val>y</val>
            <c>
                <c_id>3</c_id>
                <blah>borkbork</blah>
            </c>
        </b>
    </a>
    <a>
        <a_id>2</a_id>
        <name>bert</name>
        <b>
            <b_id>21</b_id>
            <val>a</val>
            <c />
        </b>
        <b>
            <b_id>22</b_id>
            <val>b</val>
            <c>
                <c_id>4</c_id>
                <blah>fuzz</blah>
            </c>
        </b>
    </a>
</items>

Please emit better XML

On a side note, using attributes like that in XML seems tempting, but it quickly gets difficult and ugly to work with. Please just use normal XML elements:

  <Table 1>
    <Nr>1</Nr>
    <Name>blah</Name>
     <Table 2>
       <Nr>1</Nr>
       <Table 3>
          <Col1>42</Col1>
          <Col2>...</Col2>
          <Col3>...</Col3>
          <Col4>...</Col4>
          ...
       </Table 3>
     </Table 2>
   </Table 1>

这篇关于PostgreSQL到具有3个表的XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆