在PSQL脚本中访问外部XML文件作为变量(来自bash脚本) [英] Accessing external XML files as variables in a PSQL script (sourced from a bash script)

查看:151
本文介绍了在PSQL脚本中访问外部XML文件作为变量(来自bash脚本)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此示例之后,我在使用PostgreSQL时遇到了问题* .sql脚本中的变量:




  • 我想使用BASH脚本遍历多个XML数据文件


  • BASH脚本将XML文件名分配给一个变量,该变量将传递给SQL脚本


  • 该BASH脚本调用的SQL脚本会将这些数据加载到PostgreSQL




如果我获取XML文件直接,没有问题;但是,我无法在SQL脚本中访问该变量:



在我的SQL脚本中( hmdb.sql )我可以访问PSQL变量:bash_var (从BASH脚本传递):

  \echo'\nEXTERNAL VARIABLE(= datafile,HERE):':bash_var'\n'

和/或直接引用XML文件,

 数据文件文本:='hmdb / hmdb.xml '; 

但不作为变量:

 数据文件文本:='bash_var'; 






hmdb.sh

 #!/ bin / bash 

DATA_DIR = data /

$ DATA_DIR / *。xml中的文件

bash_var = $(echo $ {file ## * /})
echo $ bash_var
psql -d hmdb -v bash_var = $ bash_var -f hmdb.sql
完成


解决方案

好,这是我的解决方案。



我在 Persagen.com博客



基本上,我决定取消 DO $$ DECLARE ... 方法(在 SO 49950384 ),采用下面的简化方法。



然后,我可以访问BASH / PSQL共享变量:bash_var ,因此:

  xpath('// metabolite',XMLP ARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'))))

这是示例SQL脚本,说明用法:



hmdb.sql

  \c hmdb 

创建表hmdb_identifiers(
id SERIAL,
加入VARCHAR(15)非空,
名称VARCHAR(300)非空,
cas_number VARCHAR(12),
pubchem_cid INT,
主键(id),
UNIQUE(加入)
);

``echo'\n [hmdb.sql] bash_var:':bash_var'\n'

-更新(2019-05-15):SEE我的评论如下:TEMP TABLE!
创建临时表tmp_table AS
选择
(xpath('// accession / text()',x))[1] :: text :: varchar(15)AS加入
,(xpath('// name / text()',x))[1] :: text :: varchar(300)AS名称
,(xpath('// cas_registry_number / text()' ,x))[1] :: text :: varchar(12)AS cas_number
,(xpath('// pubchem_compound_id / text()',x))[1] :: text :: int AS pubchem_cid
-FROM unnest(xpath('// metabolite',XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb / hmdb.xml'),'UTF8'))))x
FROM unnest(xpath( '// metabolite',XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'),'UTF8'))))x
;

插入hmdb_identifiers(登录名,名称,cas_number,pubchem_cid)
从tmp_table中选择lower(accession),lower(name),lower(cas_number),pubchem_cid;

DROP TABLE tmp_table;






SQL脚本注释:




  • 在xpath语句中,我重铸了 :: text (例如:每个Postgres表架构的 :: text :: varchar(15))。


  • 更多重要的是,如果我重铸了xpath语句中的数据类型,并且字段条目(例如 name 长度)超出了SQL varchar(300)长度限制,这些数据引发PSQL错误,并且表未更新(即,空白表结果)。




我在此摘要中上传了此答案中使用的XML数据文件



https://gist.github.com/victoriastuart/d1b1959bd31e4de5ed951ff4fe3c3184



< :








更新(2019-05-15)



在后续工作中,我的详述拱博客文章将纯文本导出到PostgreSQL ,我直接将XML数据加载到PostgreSQL中



TL / DR。在该项目中,我观察到以下改进。

 参数|临时表|直接导入|减少
时间:| 1048分钟| 1.75分钟| 599x
空间:| 252,000 MB | 18 MB | 14,000x


Following this example, I am having trouble using a PostgreSQL variables in a *.sql script:

  • I want to iterate over a number of XML data files, using a BASH script

  • the BASH script assigns XML file names to a variable, that is passed to SQL script

  • the SQL script, called by that BASH script, loads those data into PostgreSQL

If I source the XML files directly, there is no problem; however, I cannot access that variable, in my SQL script:

In my SQL script (hmdb.sql) I can access the PSQL variable :bash_var (passed from the BASH script):

\echo '\nEXTERNAL VARIABLE (= "datafile", HERE):' :bash_var '\n'

and/or directly reference the XML file,

datafile text := 'hmdb/hmdb.xml';

but not as a variable:

datafile text := 'bash_var';


hmdb.sh

#!/bin/bash

DATA_DIR=data/

for file in $DATA_DIR/*.xml
  do
    bash_var=$(echo ${file##*/})
    echo $bash_var
    psql -d hmdb -v bash_var=$bash_var -f hmdb.sql
done

解决方案

OK, here is my solution.

I post a more detailed answer on my Persagen.com blog.

Basically, I decided to abrogate the DO $$DECLARE ... approach (described in SO 49950384) in favor of the simplified approach, below.

I am then able to access the BASH / PSQL shared variable, :bash_var, thusly:

xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'))))

Here is a sample SQL script, illustrating that usage:

hmdb.sql

\c hmdb

CREATE TABLE hmdb_identifiers (
  id SERIAL,
  accession VARCHAR(15) NOT NULL,
  name VARCHAR(300) NOT NULL,
  cas_number VARCHAR(12),
  pubchem_cid INT,
  PRIMARY KEY (id),
  UNIQUE (accession)
);

\echo '\n[hmdb.sql] bash_var:' :bash_var '\n'

-- UPDATE (2019-05-15): SEE MY COMMENTS BELOW RE: TEMP TABLE!
CREATE TEMP TABLE tmp_table AS 
SELECT 
  (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession
  ,(xpath('//name/text()', x))[1]::text::varchar(300) AS name 
  ,(xpath('//cas_registry_number/text()', x))[1]::text::varchar(12) AS cas_number 
  ,(xpath('//pubchem_compound_id/text()', x))[1]::text::int AS pubchem_cid 
-- FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb/hmdb.xml'), 'UTF8')))) x
FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x
;

INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid)
  SELECT lower(accession), lower(name), lower(cas_number), pubchem_cid FROM tmp_table;

DROP TABLE tmp_table;


SQL script notes:

  • In the xpath statements I recast the ::text (e.g.: ::text::varchar(15)) per the Postgres table schema.

  • More significantly, if I did not recast the datatypes in the xpath statement and a field entry (e.g. name length) exceeded the SQL varchar(300) length limit, those data threw a PSQL error and the table did not update (i.e. a blank table results).

I uploaded the XML data files used in this answer at this Gist

https://gist.github.com/victoriastuart/d1b1959bd31e4de5ed951ff4fe3c3184

Direct links:


UPDATE (2019-05-15)

In follow-on work, detailed in my research blog post Exporting Plain Text to PostgreSQL, I directly load XML data into PostgreSQL, rather than using temp tables.

TL/DR. In that project, I observed the following improvements.

Parameter | Temp Tables  | Direct Import | Reduction
    Time: | 1048 min     | 1.75 min      | 599x
   Space: | 252,000 MB   | 18 MB         | 14,000x

这篇关于在PSQL脚本中访问外部XML文件作为变量(来自bash脚本)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆