使用PSQL将Greenplum归档 [英] Greenplum to file using PSQL

查看:378
本文介绍了使用PSQL将Greenplum归档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PSQL和\copy将数据从Green-plum导出到带有管道定界符的文本文件(客户端)。在输出中,我看到单斜杠转换为双斜杠,制表符转换为\t。
示例
N\A转换为N\\A

I'm trying to export data from Green-plum to a text file(client) with pipe delimiter using PSQL and \copy. In the output i see single slash is converted to double slash and tab is converted \t. Example N\A is converted to N\\A

那么如何只得到N\A而不是N\ \A只是空格而不是\t?

So how to get just N\A instead N\\A and just spaces instead of \t ?

注意:我只允许使用\copy。由于我的文件很大,在使用SED或Perl进行查找和替换时会出现空间问题

Note: i`m allowed to use only \copy. Since my file is huge im getting space issue while use SED or Perl for find and replace

推荐答案

假设您没有任何文件 ^字符,可以将其用作转义字符。

Assuming you don't have any "^" characters, you could use that as the escape character.

copy tpcds.call_center to stdout with delimiter '|' escape '^';

可在此处找到更多副本: https://www.postgresql.org/docs/8.2/static/sql-copy.html

More on copy can be found here: https://www.postgresql.org/docs/8.2/static/sql-copy.html

此技术相对较慢,会给师父带来负担。如果改用gpfdist,则可以利用集群中的并行性并绕过主服务器。此解决方案非常适合卸载大量数据。

This technique will be relatively slow and put a burden on the Master. If you used gpfdist instead, you could leverage the parallelism in the cluster and bypass the master. This solution is ideal for unloading large amounts of data.

首先,启动gpfidst进程:

First, start the gpfidst process:

[gpadmin@gpdbsne ~]$ gpfdist -p 8888 > gpfdist_8888.log 2>&1 < gpfdist_8888.log &
[1] 2255

现在,您可以创建外部表了。

Now, you can create the external table.

[gpadmin@gpdbsne ~]$ psql 
SET
Timing is on.
psql (8.2.15)
Type "help" for help.

gpadmin=# create writable external table tpcds.et_call_center 
(like tpcds.call_center) 
location ('gpfdist://gpdbsne:8888/call_center.txt') 
format 'text' (delimiter '|' escape '^');
NOTICE:  Table doesn't have 'distributed by' clause, defaulting to distribution columns from LIKE table
CREATE EXTERNAL TABLE
Time: 18.681 ms

现在,您插入数据:

gpadmin=# insert into tpcds.et_call_center select * from tpcds.call_center;                                                                             
INSERT 0 6
Time: 72.653 ms
gpadmin=# \q

验证:

[gpadmin@gpdbsne ~]$ wc -l call_center.txt 
6 call_center.txt

在我的示例中,我使用了主机名 gpdbsne,该群集中的所有段均可访问。通常,Greenplum使用专用网络在网段之间进行通信,因此该主机名将需要连接到专用网络。

In my example, I used the hostname "gpdbsne" which is accessible to all segments in this cluster. Typically, Greenplum uses a private network for communication between segments so this hostname will need to be connected to the private network.

由于可写外部表是使用SQL写入的,您可以在SQL中使用所需的任何转换逻辑,因此可以根据需要将制表符更改为空格。这消除了对awk或sed进行后期处理文件的需要。复制也可以使用SQL,但是就像我说的那样,它比使用可写外部表要慢。

Since the writable external table is written to with SQL, you can use whatever transformation logic you want in the SQL so you can change tabs to spaces if you want. This eliminates the need for awk or sed for post processing the files. Copy can use SQL too but like I said, it is a slower than using writable external tables.

这篇关于使用PSQL将Greenplum归档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆