有没有办法将文本数据加载到 PostgreSQL 中的数据库? [英] Is there a way to load text data to database in PostgreSQL?

查看:70
本文介绍了有没有办法将文本数据加载到 PostgreSQL 中的数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从一个文本文件(将近 1GB)中提取信息并将其存储在 PostgreSQL 数据库中.文本文件格式如下:

I want to extract information from a text file (almost 1GB) and store it in PostgreSQL database. Text file is in following format:

DEBUG, 2017-03-23T10:02:27+00:00, ghtorrent-40 -- ghtorrent.rb:Repo EFForg/https-everywhere exists
DEBUG, 2017-03-24T12:06:23+00:00, ghtorrent-49 -- ghtorrent.rb:Repo Shikanime/print exists
...

我想从每一行中提取DEBUG"、时间戳、ghtorrent-40"、ghtorrent"和Repo EFForg/https-everywhere exists"并将其存储在数据库中.

我已经使用其他语言,如 python (psycopg2) 和 C++ (libpqxx) 完成了它,但是是否可以在 PostgreSQL 本身中编写一个函数来导入整个数据本身.

and I want to extract 'DEBUG', timestamp, 'ghtorrent-40', 'ghtorrent' and "Repo EFForg/https-everywhere exists" from each line and store it in database.

I have done it in using other languages like python (psycopg2) and C++ (libpqxx) but is it possible to write a function in PostgreSQL itself to import the whole data itself.

我目前正在为 PostgreSQL 使用 pgAdmin4 工具.我想在函数中使用类似 pg_read_file 之类的东西来读取文件,但一次一行并将其插入表中.

I am currenly using pgAdmin4 tool for the PostgreSQL. I thinking of using something like pg_read_file in function to read the file but one line at a time and insert it into the table.

推荐答案

我对大型 XML 文件(130GB 或更大)使用的一种方法是将整个文件上传到一个临时的未记录表中,然后从那里我提取我想要的内容.未记录的表 不是崩溃安全的,但比记录的要快得多,这完全适合临时表的目的;-)

An approach I use with my large XML files - 130GB or bigger - is to upload the whole file into a temporary unlogged table and from there I extract the content I want. Unlogged tables are not crash-safe, but are much faster than logged ones, which totally suits the purpose of a temporary table ;-)

考虑下表..

CREATE UNLOGGED TABLE tmp (raw TEXT);

.. 您可以使用控制台 (unix) 中的单个 psql 行导入这个 1GB 文件..

.. you can import this 1GB file using a single psql line from your console (unix)..

$ cat 1gb_file.txt | psql -d db -c "COPY tmp FROM STDIN" 

之后,您所需要的就是应用您的逻辑来查询和提取您想要的信息.根据表的大小,您可以从 SELECT 创建第二个表,例如:

After that all you need is to apply your logic to query and extract the information you want. Depending on the size of your table, you can create a second table from a SELECT, e.g.:

CREATE TABLE t AS
SELECT 
  trim((string_to_array(raw,','))[1]) AS operation,
  trim((string_to_array(raw,','))[2])::timestamp AS tmst,
  trim((string_to_array(raw,','))[3]) AS txt
FROM tmp
WHERE raw LIKE '%DEBUG%' AND
      raw LIKE '%ghtorrent-40%' AND 
      raw LIKE '%Repo EFForg/https-everywhere exists%'

根据您的逻辑调整string_to_array 函数和WHERE 子句!您可以选择将这些多个 LIKE 操作替换为单个 SIMILAR TO.

Adjust the string_to_array function and the WHERE clause to your logic! Optionally you can replace these multiple LIKE operations to a single SIMILAR TO.

.. 并且您的数据已准备好使用:

.. and your data would be ready to be played with:

SELECT * FROM t;

 operation |        tmst         |                               txt                                
-----------+---------------------+------------------------------------------------------------------
 DEBUG    | 2017-03-23 10:02:27 | ghtorrent-40 -- ghtorrent.rb:Repo EFForg/https-everywhere exists
(1 Zeile)

提取数据后,您可以DROP TABLE tmp;释放一些磁盘空间;)

Once your data is extracted you can DROP TABLE tmp; to free some disk space ;)

进一步阅读:COPY, PostgreSQL 数组函数模式匹配

这篇关于有没有办法将文本数据加载到 PostgreSQL 中的数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆