SQL / Python:将数据从csv转换为具有条件的不同架构的表 [英] SQL/Python: Transform data from csv and into table with different schema with condition
问题描述
因此,我有一个包含以下数据的csv文件:
So, I have a csv file containing data like this:
id type sum_cost date_time
--------------------------------------------------
a1 pound 500 2019-04-21T10:50:06
b1 euro 100 2019-04-21T10:40:00
c1 pound 650 2019-04-21T11:00:00
d1 usd 410 2019-04-21T00:30:00
我要做的就是插入这些数据放入架构与csv不同的数据库表中,使得表中的列具有以下内容:
What I want to do is to insert these data into a database table where the schema is not the same as the csv such that the column in table have like this:
_id , start_time, end_time, pound_cost, euro_cost, count
在这里我从csv插入到此表中,这样, id = id
,开始时间
是 date_time-1 hour
,结束时间
是 date_time-30分钟
。对于英镑成本
和欧元成本
,如果 type
是英镑,则插入值从其总成本
转换为英镑成本
并将0添加到欧元成本
。以同样的方式欧元。并将1添加到 count
列。
where I insert from csv to this table such that, id = id
, start_time
is date_time - 1 hour
, end_time
is date_time - 30 minutes
. For pound_cost
and euro_cost
, if type
is pound insert the value from its sum_cost
into pound_cost
and add 0 to euro_cost
. The same way to euro. and add 1 to the count
column.
因此,表的结果将如下所示:
So, the result of the table will be like this:
_id start_time end_time pound_cost euro_cost count
-----------------------------------------------------------------------------
a1 2019-04-21T09:50:06 2019-04-21T10:20:06 500 0 1
b1 2019-04-21T09:40:06 2019-04-21T10:10:00 0 100 1
c1 2019-04-21T10:00:00 2019-04-21T10:30:00 650 0 1
d1 2019-04-20T23:30:00 2019-04-21T00:00:00 0 410 1
所以我应该如何向表方面我如何将值从csv转换为表。这是我第一次使用postgresql,而我使用的SQL并不多,所以我想知道是否有一个函数可以做到这一点。否则,我该如何使用Python转换数据并将其插入表格。
So, how should I insert data to table respect to how I transform values from csv to the table. This is my first time using postgresql and I did not use sql that much so I wonder if there is a function that can do this. Or if not, how can I use Python to transform data and insert them to the table.
谢谢。
推荐答案
正如评论所讨论的,您可以通过使用 COPY
命令和一个临时表从文件中保存数据来轻松完成此操作
As discussed over comments, you may easily accomplish this by using COPY
command and a temporary table to hold your data from the file.
使用CSV结构创建一个临时表,请注意,所有表均为文本数据类型。
Create a temporary table with the structure of your CSV,note that all are of text datatypes. This makes the copying faster as the validations are minimised.
CREATE TEMP TABLE temptable
( id TEXT ,
TYPE TEXT,
sum_cost TEXT ,
date_time TEXT );
使用 COPY
从文件加载到这个桌子。如果要从服务器加载文件,请使用 COPY
,如果在客户端计算机中,请使用psql的 \COPY
。
Use COPY
to load from the file into this table. If you are loading the file from a server, use COPY
, If it's in a client machine use psql's \COPY
. Change it to a different delimiter appropriately if needed.
\COPY temptable from '/somepath/mydata.csv' with delimiter ',' CSV HEADER;
现在,只需运行 INSERT INTO .. SELECT
使用表达式进行各种转换。
Now, simply run an INSERT INTO .. SELECT
using expressions for various transformations.
INSERT INTO maintable (
_id,start_time,end_time,pound_cost,euro_cost,count )
SELECT id,
date_time::timestamp - INTERVAL '1 HOUR',
date_time::timestamp - INTERVAL '30 MINUTES',
CASE type
WHEN 'pound' THEN sum_cost::numeric
ELSE 0 END,
CASE type when 'euro' THEN sum_cost::numeric --you have not specified what
--happens to USD,use it as required.
ELSE 0 END,
1 as count -- I have hardcoded it based on your info, not sure what it
--actually means
from temptable t;
现在,数据在您的主表中
Now, the data is in your main table
从主表中选择*
;
select * from maintable
;
_id | start_time | end_time | pound_cost | euro_cost | count
-----+---------------------+---------------------+------------+-----------+-------
a1 | 2019-04-21 09:50:06 | 2019-04-21 10:20:06 | 500 | 0 | 1
b1 | 2019-04-21 09:40:00 | 2019-04-21 10:10:00 | 0 | 100 | 1
c1 | 2019-04-21 10:00:00 | 2019-04-21 10:30:00 | 650 | 0 | 1
d1 | 2019-04-20 23:30:00 | 2019-04-21 00:00:00 | 0 | 0 | 1
这篇关于SQL / Python:将数据从csv转换为具有条件的不同架构的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!