SQL / Python:将数据从csv转换为具有条件的不同架构的表 [英] SQL/Python: Transform data from csv and into table with different schema with condition

查看:109
本文介绍了SQL / Python:将数据从csv转换为具有条件的不同架构的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一个包含以下数据的csv文件:

So, I have a csv file containing data like this:

id       type      sum_cost         date_time
--------------------------------------------------
a1        pound     500        2019-04-21T10:50:06    
b1        euro      100        2019-04-21T10:40:00    
c1        pound     650        2019-04-21T11:00:00    
d1        usd       410        2019-04-21T00:30:00     

我要做的就是插入这些数据放入架构与csv不同的数据库表中,使得表中的列具有以下内容:

What I want to do is to insert these data into a database table where the schema is not the same as the csv such that the column in table have like this:

_id , start_time, end_time, pound_cost, euro_cost, count

在这里我从csv插入到此表中,这样, id = id 开始时间 date_time-1 hour 结束时间 date_time-30分钟。对于英镑成本欧元成本,如果 type 是英镑,则插入值从其总成本转换为英镑成本并将0添加到欧元成本 。以同样的方式欧元。并将1添加到 count 列。

where I insert from csv to this table such that, id = id, start_time is date_time - 1 hour, end_time is date_time - 30 minutes. For pound_cost and euro_cost, if type is pound insert the value from its sum_cost into pound_cost and add 0 to euro_cost. The same way to euro. and add 1 to the count column.

因此,表的结果将如下所示:

So, the result of the table will be like this:

_id   start_time           end_time              pound_cost  euro_cost  count
-----------------------------------------------------------------------------
 a1  2019-04-21T09:50:06  2019-04-21T10:20:06      500           0        1
 b1  2019-04-21T09:40:06  2019-04-21T10:10:00       0           100       1
 c1  2019-04-21T10:00:00  2019-04-21T10:30:00      650           0        1
 d1  2019-04-20T23:30:00  2019-04-21T00:00:00       0           410       1

所以我应该如何向表方面我如何将值从csv转换为表。这是我第一次使用postgresql,而我使用的SQL并不多,所以我想知道是否有一个函数可以做到这一点。否则,我该如何使用Python转换数据并将其插入表格。

So, how should I insert data to table respect to how I transform values from csv to the table. This is my first time using postgresql and I did not use sql that much so I wonder if there is a function that can do this. Or if not, how can I use Python to transform data and insert them to the table.

谢谢。

推荐答案

正如评论所讨论的,您可以通过使用 COPY 命令和一个临时表从文件中保存数据来轻松完成此操作

As discussed over comments, you may easily accomplish this by using COPY command and a temporary table to hold your data from the file.

使用CSV结构创建一个临时表,请注意,所有表均为文本数据类型。

Create a temporary table with the structure of your CSV,note that all are of text datatypes. This makes the copying faster as the validations are minimised.

CREATE TEMP TABLE  temptable 
      ( id TEXT ,
        TYPE TEXT,
        sum_cost TEXT ,
        date_time TEXT );

使用 COPY 从文件加载到这个桌子。如果要从服务器加载文件,请使用 COPY ,如果在客户端计算机中,请使用psql的 \COPY

Use COPY to load from the file into this table. If you are loading the file from a server, use COPY, If it's in a client machine use psql's \COPY. Change it to a different delimiter appropriately if needed.

\COPY temptable from '/somepath/mydata.csv'  with delimiter ',' CSV HEADER;

现在,只需运行 INSERT INTO .. SELECT 使用表达式进行各种转换。

Now, simply run an INSERT INTO .. SELECT using expressions for various transformations.

INSERT INTO maintable (
          _id,start_time,end_time,pound_cost,euro_cost,count )
SELECT id,
     date_time::timestamp - INTERVAL '1 HOUR', 
     date_time::timestamp - INTERVAL '30 MINUTES',
  CASE type
      WHEN 'pound' THEN sum_cost::numeric
     ELSE 0 END,
  CASE type when 'euro' THEN sum_cost::numeric --you have not specified what 
                                               --happens to USD,use it as required.
     ELSE 0 END, 
   1 as count       -- I have hardcoded it based on your info, not sure what it 
                    --actually means
from temptable t; 

现在,数据在您的主表中

Now, the data is in your main table

从主表中选择*

select * from maintable;

 _id |     start_time      |      end_time       | pound_cost | euro_cost | count
-----+---------------------+---------------------+------------+-----------+-------
 a1  | 2019-04-21 09:50:06 | 2019-04-21 10:20:06 |        500 |         0 |     1
 b1  | 2019-04-21 09:40:00 | 2019-04-21 10:10:00 |          0 |       100 |     1
 c1  | 2019-04-21 10:00:00 | 2019-04-21 10:30:00 |        650 |         0 |     1
 d1  | 2019-04-20 23:30:00 | 2019-04-21 00:00:00 |          0 |         0 |     1

这篇关于SQL / Python:将数据从csv转换为具有条件的不同架构的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆