如何导出到sqlite(或其他格式)并保留日期数据类型? [英] How can I export to sqlite (or another format) and retain the date datatype?

查看:114
本文介绍了如何导出到sqlite(或其他格式)并保留日期数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,可将CSV加载到熊猫数据框中,清理结果表(例如,删除无效值,将日期格式设置为日期等)并将输出保存到本地sqlite .db文件中.

I have a script that loads a CSV into a pandas dataframe, cleanses the resulting table (eg removes invalid values, formats dates as dates, etc) and saves the output to a local sqlite .db file.

然后我还有其他脚本可以打开该数据库文件并对其执行其他操作.

I then have other scripts that open that database file and perform other operations on it.

我的问题是Sqlite3没有明确的日期格式: https://www.sqlite .org/datatype3.html 这意味着对日期的操作会失败,例如:

My problem is that Sqlite3 doesn't have an explicit date format: https://www.sqlite.org/datatype3.html This means that operations on dates fail, e.g.:

df_read['Months since mydate 2'] = (  pd.to_datetime('15-03-2019') - df_read['mydate'] )

返回

TypeError:-:时间戳记"和"str"的不受支持的操作数类型

TypeError: unsupported operand type(s) for -: 'Timestamp' and 'str'

如何以跟踪所有数据类型(包括日期)的方式导出数据框?

我想到了以下内容:

  • 导出为另一种格式,但是什么格式?适当的SQL Server会很棒,但在这种情况下我无权访问.我需要一种明确声明每一列的数据类型的格式,因此CSV不是一个选择.

  • Export to another format, but what format? A proper SQL Server would be great, but I don't have access to any in this case. I'd need a format which EXPLICITLY declares the data type of each column, so CSV is not an option.

具有一个小功能,可以从SQL lite读取列后将其转换为日期.但这意味着我将不得不手动跟踪列日期是什么-在大型数据集上这将是繁琐且缓慢的事情.

Having a small function which reconverts the columns to dates, after reading them from SQL lite. But this would mean I'd have to manually keep track of what the column dates are - it would be cumbersome and slow on large datasets.

在SQL lite数据库中具有另一个表,该表可跟踪哪些列是日期以及它们的格式(例如%Y-%m-%d);这可以帮助将日期转换为日期,但仍然感觉非常笨拙,笨拙且非常不符合Python规范.

Having another table in the SQL lite database which keeps track of which columns are dates, and what format they are in (e.g. %Y-%m-%d); this can help with the reconversion into dates, but it still feels very cumbersome, clunky and very un-pythonic.

这是我的意思的简单示例:

Here is a quick example of what I mean:

import numpy as np
import pandas as pd
import sqlite3
num=int(10e3)
df=pd.DataFrame()
df['month'] = np.random.randint(1,13,num)
df['year'] = np.random.randint(2000,2005,num)
df['mydate'] = pd.to_datetime(df['year'] * 10000 + df['month']* 100 + df['month'], format ='%Y%m%d' )
df.iloc[20:30,2]=np.nan

#this works
df['Months since mydate'] = (  pd.to_datetime('15-03-2019') - df['mydate'] )

conn=sqlite3.connect("test_sqllite_dates.db")
df.to_sql('mydates',conn, if_exists='replace')
conn.close()

conn2=sqlite3.connect("test_sqllite_dates.db")

df_read=pd.read_sql('select * from mydates',conn2 )
# this doesn't work
df_read['Months since mydate 2'] = (  pd.to_datetime('15-03-2019') - df_read['mydate'] )
conn2.close()

print(df.dtypes)
print(df_read.dtypes)

推荐答案

,它是通过在sqlite中创建列类型作为日期时间来解决的,因此在回读时, python会自动转换为datetime类型.

请记住,当您连接到数据库时,需要给参数 detect_types = sqlite3.PARSE_DECLTYPES

Mind that, when you are connecting to the database, you need to give the parameter detect_types=sqlite3.PARSE_DECLTYPES

这篇关于如何导出到sqlite(或其他格式)并保留日期数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆