如何导出到sqlite(或其他格式)并保留日期数据类型? [英] How can I export to sqlite (or another format) and retain the date datatype?
问题描述
我有一个脚本,可将CSV加载到熊猫数据框中,清理结果表(例如,删除无效值,将日期格式设置为日期等)并将输出保存到本地sqlite .db文件中.
I have a script that loads a CSV into a pandas dataframe, cleanses the resulting table (eg removes invalid values, formats dates as dates, etc) and saves the output to a local sqlite .db file.
然后我还有其他脚本可以打开该数据库文件并对其执行其他操作.
I then have other scripts that open that database file and perform other operations on it.
我的问题是Sqlite3没有明确的日期格式: https://www.sqlite .org/datatype3.html 这意味着对日期的操作会失败,例如:
My problem is that Sqlite3 doesn't have an explicit date format: https://www.sqlite.org/datatype3.html This means that operations on dates fail, e.g.:
df_read['Months since mydate 2'] = ( pd.to_datetime('15-03-2019') - df_read['mydate'] )
返回
TypeError:-:时间戳记"和"str"的不受支持的操作数类型
TypeError: unsupported operand type(s) for -: 'Timestamp' and 'str'
如何以跟踪所有数据类型(包括日期)的方式导出数据框?
我想到了以下内容:
-
导出为另一种格式,但是什么格式?适当的SQL Server会很棒,但在这种情况下我无权访问.我需要一种明确声明每一列的数据类型的格式,因此CSV不是一个选择.
Export to another format, but what format? A proper SQL Server would be great, but I don't have access to any in this case. I'd need a format which EXPLICITLY declares the data type of each column, so CSV is not an option.
具有一个小功能,可以从SQL lite读取列后将其转换为日期.但这意味着我将不得不手动跟踪列日期是什么-在大型数据集上这将是繁琐且缓慢的事情.
Having a small function which reconverts the columns to dates, after reading them from SQL lite. But this would mean I'd have to manually keep track of what the column dates are - it would be cumbersome and slow on large datasets.
在SQL lite数据库中具有另一个表,该表可跟踪哪些列是日期以及它们的格式(例如%Y-%m-%d);这可以帮助将日期转换为日期,但仍然感觉非常笨拙,笨拙且非常不符合Python规范.
Having another table in the SQL lite database which keeps track of which columns are dates, and what format they are in (e.g. %Y-%m-%d); this can help with the reconversion into dates, but it still feels very cumbersome, clunky and very un-pythonic.
这是我的意思的简单示例:
Here is a quick example of what I mean:
import numpy as np
import pandas as pd
import sqlite3
num=int(10e3)
df=pd.DataFrame()
df['month'] = np.random.randint(1,13,num)
df['year'] = np.random.randint(2000,2005,num)
df['mydate'] = pd.to_datetime(df['year'] * 10000 + df['month']* 100 + df['month'], format ='%Y%m%d' )
df.iloc[20:30,2]=np.nan
#this works
df['Months since mydate'] = ( pd.to_datetime('15-03-2019') - df['mydate'] )
conn=sqlite3.connect("test_sqllite_dates.db")
df.to_sql('mydates',conn, if_exists='replace')
conn.close()
conn2=sqlite3.connect("test_sqllite_dates.db")
df_read=pd.read_sql('select * from mydates',conn2 )
# this doesn't work
df_read['Months since mydate 2'] = ( pd.to_datetime('15-03-2019') - df_read['mydate'] )
conn2.close()
print(df.dtypes)
print(df_read.dtypes)
推荐答案
如,它是通过在sqlite中创建列类型作为日期时间来解决的,因此在回读时, python会自动转换为datetime
类型.
请记住,当您连接到数据库时,需要给参数 detect_types = sqlite3.PARSE_DECLTYPES
Mind that, when you are connecting to the database, you need to give the parameter detect_types=sqlite3.PARSE_DECLTYPES
这篇关于如何导出到sqlite(或其他格式)并保留日期数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!