Python如何在内部存储日期时间? [英] How does Python store datetime internally?
问题描述
我找到了 _datetimemodule.c
这似乎是正确的文件,但是我需要一点帮助,因为C不是我的强项。
I found _datetimemodule.c
which seems to be the right file, but I need a bit of help as C is not my strength.
>>> import datetime
>>> import sys
>>> d = datetime.datetime.now()
>>> sys.getsizeof(d)
48
>>> d = datetime.datetime(2018, 12, 31, 23, 59, 59, 123)
>>> sys.getsizeof(d)
48
因此,时区未知的datetime对象需要48个字节。查看 PyDateTime_DateTimeType
,它似乎是 PyDateTime_DateType
和 PyDateTime_TimeType
。也许还 _PyDateTime_BaseTime
?
So a timezone-unaware datetime object nees 48 Bytes. Looking at the PyDateTime_DateTimeType
, it seems to be a PyDateTime_DateType
and a PyDateTime_TimeType
. Maybe also _PyDateTime_BaseTime
?
通过查看代码,我的印象是每个字段存储一个组件在 YYYY-mm-dd HH:MM:ss
中,意思是:
From looking at the code, I have the impression that one component is stored for each field in YYYY-mm-dd HH:MM:ss
, meaning:
- 年份:例如int(例如
int16_t
为16位) - 月:例如
int8_t
- 天:例如
int8_t
- 小时:例如
int8_t
- 分钟:例如
int8_t
- 第二个:例如
int8_t
- 微秒:例如
uint16_t
- Year: e.g. int (e.g
int16_t
would be 16 bit) - Month: e.g
int8_t
- day: e.g.
int8_t
- Hour: e.g.
int8_t
- Minute: e.g.
int8_t
- Second: e.g.
int8_t
- Microsecond: e.g.
uint16_t
但这将是2 * 16 + 5 * 8 = 72位= 9字节,而不是Python告诉我的48字节。
But that would be 2*16 + 5 * 8 = 72 Bit = 9 Byte and not 48 Byte as Python tells me.
我对日期时间内部结构的假设在哪里呢?
(我想这在Python实现之间可能会有所不同-如果是这样,请关注cPython)
(I guess this might differ between Python implementations - if so, please focus on cPython)
推荐答案
您缺少图片的关键部分:实际的日期时间结构定义,该定义位于 Include / datetime.h
。那里也有重要的评论。以下是一些关键摘录:
You're missing a key part of the picture: the actual datetime struct definitions, which lie in Include/datetime.h
. There are also important comments in there. Here are some key excerpts:
/* Fields are packed into successive bytes, each viewed as unsigned and
* big-endian, unless otherwise noted:
*
* byte offset
* 0 year 2 bytes, 1-9999
* 2 month 1 byte, 1-12
* 3 day 1 byte, 1-31
* 4 hour 1 byte, 0-23
* 5 minute 1 byte, 0-59
* 6 second 1 byte, 0-59
* 7 usecond 3 bytes, 0-999999
* 10
*/
...
/* # of bytes for year, month, day, hour, minute, second, and usecond. */
#define _PyDateTime_DATETIME_DATASIZE 10
...
/* The datetime and time types have hashcodes, and an optional tzinfo member,
* present if and only if hastzinfo is true.
*/
#define _PyTZINFO_HEAD \
PyObject_HEAD \
Py_hash_t hashcode; \
char hastzinfo; /* boolean flag */
...
/* All datetime objects are of PyDateTime_DateTimeType, but that can be
* allocated in two ways too, just like for time objects above. In addition,
* the plain date type is a base class for datetime, so it must also have
* a hastzinfo member (although it's unused there).
*/
...
#define _PyDateTime_DATETIMEHEAD \
_PyTZINFO_HEAD \
unsigned char data[_PyDateTime_DATETIME_DATASIZE];
typedef struct
{
_PyDateTime_DATETIMEHEAD
} _PyDateTime_BaseDateTime; /* hastzinfo false */
typedef struct
{
_PyDateTime_DATETIMEHEAD
unsigned char fold;
PyObject *tzinfo;
} PyDateTime_DateTime; /* hastzinfo true */
您看到的48字节计数分解如下:
The 48-byte count you're seeing breaks down as follows:
- 8字节引用计数
- 8字节类型指针
- 8字节缓存的哈希
- 1字节 hastzinfo标志
- 7字节填充
- 10字节手动打包的
char [10]
包含日期时间数据 - 6字节填充
- 8-byte refcount
- 8-byte type pointer
- 8-byte cached hash
- 1-byte "hastzinfo" flag
- 7-byte padding
- 10-byte manually packed
char[10]
containing datetime data - 6-byte padding
当然,这是所有实施细节。在不同的Python实现,不同的CPython版本,32位CPython构建或CPython调试构建(在定义了Py_TRACE_REFS的情况下编译CPython时,PyObject_HEAD中会有多余的东西)可能会有所不同。
This is, of course, all implementation details. It may be different on a different Python implementation, or a different CPython version, or a 32-bit CPython build, or a CPython debug build (there's extra stuff in the PyObject_HEAD when CPython is compiled with Py_TRACE_REFS defined).
这篇关于Python如何在内部存储日期时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!