pandas 日期时间格式 [英] Pandas datetime formatting
问题描述
是否可以用后缀零表示pd.to_datetime?似乎零已被删除.
Is it possible to represent a pd.to_datetime with suffix of zeros? It seems the zeros are being removed.
print pd.to_datetime("2000-07-26 14:21:00.00000",
format="%Y-%m-%d %H:%M:%S.%f")
结果是
2000-07-26 14:21:00
所需的结果将是
2000-07-26 14:21:00.00000
我知道这些值具有相同的含义,但是对于保持一致性很有帮助.
I know the values mean the same thing but it would be nice for consistency.
推荐答案
进行一些测试表明,当使用format =%H:%M:%S.%f"格式化日期时间数据时,%f可以如果小数点后第九位为非零,则为纳秒分辨率.在格式化字符串时,将根据从小数点后的最低有效数字的位置,加上给定的最后一个数字,来添加从零到五的可变数量的尾随零.这是一张来自测试数据的表格,其中position是最低有效非零的位置,也是最后一位的位置,零是通过格式化添加的尾随零的数量:
Doing some testing shows that when formatting date-time data with format="%H:%M:%S.%f", %f is capable of nanosecond resolution provided the ninth digit after the decimal place is non-zero. When formatting a string, a variable number of trailing zeros from none to five are added depending on the position of the least significant digit after the decimal point and given that its also the final digit. Here's a table of that from test data where position is the position of the least significant non-zero and also final digit and zeros is the number of trailing zeros added by formatting:
position zeros
9 0
8 1
7 2
6 0
5 1
4 2
3 3
2 4
1 5
当一列的整体格式为%H:%M:%S.%f"时,其所有元素的小数点后位数均相同,这可以通过添加或删除来实现尾随零,即使这会增加或减少原始数据的分辨率.我猜想这是出于一致性和美观的原因,通常不会引入过多的误差,因为在数值计算中,结尾的零通常不会影响立即结果,但是它们会影响对误差以及误差的表示方式(重要数字规则).
When a column is formatted with "%H:%M:%S.%f" as a whole, all of its elements will have the same number of digits after the decimal point, which may be done by adding or removing trailing zeros even if that increases or decreases the resolution of raw data. I guess reasons for this are consistency and pleasing aesthetics without usually introducing excessive error, since in numeric calculations trailing zeros don't typically affect immediate results, however they can affect estimation of their error and how they should be presented (Trailing Zeros, Rules for Significant Figures).
以下是将%H:%M:%S.%f"格式应用于带有pandas.to_datetime的单个字符串和pandas.Series(DataFrame列)并应用pandas.DataFrame.convert_objects(convert_dates ='coerce ')到具有可转换为datetime的列的DataFrames.
Below are some observations of applying "%H:%M:%S.%f" format to individual strings and pandas.Series (DataFrame columns) with pandas.to_datetime and applying pandas.DataFrame.convert_objects(convert_dates='coerce') to DataFrames with a column that could be converted to datetime.
在字符串上,pandas使用%H:%M:%S.%f"在时间转换中保留最多零个小数点后第九位的非零数字,并在未提供的情况下添加日期:>
On a string pandas preserves a non-zero digit in up to the ninth decimal place in time conversion with "%H:%M:%S.%f" and adds a date if a one was not provided:
import pandas as pd
pd.to_datetime ("10:00:00.000000001",format="%H:%M:%S.%f")
Out[15]: Timestamp('1900-01-01 10:00:00.000000001')
pd.to_datetime ("2015-09-17 10:00:00.000000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[15]: Timestamp('2015-09-17 10:00:00.000000001')
在此之前,对于最终非零数字是最终数字的测试,它会在最终非零数字后加至五个尾随零,从而提高原始数据的分辨率,除非最终非零数字在小数点后第6位:
Prior to that and for tests in which the final non-zero digit is the final digit, it adds up to five trailing zeros after the final non-zero digit increasing the resolution of the raw data except when the final non-zero digit is in position six to the right of the decimal place:
pd.to_datetime ("10:00:00.00000001",format="%H:%M:%S.%f")
Out[15]: Timestamp('1900-01-01 10:00:00.000000010')
pd.to_datetime ("2015-09-17 10:00:00.00000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[16]: Timestamp('2015-09-17 10:00:00.000000010')
pd.to_datetime ("10:00:00.0000001",format="%H:%M:%S.%f")
Out[15]: Timestamp('1900-01-01 10:00:00.000000100')
pd.to_datetime ("2015-09-17 10:00:00.0000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[17]: Timestamp('2015-09-17 10:00:00.000000100')
pd.to_datetime ("10:00:00.000001",format="%H:%M:%S.%f")
Out[33]: Timestamp('1900-01-01 10:00:00.000001')
pd.to_datetime ("2015-09-17 10:00:00.000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[18]: Timestamp('2015-09-17 10:00:00.000001')
pd.to_datetime ("10:00:00.00001",format="%H:%M:%S.%f")
Out[6]: Timestamp('1900-01-01 10:00:00.000010')
pd.to_datetime ("2015-09-17 10:00:00.00001",format="%Y-%m-%d %H:%M:%S.%f")
Out[19]: Timestamp('2015-09-17 10:00:00.000010')
pd.to_datetime ("10:00:00.0001",format="%H:%M:%S.%f")
Out[9]: Timestamp('1900-01-01 10:00:00.000100')
pd.to_datetime ("2015-09-17 10:00:00.0001",format="%Y-%m-%d %H:%M:%S.%f")
Out[21]: Timestamp('2015-09-17 10:00:00.000100')
pd.to_datetime ("10:00:00.001",format="%H:%M:%S.%f")
Out[10]: Timestamp('1900-01-01 10:00:00.001000')
pd.to_datetime ("2015-09-17 10:00:00.001",format="%Y-%m-%d %H:%M:%S.%f")
Out[22]: Timestamp('2015-09-17 10:00:00.001000')
pd.to_datetime ("10:00:00.01",format="%H:%M:%S.%f")
Out[12]: Timestamp('1900-01-01 10:00:00.010000')
pd.to_datetime ("2015-09-17 10:00:00.01",format="%Y-%m-%d %H:%M:%S.%f")
Out[24]: Timestamp('2015-09-17 10:00:00.010000'
pd.to_datetime ("10:00:00.1",format="%H:%M:%S.%f")
Out[13]: Timestamp('1900-01-01 10:00:00.100000')
pd.to_datetime ("2015-09-17 10:00:00.1",format="%Y-%m-%d %H:%M:%S.%f")
Out[26]: Timestamp('2015-09-17 10:00:00.100000')
让我们看看它如何与DataFrame一起使用:
Let see how it works with a DataFrame:
!type test.csv # here type is Windows substitute for Linux cat command
date,mesg
10:00:00.000000001,one
10:00:00.00000001,two
10:00:00.0000001,three
10:00:00.000001,four
10:00:00.00001,five
10:00:00.0001,six
10:00:00.001,seven
10:00:00.01,eight
10:00:00.1,nine
10:00:00.000000001,ten
10:00:00.000000002,eleven
10:00:00.000000003,twelve
df = pd.read_csv('test.csv')
df
Out[30]:
date mesg
0 10:00:00.000000001 one
1 10:00:00.00000001 two
2 10:00:00.0000001 three
3 10:00:00.000001 four
4 10:00:00.00001 five
5 10:00:00.0001 six
6 10:00:00.001 seven
7 10:00:00.01 eight
8 10:00:00.1 nine
9 10:00:00.000000001 ten
10 10:00:00.000000002 eleven
11 10:00:00.000000003 twelve
df.dtypes
Out[31]:
date object
mesg object
dtype: object
具有convert_objects的DataFrame的日期时间转换(不具有格式选项),即使某些原始数据的分辨率小于或大于该分辨率,并提供今天的日期,也可以提供微秒的分辨率:
Datetime conversion of a DataFrame with convert_objects, which does not have format option, provides microsecond resolution even when some raw data has resolution less or more than that and adds today's date:
df2 = df.convert_objects(convert_dates='coerce')
df2
Out[32]:
date mesg
0 2015-09-17 10:00:00.000000 one
1 2015-09-17 10:00:00.000000 two
2 2015-09-17 10:00:00.000000 three
3 2015-09-17 10:00:00.000001 four
4 2015-09-17 10:00:00.000010 five
5 2015-09-17 10:00:00.000100 six
6 2015-09-17 10:00:00.001000 seven
7 2015-09-17 10:00:00.010000 eight
8 2015-09-17 10:00:00.100000 nine
9 2015-09-17 10:00:00.000000 ten
10 2015-09-17 10:00:00.000000 eleven
11 2015-09-17 10:00:00.000000 twelve
df2.dtypes
Out[33]:
date datetime64[ns]
mesg object
dtype: object
使用原始数据创建的DataFrame列中的元素值具有更高的分辨率,其中某些分辨率大于微秒,无法在日期时间转换后使用%H:%M:%S.%f"格式进行格式化,而没有显式格式说明符(与DataFrame.convert_objects一起使用):
Greater resolution of element values in a DataFrame column created from raw data some of which has greater than microsecond resolution is not recoverable with "%H:%M:%S.%f" formatting after datetime conversion done without an explicit format specifier (that is with DataFrame.convert_objects):
df2['date'] = pd.to_datetime(df2['date'],format="%H:%M:%S.%f")
df2
Out[34]:
date mesg
0 2015-09-17 10:00:00.000000 one
1 2015-09-17 10:00:00.000000 two
2 2015-09-17 10:00:00.000000 three
3 2015-09-17 10:00:00.000001 four
4 2015-09-17 10:00:00.000010 five
5 2015-09-17 10:00:00.000100 six
6 2015-09-17 10:00:00.001000 seven
7 2015-09-17 10:00:00.010000 eight
8 2015-09-17 10:00:00.100000 nine
9 2015-09-17 10:00:00.000000 ten
10 2015-09-17 10:00:00.000000 eleven
11 2015-09-17 10:00:00.000000 twelve
如果至少一个元素的第九位具有非零数字,则在日期时间转换之前使用%H:%M:%S.%f"格式化DataFrame枚举可提供纳秒级分辨率(如 pandas.to_datetime文档),但还可以通过小于十亿分之一秒的分辨率,并添加1900-01-01作为日期:
Formatting a DataFrame colume with "%H:%M:%S.%f" before datetime conversion provides nanosecond resolution if at least one element has a non-zero digit in the ninth place (as advertised in pandas.to_datetime documentation), but also increases the resolution of raw data with less than nanosecond resolution to that level and adds 1900-01-01 as the date:
df3 = df.copy(deep=True)
df3['date'] = pd.to_datetime(df3['date'],format="%H:%M:%S.%f",coerce=True)
df3
Out[35]:
date mesg
0 1900-01-01 10:00:00.000000001 one
1 1900-01-01 10:00:00.000000010 two
2 1900-01-01 10:00:00.000000100 three
3 1900-01-01 10:00:00.000001000 four
4 1900-01-01 10:00:00.000010000 five
5 1900-01-01 10:00:00.000100000 six
6 1900-01-01 10:00:00.001000000 seven
7 1900-01-01 10:00:00.010000000 eight
8 1900-01-01 10:00:00.100000000 nine
9 1900-01-01 10:00:00.000000001 ten
10 1900-01-01 10:00:00.000000002 eleven
11 1900-01-01 10:00:00.000000003 twelve
使用%H:%M:%S.%f"格式化DataFrame列,在数据后添加零,并在小数点后添加最低有效非零数字(在整个列中,并根据位置添加零:zeros表),并将所有其他数据的分辨率与此对齐,即使这样做会增加或降低某些原始数据的分辨率:
Formatting a DataFrame column with "%H:%M:%S.%f" adds zeros after the datum with the least significant non-zero digit after decimal point (over the whole column and zeros are added according to the position:zeros table above) and aligns the resolution of all other data with that even if doing so increases or decreases the resolution of some raw data:
df4 = pd.read_csv('test2.csv')
df4
Out[36]:
date mesg
0 10:00:00.000000000 one
1 10:00:00.00000000 two
2 10:00:00.0000000 three
3 10:00:00.000000 four
4 10:00:00.00000 five
5 10:00:00.0001 six
6 10:00:00.00 seven
7 10:00:00.0 eight
8 10:00:00. nine
9 10:00:00.000000000 ten
10 10:00:00.000000000 eleven
11 10:00:00.00000000 twelve
df4['date'] = pd.to_datetime(df4['date'],format="%H:%M:%S.%f",coerce=True)
df4
Out[37]:
date mesg
0 1900-01-01 10:00:00.000000 one
1 1900-01-01 10:00:00.000000 two
2 1900-01-01 10:00:00.000000 three
3 1900-01-01 10:00:00.000000 four
4 1900-01-01 10:00:00.000000 five
5 1900-01-01 10:00:00.000100 six
6 1900-01-01 10:00:00.000000 seven
7 1900-01-01 10:00:00.000000 eight
8 NaT nine # nothing after decimal point in raw data
9 1900-01-01 10:00:00.000000 ten
10 1900-01-01 10:00:00.000000 eleven
11 1900-01-01 10:00:00.000000 twelve
当尝试使用相同的DataFrame但日期列中包含日期时,发生了相同的事情:
When attempting this with the same DataFrame with but with dates included in the date column, the same thing happened:
df25
Out[38]:
date mesg
0 2015-09-10 10:00:00.000000000 one
1 2015-09-11 10:00:00.00000000 two
2 2015-09-12 10:00:00.0000000 three
3 2015-09-13 10:00:00.000000 four
4 2015-09-14 10:00:00.00000 five
5 2015-09-15 10:00:00.0001 six
6 2015-09-16 10:00:00.00 seven
7 2015-09-17 10:00:00.0 eight
8 2015-09-18 10:00:00. nine
9 2015-09-19 10:00:00.000000000 ten
10 2015-09-20 10:00:00.000000000 eleven
11 2015-09-21 10:00:00.00000000 twelve
df25['date'] = pd.to_datetime(df25['date'],format="%Y-%m-%d %H:%M:%S.%f",coerce=True)
df25
Out[39]:
date mesg
0 2015-09-10 10:00:00.000000 one
1 2015-09-11 10:00:00.000000 two
2 2015-09-12 10:00:00.000000 three
3 2015-09-13 10:00:00.000000 four
4 2015-09-14 10:00:00.000000 five
5 2015-09-15 10:00:00.000100 six
6 2015-09-16 10:00:00.000000 seven
7 2015-09-17 10:00:00.000000 eight
8 NaT nine # nothing after decimal point in raw data
9 2015-09-19 10:00:00.000000 ten
10 2015-09-20 10:00:00.000000 eleven
11 2015-09-21 10:00:00.000000 twelve
如果没有原始数据的小数点后有一个非零的有效数字,则使用DataFrame列%H:%M:%S.%f"进行格式化可能会统一为所有小数点在小数点后仅提供两个零.数据,即使这会增加或减少某些原始数据的分辨率:
When no raw datum has a non-zero significant digit after the decimal point, formatting with a DataFrame column "%H:%M:%S.%f" may uniformly provide just two zeros after the decimal point for all the data even when that increases or decreases the resolution of some raw data:
df5 = pd.read_csv('test3.csv')
df5
Out[40]:
date mesg
0 10:00:00.000 one
1 10:00:00.0 two
2 10:00:00.000 three
3 10:00:00.000 four
4 10:00:00.00 five
5 10:00:00.000 six
6 10:00:00.00 seven
7 10:00:00.0 eight
8 10:00:00.0 nine
9 10:00:00.000000000 ten
10 10:00:00.000 eleven
11 10:00:00.000 twelve
df5['date'] = pd.to_datetime(df5['date'],format="%H:%M:%S.%f",coerce=True)
df5
Out[41]:
date mesg
0 1900-01-01 10:00:00 one
1 1900-01-01 10:00:00 two
2 1900-01-01 10:00:00 three
3 1900-01-01 10:00:00 four
4 1900-01-01 10:00:00 five
5 1900-01-01 10:00:00 six
6 1900-01-01 10:00:00 seven
7 1900-01-01 10:00:00 eight
8 1900-01-01 10:00:00 nine
9 1900-01-01 10:00:00 ten
10 1900-01-01 10:00:00 eleven
11 1900-01-01 10:00:00 twelve
使用相同的DataFrame进行此测试时发生了相同的事情,但日期列中包含了日期:
The same thing happened when doing this test with the same DataFrame but with dates included in the date column:
df45
Out[42]:
date mesg
0 2015-09-10 10:00:00.000 one
1 2015-09-11 10:00:00.0 two
2 2015-09-12 10:00:00.000 three
3 2015-09-13 10:00:00.000 four
4 2015-09-14 10:00:00.00 five
5 2015-09-15 10:00:00.000 six
6 2015-09-16 10:00:00.00 seven
7 2015-09-17 10:00:00.0 eight
8 2015-09-18 10:00:00.0 nine
9 2015-09-19 10:00:00.000000000 ten
10 2015-09-20 10:00:00.000 eleven
11 2015-09-21 10:00:00.000 twelve
df45['date'] = pd.to_datetime(df45['date'],format="%Y-%m-%d %H:%M: %S.%f",coerce=True)
df45
Out[43]:
date mesg
0 2015-09-10 10:00:00 one
1 2015-09-11 10:00:00 two
2 2015-09-12 10:00:00 three
3 2015-09-13 10:00:00 four
4 2015-09-14 10:00:00 five
5 2015-09-15 10:00:00 six
6 2015-09-16 10:00:00 seven
7 2015-09-17 10:00:00 eight
8 2015-09-18 10:00:00 nine
9 2015-09-19 10:00:00 ten
10 2015-09-20 10:00:00 eleven
11 2015-09-21 10:00:00 twelve
这篇关于 pandas 日期时间格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!