pandas 使用多种不同的日期类型格式化日期时间 [英] Pandas format datetime with many different date types

查看:80
本文介绍了 pandas 使用多种不同的日期类型格式化日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对数据"列进行格式化以形成带有日期的模式.

I am trying to format the column 'Data' to make a pattern with dates.

我的格式是:

1/30/20 16:00
1/31/2020 23:59
2020-02-02T23:43:02

这是数据框的代码.

import requests
import pandas as pd
import numpy as np
url = "https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
csv_only  = [i.split("=")[1][1:-1] for i in requests.get(url).text.split(" ") if '.csv' in i and 'title' in i]

combo = [pd.read_csv(url.replace("github","raw.githubusercontent").replace("/tree/","/")+"/"+f) for f in csv_only]

one_df = pd.concat(combo,ignore_index=True)

one_df["País"] = one_df["Country/Region"].fillna(one_df["Country_Region"])
one_df["Data"] = one_df["Last Update"].fillna(one_df["Last_Update"])

我尝试添加下面的代码,但没有带来我想要的结果

I tried adding the code bellow but it doesnt bring the result I wanted

pd.to_datetime(one_df['Data'])
one_df.style.format({"Data": lambda t: t.strftime("%m/%d/%Y")})

有帮助吗?

更新

这是完整的代码,但是不起作用.许多例外使用不同的日期格式打印.

This is the complete code, but it doesnt work. Many exceptions printed with different date formats.

import requests
import pandas as pd
import numpy as np
from datetime import datetime
url = "https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
csv_only  = [i.split("=")[1][1:-1] for i in requests.get(url).text.split(" ") if '.csv' in i and 'title' in i]

combo = [pd.read_csv(url.replace("github","raw.githubusercontent").replace("/tree/","/")+"/"+f) for f in csv_only]

one_df = pd.concat(combo,ignore_index=True)

df = pd.DataFrame()
DATE_FORMATS = ["%m/%d/%y %H:%M", "%m/%d/%Y %H:%M", "%Y-%m-%dT%H:%M:%S", "%Y-%m-%d %H:%M:%S", "%Y-%m-%d %H:%M:%S", "%Y-%m-%d  %H:%M:%S"]

df["Região"] = one_df["Province/State"].fillna(one_df["Admin2"])
df["País"] = one_df["Country/Region"].fillna(one_df["Country_Region"])
df["Data"] = one_df["Last Update"].fillna(one_df["Last_Update"])
df["Confirmados"] = one_df["Confirmed"]
df["Mortes"] = one_df["Deaths"]
df["Recuperados"] = one_df["Recovered"]

def parse(x_):
    for fmt in DATE_FORMATS :
        try:
            tmp = datetime.strptime(x_, fmt).strftime("%m/%d/%Y")
            return tmp
        except ValueError:
            print(x_)

pd.to_datetime(df['Data'])
df['Data'] = df['Data'].apply(lambda x: parse(x))

#df['Data'].strftime('%m/%d/%Y')
#df['Data'] = df['Data'].map(lambda x: x.strftime('%m/%d/%Y') if x else '')

df.to_excel(r'C:\Users\guilh\Downloads\Covid2\Covid-19.xlsx', index=False,  encoding="utf8")
print(df)

推荐答案

from datetime import datetime
import pandas as pd

您可以将所有可能的格式另存为-

You could save all possible formats in a list as -

DATE_FORMATS = ["%Y-%m-%d %H:%M:%S", "%Y-%m-%dT%H:%M:%S", "%m/%d/%y %H:%M", "%m/%d/%Y %H:%M"]

定义一个循环遍历格式并尝试对其进行解析的函数.(修复了一个错误,该错误中 print 语句应该在 for 循环之外)

Define a function that loops through the formats and tries to parse it. (Fixed a bug, where the print statement should have been outside the for loop)

issues = set()
def parse(x_):
    for fmt in DATE_FORMATS:
        try:
            return datetime.strptime(x_, fmt).strftime("%m/%d/%Y")
        except ValueError:
            pass
    issues.add(x_)


sample = ["1/30/20 16:00", "1/31/2020 23:59", "2020-02-02T23:43:02"]

df = pd.DataFrame({'data': sample})
df['data'] = df['data'].apply(lambda x: parse(x))

assert df['Data'].isna().sum() == len(issues) == 0, "Issues observed, nulls observed in dataframe"

print("Done")

输出

         data
0  01/30/2020
1  01/31/2020
2  02/02/2020

如果 df.apply()遇到列表中未定义的特定日期格式,则它将仅打印 None ,因为不会返回任何内容.函数 parse()

If df.apply() comes across a particular date format that hasn't been defined in the list, it would simply print None since nothing would be returned by the function parse()

这篇关于 pandas 使用多种不同的日期类型格式化日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆