如何在 Seaborn distplot 中绘制 Pandas 日期时间序列? [英] How to plot Pandas datetime series in Seaborn distplot?

查看:32
本文介绍了如何在 Seaborn distplot 中绘制 Pandas 日期时间序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有日期时间列的 Pandas 数据框.我想根据该日期列绘制行的分布,但我目前遇到了一个无益的错误.我有:

I have a pandas dataframe with a datetime column. I would like to plot the distribution of the rows according to that date column, but I'm currenty getting an unhelpful error. I have:

df['Date'] = pd.to_datetime(df['Date'], errors='raise')
s = sns.distplot(df['Date'])

抛出错误:

TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

如果我将要绘制的列更改为数字数据,则一切正常.我怎样才能让日期时间列表现得很好?我在文档中找不到太多关于我认为我需要的内容.任何和所有帮助表示赞赏.

If I change the column I'm plotting to numeric data then it all works fine. How can I get the datetime column to behave nicely? I can't really find much about what I think I need in the docs. Any and all help appreciated.

以下是 df.head(2) 的结果,出于安全等原因,我删除了一些列:

The below is the result of df.head(2), I have removed some columns for security reasons etc:

               Date                 
2812         2016-03-05
2813         2016-03-05

显然该列(作为一个系列)具有属性

Apparently the column (when taken as a series) has properties

Name: Date, dtype: datetime64[ns]

推荐答案

我自己也遇到同样的问题时遇到了这个问题.正如评论中提到的,seaborn 的 distplot 似乎不支持日期.不幸的是,我在官方文档中找不到任何内容来支持这一说法.

I came across this question while having the same problem myself. As mentioned in comments, it seems like seaborn's distplot doesn't support dates to work with. Unfortunately, I could not find anything in official documentation to support this claim.

我找到了两种方法来解决这个问题.它们都不是完美的,但这是我发现的最好的.

I found two ways to deal with this problem. None of them is perfect, yet that's the best I found.

选项 1:将日期转换为数字

转换为一些数字度量并使用它.displot 处理数字,所以如果每个日期都用数字表示,我们就可以了.日期和数字之间的映射有点像使用 MinMax Scaler.例如,我们可以将2017-01-01"设置为0,2020-06-06"设置为1,并将它们之间的所有日期映射到[0,1]范围内的值.

Convert to some numeric metric and work with that. displot works with numbers, so if each date was represented by a number we will be okay. The mapping between dates and numbers is kinda like use MinMax Scaler. For example, We can set "2017-01-01" as 0 and "2020-06-06" as 1, and map all dates between them to values in range [0,1].

使用的数字范围取决于您的数据范围,可能是天/月/年等.

What range of numbers to use it's depends on the range of your data, could be days/months/ years or etc.

我将通过这个玩具示例演示这种方法.

I'll demonstrate this approach with this toy example.

import pandas as pd
import datetime as dt

original_dates = ["2016-03-05", "2016-03-05", "2016-02-05", "2016-02-05", "2016-02-05", "2014-03-05"]
dates_list = [dt.datetime.strptime(date, '%Y-%m-%d').date() for date in original_dates]

df = pd.DataFrame({"Date":dates_list})

现在数据框如下:

         Date
0  2016-03-05
1  2016-03-05
2  2016-02-05
3  2016-02-05
4  2016-02-05
5  2014-03-05

(当然,这不是将日期输入到数据框的最佳方式,但方式无关紧要).

(not the best way to enter dates to dataframe of course, but it doesn't matter how).

现在我创建一个新列,其中包含最小日期之间的天数差异:

Now I create a new column which will hold the difference in days between minimum date:

df["NewDate"] = df["Date"] - dt.date(2014,3,5)
df["NewDate"] = df["NewDate"].apply(lambda x: x.days)

结果:

         Date  NewDate
0  2016-03-05      731
1  2016-03-05      731
2  2016-02-05      702
3  2016-02-05      702
4  2016-02-05      702
5  2014-03-05        0

注意我硬编码"了最小日期.您可以使用更好的方法来查找最小值而不是对其进行硬编码.我只是想尽快得到这部分.

notice I "hard-coded" the minimum date. You can use better ways to find minimum and not hard-coded it. I just wanted to get this part as fast as possible.

现在我们可以在我们的新列上使用 displot:

Now we can use displot on our new column:

import seaborn as sns
sns.set()
ax = sns.distplot(df['NewDate'])

输出:

如您所见,它显示的是天数而不是日期.对于我个人的问题,可以这样展示.如果要将其显示为日期,则需要一些额外的步骤:显示 x 轴函数的 xticks,而不是直接显示它自身的数据.日期示例(熊猫,matplotlib)

As you can see, it shows the days instead of dates. For my personal problem it was okay to show it that way. If you want to show it as dates, some extra step is needed: Show xticks which are function of x-axis, not directly the data it self. Example with dates (pandas, matplotlib)

正如我之前所说,我使用天差缩放,但您可以使用数月或数年进行相同的缩放.取决于数据.

As I said earlier, I used scaling by days difference but you can do the same with months or years. Depends on the data.

方案二:直接使用直方图,不用seaborn的displot

在这个问题中:Pandas 可以绘制日期的直方图吗? 有一个答案如何使用 Pandas 的 groupby 绘制带有日期的直方图.

In this question: Can Pandas plot a histogram of dates? there is an answer how to plot histogram with dates, using pandas's groupby.

它与 displot 不同,但它可以是足够接近的解决方案(因为 displot 最终是基于 matplotlib 的 hist).

It's not the same as displot, but it can be close-enough solution (as displot eventually is based on matplotlib's hist).

这篇关于如何在 Seaborn distplot 中绘制 Pandas 日期时间序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆