我可以展示死刑的不同方法并预测未来几年吗 [英] Can I show the different methods of death penalties as well as predict future years

查看:32
本文介绍了我可以展示死刑的不同方法并预测未来几年吗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够预测以下数据集的死刑上升/下降这是美国 1976 年的死刑数据:

我们可以检查数据的新形状是否为我们提供了正确的方法"编号.计数:

df2.sum()# 方法# 电刑 17.0# 射击小队 1.0# 毒气室 1.0# 致命注射 923.0

解释

df['Date'] = pd.to_datetime(df['Date'])# 过滤掉日期值小于 1999 年的行df = df[df['日期'].dt.year >= 1999]# 设置索引为日期时间df = df.set_index('日期')# 这一点变得有趣 - 我们按每个方法分组,然后重新采样# 在每个组内,这样我们每个月都有一行,现在每个月都有一个# 与该月相关的所有先前行的计数.由于数据框是# 现在每列填充相同的计数值,我们任意取# 第一个是 'Name'# 注意:您可以将重采样频率更改为您想要的任何时间段,# 我刚刚选择了月份,因为它的粒度足以涵盖整个时期df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()#                              名称# 方法日期# 触电 1999-06-30 1# 1999-07-31 1# 1999-08-31 1# 1999-09-30 0# 1999-10-31 0# ... ...# 致命注射 2016-08-31 0# 2016-09-30 0# 2016-10-31 2# 2016-11-30 1# 2016-12-31 2df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)# 方法电击射击小队毒气室注射致死# 日期# 1999-01-31 0.0 0.0 0.0 10.0# 1999-02-28 0.0 0.0 0.0 12.0# 1999-03-31 0.0 0.0 1.0 7.0# 1999-04-30 0.0 0.0 0.0 10.0# 1999-05-31 0.0 0.0 0.0 6.0# ... ... ... ... ...# 2016-08-31 0.0 0.0 0.0 0.0# 2016-09-30 0.0 0.0 0.0 0.0# 2016-10-31 0.0 0.0 0.0 2.0# 2016-11-30 0.0 0.0 0.0 1.0# 2016-12-31 0.0 0.0 0.0 2.0

I would like to be able to predict the rise/fall in death penalties for this dataset below This is USA 1976 death penalty data found at: https://www.kaggle.com/usdpic/execution-database. I want to have the Y axis showing amount of death penaitlies over the years and using different colours show the different methods, with the x axis showing the count of deaths penalities from 1999 onwards This is my code so far

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression 
df['Date'] = pd.to_datetime(df['Date'])

res = df[~(df['Date'] < '1999-01-01')]

print(res)
Count = res['Date'].value_counts()
print(Count)
time= df['Date'] = pd.to_datetime(df['Date'])
df['Date']=df['Date'].map(dt.datetime.toordinal)
print (time)
x = np.array(time)
y = np.array(Count)
xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=1/3, random_state=0)

But I'm getting error:

 ValueError: Found input variables with inconsistent numbers of samples: [1442, 834]

解决方案

It sounds like what you want is to reshape your data so that you have a time series for each "method", which you can then use in a predictive model. It's probably worth pointing out that the distribution of "Method" is really skewed (values are from 1999 onwards), so it will be very difficult/impossible to forecast most of them:

df['Method'].value_counts()

# Lethal Injection    923
# Electrocution        17
# Gas Chamber           1
# Firing Squad          1

Here is a solution that will help you reshape your data to get time series data for each "Method" (I've added a bit more of an explanation at the end):

df['Date'] = pd.to_datetime(df['Date'])

df = df[df['Date'].dt.year >= 1999]

df = df.set_index('Date')

df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()

df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)

df2.plot()

We can check that the new shape of the data gives us the correct number of "Method" counts:

df2.sum()

# Method
# Electrocution        17.0
# Firing Squad          1.0
# Gas Chamber           1.0
# Lethal Injection    923.0

Explained

df['Date'] = pd.to_datetime(df['Date'])

# Filter out rows where date values where the year is less than 1999
df = df[df['Date'].dt.year >= 1999]

# Set the index to be the datetime
df = df.set_index('Date')

# This bit gets interesting - we're grouping by each method and then resampling
# within each group so that we get a row per month, where each month now has a
# count of all the previous rows associated with that month. As the dataframe is
# now filled with the same count value for each column, we arbitrarily take the 
# first one which is 'Name'
# Note: you can change the resampling frequency to any time period you want, 
# I've just chosen month as it is granular enough to cover the whole period
 
df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()

#                              Name
# Method           Date            
# Electrocution    1999-06-30     1
#                  1999-07-31     1
#                  1999-08-31     1
#                  1999-09-30     0
#                  1999-10-31     0
# ...                           ...
# Lethal Injection 2016-08-31     0
#                  2016-09-30     0
#                  2016-10-31     2
#                  2016-11-30     1
#                  2016-12-31     2

df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)

# Method      Electrocution  Firing Squad  Gas Chamber  Lethal Injection
# Date                                                                  
# 1999-01-31            0.0           0.0          0.0              10.0
# 1999-02-28            0.0           0.0          0.0              12.0
# 1999-03-31            0.0           0.0          1.0               7.0
# 1999-04-30            0.0           0.0          0.0              10.0
# 1999-05-31            0.0           0.0          0.0               6.0
# ...                   ...           ...          ...               ...
# 2016-08-31            0.0           0.0          0.0               0.0
# 2016-09-30            0.0           0.0          0.0               0.0
# 2016-10-31            0.0           0.0          0.0               2.0
# 2016-11-30            0.0           0.0          0.0               1.0
# 2016-12-31            0.0           0.0          0.0               2.0

这篇关于我可以展示死刑的不同方法并预测未来几年吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆