使用全等 x 值在 python 中绘图 [英] plotting in python with congruent x-values

查看:46
本文介绍了使用全等 x 值在 python 中绘图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:在同一张图上得到两个不同的名字.确保年份对齐.请注意,不是该文件有两次年份(当女孩和男孩都已命名时),在这种情况下,添加每个名称的所有拆分年份的值.

当前状态:一个名称正在起作用.两个名称会将索引更改为行号,而不是年份号.

Y:'prop'==新生儿中当年所有名字中名字的比例(不分性别).

X: 'year' == 出生证明的年份

中使用的那个?将来最好从您的数据集中包含一个样本,以便其他人可以重现您的工作.

我刚刚将 2006 - 2010 读入 DataFrame,就像这样.

 在[75]中:df.head()出[75]:姓名性别年份0艾米莉F 21365 20061 艾玛 F 19092 20062 麦迪逊 F 18599 20063伊莎贝拉F 18200 20064 Ava F 16925 2006

prop 中添加,如上定义:

在[26]中:df['prop'] = df.groupby('year')['num'].transform(lambda x: x/x.sum())在 [26] 中:df出[26]:名字性别数年道具0艾米丽(Emily)F 21365 2006 0.0054131 艾玛 F 19092 2006 0.0048372 麦迪逊 F 18599 2006 0.0047133伊莎贝拉F 18200 2006 0.0046114 Ava F 16925 2006 0.0042885 阿比盖尔 F 15615 2006 0.003956

我建议使用另一种方法来获取名称和年份的计数.我认为这将使绘图更容易.不要制作两个数据框,每个名称一个,而是同时进行.

In [48]: df.query('name in ["Joeseph", "Nancy"]')出[48]:名字性别数年道具323 南希 F 1014 2006 0.00025723206乔瑟夫M 34 2006 0.00000934401南希F 896 2007 0.00022557551 乔瑟夫 M 39 2007 0.00001069300南希F 853 2008 0.00021892066 约瑟夫 M 45 2008 0.000011104394 Nancy F 663 2009 0.000174127335乔瑟夫M 34 2009 0.000009139050南希F 565 2010 0.000154161863 乔瑟夫 M 29 2010 0.000008[10行x 5列]

在熊猫.13之前,您可以使用 df [df.name.isin(['Joeseph','Nancy'])]

由于您已经计算出了 prop ,因此我们不需要任何进一步的 groupby (这比我之前的方法要简单一些):

In [42]: s = df.query('name in ["Joeseph", "Nancy"]').set_index(['year', 'name'])['prop']在[46]中:ax = s.unstack().plot()

使用这种方法,您不必担心对齐 x 值.已经为您完成了.

Goal: Get two different names on the same graph. Make sure that the years line up. Note, not the file has some years twice (when a name has been given to both girl & boy), in that case add the values for all split years per name.

Current status: one name is working. Two names changes the index to the row number instead of the year number.

Y: 'prop' == the proportion of the name (regardless of sex) to all names given that year to newborns.

X: 'year' == the year of the birth certificate

https://raw2.github.com/hadley/data-baby-names/master/baby-names.csv

has the csv

CODE:

import pandas
import pylab
import matplotlib
from pandas import *
from pylab import *
from matplotlib import *

names = read_csv(r'C:\Users\joe\Documents\Python\baby-names2.csv')


import matplotlib as mpl
import matplotlib.pyplot as plt



resultAry = names[names.name.isin(['Joseph', 'Nancy'])].set_index(['year','name'])['prop']

print (resultAry.head())
print ('***************')
resultAry = resultAry.groupby(level='name')
print (resultAry.head())
resultAry = resultAry.plot()




plt.show()

Thanks, everyone.

The graphs do not line up since there are years with girls named 'Joseph' and boys names 'Nancy'.

============UPDATE============== 2/13/2014

In [12]:

import pandas
import pylab
import matplotlib
from pandas import *
from pylab import *
from matplotlib import *

names = read_csv(r'C:\Users\joe\Documents\Python\baby-names2.csv')
print (names.head())

import matplotlib as mpl
import matplotlib.pyplot as plt

userNames = ['Joseph', 'Nancy']

resultAry = names[names.name.isin(userNames)].set_index(['year','name','sex'])['prop']
resultAry = resultAry.groupby(level='name')
print (resultAry.head())
print ('***************')
resultAry = resultAry.groupby(level='year')
print (resultAry)
#resultAry = resultAry.plot()

   year     name      prop  sex soundex
0  1880     John  0.081541  boy    J500
1  1880  William  0.080511  boy    W450
2  1880    James  0.050057  boy    J520
3  1880  Charles  0.045167  boy    C642
4  1880   George  0.043292  boy    G620
name    year  name    sex
Joseph  1880  Joseph  boy    0.022229
        1881  Joseph  boy    0.022679
        1882  Joseph  boy    0.021879
        1883  Joseph  boy    0.022367
        1884  Joseph  boy    0.022062
Nancy   1889  Nancy   boy    0.000059
        1933  Nancy   boy    0.000044
        1934  Nancy   boy    0.000044
        1935  Nancy   boy    0.000042
        1936  Nancy   boy    0.000059
dtype: float64
***************
name
Joseph    [(1880, [0.022229, 0.000102]), (1881, [0.02267...
Nancy     [(1880, [0.004211]), (1881, [0.004339]), (1882...
dtype: object

Next I got them to add the two values but I am still having a formatting error. arr = list(resultAry['Joseph'])

for i, (year, numbers) in enumerate(arr):
    arr[i][1][:] = [ sum(numbers) ]
print (arr)

[(1880, year  name    sex 
1880  Joseph  boy     0.022331
              girl    0.022331
Name: Joseph, dtype: float64), (1881, year...

Any help advice is greatly appreciated.

解决方案

I'm guessing you're using the Census baby names dataset? The one used in Wes McKinney's book? In the future it's a good idea to include a sample from your dataset so that others can reproduce your work.

I've just got 2006 - 2010 read into a DataFrame, like this.

In [75]: df.head()
Out[75]: 
       name sex    num  year
0     Emily   F  21365  2006
1      Emma   F  19092  2006
2   Madison   F  18599  2006
3  Isabella   F  18200  2006
4       Ava   F  16925  2006

Added in prop as defined above:

In [26]: df['prop'] = df.groupby('year')['num'].transform(lambda x: x / x.sum())


In [26]: df
Out[26]: 
         name sex    num  year      prop
0       Emily   F  21365  2006  0.005413
1        Emma   F  19092  2006  0.004837
2     Madison   F  18599  2006  0.004713
3    Isabella   F  18200  2006  0.004611
4         Ava   F  16925  2006  0.004288
5     Abigail   F  15615  2006  0.003956

I'd suggest a different approach to get the counts by name and year. I think it will make plotting easier. Instead of making two dataframes, one for each name, do it at the same time.

In [48]: df.query('name in ["Joeseph", "Nancy"]')
Out[48]: 
           name sex   num  year      prop
323       Nancy   F  1014  2006  0.000257
23206   Joeseph   M    34  2006  0.000009
34401     Nancy   F   896  2007  0.000225
57551   Joeseph   M    39  2007  0.000010
69300     Nancy   F   853  2008  0.000218
92066   Joeseph   M    45  2008  0.000011
104394    Nancy   F   663  2009  0.000174
127335  Joeseph   M    34  2009  0.000009
139050    Nancy   F   565  2010  0.000154
161863  Joeseph   M    29  2010  0.000008

[10 rows x 5 columns]

Prior to pandas .13 you can use df[df.name.isin(['Joeseph', 'Nancy'])]

Since you already have prop calculated, we don't need any further groupbys (this is a bit simpler than what I had before):

In [42]: s = df.query('name in ["Joeseph", "Nancy"]').set_index(['year', 'name'])['prop']

In [46]: ax = s.unstack().plot()

With this method you shouldn't have to worry about aligning the x-values. It's already done for you.

这篇关于使用全等 x 值在 python 中绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆