使用全等 x 值在 python 中绘图 [英] plotting in python with congruent x-values
问题描述
目标:在同一张图上得到两个不同的名字.确保年份对齐.请注意,不是该文件有两次年份(当女孩和男孩都已命名时),在这种情况下,添加每个名称的所有拆分年份的值.
当前状态:一个名称正在起作用.两个名称会将索引更改为行号,而不是年份号.
Y:'prop'==新生儿中当年所有名字中名字的比例(不分性别).
X: 'year' == 出生证明的年份
中使用的那个?将来最好从您的数据集中包含一个样本,以便其他人可以重现您的工作.
我刚刚将 2006 - 2010 读入 DataFrame,就像这样.
在[75]中:df.head()出[75]:姓名性别年份0艾米莉F 21365 20061 艾玛 F 19092 20062 麦迪逊 F 18599 20063伊莎贝拉F 18200 20064 Ava F 16925 2006
在 prop
中添加,如上定义:
在[26]中:df['prop'] = df.groupby('year')['num'].transform(lambda x: x/x.sum())在 [26] 中:df出[26]:名字性别数年道具0艾米丽(Emily)F 21365 2006 0.0054131 艾玛 F 19092 2006 0.0048372 麦迪逊 F 18599 2006 0.0047133伊莎贝拉F 18200 2006 0.0046114 Ava F 16925 2006 0.0042885 阿比盖尔 F 15615 2006 0.003956
我建议使用另一种方法来获取名称和年份的计数.我认为这将使绘图更容易.不要制作两个数据框,每个名称一个,而是同时进行.
In [48]: df.query('name in ["Joeseph", "Nancy"]')出[48]:名字性别数年道具323 南希 F 1014 2006 0.00025723206乔瑟夫M 34 2006 0.00000934401南希F 896 2007 0.00022557551 乔瑟夫 M 39 2007 0.00001069300南希F 853 2008 0.00021892066 约瑟夫 M 45 2008 0.000011104394 Nancy F 663 2009 0.000174127335乔瑟夫M 34 2009 0.000009139050南希F 565 2010 0.000154161863 乔瑟夫 M 29 2010 0.000008[10行x 5列]
在熊猫.13之前,您可以使用 df [df.name.isin(['Joeseph','Nancy'])]
由于您已经计算出了 prop
,因此我们不需要任何进一步的 groupby
(这比我之前的方法要简单一些):
In [42]: s = df.query('name in ["Joeseph", "Nancy"]').set_index(['year', 'name'])['prop']在[46]中:ax = s.unstack().plot()
使用这种方法,您不必担心对齐 x 值.已经为您完成了.
Goal: Get two different names on the same graph. Make sure that the years line up. Note, not the file has some years twice (when a name has been given to both girl & boy), in that case add the values for all split years per name.
Current status: one name is working. Two names changes the index to the row number instead of the year number.
Y: 'prop' == the proportion of the name (regardless of sex) to all names given that year to newborns.
X: 'year' == the year of the birth certificate
https://raw2.github.com/hadley/data-baby-names/master/baby-names.csv
has the csv
CODE:
import pandas
import pylab
import matplotlib
from pandas import *
from pylab import *
from matplotlib import *
names = read_csv(r'C:\Users\joe\Documents\Python\baby-names2.csv')
import matplotlib as mpl
import matplotlib.pyplot as plt
resultAry = names[names.name.isin(['Joseph', 'Nancy'])].set_index(['year','name'])['prop']
print (resultAry.head())
print ('***************')
resultAry = resultAry.groupby(level='name')
print (resultAry.head())
resultAry = resultAry.plot()
plt.show()
Thanks, everyone.
The graphs do not line up since there are years with girls named 'Joseph' and boys names 'Nancy'.
============UPDATE============== 2/13/2014
In [12]:
import pandas
import pylab
import matplotlib
from pandas import *
from pylab import *
from matplotlib import *
names = read_csv(r'C:\Users\joe\Documents\Python\baby-names2.csv')
print (names.head())
import matplotlib as mpl
import matplotlib.pyplot as plt
userNames = ['Joseph', 'Nancy']
resultAry = names[names.name.isin(userNames)].set_index(['year','name','sex'])['prop']
resultAry = resultAry.groupby(level='name')
print (resultAry.head())
print ('***************')
resultAry = resultAry.groupby(level='year')
print (resultAry)
#resultAry = resultAry.plot()
year name prop sex soundex
0 1880 John 0.081541 boy J500
1 1880 William 0.080511 boy W450
2 1880 James 0.050057 boy J520
3 1880 Charles 0.045167 boy C642
4 1880 George 0.043292 boy G620
name year name sex
Joseph 1880 Joseph boy 0.022229
1881 Joseph boy 0.022679
1882 Joseph boy 0.021879
1883 Joseph boy 0.022367
1884 Joseph boy 0.022062
Nancy 1889 Nancy boy 0.000059
1933 Nancy boy 0.000044
1934 Nancy boy 0.000044
1935 Nancy boy 0.000042
1936 Nancy boy 0.000059
dtype: float64
***************
name
Joseph [(1880, [0.022229, 0.000102]), (1881, [0.02267...
Nancy [(1880, [0.004211]), (1881, [0.004339]), (1882...
dtype: object
Next I got them to add the two values but I am still having a formatting error. arr = list(resultAry['Joseph'])
for i, (year, numbers) in enumerate(arr):
arr[i][1][:] = [ sum(numbers) ]
print (arr)
[(1880, year name sex
1880 Joseph boy 0.022331
girl 0.022331
Name: Joseph, dtype: float64), (1881, year...
Any help advice is greatly appreciated.
I'm guessing you're using the Census baby names dataset? The one used in Wes McKinney's book? In the future it's a good idea to include a sample from your dataset so that others can reproduce your work.
I've just got 2006 - 2010 read into a DataFrame, like this.
In [75]: df.head()
Out[75]:
name sex num year
0 Emily F 21365 2006
1 Emma F 19092 2006
2 Madison F 18599 2006
3 Isabella F 18200 2006
4 Ava F 16925 2006
Added in prop
as defined above:
In [26]: df['prop'] = df.groupby('year')['num'].transform(lambda x: x / x.sum())
In [26]: df
Out[26]:
name sex num year prop
0 Emily F 21365 2006 0.005413
1 Emma F 19092 2006 0.004837
2 Madison F 18599 2006 0.004713
3 Isabella F 18200 2006 0.004611
4 Ava F 16925 2006 0.004288
5 Abigail F 15615 2006 0.003956
I'd suggest a different approach to get the counts by name and year. I think it will make plotting easier. Instead of making two dataframes, one for each name, do it at the same time.
In [48]: df.query('name in ["Joeseph", "Nancy"]')
Out[48]:
name sex num year prop
323 Nancy F 1014 2006 0.000257
23206 Joeseph M 34 2006 0.000009
34401 Nancy F 896 2007 0.000225
57551 Joeseph M 39 2007 0.000010
69300 Nancy F 853 2008 0.000218
92066 Joeseph M 45 2008 0.000011
104394 Nancy F 663 2009 0.000174
127335 Joeseph M 34 2009 0.000009
139050 Nancy F 565 2010 0.000154
161863 Joeseph M 29 2010 0.000008
[10 rows x 5 columns]
Prior to pandas .13 you can use df[df.name.isin(['Joeseph', 'Nancy'])]
Since you already have prop
calculated, we don't need any further groupby
s (this is a bit simpler than what I had before):
In [42]: s = df.query('name in ["Joeseph", "Nancy"]').set_index(['year', 'name'])['prop']
In [46]: ax = s.unstack().plot()
With this method you shouldn't have to worry about aligning the x-values. It's already done for you.
这篇关于使用全等 x 值在 python 中绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!