SciPy 中的 Tukey-Test 分组和绘图 [英] Tukey-Test Grouping and plotting in SciPy
问题描述
我正在尝试绘制 Tukey 检验的结果,但我正在努力根据 P 值将数据分组.这是我试图复制的 R 中的等效项.我一直在使用 SciPy 单向方差分析测试和 Tukey 测试统计模型,但无法以相同的方式完成这些组.
I'm trying to plot results from a Tukey test, but I am struggling with putting data into groups based on a P-Value. This is the equivalent in R which I am trying to replicate. I have been using the SciPy one-way ANOVA tests and the Tukey test statsmodel but can't get these groups done in the same way.
非常感谢任何帮助
我也刚刚发现了这个 我想在 python 中做的另一个 R 示例
推荐答案
我一直在努力做同样的事情.我找到了一篇论文,告诉你如何编码这些字母.
Hans-Peter Piepho (2004) An Algorithm for a Letter-Based Representation of All-Pairwise比较,Journal of Computational and Graphical Statistics, 13:2, 456-466, DOI: 10.1198/1061860043515
I have been struggling to do the same thing. I found a paper that tells you how to code the letters.
Hans-Peter Piepho (2004) An Algorithm for a Letter-Based Representation of All-Pairwise Comparisons, Journal of Computational and Graphical Statistics, 13:2, 456-466, DOI: 10.1198/1061860043515
编码有点棘手,因为您需要检查和复制列,然后合并列.我试图在colde中添加一些评论.我想出了一种方法,您可以在其中运行 tukeyhsd,然后根据结果计算字母.应该可以把它变成一个函数.或者希望是 tukeyhsd 的一部分.我的数据没有发布,但它是一列数据,然后是一列描述组.对我来说,这些团体是纽约市的五个行政区.您也可以在第一次更改评论并使用随机数据.
Doing the coding was a little tricky as you need to check and replicate columns and then combine columns. I tried to add some comments to the colde. I figured out a method where you can run tukeyhsd and then from the results compute the letters. It should be possible to turn this into a function. Or hopefully part of tukeyhsd. My data is not posted but it is a column of data and then a column describing the groups. The groups for me are the five boroughs of NYC. You can also just change the comments and use random data the first time.
# Read data. Comment out the next ones to use random data.
df=pd.read_excel('anova_test.xlsx')
#n=1000
#df = pd.DataFrame(columns=['Groups','Data'],index=np.arange(n))
#df['Groups']=np.random.randint(1, 4,size=n)
#df['Data']=df['Groups']*np.random.random_sample(size=n)
# define columns for data and then grouping
col_to_group='Groups'
col_for_data='Data'
#Now take teh data and regroup for anova
samples = [cols[1] for cols in df.groupby(col_to_group)[col_for_data]] #I am not sure how this works but it makes an numpy array for each group
f_val, p_val = stats.f_oneway(*samples) # I am not sure what this star does but this passes all the numpy arrays correctly
#print('F value: {:.3f}, p value: {:.3f}\n'.format(f_val, p_val))
# this if statement can be uncommmented if you don't won't to go furhter with out p<0.05
#if p_val<0.05: #If the p value is less than 0.05 it then does the tukey
mod = MultiComparison(df[col_for_data], df[col_to_group])
thsd=mod.tukeyhsd()
#print(mod.tukeyhsd())
#this is a function to do Piepho method. AN Alogrithm for a letter based representation of al-pairwise comparisons.
tot=len(thsd.groupsunique)
#make an empty dataframe that is a square matrix of size of the groups. #set first column to 1
df_ltr=pd.DataFrame(np.nan, index=np.arange(tot),columns=np.arange(tot))
df_ltr.iloc[:,0]=1
count=0
df_nms = pd.DataFrame('', index=np.arange(tot), columns=['names']) # I make a dummy dataframe to put axis labels into. sd stands for signifcant difference
for i in np.arange(tot): #I loop through and make all pairwise comparisons.
for j in np.arange(i+1,tot):
#print('i=',i,'j=',j,thsd.reject[count])
if thsd.reject[count]==True:
for cn in np.arange(tot):
if df_ltr.iloc[i,cn]==1 and df_ltr.iloc[j,cn]==1: #If the column contains both i and j shift and duplicat
df_ltr=pd.concat([df_ltr.iloc[:,:cn+1],df_ltr.iloc[:,cn+1:].T.shift().T],axis=1)
df_ltr.iloc[:,cn+1]=df_ltr.iloc[:,cn]
df_ltr.iloc[i,cn]=0
df_ltr.iloc[j,cn+1]=0
#Now we need to check all columns for abosortpion.
for cleft in np.arange(len(df_ltr.columns)-1):
for cright in np.arange(cleft+1,len(df_ltr.columns)):
if (df_ltr.iloc[:,cleft].isna()).all()==False and (df_ltr.iloc[:,cright].isna()).all()==False:
if (df_ltr.iloc[:,cleft]>=df_ltr.iloc[:,cright]).all()==True:
df_ltr.iloc[:,cright]=0
df_ltr=pd.concat([df_ltr.iloc[:,:cright],df_ltr.iloc[:,cright:].T.shift(-1).T],axis=1)
if (df_ltr.iloc[:,cleft]<=df_ltr.iloc[:,cright]).all()==True:
df_ltr.iloc[:,cleft]=0
df_ltr=pd.concat([df_ltr.iloc[:,:cleft],df_ltr.iloc[:,cleft:].T.shift(-1).T],axis=1)
count+=1
#I sort so that the first column becomes A
df_ltr=df_ltr.sort_values(by=list(df_ltr.columns),axis=1,ascending=False)
# I assign letters to each column
for cn in np.arange(len(df_ltr.columns)):
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(1,chr(97+cn))
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(0,'')
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(np.nan,'')
#I put all the letters into one string
df_ltr=df_ltr.astype(str)
df_ltr.sum(axis=1)
#print(df_ltr)
#print('\n')
#print(df_ltr.sum(axis=1))
#Now to plot like R with a violing plot
fig,ax=plt.subplots()
df.boxplot(column=col_for_data, by=col_to_group,ax=ax,fontsize=16,showmeans=True
,boxprops=dict(linewidth=2.0),whiskerprops=dict(linewidth=2.0)) #This makes the boxplot
ax.set_ylim([-10,20])
grps=pd.unique(df[col_to_group].values) #Finds the group names
grps.sort() # This is critical! Puts the groups in alphabeical order to make it match the plotting
props=dict(facecolor='white',alpha=1)
for i,grp in enumerate(grps): #I loop through the groups to make the scatters and figure out the axis labels.
x = np.random.normal(i+1, 0.15, size=len(df[df[col_to_group]==grp][col_for_data]))
ax.scatter(x,df[df[col_to_group]==grp][col_for_data],alpha=0.5,s=2)
name="{}\navg={:0.2f}\n(n={})".format(grp
,df[df[col_to_group]==grp][col_for_data].mean()
,df[df[col_to_group]==grp][col_for_data].count())
df_nms['names'][i]=name
ax.text(i+1,ax.get_ylim()[1]*1.1,df_ltr.sum(axis=1)[i],fontsize=10,verticalalignment='top',horizontalalignment='center',bbox=props)
ax.set_xticklabels(df_nms['names'],rotation=0,fontsize=10)
ax.set_title('')
fig.suptitle('')
fig.savefig('anovatest.jpg',dpi=600,bbox_inches='tight')
这篇关于SciPy 中的 Tukey-Test 分组和绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!