在Python中使用mca包 [英] Using mca package in Python

查看:303
本文介绍了在Python中使用mca包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 mca软件包 Python中的多重对应分析.

I am trying to use the mca package to do multiple correspondence analysis in Python.

我对如何使用它有些困惑.使用PCA,我希望拟合一些数据(即找到那些数据的主要成分),然后我将能够使用我发现的主要成分进行转换看不见的数据.

I am a bit confused as to how to use it. With PCA I would expect to fit some data (i.e. find principal components for those data) and then later I would be able to use the principal components that I found to transform unseen data.

根据MCA文档,我无法确定最后一步的操作方法.我也不明白任何奇怪地用名字命名的属性和方法(例如.E.L.K.k等)的作用.

Based on the MCA documentation, I cannot work out how to do this last step. I also don't understand what any of the weirdly cryptically named properties and methods do (i.e. .E, .L, .K, .k etc).

到目前为止,如果我的DataFrame的列中包含字符串(假定这是DF中的唯一列),我会做类似的事情

So far if I have a DataFrame with a column containing strings (assume this is the only column in the DF) I would do something like

import mca
ca = mca.MCA(pd.get_dummies(df, drop_first=True))

从我能收集到的东西

ca.fs_r(1)

df

ca.L

应该是特征值(尽管我得到的1 s向量比我的特征数少一个元素?).

is supposed to be the eigenvalues (although I get a vector of 1s that is one element fewer that my number of features?).

现在,如果我还有更多具有相同功能的数据,假设为df_new,并假设我已将其正确转换为虚拟变量,那么如何为新数据找到与ca.fs_r(1)等效的数据

now if I had some more data with the same features, let's say df_new and assuming I've already converted this correctly to dummy variables, how do I find the equivalent of ca.fs_r(1) for the new data

推荐答案

关于这方面,mca软件包的文档不是很清楚.但是,有一些提示表明应使用ca.fs_r_sup(df_new)将新的(看不见的)数据投影到分析中获得的因素上.

The documentation of the mca package is not very clear with that regard. However, there are a few cues which suggest that ca.fs_r_sup(df_new) should be used to project new (unseen) data onto the factors obtained in the analysis.

  1. 软件包作者将新数据称为补充数据,这是以下论文中使用的术语:Abdi,H.和& Valentin,D.(2007年). 多个对应分析. 测量和统计百科全书,651-657.
  2. 该软件包只有两个函数可以接受新数据作为参数DF:fs_r_sup(self, DF, N=None)fs_c_sup(self, DF, N=None).后者是查找列因子得分.
  3. 使用指南基于新数据进行了演示.框架尚未在整个组件分析中使用.
  1. The package author refers to new data as supplementary data which is the terminology used in following paper: Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. Encyclopedia of measurement and statistics, 651-657.
  2. The package has only two functions which accept new data as parameter DF: fs_r_sup(self, DF, N=None) and fs_c_sup(self, DF, N=None). The latter is to find the column factor scores.
  3. The usage guide demonstrates this based on a new data frame which has not been used throughout the component analysis.

这篇关于在Python中使用mca包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆