保留LDA pca的CSV功能标签 [英] Keep csv feature labels for LDA pca

查看:82
本文介绍了保留LDA pca的CSV功能标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在 https:/上使用2000年主题的前20个频率数据/github.com/wwbp/facebook_topics/tree/master/csv

我想对数据执行randomizedPCA.根据文档,X必须是类似数组的形状(n_samples,n_features).

I would like to perform randomizedPCA on the data. From the documentation, X needs to be array-like, shape (n_samples, n_features) .

我已经用LDA_topics = pd.read_csv(r'2000topics.top20freqs.keys.csv', header=None, index_col=0, error_bad_lines=False)

但这不是以下行的正确格式:

however this is not the right format for the following line:

pca2 = sklearn.decomposition.RandomizedPCA(n_components=45)
pca2.fit(LDA_topics)

导致ValueError:无法将字符串转换为float:'sonic'

resulting in a ValueError: could not convert string to float: 'sonic'

有没有一种方法可以执行PCA并保留功能标签,而不仅仅是之后保留频率?

Is there a way to perform PCA and retain the feature labels and not just frequencies afterwards?

推荐答案

PCA不会丢弃或保留要素,但是组件结果也不会映射到要素. (给出xyzn_components=2参数,得到的两个分量将不能完美地映射到xyz中的任何一个.)如果要保留特征名称作为降维的一部分, ,您可能想探索其他方法( sklearn对此有完整的部分).

PCA doesn't discard or retain features, but the component results don't map to features either. (Given x, y, z and an n_components=2 param, the resulting two components won't map to any of xyz perfectly.) If you want to retain the feature names as part of dimensionality reduction, you might want to explore other approaches (sklearn has a whole section for this).

Chuck Ivan是正确的,在执行PCA之前需要先使用编码器或矢量化器.我喜欢他的OrdinalEncoder建议,但您也可以考虑以下列表中的sklearn文本实用程序: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_extraction.text

Chuck Ivan is correct that an encoder or vectorizer is called for before you can do PCA. I like his OrdinalEncoder suggestion, but you may also consider the sklearn text utilities on this list: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_extraction.text

这篇关于保留LDA pca的CSV功能标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆