用pandas和numpy解析冒号分隔的稀疏数据 [英] Parsing colon separated sparse data with pandas and numpy
问题描述
我想用pandas/numpy中的col_index:value格式解析数据文件.例如:
I would like to parse data file with the format col_index:value in pandas/numpy. For example:
0:23 3:41
1:31 2:65
0:23 3:41
1:31 2:65
对应于此矩阵:
[[23 0 0 41]
[0 31 65 0]]
[[23 0 0 41]
[0 31 65 0]]
这似乎是在文件中表示稀疏数据的一种非常普遍的方法,但是我找不到一种简便的方法来解析此数据,而不必在调用read_csv之后进行某种迭代.
It seems like a pretty common way to represent sparse data in a file, but I can't find an easy way to parse this without having to do some sort of iteration after calling read_csv.
推荐答案
我最近发现这实际上是svm-light格式,您可以使用svm loader这样的数据读取数据集:
I found out recently that this is in fact svm-light format and you may be able to read a dataset like this using an svm loader like:
http://scikit-learn.org/stable/modules/Generated/sklearn.datasets.load_svmlight_file.html
这篇关于用pandas和numpy解析冒号分隔的稀疏数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!