通过rpy将SPSS文件(.sav)导入 pandas 时,如何保留标签? [英] How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?
问题描述
我希望使用pandas
处理SPSS文件(.sav).在没有SPSS程序的情况下,这是将典型文件转换为.csv后的样子:
I'm looking to work on a SPSS files (.sav) using pandas
. In the absence of the SPSS program, here's what a typical file looks like when converted to .csv:
在调查前两行的含义(我不知道SPSS)时,似乎第一行包含Label
,而第二行包含VarName
.
On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Label
s, while the second row contains the VarName
s.
当我将文件带入大熊猫时:
When I bring the file into pandas thus:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
w = com.convert_robj(w)
return w
然后执行head(),第一行(标签)丢失:
and then do a head(), the first row (Label) is missing:
如何维护标签?
- 引用:是否有Python模块可以打开SPSS文件?
- Python:2.7.10
- 熊猫:0.17.1
推荐答案
sav
文件中的标签存储在read.spss
函数.
Labels in a sav
file are stored in variable.labels
attribute of the returning object from the read.spss
function.
您可以使用以下内容获取变量标签:
You can get the variable labels with the following:
import pandas.rpy.common as com
def get_labels(filename):
w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
w = com.convert_robj(w)
return w
如果要将标签设置为数据框的列名:
If you want to set the labels as the column names of your dataframe:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
cols = list(com.robj.r("attr")(w, "variable.labels"))
w = com.convert_robj(w)
w.columns = cols
return w
这篇关于通过rpy将SPSS文件(.sav)导入 pandas 时,如何保留标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!