rpy2 处理从 R 到 Python 的数据帧中的 NA/缺失值的问题 [英] Issue with rpy2 handling NA/missing value in dataframe from R to Python
问题描述
我在使用 rpy2 包将保存在 R 中的 dataframe
转换为 Python 时遇到问题.
I've encounter a problem when using rpy2 package to transform dataframe
saved in R to Python.
import os
os.environ['R_HOME'] = '/Library/Frameworks/R.framework/Resources'
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
# define a trivial dataframe in R
ro.r('n = c(1,2)')
ro.r("b = c(NA,'def')")
ro.r("temp_df = data.frame(n,b)")
# the dataframe in R shows missing value in one cell as NA
temp_rdf = ro.r('temp_df')
print(temp_rdf)
n b
1 1 <NA>
2 2 def
# yet the transformed Python dataframe replace the missing value with a string
temp_pydf = pandas2ri.ri2py(temp_rdf)
print(temp_pydf)
n b
1 1.0 def
2 2.0 def
我进行了一些搜索并找到了这篇文章 Rpy2 pandas2ri.ri2py() 正在将 NA 值转换为整数.它解释了原因,但没有提供解决方案.我想在 Python 中为 R 数据帧中的那些 NA 提供 Null 值.我怎么能这样做?
I did some search and found this post Rpy2 pandas2ri.ri2py() is converting NA values to integers. It explains why but doesn't provide a solution to this. I want to have Null values in Python for those NA in R dataframe. How could I do this?
推荐答案
更新:http://rpy.sourceforge.net/rpy2/doc-2.2/html/rinterface.html
以上链接可能对某些设置有帮助.如果您找到NA"(包括空格)并转到第二个命中.有一个看起来与您的 NA 问题有关.
Above link may have useful help on some settings. If you find "NA " (include the space" and go to the second hit. There is one that looks like it relates to your NA problem.
原帖:假设您的输出中显示的def"作为字符串输入,您可以将其替换为您确信不是数据中的值的字符串,然后使用它代替未输入的 NA 值:
Original post: assuming "def" as shown in your output is coming in as a string, you could replace it with a string that you are confident is not a value in your data and then use this in lieu of the NA value that is not coming in:
这个示例代码说明了这个概念.
This sample code illustrates the concept.
x = "def"
type(x)
x = x.replace("def", "NA")
x
看看你的源有两行都说'def'的问题,一是来自数据,另一行是NA转换为def:
Looking at the problem that your source has two rows that both say 'def' one where it came from the data and another where NA converted to def:
- 将 'def' 转换为 R 中的其他内容
- 引入您的数据
- 现在def"表示不适用
- 就这样使用它或将其转换为您可以忍受的东西
这是您经常遇到的问题吗?
Is this a problem you encounter often?
如果是,请创建一个测试函数来检查您的数据是否为def"
if so, create a test function to check your data for 'def'
如果发现用一些疯狂的东西替换你知道数据不会有像:my_crazy_replacementValue
if found replace with something crazy you know the data will not have like: my_crazy_replacementValue
用您想要的 NA 替代品替换def"
replace "def" with your desired stand-in for NA
用def"替换 my_crazy_replacementValue
replace my_crazy_replacementValue with "def"
在 Python 中,NA 最常见的值,我认为是 None.不幸的是,您不能使用 None 替换值:
In Python, the most common value for NA, I think is None. Unfortuantely, you cannot replace a value with None using:
string.replace()
应该有一个更好的答案似乎是合理的:一种将数据框中的指定值转换为无的Pythonic"方式.当我有机会时,我必须查看 Pandas -> 数据框,然后我可能会重新登录并编辑这一段(或者也许其他人会打败我).希望以上内容可以在此期间对您有所帮助.
It seems reasonable that there should be a better answer: a "Pythonic" way of converting a specified value in a data frame to None. I have to review Pandas -> data frames when I get a chance and then I may log back in and edit this paragraph (or maybe someone else will beat me to it). Hoping the above might help you in the interim.
这篇关于rpy2 处理从 R 到 Python 的数据帧中的 NA/缺失值的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!