rpy2 处理从 R 到 Python 的数据帧中的 NA/缺失值的问题 [英] Issue with rpy2 handling NA/missing value in dataframe from R to Python

查看:27
本文介绍了rpy2 处理从 R 到 Python 的数据帧中的 NA/缺失值的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 rpy2 包将保存在 R 中的 dataframe 转换为 Python 时遇到问题.

I've encounter a problem when using rpy2 package to transform dataframe saved in R to Python.

import os
os.environ['R_HOME'] = '/Library/Frameworks/R.framework/Resources'

import rpy2.robjects as ro
from rpy2.robjects import pandas2ri

# define a trivial dataframe in R
ro.r('n = c(1,2)')
ro.r("b = c(NA,'def')")
ro.r("temp_df = data.frame(n,b)")

# the dataframe in R shows missing value in one cell as NA
temp_rdf = ro.r('temp_df')
print(temp_rdf)

  n    b
1 1 <NA>
2 2  def

# yet the transformed Python dataframe replace the missing value with a string
temp_pydf = pandas2ri.ri2py(temp_rdf)
print(temp_pydf)

     n    b
1  1.0  def
2  2.0  def

我进行了一些搜索并找到了这篇文章 Rpy2 pandas2ri.ri2py() 正在将 NA 值转换为整数.它解释了原因,但没有提供解决方案.我想在 Python 中为 R 数据帧中的那些 NA 提供 Null 值.我怎么能这样做?

I did some search and found this post Rpy2 pandas2ri.ri2py() is converting NA values to integers. It explains why but doesn't provide a solution to this. I want to have Null values in Python for those NA in R dataframe. How could I do this?

推荐答案

更新:http://rpy.sourceforge.net/rpy2/doc-2.2/html/rinterface.html

以上链接可能对某些设置有帮助.如果您找到NA"(包括空格)并转到第二个命中.有一个看起来与您的 NA 问题有关.

Above link may have useful help on some settings. If you find "NA " (include the space" and go to the second hit. There is one that looks like it relates to your NA problem.

原帖:假设您的输出中显示的def"作为字符串输入,您可以将其替换为您确信不是数据中的值的字符串,然后使用它代替未输入的 NA 值:

Original post: assuming "def" as shown in your output is coming in as a string, you could replace it with a string that you are confident is not a value in your data and then use this in lieu of the NA value that is not coming in:

这个示例代码说明了这个概念.

This sample code illustrates the concept.

x = "def"
type(x)
x = x.replace("def", "NA")
x

看看你的源有两行都说'def'的问题,一是来自数据,另一行是NA转换为def:

Looking at the problem that your source has two rows that both say 'def' one where it came from the data and another where NA converted to def:

  1. 将 'def' 转换为 R 中的其他内容
  2. 引入您的数据
  3. 现在def"表示不适用
  4. 就这样使用它或将其转换为您可以忍受的东西

这是您经常遇到的问题吗?

Is this a problem you encounter often?

  1. 如果是,请创建一个测试函数来检查您的数据是否为def"

  1. if so, create a test function to check your data for 'def'

如果发现用一些疯狂的东西替换你知道数据不会有像:my_crazy_replacementValue

if found replace with something crazy you know the data will not have like: my_crazy_replacementValue

用您想要的 NA 替代品替换def"

replace "def" with your desired stand-in for NA

用def"替换 my_crazy_replacementValue

replace my_crazy_replacementValue with "def"

在 Python 中,NA 最常见的值,我认为是 None.不幸的是,您不能使用 None 替换值:

In Python, the most common value for NA, I think is None. Unfortuantely, you cannot replace a value with None using:

string.replace()

应该有一个更好的答案似乎是合理的:一种将数据框中的指定值转换为无的Pythonic"方式.当我有机会时,我必须查看 Pandas -> 数据框,然后我可能会重新登录并编辑这一段(或者也许其他人会打败我).希望以上内容可以在此期间对您有所帮助.

It seems reasonable that there should be a better answer: a "Pythonic" way of converting a specified value in a data frame to None. I have to review Pandas -> data frames when I get a chance and then I may log back in and edit this paragraph (or maybe someone else will beat me to it). Hoping the above might help you in the interim.

这篇关于rpy2 处理从 R 到 Python 的数据帧中的 NA/缺失值的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆