如何使用spark_apply更改NaN值? [英] How to use spark_apply to change NaN values?

查看:109
本文介绍了如何使用spark_apply更改NaN值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用sdf_pivot之后,我剩下了大量的NaN值,因此为了进行分析,我需要将NaN替换为0,我尝试使用此方法:

After using sdf_pivot I was left with a huge number of NaN values, so in order to proceed with my analysis I need to replace the NaN with 0, I have tried using this:

data <- data %>% 
  spark_apply(function(e) ifelse(is.nan(e),0,e))

这会引发以下错误:

Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  cannot open file 
'C:\.........\file18dc5a1c212e_spark.log':Permission denied

我正在使用Spark 2.2.0和最新版本的sparklyr

I'm using Spark 2.2.0 and the latest version of sparklyr

有人对如何解决此问题有想法吗? 谢谢

Does anyone have an idea on how to fix this issue? Thanks

推荐答案

您在这里似乎有两个不同的问题.

You seem to have two different problems here.

  • 权限问题.确保您具有必需的权限,并在必要时正确使用winutils.
  • NULL替换.
  • Permissions issues. Make sure that you have required permissions and correctly use winutils if necessary.
  • NULL replacement.

后一种可以使用内置函数来解决,并且不需要效率低下的spark_apply:

The latter one can solved using built-in functions and there is no need for inefficient spark_apply:

df <- copy_to(sc, 
  data.frame(id=c(1, 1, 2, 3), key=c("a", "b", "a", "d"), value=1:4))

pivoted <- sdf_pivot(df, id ~ key)
pivoted

# Source:   table<sparklyr_tmp_f0550e429aa> [?? x 4]
# Database: spark_connection
     id     a     b     d
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1   NaN
2     3   NaN   NaN     1
3     2     1   NaN   NaN

pivoted %>% na.replace(0)

# Source:   table<sparklyr_tmp_f0577e16bf1> [?? x 4]
# Database: spark_connection
     id     a     b     d
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1     0
2     3     0     0     1
3     2     1     0     0

经过sparklyr 0.7.0-9105测试.

Tested with sparklyr 0.7.0-9105.

这篇关于如何使用spark_apply更改NaN值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆