sparklyr spark_read_parquet将字符串字段作为列表读取 [英] sparklyr spark_read_parquet Reading String Fields as Lists

查看：666 发布时间：2018/6/12 14:14:24 r hive spark-dataframe parquet sparklyr

本文介绍了sparklyr spark_read_parquet将字符串字段作为列表读取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有许多 Hive 文件，其格式为 parquet ，其中包含字符串和 double 列。我可以使用下面的语法将它们中的大部分读入一个Spark数据框中： sparklyr

I have a number of Hive files in parquet format that contain both string and double columns. I can read most of them into a Spark Data Frame with sparklyr using the syntax below:

spark_read_parquet(sc, name = "name", path = "path", memory = FALSE)

然而，我读了一个文件，其中所有字符串值都被转换为无法识别的列表，这些列表在收集到R Data Frame中时看起来像这样。打印：

However, I have one file that I read in where all of the string values get converted to unrecognizable lists that looks like this when collected into an R Data Frame and printed:

s_df <- spark_read_parquet(sc, name = "s_df", path = "hdfs://nameservice1/user/hive/warehouse/s_df", memory = FALSE) df <- collect(s_df) head(df) # A tibble: 11,081 x 13 provid hospital_name servcode servcode_desc codegroup claimid amountpaid <list> <list> <list> <list> <list> <list> <dbl> 1 <raw [8… <raw [32]> <raw [5]> <raw [25]> <raw [29… <raw [1… 7.41 2 <raw [8… <raw [32]> <raw [5]> <raw [15]> <raw [22… <raw [1… 4.93 3 <raw [8… <raw [32]> <raw [5]> <raw [28]> <raw [22… <raw [1… 5.36 4 <raw [8… <raw [32]> <raw [5]> <raw [28]> <raw [30… <raw [1… 5.46 5 <raw [8… <raw [32]> <raw [5]> <raw [16]> <raw [30… <raw [1… 2.80

hospital_name df 的前5行应为 <南美加州方法医院，但<而不是像这样：
The hospital_name for the top 5 rows of df should read METHODIST HOSPITAL OF SOUTHERN CALIFORNIA, but are coming out like this instead: head(df$hospital_name) [[1]] [1] 48 45 4e 52 59 20 4d 41 59 4f 20 4e 45 57 48 41 4c 4c 20 4d 45 4d 4f 52 49 [26] 41 4c 20 48 4f 53 50 [[2]] [1] 48 45 4e 52 59 20 4d 41 59 4f 20 4e 45 57 48 41 4c 4c 20 4d 45 4d 4f 52 49 [26] 41 4c 20 48 4f 53 50 [[3]] [1] 48 45 4e 52 59 20 4d 41 59 4f 20 4e 45 57 48 41 4c 4c 20 4d 45 4d 4f 52 49 [26] 41 4c 20 48 4f 53 50 [[4]] [1] 48 45 4e 52 59 20 4d 41 59 4f 20 4e 45 57 48 41 4c 4c 20 4d 45 4d 4f 52 49 [26] 41 4c 20 48 4f 53 50 [[5]] [1] 48 45 4e 52 59 20 4d 41 59 4f 20 4e 45 57 48 41 4c 4c 20 4d 45 4d 4f 52 49 [26] 41 4c 20 48 4f 53 50 尝试下面的解决方案，但它不起作用： I tried the below solution, but it didn't work: head(df %>% mutate(hospital_name = as.character(hospital_name))) [1] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))" [2] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))" [3] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))" [4] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))" [5] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))" 我很感激能够解决问题的任何帮助，或者有任何建议使我的请求更清楚。感谢。 I appreciate any help in being able to resolve the issue OR with any suggestions to make my request more clear. Thanks. 推荐答案使用 dput（head（df））并在此粘贴结果。尝试以下操作： A reprex would have been nice (just for df) e.g. using dput(head(df)) and pasting the result here. Try the following: df %>% mutate(hospital_name = unlist(lapply(hospital_name, function(e) rawToChar(e)))) 这篇关于sparklyr spark_read_parquet将字符串字段作为列表读取的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

sparklyr spark_read_parquet将字符串字段作为列表读取 [英] sparklyr spark_read_parquet Reading String Fields as Lists

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

sparklyr spark_read_parquet将字符串字段作为列表读取 [英] sparklyr spark_read_parquet Reading String Fields as Lists

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭