星火从行提取值 [英] Spark extracting values from a Row

查看：141 发布时间：2016/5/22 15:17:49 scala apache-spark apache-spark-sql

本文介绍了星火从行提取值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框中

  VAL transactions_with_counts = sqlContext.sql（
  选择USER_ID AS USER_ID，AS CATEGORY_ID CATEGORY_ID，
  COUNT（CATEGORY_ID）FROM交易GROUP BY user_ID的，CATEGORY_ID）

我想要的行评对象转换，但由于X（0）返回一个数组失败

  VAL收视率= transactions_with_counts
  .MAP（X =＆GT;评分（X（0）.toInt中，x（1）.toInt中，x（2）.toInt））

错误：值toInt是不是任何一个成员

解决方案

让我们用一些虚拟数据启动：
  VAL交易= sc.parallelize（SEQ（
  （1,2），（1,4），（2,3）））。toDF（USER_ID，CATEGORY_ID）VAL transactions_with_counts =交易
  .groupBy（$user_ID的$CATEGORY_ID）
  。计数transactions_with_counts.printSchema// 根
// |  -  USER_ID：整数（可为空= FALSE）
// |  -  CATEGORY_ID：整数（可为空= FALSE）
// |  - 计数：长（可为空= FALSE）
 
有访问的几种方法行的价值观和保持预期的类型：
模式匹配
 进口org.apache.spark.sql.Rowtransactions_with_counts.map {
  行的情况下（USER_ID：智力，CATEGORY_ID：智力，等级：长）=＆GT;
    评级（USER_ID，CATEGORY_ID，等级）
}
 
键入的获得* 方法，如调用getInt ， getLong之：
  transactions_with_counts.map（
  R =＆GT;等级（r.getInt（0），r.getInt（1），r.getLong（2））
）
 
getAs 方法，它可以同时使用名称和指标：
  transactions_with_counts.map（R = GT;评级（
  r.getAs [INT]（USER_ID），r.getAs [INT]（CATEGORY_ID），r.getAs [龙]（2）
））
 
它可以被用于正确提取用户定义类型，包括 mllib.linalg.Vector 。显然，通过名字来访问需要的模式。
I have the following dataframe
val transactions_with_counts = sqlContext.sql(
  """SELECT user_id AS user_id, category_id AS category_id,
  COUNT(category_id) FROM transactions GROUP BY user_id, category_id""")
I'm trying to convert the rows to Rating objects but since x(0) returns an array this fails
val ratings = transactions_with_counts
  .map(x => Rating(x(0).toInt, x(1).toInt, x(2).toInt))
error: value toInt is not a member of Any

解决方案
Lets start with some dummy data:
val transactions = sc.parallelize(Seq(
  (1, 2), (1, 4), (2, 3))).toDF("user_id", "category_id")

val transactions_with_counts = transactions
  .groupBy($"user_id", $"category_id")
  .count

transactions_with_counts.printSchema

// root
// |-- user_id: integer (nullable = false)
// |-- category_id: integer (nullable = false)
// |-- count: long (nullable = false)
There are a few ways to access Row values and keep expected types:
Pattern matching
import org.apache.spark.sql.Row

transactions_with_counts.map{
  case Row(user_id: Int, category_id: Int, rating: Long) =>
    Rating(user_id, category_id, rating)
} 
Typed get* methods like getInt, getLong:
transactions_with_counts.map(
  r => Rating(r.getInt(0), r.getInt(1), r.getLong(2))
)
getAs method which can use both names and indices:
transactions_with_counts.map(r => Rating(
  r.getAs[Int]("user_id"), r.getAs[Int]("category_id"), r.getAs[Long](2)
))
It can be used to properly extract user defined types, including mllib.linalg.Vector. Obviously accessing by name requires a schema.
这篇关于星火从行提取值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

星火从行提取值 [英] Spark extracting values from a Row

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火从行提取值 [英] Spark extracting values from a Row

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭