Spark GraphX:添加多个边缘权重 [英] Spark GraphX: add multiple edge weights

查看:254
本文介绍了Spark GraphX:添加多个边缘权重的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是GraphX的新手,有一个Spark数据框,其中包含四列,如下所示:

I am new to GraphX and have a Spark dataframe with four columns like below:

src_ip    dst_ip    flow_count   sum_bytes
8.8.8.8   1.2.3.4          435        1137
  ...       ...           ...         ...

基本上我想将src_ipdst_ip都映射到顶点,并将flow_countsum_bytes分配为edges属性.据我所知,我们不能在GraphX中添加edge属性,因为只允许顶点属性.因此,我正在考虑添加flow_count作为边缘权重:

Basically I want to map both src_ip and dst_ip to vertices and assign flow_count and sum_bytes as edges attribute. As far as I know, we can not add edges attributes in GraphX as only vertex attributes are permitted. Hence, I am thinking about adding flow_count as edge weight:

//create edges
val trafficEdges = trafficsFromTo.map(x =Edge(MurmurHash3.stringHash(x(0).toString,MurmurHash3.stringHash(x(1).toString,x(2))

但是,我也可以添加sum_bytes作为边缘权重吗?

However, can I add sum_bytes as edge weight as well?

推荐答案

可以将两个变量都添加到边缘.最简单的解决方案是使用元组,例如:

It is possible to add both variables to the edge. The simplest solution would be to use a tuple, for example:

val data = Array(Edge(3L, 7L, (123, 456)), Edge(5L, 3L, (41, 34)))
val edges: RDD[Edge[(Int, Int)]] = spark.sparkContext.parallelize(data)

或者,您可以使用案例类:

Alternatively, you can make use of a case class:

case class EdgeWeight(flow_count: Int, sum_bytes: Int)

val data2 = Array(Edge(3L, 7L, EdgeWeight(123, 456)), Edge(5L, 3L, EdgeWeight(41, 34)))
val edges: RDD[Edge[EdgeWeight]] = spark.sparkContext.parallelize(data2)

如果要添加更多属性,使用案例类将更易于使用和维护.

Using a case class would be more convenient to use and maintain if there are more attributes to be added.

我相信在这种特定情况下,可以通过以下方式最优雅地解决它:

I believe that in this specific case, it is most elegantly solved by:

val trafficEdges = trafficsFromTo.map{x => 
  Edge(MurmurHash3.stringHash(x(0).toString, 
       MurmurHash3.stringHash(x(1).toString,
       EdgeWeight(x(2), x(3))
}

trafficEdges.sortBy(edge => edge.attr.flow_count) // sort by flow_count

这篇关于Spark GraphX:添加多个边缘权重的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆