转换矩阵格式,烫伤 [英] Transforming matrix format, scalding

查看:150
本文介绍了转换矩阵格式,烫伤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以在烫伤时,我们可以使用矩阵api轻松处理矩阵,并且可以 - 以这种方式:

Ok, so, in scalding we can easily work with matrix, using matrix api, and it is ok - in a such way:

val matrix = Tsv(path, ('row, 'col, 'val))
  .read
  .toMatrix[Long,Long,Double]('row, 'col, 'val)

但是我如何将矩阵转换为格式,就像我们通常写的那样?是否有一些优雅的方式?

But how can I transform matrix to that format from format, like we usually write? Are there some elegant ways?

1 2 3
3 4 5
5 6 7

1 1 1
1 2 2
1 3 3
2 1 3
2 2 4
2 3 5
3 1 5
3 2 6
3 3 7

我需要这个在巨大矩阵上进行操作,我不知道行数和列数(例如,如果是文件?NxM,可以给出大小)。

I need this to make operations on matrix with huge sizes, and I don't know the number of rows and columns (it is possible to give sizes if file? NxM for example).

我试图用 TextLine(args(input))但我不知道如何计算行号。我想在hadoop上转换矩阵,mb还有其他方法如何处理格式?是否有可能与烫?

I tried to make smth with TextLine( args("input") ) but i dunno how to count line number. I want to convert matrix on hadoop, mb there r other ways how to deal with format? Is it possible with scalding?

推荐答案

下面的答案是不是我的,但OP的答案,这是摆在问题

以下是我所做的,输出我想要的内容:

Here's what I've done, which outputs what I wanted:

var prev: Long = 0
var pos: Long = 1

val zeroInt = 0
val zeroDouble = 0.0

TextLine( args("a") )
    .flatMap('line -> 'number)  { line : String => line.split("\\s+") }
    .mapTo(('offset, 'line, 'number) -> ('row, 'val)) { 
      (offset: Long, line: String, number: String) => 
        pos = if(prev == (offset + 1)) pos + 1 else 1
        prev = offset + 1
        (offset + 1, number) }
    .filter('row, 'col, 'v) { 
      (row: Long, col: String, v: String) => 
        val (row, col, v) = line
        (v != zeroInt.toString) && (v != zeroDouble.toString) }
    .write(Tsv(args("c")))

这篇关于转换矩阵格式,烫伤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆