如何输出第一行作为列限定符名称 [英] how to output first row as column qualifier names

查看:129
本文介绍了如何输出第一行作为列限定符名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够从xml处理两个节点。我得到下面的输出结果:

$ $ $ $ $ $ $ $ bin / hadoop fs -text / user / root / t-output1 / part-r -00000
name:ST17925 currentgrade 1.02
name:ST17926 currentgrade 3.0
name:ST17927 currentgrade 3.0

但我需要输出如下:

  studentid curentgrade 
ST17925 1.02
ST17926 3.00
ST17927 3.00

我该如何做到这一点?



我的完整源代码: https:/ /github.com/studhadoop/xml/blob/master/XmlParser11.java



编辑:解决方案

  protected void setup(Context context)throws IOException,InterruptedException {
context.write(new Text(studentid),new Text 当前的等级));
}


解决方案

与您的MapReduce代码一起执行此操作。原因是


  1. 标题可能不是相同的数据类型

  2. 如果类型是同样,您可以从Reducer代码的setup()方法编写头文件,但不能保证头文件将作为输出中的第一行显示。

至多可以做的是,在第一次遇到列限定符时,在地图代码中创建一个带有标题的单独HDFS /本地文件。您需要使用适当的文件操作API来创建此文件。稍后当作业完成后,您可以在其他程序中使用这些标题或将它们合并为一个文件。


I am able to process two nodes from an xml. And I am getting the output below:

bin/hadoop fs -text /user/root/t-output1/part-r-00000
    name:ST17925 currentgrade 1.02
    name:ST17926 currentgrade 3.0
    name:ST17927 currentgrade 3.0

but I need to have an output like:

studentid curentgrade
ST17925 1.02
ST17926 3.00
ST17927 3.00

How can I achieve this?

My complete source code: https://github.com/studhadoop/xml/blob/master/XmlParser11.java

EDIT: Solution

protected void setup(Context context) throws IOException, InterruptedException {
    context.write(new Text("studentid"), new Text("currentgrade"));            
  }

解决方案

I think it is difficult to do this along with your MapReduce code. The reasons is

  1. The headers may not be of the same data types
  2. If the types are same, you can write headers from the setup() method of Reducer code but there is no guarantee that the headers will appear as the first row in the output.

At best what you can do is, create a separate HDFS/ local file with the headers in your map code on the first encounter of the column qualifiers. You need to use appropriate file operations API for creating this file. Later when the job is complete you can use these headers in other programs or merge them together as a single file.

这篇关于如何输出第一行作为列限定符名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆