使用 XmlIo 在 apache-beam 中读取 xml 文件 [英] Reading an xml file in apache beam using XmlIo

查看:27
本文介绍了使用 XmlIo 在 apache-beam 中读取 xml 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题说明:我正在尝试使用直接转轮读取和打印光束中 xml 文件的内容这是代码片段:

 公共类 BookStore{公共静态无效主(字符串参数[]){BookOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(BookOptions .class);管道管道 = Pipeline.create(options);PCollection<Book>output = pipeline.apply(XmlIO.read().from("sample.xml").withRootElement("书").withRecordElement("name").withRecordClass(Book.class));output.apply(ParDo.of(new DoFn(){@ProcessElementpublic void processElement(ProcessContext c){System.out.println("xml 数据"+c.element().getname());}}));管道运行();}}

我的 pojo 课:

<预><代码>@XmlRootElement(name = "book")@XmlType(propOrder = {"name"})公开课书{私人字符串名称;@XmlElement(name = "name")公共字符串getName(){返回名称;}public void setName(字符串名称){this.name = 名称;}@覆盖公共字符串 toString(){return "ClassPojo [name="+name+"]";}}

我的 sample.xml 文件

哈利波特

当我使用直接运行器执行上述代码时,我得到的名称"输出为空

有人可以指导我吗.

有没有我可以参考的例子......?

解决方案

您的 XML 文件与您在管道中定义的 XmlIO 选项不对应 - 您需要有一个包含您的记录(书籍).解决方案之一可能是这样的:

PCollection输出 = 管道.应用(XmlIO.read().from("sample.xml").withRootElement("书籍").withRecordElement("书").withRecordClass(Book.class));

和 XML 文件应如下所示:

哈利波特</书籍>

problem statement: i am trying to read and print contents of an xml file in beam using direct runner here is the code snippet:

 public  class  BookStore{

 public  static  void  main  (string  args[]){

 BookOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(BookOptions .class); 

 Pipeline pipeline = Pipeline.create(options);

 PCollection<Book> output = pipeline.apply(XmlIO.<Book>read().from("sample.xml")
                 .withRootElement("book") 
                 .withRecordElement("name")
                 .withRecordClass(Book.class));  

         output.apply(ParDo.of(new DoFn<Book,String>(){
             @ProcessElement 
             public void processElement(ProcessContext c)
             {
                 System.out.println("xml  data "+c.element().getname());    
             }
          }));
 pipeline.run();
}
}

my pojo class:


@XmlRootElement(name = "book")
@XmlType(propOrder = {"name"})
public class Book{

    private String name;
    @XmlElement(name = "name")
    public String getName ()
    {
    return name;
    }

    public void setName (String name)
    {
    this.name = name;
    }

    @Override
    public String toString()
    {
    return "ClassPojo [name= "+name+"]";
    }

}

my sample.xml file

<?xml version="1.0" encoding="UTF-8"?> 
<book>
   <name>Harrypotter</name>
</book>

when i execute the above code using direct runner i am getting output of "name" as null

can somebody guide me on this.

is there any example i can refer into....?

解决方案

Your XML file doesn't correspond to XmlIO options that you define in your pipeline - you need to have a root element that includes your records (books). One of the solutions could be something like this:

PCollection<Book> output = pipeline.apply(
        XmlIO.<Book>read().from("sample.xml")
            .withRootElement("books")
            .withRecordElement("book")
            .withRecordClass(Book.class));

and XML file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<books>
    <book>
        <name>Harrypotter</name>
    </book>
</books>

这篇关于使用 XmlIo 在 apache-beam 中读取 xml 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆