可视化来自hadoop的xml数据 [英] Visualization of xml data from hadoop
问题描述
在我的HDFS中,我将执行xml处理。即处理一个xml文件并提取2个节点。这将是我的x和y来绘制一个图。
我怎么能这样做。从hdfs输出生成图。我想使用Rapid miner.how我可以做这个任何想法...
或其他
是有一种方法可以显示我的hadoop数据HDFS的工作方式是将文件分割成预定义大小的块。它就像做一个
split -b 64M file.xml
然后将每个块取出并保存到一个salve datanode中。现在,如果您的HDFS的块大小为64MB,文件大小为1 GB,则您的文件将被拆分为16个块并保存在不同的位置。所以mapreduce作业将无法从xml文件块中理解,因为xml的结构与简单的csv或tsv文件不同。所以就我所见,如果hdfs的块大小超过hdfs,你就无法通过hdfs处理xml文件。
In my HDFS i will be doing xml processing . ie processing an xml file and extracting 2 nodes.And this will be my x and y to plot a graph.
How can I do this .Generating graph from hdfs output.I want to use Rapid miner.how can i do this any idea...
OR ELSE
Is there a way to visualize my hadoop data
The way HDFS works is by splitting the file into blocks of predefined size. It just like doing a
split -b 64M file.xml
And takes each block and saves it to a salve datanode. Now if you HDFS has a block size of 64MB and the file size is 1 GB your file will be split into 16 blocks and saved in different location. So a mapreduce job will not be able to make sense out of a xml file block since xml is structured unlike a simple csv or tsv files. So as far as i can see you cannot process a xml file over hdfs if its greater then the hdfs block size.
这篇关于可视化来自hadoop的xml数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!