Proc Groovy将较大的XML解析为SAS [英] Proc Groovy to parse larger XML into SAS

查看：65 发布时间：2021/5/13 19:42:06 groovy sas

本文介绍了Proc Groovy将较大的XML解析为SAS的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们尝试使用SAS XML映射器读取3-4 GB的XML文件.但是，当我们将数据从XML引擎复制到SAS数据集时，大约需要5到6分钟，这对于我们来说是太多时间，因为我们不得不每天处理3000个文件.我们并行运行近10个文件.一个表几乎有230列.

We tried reading 3-4 GB of XML file using SAS XML mapper .but when we PROC COPY the data from the XML engine to SAS Dataset its taking almost 5 to 6 mins which is too much time for us since we have to process 3000 files a day .We are running almost 10 files in parallel.One table almost have 230 columns.

还有其他更快的方法来处理XML吗?我们可以使用PROC GROOVY吗?会有效吗?如果可以，可以给我提供示例代码吗?我尝试在线搜索，但无法获得搜索结果.

Is there any other faster way to process the XML ? can we use PROC GROOVY ? will it be efficient? if yes can any one provide me a sample code? i tried searching online but not able to get one.

XML具有PII数据，其容量为3 GB.

The XML has PII data and its huge of 3 GB .

正在运行的代码非常简单明了:

The Code being run is very simple and straight forward:

filename NHL "/path/ODM.xml";
filename map "/path/odm_map.map";
libname NHL xmlv2 xmlmap=map;
proc copy in=nhl out=work;
run;

创建的表总数:54，其中14个以上的表具有〜18000条记录，其余的表具有〜1000条记录

Total Table created : 54 in which more than 14 tables have ~18000 records and remaining tables have ~1000 records

显示日志"窗口

NOTE: PROCEDURE COPY used (Total process time): 
      real time           4:03.72 
      user cpu time       4:00.68 
      system cpu time        1.17 seconds 
      memory              32842.37k 
      OS Memory           52888.00k 
      Timestamp           19/05/2020 03:14:43 PM 
      Step Count          4 Switch Count 802
      Page Faults 3 
      Page Reclaims 17172 
      Page Swaps 0 
      Voluntary Context Switches 3662 
      Involuntary Context Switches 27536 
      Block Input Operations 504 
      Block Output Operations 56512 

      SAS Version : 9.4_M2

我们的服务器中的总内存大小为 MEMSIZE = 3221225472

total memsize is MEMSIZE=3221225472 in our server

总共3000个文件，其中1000个为3到4 GB，其中一些为1 GB，1000个文件以KB为单位.较小的文件很快得到处理，问题仅在于大文件.它几乎使用整个CPU.

3000 files total out of which 1000 will be 3 to 4 GB and some of which will be 1 GB and 1000 files will be in KB .The smaller files are getting processed quickly the problem is only with big files .it uses almost the entire CPU.

当我们减少文件数量时，来自XML引擎的复制时间会有所不同，但是要实现此目的，我们必须更改映射文件或输入xml.

The copy time from XML engine varies when we reduce the number of file,but for that to happen we have to change the map file or the input xml.

已经提出了SAS跟踪，并且在SAS社区中对SAS提出了质疑，但仍然运气不佳.看起来像它的解析器限制本身.

Already raised SAS tracks and have questioned the same in SAS communities still no luck.looks like its parser limitation itself.

您对Teradata中的碎纸机有任何想法吗?会有效吗?

Any idea about the shredder in Teradata ? will it be efficient?

Proc Groovy将较大的XML解析为SAS [英] Proc Groovy to parse larger XML into SAS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Proc Groovy将较大的XML解析为SAS [英] Proc Groovy to parse larger XML into SAS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭