SAS:阅读PDF文件 [英] SAS : read in PDF file

查看:581
本文介绍了SAS:阅读PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要寻找的方法与SAS的PDF文件阅读。显然,这不是基本的功能和有很少在互联网上被发现。 (更不用说,谷歌不容易与PDF在你搜索给你还链接到去其他的事情PDF文档。)

I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)

这是可以发现的唯一的事情,是人们寻找各种方法将数据导入从PDF数据集。对于我来说,这甚至不是necesarry。我希望能够阅读PDF文件的内容,在一个大的字符变量。如果可能的话,这将甚至更好,以便能够在文件的二进制数据读

The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.

这可能与SAS又如何? (我得到了它在Access VBA工作,但找不到任何SAS类似的方式。)

Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)

(最终,目的是将其转换为Base64,并将该基于64位字符串转化成XML文档。)

(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)

推荐答案

您可能不能够读取整个文件到一个字符变量因为一个字符变量的最大尺寸约为33 KB。一个简单的方法在一行一次读取,虽然是类似以下内容:

You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:

%let pdfFileName = Test.pdf;
%let lineSize = 2000;

data base;
   format text_line $&lineSize..;
   infile "&pdfFileName" lrecl=&lineSize;
   input text_line $;
run;

这需要你有时间提前的最大记录长度的一般想法,但你可以写额外code来确定最大记录大小的文件,在阅读之前。在本实施例的每一行文本读入名为一个字符变量text_line。从那里,你可以在输入行使用RETAIN语句或双拖车(@@),以一次处理多行。 SAS的网站有大量的文档对如何阅读和各种类型的输入文件的过程文本。

This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (@@) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.

这篇关于SAS:阅读PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆