Pdf提取文本 [英] Pdf Extracting text

查看:216
本文介绍了Pdf提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想选择<$ p $ c> pdf 文件的路径

private static final int DIALOG_LOAD_FILE = 1000;

我有两个按钮,一个用于获取pdf文件的路径,另一个用于提取文本的文本:

I have two buttons,one to get the path of the pdf file and the other to extraxt the text:

Button b1 = (Button) x.findViewById(R.id.buttonStripText);
        Button button = (Button) x.findViewById(R.id.pick);
        button.setOnClickListener(new View.OnClickListener()
        {
            @Override
            public void onClick(View v)
            {
                Intent intent = new Intent(Intent.ACTION_GET_CONTENT);
                intent.setType("file/*");
                startActivityForResult(intent,DIALOG_LOAD_FILE);
            }
        });
        b1.setOnClickListener(new View.OnClickListener()
        {
            @Override
            public void onClick(View v)
            {
                stripText(v);
            }
        });

其他两个函数是

@Override
    public void onActivityResult(int requestCode, int resultCode, Intent data) {
        // TODO Auto-generated method stub
        switch(requestCode){
            case DIALOG_LOAD_FILE:
                if(resultCode==RESULT_OK){
                   fileName = data.getData().getPath();
                   System.out.println("Your File Name is:::"+fileName);
                }
                break;

        }
    }
    private void setup() {
        PDFBoxResourceLoader.init(getActivity().getApplicationContext());
        root = android.os.Environment.getExternalStorageDirectory();
        assetManager = getActivity().getAssets();
    }
    public void stripText(View v) {
        String parsedText = null;
        try {

            PDDocument document  = PDDocument.load(assetManager.open("cover_letter.pdf"));
            PDFTextStripper pdfStripper = new PDFTextStripper();
            pdfStripper.setStartPage(0);
            pdfStripper.setEndPage(1);
            parsedText = "Parsed text: " + pdfStripper.getText(document);
                if (document != null) document.close();
            } catch (Exception e) {
                e.printStackTrace();
            }

        tv.setText(parsedText);
    }

我没有抛出任何错误但也没有得到提取文本。
这种 Dialog_Load_File 打开谷歌驱动器,如果可能的话,告诉我如何打开内部存储!
任何帮助将不胜感激!

I doesn't throw any error but also it doesn't get the extracted text. This kind of Dialog_Load_File opens google drive,if possible to show me how to open internal storage! Any Help would be appreciated!

推荐答案

PDDocument document = PDDocument.load(assetManager.open("cover_letter.pdf"));
PDDocument document = PDDocument.load(... from any input stream .... ); 

因此,如果您可以从资产或原始或文件或从uri打开输入流,您就完成了。

So if you can open an inputstream from assets or raw or from file or from uri you are done.

例如,如果你在onActivityResult中得到一个uri

For instance if you get an uri in onActivityResult

InputStream is = getContentResolver().openInputStream(data.getData());
PDDocument document = PDDocument.load( is ); 

这篇关于Pdf提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆