计算PDF中的单词数 [英] Count Number of Words in PDF

查看:253
本文介绍了计算PDF中的单词数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,
我以这篇文章开始我的载体,我的任务是上传pdf文件时计算PDF文件中存在的单词数.任何人都可以帮助我该怎么做.希望有人能帮助我完成我的第一个任务生活.


谢谢,
Neelam.

Hi to all,
I am starting my carrier with thi post ,I have task to count number of words existed in PDF file when uploading pdf file.Can any one help me how can I do it.Hope some one will help me to complete my first task in my life.


Thanks,
Neelam.

推荐答案

尊敬的Neelam,

尝试对此进行一些Google搜索:-> http://stackoverflow.com/questions/6734374/get-only-word- count-from-pdf-document [ ^ ]

希望这对您有所帮助.

谢谢
Dear Neelam,

Try to do some google search on this:- google.com[^]

the solution to you problem is in this link:-

http://stackoverflow.com/questions/6734374/get-only-word-count-from-pdf-document[^]

Hope this will help you out.

Thanks


检查这些链接..

从pdf文档中仅获取字数 [使用ItextPdf [^ ]
check these links..

Googled[^]

get only word count from pdf document[^]

use ItextPdf[^]


neelamrathod,
欢迎来到codeproject.

Hi neelamrathod,
welcome to codeproject.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text.pdf.parser;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            string InputFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Input.pdf");

            //Get all the text
            string T = ExtractAllTextFromPdf(InputFile);
            //Count the words
            int I = GetWordCountFromString(T);

        }

        public static string ExtractAllTextFromPdf(string inputFile)
        {
            //Sanity checks
            if (string.IsNullOrEmpty(inputFile))
                throw new ArgumentNullException("inputFile");
            if (!System.IO.File.Exists(inputFile))
                throw new System.IO.FileNotFoundException("Cannot find inputFile", inputFile);

            //Create a stream reader (not necessary but I like to control locks and permissions)
            using (FileStream SR = new FileStream(inputFile, FileMode.Open, FileAccess.Read, FileShare.Read))
            {
                //Create a reader to read the PDF
                iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(SR);

                //Create a buffer to store text
                StringBuilder Buf = new StringBuilder();

                //Use the PdfTextExtractor to get all of the text on a page-by-page basis
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    Buf.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                return Buf.ToString();
            }
        }
        public static int GetWordCountFromString(string text)
        {
            //Sanity check
            if (string.IsNullOrEmpty(text))
                return 0;

            //Count the words
            return System.Text.RegularExpressions.Regex.Matches(text, "\\S+").Count;
        }
    }
}


这篇关于计算PDF中的单词数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆