如何从提取的PDF文本中获取字体属性(字体大小,字体样式,字体颜色)? [英] How to get font properties(font size, font style, font colour) from the extracted pdf text?

查看:2453
本文介绍了如何从提取的PDF文本中获取字体属性(字体大小,字体样式,字体颜色)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,



我使用下面的代码从pdf文件中提取文字,



Hi All,

I am using below code to extract text from pdf file,

public string ReadPdfFile()
        {
            string strText = string.Empty;
            try
            {
                PdfReader reader = new PdfReader(@"\\FilePath");

                for (int page = 1; page <= reader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
                    String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
                    
                    s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
                    strText = strText + s;

                }
                reader.Close();
            }
            catch (Exception ex)
            {
            }
            return strText;
        }





我需要获取所提取文本的字体属性(字体大小,字体样式,字体颜色)比较。

我需要在这行代码下面应用这个逻辑,





I need to get the font properties(font size, font style, font colour) of the extracted text for comparison.
I need that logic to be applied below this line of code,

ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);



有谁知道如何从提取的pdf文本中获取字体大小,字体样式,字体颜色等字体属性。



提前致谢,

Kane



添加的代码块[/ edit]

推荐答案

你可以通过使用iSharptext来实现这个目标



查看这些链接..



http://stackoverflow.com/questions/6882098/how-can-i-get-text-formatting -with-itextsharp [ ^ ]







http://stackoverflow.com/questions/3750150/i-want-to-export-pdf-to-xml-with-font-information-as-attribute-values [ ^ ]



或试试这个开源项目



http://sourceforge.net/projects/pdfsharp/ [ ^ ]
you can achieve this by using iSharptext

check these links..

http://stackoverflow.com/questions/6882098/how-can-i-get-text-formatting-with-itextsharp[^]

and

http://stackoverflow.com/questions/3750150/i-want-to-export-pdf-to-xml-with-font-information-as-attribute-values[^]

or try this open source project

http://sourceforge.net/projects/pdfsharp/[^]


这篇关于如何从提取的PDF文本中获取字体属性(字体大小,字体样式,字体颜色)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆