是否可以使用仅搜索我上传的PDF的搜索引擎创建网站? [英] Is it possible to create a website with a search engine that only searches the PDFs I upload?

查看:122
本文介绍了是否可以使用仅搜索我上传的PDF的搜索引擎创建网站?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为教育视频库创建用户界面。视频位于其他地方,我想创建一个用户友好的网站,并拥有广泛的搜索引擎,但仅限于视频中涵盖的内容。目前,我手动使用20-30个关键字标记每个视频链接。但是,我希望如果我能弄清楚如何使用每个视频的pdf成绩单作为可搜索的文本,标记将是自动的,并产生更好的搜索引擎。我知道有很多OCR网站,但我没有找到任何自定义OCR搜索引擎的个人网站。这可能吗?

I am trying to create the user interface for an educational video library. The videos are housed somewhere else and I want to create a site that will be user friendly and have an extensive search engine, but only for the content covered in the videos. At the moment I am manually tagging each video link with 20-30 keywords. But, I am hoping if I can figure out how to use the pdf transcripts of each video as searchable text, the tagging will be automatic and result in a better search engine. I know there are many OCR websites out there but I haven't found any personal sites with custom OCR search engines. Is this possible?

推荐答案

OCR?听起来你需要ITextSharp。查看他们的SourceFourge页面并阅读有关如何使用它的一些内容。这是一个简单的片段,可以帮助您从PDF文件中提取一些文本:



itextsharp读取pdf文件 [ ^ ]

OCR? Sounds like you need ITextSharp. Check out their SourceFourge page and do some reading up on how to use it. Here's a simple snippet to get you started with extracting some text from a PDF file:

itextsharp read pdf file[^]
public string ParsePdf(string fileName)
{
  if (!File.Exists(fileName))
    throw new FileNotFoundException("fileName");
  using (PdfReader reader = new PdfReader(fileName))
  {
    StringBuilder sb = new StringBuilder();
 
    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
    for (int page = 0; page < reader.NumberOfPages; page++)
    {
      string text = PdfTextExtractor.GetTextFromPage(reader, page + 1, strategy);
      if (!string.IsNullOrWhitespace(text))
      {
        sb.Append(Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text))));
      }
    }
 
    return sb.ToString();
  } 
 }
}


这篇关于是否可以使用仅搜索我上传的PDF的搜索引擎创建网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆