结合使用Ruby和Ubuntu和光学字符识别 [英] Using Ruby And Ubuntu With Optical Character Recognition

查看:59
本文介绍了结合使用Ruby和Ubuntu和光学字符识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名大学生,是时候再次购买教科书.这个季度,我上课需要超过20本书.通常,这没什么大不了的,因为我只是将ISBN复制并粘贴到Amazon中.但是,ISBN会在我学校的书本网站上转换为图像.我要做的就是将ISBN放入一个字符串中,这样就不必手动键入每个ISBN了.我已经使用GOCR将图像转换为文本,但是我想将其与Ruby脚本一起使用,以便我可以自动执行该过程并对同学进行相同的操作.

I am a university student and it's time to buy textbooks again. This quarter there are over 20 books I need for classes. Normally this wouldn't be such a big deal, as I would just copy and paste the ISBNs into Amazon. The ISBNs, however, are converted into an image on my school's book site. All I want to do is get the ISBNs into a string so I don't have to type each one by hand. I have used GOCR to convert the images into text, but I want to use it with a Ruby script so I can automate the process and do the same for my classmates.

我可以导航到该站点.如何将图像保存到计算机上的文件中(运行UBUNTU),使用GOCR转换图像,最后将其保存到文件中,以便随后可以使用Ruby脚本再次访问它们?

I can navigate to the site. How can I save the image to a file on my computer (running UBUNTU), convert the image with GOCR, and finally save it to a file so I can then access them again with my Ruby script?

推荐答案

GOCR似乎是一个不错的选择,但是从我自己的研究"可以看出,质量对于日常使用而言还不够.可能会导致问题,具体取决于图像输入.如果您不满意,请尝试使用Google文档的新"功能,该功能可让您上传OCR图片.然后,您可以使用一些Google api检索结果(那里有大量信息,我正在使用

GOCR seems to be a good choice at first, but from what I can tell from my own "research", quality isn't quite sufficient for daily use. Maybe this could lead to a problem, depending on the image input. If it doesn't work out for you, try the "new" feature of Google Docs, which allows you to upload images for OCR. You can then retrieve the results using some google api ( there are tons out there, I'm using gdata-ruby-util which requires some hacking, though.

您还可以将tesseract-ocr用于OCR部分,它也是开源的并且正在积极开发中.

You could also use tesseract-ocr for the OCR part, it's also open source and in active development.

对于检索部分,我也将坚持使用hpricot,它超级强大且灵活.

For the retrieval part, I would as well stick with hpricot, super-powerful and flexible.

这篇关于结合使用Ruby和Ubuntu和光学字符识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆