png图像到文本的转换 [英] png image to text conversion

查看:107
本文介绍了png图像到文本的转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尊敬的程序员,

我有一个小项目,必须将图像(png格式)转换为文本.

谁能帮我吗?

提前谢谢.

我已使用此代码:

Hi respected programmers,

I have a small project where I have to convert an image(png format )into text.

Can anyone please help me?

Thanks in advance.

I have used this code:

private string ConvertImage(Bitmap sBit)
{
    MemoryStream imageStream = new MemoryStream();
    //sBit.Save(imageStream, ImageFormat.Jpeg);
    sBit.Save(imageStream, System.Drawing.Imaging.ImageFormat.Png);
    return Convert.ToBase64String(imageStream.ToArray());
}

private void button1_Click(object sender, EventArgs e)
{
    Bitmap sBit = new Bitmap(@"C:\abc.png");
    string imageString = ConvertImage(sBit);
    // StreamWriter sw = new StreamWriter(@"C:\waleedelkot.text", false);
    StreamWriter sw = new StreamWriter(@"C:\wal.doc", false);
    sw.Write(imageString);
    sw.Close();
    MessageBox.Show("success");
}



它成功地运行,并导致DOC文件,其中包含此 " " iVBORw0KGgoAAAANSUhEUgAAAMgAAAAPAQMAAACbexLRAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAAAZQTFRF////AgAAVkYHEAAAAG1JREFUKM9jYKAqYD7eZi9RIXH4 + PwGNBm2nHPMFmcsj + XcYMCQEWOL4Kk6xnMfQya/jU2i4sbxAzfRTWN + c4xHgnfGseRGdBmGHJDMj2NpjJguOCYhcUbiWI4khmnH58 + oq5D + fP48dUODaAAAFpUkO0wZp50AAAAASUVORK5CYII = " ""但我的图片包含的数字没有96171341725



It ran succesfully and resulted in doc file which contains this """"iVBORw0KGgoAAAANSUhEUgAAAMgAAAAPAQMAAACbexLRAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAAAZQTFRF////AgAAVkYHEAAAAG1JREFUKM9jYKAqYD7eZi9RIXH4+PwGNBm2nHPMFmcsj+XcYMCQEWOL4Kk6xnMfQya/jU2i4sbxAzfRTWN+c4xHgnfGseRGdBmGHJDMj2NpjJguOCYhcUbiWI4khmnH58+oq5D+fP48dUODaAAAFpUkO0wZp50AAAAASUVORK5CYII=""""" but my picture contains a numerical no 96171341725

推荐答案

http://sourceforge.net/projects/netocr/ [ ^ ]
如何:使用C#使用Office 2007 OCR [
http://sourceforge.net/projects/netocr/[^]
How To: Use Office 2007 OCR Using C#[^]


您会在Google上找到许多OCR SDK.它们很少如下.

1)Atalasoft OCR SDK
http://www.atalasoft.com/products/dotimage/white-papers/the-atalasoft-ocr-engine

2)Accusoft SmartZone OCR SDK
http://www.accusoft.com/smartzone.htm

3)ABBY OCR SDK
http://www.abbyy.com/ocr_sdk_windows/key_features/ocr/?adw= google& gclid = CKS7n9veq6wCFUp66wodkSLH3A

4)Nuance OminPage OCR SDK
http://www.nuance.com/for-business/by-product/omn​​ipage/csdk/index.htm

5)Traceract OCR SDK
http://code.google.com/p/tesseract-ocr/

Traceract
You will find many OCR SDKs on Google. Few of them as below.

1) Atalasoft OCR SDK
http://www.atalasoft.com/products/dotimage/white-papers/the-atalasoft-ocr-engine

2) Accusoft SmartZone OCR SDK
http://www.accusoft.com/smartzone.htm

3) ABBY OCR SDK
http://www.abbyy.com/ocr_sdk_windows/key_features/ocr/?adw=google&gclid=CKS7n9veq6wCFUp66wodkSLH3A

4) Nuance OminPage OCR SDK
http://www.nuance.com/for-business/by-product/omnipage/csdk/index.htm

5) Traceract OCR SDK
http://code.google.com/p/tesseract-ocr/

Traceract


首先,您似乎要知道将图像转换为可打印的字符串(作为base64,这称为序列化)与OCR(上述)非常不同,这是正确的选择.我敢打赌,您的图片会显示"您提到的数字.该数字显示为像素图形.序列化图像会从png(二进制格式)中获取每个像素,并将其转换为您所说的字符串"(在您的情况下,base64是一个字符串,不含白色字符,表示没有可打印字符).这是您放入.doc文件中的废话(无效). OCR(光学字符识别)是一种从基于像素的图像(如png)中提取或更好地猜测(而不是转换)字符或字符串的方法.这样做不是初学者的任务,但是您可以尝试.只需提及几件事:办公室OCR需要一个较旧的办公室才能工作,在较新的版本中不再包含该信息,此信息可能会让您感到安全.代码项目上有几篇文章不再有效,或者必须进行部分重写才能正常工作,因为api或名称空间已随着时间的流逝而改变或不再存在.因此,请遵循以上提示并继续尝试.看来您使用的是XP(因为您可以写入c:\),也许这些较旧的示例在您的台式机上可用,但请注意,它们可能不在W7或Vista上.最后,我有一些OCR的可用示例,但是抱歉,为我整理一个示例需要花费我几个小时.所以继续尝试.除此之外,文档文件必须具有特定的结构,您无法将字符串写到文档文件中,而期望的单词才能正确打开该字符串.
First of all, you don''t seem to know that converting an image to a printable string (as base64, this is called serialization) is very different from OCR (mentioned above) which is the right way to go. I bet your image "shows" the number you mentioned. this number is displayed as pixel graphic. Serializing an image would take each pixel from the png (which is binary format) and convert it to, what you call a "string" (in your case base64, which is a string, free of white characters, means none printable characters). this is the nonsense you put into a .doc file (doesn’t work). OCR (optical character recognition) is the way to go, to extract or better guess (not convert) characters or strings from an pixel based image (like png). doing this is not a beginner task but you may try. just a few things to be mentioned: the office OCR needs an older office to work, in newer versions it is no more included, this info may safe you some time. there are a few articles around on codeproject that no longer work or have to be partially rewritten to work because apis or namespaces have changed over time or no longer exist anyway. so take the hints above and keep trying. It seems you are using XP (because you can write into c:\) maybe these older examples work on your desktop, but be aware they may not on W7 or Vista. Finally, I have working examples of OCR but it would take me hours to assemble an example for you, sorry. so keep trying. apart from that, a doc file has to have a certain structure, you cannot write a string into a doc file an expect word to open it correctly.


这篇关于png图像到文本的转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆