绘制数字化 - 从图形的图像中刮取样本值 [英] Plot digitization - scraping sample values from an image of a graph

查看:280
本文介绍了绘制数字化 - 从图形的图像中刮取样本值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是真正的OCR,因为它不是识别字符,但它是应用于曲线的相同的想法。任何人知道图像处理库或已建立的算法来检索(光栅)图像中的值?例如,在这个图中,我很难用我的眼睛读取确切的值,因为在网格线之间有这样的差距:

This isn't really "OCR", since it's not recognizing characters, but it's the same idea applied to curves. Anyone know of an image-processing library or established algorithm for retrieving the values from a (raster) plot image? For instance, in this graph, it's hard for me to read exact values with my eyes because there's such gaps between gridlines:

我可以使用直边或任何东西,但它仍然会容易出错。如果有软件可以只截取任何旧图形的屏幕截图,并自动将其转换为值表或可以查询的函数,这将是巨大的。

I can use a straight edge or whatever, but it's still going to be error-prone. It would be great if there were software that could just take a screenshot of any old graph and automatically convert it into a table of values or a function that could be queried.

似乎被称为曲线识别?也可以用于从未公布基础数据的科学论文中的曲线中提取数据。

Seems to be called "curve recognition"? Could also be used for extracting data from the curves in scientific papers for which the underlying data is not published.

有一些人工指导是确定的。例如,没有理由OCR不能读取100并且与线匹配,但是在机器已经提取相对于网格线的曲线的路径之后,有人给出线数字值是确定的。我最感兴趣的是相对于网格跟踪曲线的功能,即使网格是倾斜,旋转或在非亲情方式

And it's ok to have some human guidance. There's no reason an OCR couldn't read the "100" and match it up with the line, for instance, but it's ok to have a human give the lines numerical values after the machine has extracted the curve's path relative to the gridlines. I'm mostly interested in the function of tracing the curve relative to the grid, even if the grid is tilted, rotated, or warped in a non-affine way.

更新:

现在是一篇维基百科文章,名为通过链接中的一堆软件将扫描图形转换为数据。另外还有一些 alternativeto.net上的软件。我想这个理论现在属于 http://dsp.stackexchange.com ,而软件解决方案属于http://superuser.com

There is now a Wikipedia article called Converting scanned graphs to data with a bunch of software in the links. Also some software on alternativeto.net. I guess the theory belongs on http://dsp.stackexchange.com now, while the software solutions belong on http://superuser.com?

推荐答案

这是非常困难的, - 酮。

This is extremely hard and error-prone. (We do this sort of thing a lot in chemistry where we try to analyze chemistry.) It depends critically on various parameters and conditions.


  1. Is the image a bit-map (pixels-only) or vectors (EMF, WMF, SVG, PS, PDF...)? Vectors are vastly better than pixels. We tackle vectors (including PDF) but don't touch pixels. Some of our collbaorators will try to use pixels but only on fairly recent documents.
  2. If you are stuck with pixels then are your images all from the same source? If so you have a small chance of extracting font information. I am afraid your image is so poor that it would require a great deal of work. However if you can work out the font you have a chance of extracting text and numbers if all docs are from the same source. You could use heuristics (rules such as where the numbers might be) or machine-learning (a list of features on whioch the methods can be trained).
  3. Your image appears to have been scanned (as the axes are pixelated). That makes it even worse. What appears a straight line to the eye is horrible for a machine. Is your image skewed on the page? You may have to deskew it.
  4. If you have a model for the lines and curves then you may have a change of modelling expected parameters into the image. But it's not trivial.

我很抱歉要悲观。如果你真的想要的信息,那么它可以做很多投资或与做这种事情的团体的合作。

I'm sorry to be pessimistic. If you really want the info then it can be done with a lot of investment or collaboration with groups which do this sort of thing.

这篇关于绘制数字化 - 从图形的图像中刮取样本值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆