Imago OCR is a toolkit for 2D chemical structure image recognition. It contains a GUI program and a command-line utility, as well as a documented API for developers. Imago is completely free and open-source, while also available on a commercial basis.
The core part of Imago is written from scratch in modern C++. It uses the best known algorithms for optical recognition. That guarantees Imago’s outstanding portability and performance.
Recognizable Molecule Features¶
- Single, double, triple bonds, bridged bonds
- Atom labels, subscripts, isotopes, charges
- Superatoms and abbreviations expansion
- Aromatic rings
- Stereochemistry (up- and down-bonds)
You can find more examples on this page.
Comparison with other systems¶
We created a detailed report with sets of different images that compares Imago OCR with other publicly available solutions. The report is available on a separate web page. The scripts and the image sets are available in the download section.
If you can suggest other test sets or other publicly available solutions we would be happy to include them too in the report.
Presentation at the Symposium on 244th ACS National Meeting & Exposition:
Imago library is written in portable C++ and supports Linux, Windows, and Mac OS X operating systems, both 32-bit and 64-bit versions of each system.
Imago exposes the C interface to applications. Java wrapper is available for all supported platforms. A Java GUI application called Imago OCR Visual Tool is provided, and a command-line utility imago_console is provided as well.
List of Dependencies¶
The dependencies are included into the distribution packages, and so you do not need to download any of them separately to run the programs or to compile the source code.
Imago C++ dependencies:
- OpenCV library
- PicoPNG (optional module to load PNG images with changes for fail-safe PNG image loading)
- JNA (for Java wrapper)
- PDFRenderer (only for Imago OCR Visual Tool)
- Java Advanced Imaging (JAI) (only for Imago OCR Visual Tool, part of Java SDK)
More details on the dependencies (including their licenses) you can find on a separate page
Supported Data Formats¶
Both the Imago OCR project and the
imago_console tool are supporting
the most popular raster image formats:
RAS and others (depending on platform).
Imago OCR Visual Tool users can also open
TIFF), and select a
fragment that should be recognized.
Developers who use the C API can pass supported format images or raw
image data to the library. Recognition result can be saved as
(Symyx, Accelrys) Molfiles. Imago OCR Visual Tool also provides a
possibility to copy the recognized molecule to the system clipboard.
Download and Install¶
Look at the Downloads page for the
installation package suitable for your system. There is an installer for
Windows, and zipfiles for Linux and Mac OS X, which you can just unpack
/opt directory, or into your home
You can run Imago OCR Visual Tool even without installing any files using Java Web Start technology. Open the following JNLP-file to execute Imago OCR Visual Tool.
Copyright © 2009-2014 LifeSciences unit of EPAM Systems, Inc.
This program is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If you did not, please see http://www.gnu.org/licenses/.
Do you need assistance using our tools? Do you need a feature? Do you want to send a patch to us? Did you find a bug? Please write to one of the following newsgroups and let us know:
- http://groups.google.com/group/indigo-bugs : for bug reports on all Open-Source projects.
- http://groups.google.com/group/indigo-dev : for development topics.
- http://groups.google.com/group/indigo-general : for any other discussions.
No registration is required: you can write from your ordinary e-mail account to email@example.com, firstname.lastname@example.org, or email@example.com to get your message posted.
If the GPL-licensed Imago does not fit your needs, please contact us to discuss the purchase of a commercial license. You may need the commercial license if you want to:
- Receive ongoing support and maintenance
- Include Imago as a component in your proprietary software product