Comparison Report

Our evaluation framework compares various versions of Imago OCR application with other applications for molecule image optical character recognition. For each image the testing framework measures execution time and similarity score with a reference molecule file in Molfile format. Indigo toolkit is used to measure molecule similarity. Because different application produces output differently testing framework applies the following rules to standardize molecules:

  • Hydrogens are folded.
  • If the output contains multiple molecules in SDF format then all of them are merged into a single molecule with several disconnected fragments.
  • Both aromatized and dearomatized structures are compared and best score is selected.

Diverse Dataset Report

Report file is available at the separate page. You can also download all the report files (with script files) from the Downloads page.

Report

This report contains 5 datasets of 500 molecule images each from different sources:

If you can suggest other test sets or other publicly available solutions we would be happy to include them too in the report.