ISRI Software
Contents:
The ISRI OCR Performance Toolkit
The UNLV/ISRI Analytic Tools for OCR Evaluation together with a large and diverse collection of scanned documnets images with the associated ground-truth text. This combination of tools and test data allows anyone to conduct meantingful tests of OCR technology. This collection has a dedicated page: http://www.isri.unlv.edu/OCRtk/.OCRspell
OCRspell is an interactive spelling correction system for correcting OCR errors in text. It selects candidate words through the use of information gathered from multiple knowledge sources. The system is based on static and dynamic device mappings, approximate string matching, and n-gram analysis. The statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. The source code for version 1.0 is available, and a journal paper about the system is on our Publications page.RubyHTML
RubyHTML is a library written for the Ruby scripting language to facilitate the creation and maintenance scripts of dynamic web pages. The object model follows the HTML 4.01 Strict DTD as closely as possible, as all elements contained in that DTD exist as classes in RubyHTML. The library also contains "widgets" constructed from these base classes - the idea being, of course, that certain organizations of elements are repeated, such as tables used to display query results. These recurring organizations are (or will be...) formed into widget classes contained in the RubyHTML library. Documentation isn't very extensive right now, and the library still needs some testing, but all suggestions and comments are welcome. The library is available for download here.Acronym Finding Program (AFP)
This is a C source distribution of version 2.0 of the AFP from 1995, as discussed in our paper ``Recognizing Acronyms and their Definitions'' (Taghva99b). Please note that this source is now quite old and although it is known to work, there may be some deficiencies and you may see some warnings while building the program. Download afp-2.0.tar.gz.Experimental Open Source OCR
Released under the Apache License, 2.0, with the following disclaimers:- The source files have not been updated to reflect the new license.
- The Aspirin/MIGRANES system included in the package is not licensed under the Apache License. It has the following license:
**************** NO WARRANTY ***************** Since the Aspirin/MIGRAINES system is licensed free of charge, Russell Leighton and the MITRE Corporation provide absolutley no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will Russell Leighton or the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or inability to use the Aspirin/MIGRAINES system. ***************** COPYRIGHT ******************* This software is the copyright of Russell Leighton and the MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. Russell Leighton and the MITRE Corporation provide absolutely NO WARRANTY for this software. August, 1992 Russell Leighton The MITRE Corporation 7525 Colshire Dr. McLean, Va. 22102-3481
- There may be other components with Open Source licenses which we are not allowed to change. A comprehensive license review has not been done by ISRI, and redistribution is not recommended until one has been done.