OCRopus

OCRopus

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

OCRopus

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

OCRopus is developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and was sponsored by Google.

OCRopus was especially designed for use in high-volume digitization projects of books, such as Google Books, Internet Archive, or libraries. A large number of languages and fonts are to be supported. However, it can also be used for desktop and office applications or for application for visually impaired people.

OCRopus has main components which perform:

Single or multiple scripts are available for these components. The modular programming approach allows individual workflows to be used and individual steps to be exchanged.

By default, OCRopus comes with a model for English texts and a model for text in Fraktur. These models refer to the script and are largely independent of the actual language. New characters or language variants can be trained either from the start, or addeded later.

Recent text recognition is based on recurrent neural networks (LSTM) and does not require a language model. This makes it possible to train language-independent models for which good recognition results in English, German and French have been shown at the same time. In addition to the Latin script, there are results for other scripts such as Sanskrit, Urdu, Devanagari, and Greek.

Very good detection rates can be achieved through an appropriate training. This extra effort is particularly worthwhile for difficult documents or scripts that are no longer common today, which are not in the focus of other OCR software.

See all

Hub AI

OCRopus AI simulator

(@OCRopus_simulator)

Wikipedia

Grokipedia

Hub AI

OCRopus

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

OCRopus is developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and was sponsored by Google.

OCRopus has main components which perform:

Single or multiple scripts are available for these components. The modular programming approach allows individual workflows to be used and individual steps to be exchanged.

See all

Recent media

software for document analysis and optical character recognition

Show all

Media

Show all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

OCRopus

OCRopus

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

OCRopus

Hub AI

OCRopus

Recent media

Contribute something to knowledge base

History

Media collections

History

Media collections

OCRopus

OCRopus

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

OCRopus

Hub AI

OCRopus

Recent media

Contribute something to knowledge base