Page segmentation

Detecting text and images on heritage documents

Keywords: page segmentation, document layout analysis, text line detection

Approaches: convolutional neural networks, synthetic data

Tools: docExtractor

docExtractor is a generic approach for extracting visual elements such as text lines or illustrations from historical documents. It can be used as an offthe-shelf system or fine-tuned on specific dataset. It relies on a fast generator of rich synthetic documents for the training and a fully convolutional network for the extraction.

Example

Goals

Extraction of illustrations and texte on heritage material for information retrieval, etc.

Educational resources

See this github.

NLP

HTR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Page segmentation

Page segmentation

Goals

Educational resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NLP

HTR

OCR

Document Analysis

Computer Vision

Clone this wiki locally