-
Notifications
You must be signed in to change notification settings - Fork 0
Page segmentation
CENL-AI-WG edited this page Dec 9, 2020
·
16 revisions
Detecting text and images on heritage documents
Keywords: page segmentation, document layout analysis, text line detection
Approaches: convolutional neural networks, synthetic data
Tools: docExtractor
docExtractor is a generic approach for extracting visual elements such as text lines or illustrations from historical documents. It can be used as an offthe-shelf system or fine-tuned on specific dataset. It relies on a fast generator of rich synthetic documents for the training and a fully convolutional network for the extraction.

Extraction of illustrations and texte on heritage material for information retrieval, etc.
See this github.