🧾 Anamnesis Form – Automated PDF Analysis

This project focuses on the automated analysis and processing of anamnesis forms commonly used in medical documentation.
The goal is to extract printed content from PDF forms, evaluate checkboxes, and match results against a predefined catalog of statements.

⚠️ Due to the sensitive nature of the data and documents, the source code of this repository is not publicly available.

🧩 Project Overview

The pipeline consists of several stages:

PDF Conversion: Convert PDF files into high-resolution images.
Image Preprocessing: Binarize and deskew images to enhance OCR performance.
Text Extraction (OCR): Use Tesseract to extract text and detect checkboxes.
Checkbox Analysis: Recognize and classify "Yes"/"No" checkbox markings.
Catalog Matching: Match extracted sentences to a reference statement catalog.
Export: Output the structured results into CSV files.

⚙️ Setup & Installation

🐍 Python Dependencies

pip install pdf2image
pip install pytesseract
pip install opencv-python
pip install numpy
pip install Levenshtein

Poppler

Poppler is required for PDF-to-image conversion.

Windows:

Download Poppler
and add bin-Ordner to your system PATH

macOS:

brew install poppler

Linux:

sudo apt install poppler-utils

🛠️ Key Functions

Converts a PDF into image files

pdf_to_images(pdf_path)

Preprocessing steps

binarize_image(image)
deskew_image(img_path)

Extract and identify keywords via OCR

get_sentences(img)
get_keywords(words, df)

Detect and filter checkbox data

get_checkboxes(...)
filter_checkboxes_for_outliers(...)

Save results in structured format

export_to_csv(data, file)

⚠️ Known Limitations

OCR Accuracy depends on scan quality and font.
Checkbox Detection may fail with very small or unclear boxes.
At the moment only supports PDF and JPG inputs.

📄 License

This project was developed as part of a university project at FHNW. The project presentation is attached as a PDF file and provides an overview of the key processes, features, and outcomes.

📄 📥 View the project presentation (Anamnese.pdf)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Anamnese.pdf		Anamnese.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧾 Anamnesis Form – Automated PDF Analysis

🧩 Project Overview

⚙️ Setup & Installation

🐍 Python Dependencies

Poppler

🛠️ Key Functions

Converts a PDF into image files

Preprocessing steps

Extract and identify keywords via OCR

Detect and filter checkbox data

Save results in structured format

⚠️ Known Limitations

📄 License

About

Uh oh!

Sivanajani/Anamnese-Formular

Folders and files

Latest commit

History

Repository files navigation

🧾 Anamnesis Form – Automated PDF Analysis

🧩 Project Overview

⚙️ Setup & Installation

🐍 Python Dependencies

Poppler

🛠️ Key Functions

Converts a PDF into image files

Preprocessing steps

Extract and identify keywords via OCR

Detect and filter checkbox data

Save results in structured format

⚠️ Known Limitations

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks