Detect diagrams draft #49

lhaibach · 2025-09-18T11:34:46Z

This draft builds on #46 and experiments with extending the diagram detection logic in src/identifiers/diagram.py.

The identify_diagram function uses a voting system based on keywords, units, and axis progression checks, where the axis detection is includes a new axis_checks function, which checks for both monotonicity and numeric progression in clusters of entries.

However it introduces more code without improving the F1 scores compared to the latest version in the other branch. We probably do not want to merge this? But the arithmetic checks might come in handy as additional features for the treebased model training?

Introduces an axis_checks helper that evaluates clusters for both monotonicity and numeric progression. It updates identify_diagram to use a voting system across:

diagram keywords
units
y-axis and x-axis monotonicity
y-axis and x-axis numeric progression (new)

However, F1 scores are not improved compared to the latest implementation in the other branch and the code increases complexity. We likely do not want to merge this, however the progression logic might still be valuable as features for tree-based models.

For diagram

Branch	F1 Score	Precision	Recall
detect-diagram	62.22	60.87	63.64
detect-diagram-draft	60.47	61.90	59.09

Metric	text	boreprofile	map	geo_profile	title_page	diagram	table	unknown	Macro Avg
F1 Score	65.00	77.77	73.07	0.00	48.78	62.22	0.00	26.47	44.16
F1 Score draft	65.00	77.77	73.07	0.00	48.78	60.47	0.00	25.71	43.85

# Conflicts: # src/identifiers/diagram.py

Copilot

Pull Request Overview

This PR experiments with extending diagram detection logic by implementing a voting system that incorporates new axis progression checks alongside existing keyword and unit detection. The changes aim to improve diagram identification by analyzing both monotonicity and numeric progression patterns in clustered data points.

Key changes:

Introduces a voting system combining keyword detection, unit detection, and axis analysis
Adds new axis_checks function to evaluate monotonicity and numeric progression in data clusters
Implements arithmetic and logarithmic progression detection for axis validation

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/identifiers/diagram.py

lhaibach added 4 commits September 18, 2025 11:51

detect arithmetic or log progression

d42b5ec

Merge branch 'develop' into detect-diagrams-draft

92c313b

Merge branch 'refs/heads/detect-diagrams' into detect-diagrams-draft

9112e91

# Conflicts: # src/identifiers/diagram.py

add voter

c4efe86

lhaibach marked this pull request as draft September 18, 2025 11:35

lhaibach requested a review from Copilot September 18, 2025 11:35

Copilot AI reviewed Sep 18, 2025

View reviewed changes

lhaibach added 2 commits September 23, 2025 14:07

Merge branch 'detect-diagrams' into detect-diagrams-draft

0b57c5a

copilot comments

d10c943

lhaibach requested review from TicaGit and letao September 23, 2025 12:32

Base automatically changed from detect-tables to develop September 24, 2025 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect diagrams draft #49

Detect diagrams draft #49

Uh oh!

lhaibach commented Sep 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Detect diagrams draft #49

Are you sure you want to change the base?

Detect diagrams draft #49

Uh oh!

Conversation

lhaibach commented Sep 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant