- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
Detect diagrams draft #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
# Conflicts: # src/identifiers/diagram.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR experiments with extending diagram detection logic by implementing a voting system that incorporates new axis progression checks alongside existing keyword and unit detection. The changes aim to improve diagram identification by analyzing both monotonicity and numeric progression patterns in clustered data points.
Key changes:
- Introduces a voting system combining keyword detection, unit detection, and axis analysis
- Adds new axis_checksfunction to evaluate monotonicity and numeric progression in data clusters
- Implements arithmetic and logarithmic progression detection for axis validation
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
This draft builds on #46 and experiments with extending the diagram detection logic in src/identifiers/diagram.py.
The
identify_diagramfunction uses a voting system based on keywords, units, and axis progression checks, where the axis detection is includes a newaxis_checksfunction, which checks for both monotonicity and numeric progression in clusters of entries.However it introduces more code without improving the F1 scores compared to the latest version in the other branch. We probably do not want to merge this? But the arithmetic checks might come in handy as additional features for the treebased model training?
Introduces an axis_checks helper that evaluates clusters for both monotonicity and numeric progression. It updates identify_diagram to use a voting system across:
However, F1 scores are not improved compared to the latest implementation in the other branch and the code increases complexity. We likely do not want to merge this, however the progression logic might still be valuable as features for tree-based models.
For diagram