fix dimensions of image generated by parser.py #242

montvid · 2025-10-28T08:52:54Z

After image ocr with --no_fitz_preprocess a jsonl file is generated with wrong image dimensions to the original because smart_resize is used in the parser.py script. As per https://arxiv.org/abs/2307.06304 NaViT can ingest any dimension image so no need for smart_resize.

delete smart_resize

ygfrancois · 2025-10-31T10:26:22Z

Smart resize here is to show the real input size to model, the model server will online do the smart resize to keep the input size divisible by vision patch size.

montvid · 2025-10-31T12:22:06Z

Thanks for clarification! Now I understand that NaViT-style encoders accept arbitrary aspect ratios/resolutions, but the deployed implementation still enforces patch-size divisibility to form valid patch tokens efficiently.

montvid added 2 commits October 28, 2025 10:52

Update parser.py

f3385fc

delete smart_resize

ygfrancois closed this Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix dimensions of image generated by parser.py #242

fix dimensions of image generated by parser.py #242

montvid commented Oct 28, 2025

Uh oh!

ygfrancois commented Oct 31, 2025

Uh oh!

montvid commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix dimensions of image generated by parser.py #242

fix dimensions of image generated by parser.py #242

Conversation

montvid commented Oct 28, 2025

Uh oh!

ygfrancois commented Oct 31, 2025

Uh oh!

montvid commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants