Skip to content

reepc/Handwriting-text-recognition-with-ChatGPT-Correction

Repository files navigation

Introduction

This repo use TrOCR model, which is from meta. You can see more details there.

The online version in NTUST NLPLab's website.

The version here is which you can run the model locally (in your PC). If you want to run in your PC, you need a GPU which CUDA supports (Nvidia's GPU). Or you can run in kaggle or colab, they both have some free compute resource.

You can train an adapter according to your own font to increase model's accuracy.

The code here are ALL MODIFIED, deleted the code which are useless.

So if you want to use other model, you need to modify the code.

The online version has ChatGPT to correct the thing that you write wrong and give some feedback but this repo won't offer any API Key. If you want to use ChatGPT automatically when running, you need to use your own API key.

Or you can give ChatGPT result generated by model manually with prompts given here.

Prerequisites

Anaconda (Not necessary)

You can find the installing steps here.

Create anaconda environment (Same as Install Python)

Open your anaconda prompt (Windows) or shell (Linux) and type:

$ conda create -n the_name_you_want python=3.9.13

Install Python (without anaconda)

If you have python version below, you can skip this step.

  • 3.8
  • 3.9
  • 3.10
  • 3.11

Please note that the version you installed must can install pytorch. If you still have errors because of python version, please install python 3.9.13, which is the author's python version when doing this repo.

To install python, please go to official website to install according to your system.

Installation

$ git clone https://github.com/reepc/Handwriting-text-recognition-with-ChatGPT-Correction.git
$ cd Handwriting-text-recognition-with-ChatGPT-Correction
$ pip install -r requirements.txt

Download and process Model

$ sh process_model.sh

RUN

$ python generate.py \
      --image-path /path/to/your/image \
      --prompt /path/to/prompt/or/enter/here \
      --output-path /path/to/store/your/output

--image-path is the only required argument in above arguments.

Adapter training and evalution

TODO

Adapter training

$ python3 main.py --adapter-path 

Reference

@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

TrOCR paper Official repo on github

One of the type of adapter: E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

Contact

If you have any problem, question or want to share your using experience, feel free to contact with guwanjun0530@outlook.com

TODOs

Change the decoder to a speech decoder and train it to make it generate speech instead of text.

Complete adapter.

Improve line segmentation and ruling line removal.

Make a Chinese version.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published