This repo use TrOCR model, which is from meta. You can see more details there.
The online version in NTUST NLPLab's website.
The version here is which you can run the model locally (in your PC). If you want to run in your PC, you need a GPU which CUDA supports (Nvidia's GPU). Or you can run in kaggle or colab, they both have some free compute resource.
You can train an adapter according to your own font to increase model's accuracy.
The code here are ALL MODIFIED, deleted the code which are useless.
So if you want to use other model, you need to modify the code.
The online version has ChatGPT to correct the thing that you write wrong and give some feedback but this repo won't offer any API Key. If you want to use ChatGPT automatically when running, you need to use your own API key.
Or you can give ChatGPT result generated by model manually with prompts given here.
You can find the installing steps here.
Open your anaconda prompt (Windows) or shell (Linux) and type:
$ conda create -n the_name_you_want python=3.9.13
If you have python version below, you can skip this step.
- 3.8
- 3.9
- 3.10
- 3.11
Please note that the version you installed must can install pytorch. If you still have errors because of python version, please install python 3.9.13, which is the author's python version when doing this repo.
To install python, please go to official website to install according to your system.
$ git clone https://github.com/reepc/Handwriting-text-recognition-with-ChatGPT-Correction.git
$ cd Handwriting-text-recognition-with-ChatGPT-Correction
$ pip install -r requirements.txt
$ sh process_model.sh
$ python generate.py \
--image-path /path/to/your/image \
--prompt /path/to/prompt/or/enter/here \
--output-path /path/to/store/your/output
--image-path is the only required argument in above arguments.
TODO
$ python3 main.py --adapter-path
@misc{li2021trocr,
title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
year={2021},
eprint={2109.10282},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
TrOCR paper Official repo on github
One of the type of adapter: E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
If you have any problem, question or want to share your using experience, feel free to contact with guwanjun0530@outlook.com
Change the decoder to a speech decoder and train it to make it generate speech instead of text.
Complete adapter.
Improve line segmentation and ruling line removal.
Make a Chinese version.