method: Recognition finetuned from TrOCR2024-05-04

Authors: Pengyu Chen; Xuezi Bi; Quanzhi Xiang; Junxian Li

Affiliation: University of South Carolina; Sun Yat-sen University; University of Science and Technology of China; Beihang University

Description: Results are created by the TrOCR that we finetune with Rumsey. First, we use the result of the task 1. To get a better position relationship of map text in recognition task, we add a mask preprocess in the dataset. Then we finetune TrOCR with adamw, lr with 5e-5 and base model with trocr-small-stage1. Our further plan to the Detection and Recognition task is shown as follow:
1. Due to time and GPU constraints, we did not train TrOCR with more calendar elements. Still, the rise in fine-tuning results is quite noticeable, suggesting that TrOCR is a relatively flexible and easy to alter model. I believe it would work better if there were more iterations.
2. As the text detection result could be better than now and will influence our recognition result. We think that if we build a more robust detection model, then the performance of recognition task will be better.