Method: Dao Xianghu light of TianQuan - Task 2 - End-to-end Seal Title Recognition - ICDAR 2023 Competition on Reading the Seal Title

method: Dao Xianghu light of TianQuan2023-03-19

Authors: Kai Yang, Ye Wang, Bin Wang, Wentao Liu, Xiaolu Ding, Jun Zhu, Ming Chen, Peng Yao, Zhixin Qiu

Affiliation: CCB Financial Technology Co. Ltd, China

Description: In this task, we need to recognize seal title. 5000 training data are provided. After data exploration and analysis, we find that the difficulties of recognition mainly focus on multi-directional recognition, overlapping interference from handwritten or printed characters, fuzzy and blurred images, and multiple reading orders.
Based on the analysis, we build the following solution. First, we make a seal title segmentation that masks out the non-title area, and removes the interference of irrelevant regions. Then, we train a TrOCR model using over 6 million data from the training set, open dataset, and synthetic dataset. Finally, in the post process, place names correction is implemented.
In the seal title segmentation, we adopt an ensemble strategy with five segmentation models to vote for the title segmentation, laying a good foundation for the recognition.
Since the training set only has 5000 images, it is far from enough for the recognition task. We use the official chars.txt dictionary and collect the corpus of company names and organization names on the Internet, and generate a large number of seals by codes. To simulate the real situation, we use various fonts, colors, backgrounds, and textures to synthesize the images, and we perform kinds of data augmentation strategies for improving generalization including rotation, gaussian blur, stretching, perspective transformation, contour expansion or contraction and so on. In addition, we use 10k seals from Baidu public dataset.
At the early stage of the competition, we use the public dataset and the synthesized dataset as the training set and the original training set of the competition as the test set. We continuously synthesize kinds of data to improve the accuracy of the test set.
To further improve the accuracy, we design a classifier to separate circular seals (Circle/Ellipse shapes) and non-circular seals (Rectangle/Triangle shapes). We generate nearly 400k non-circular seals. And we compare the single recognition model solution with the solution of classifying then recognizing with multiple models. And we verify that the former solution is better.
When analyzing bad cases, we find that smudging and character overlapping often lead to recognition errors. So we design place names based post-processing strategy to correct some of these errors.

Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Flo- rencio, Cha Zhang, Zhoujun Li, and Furu Wei. Trocr: Transformer-based optical character recognition with pre- trained models. arXiv preprint arXiv:2109.10282, 2021. 3

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021. 1,2,4

Source code

Source code 2