Results - ICDAR 2023 Competition on Reading the Seal Title

method: SPDB LAB2023-03-20

Authors: Jie Li 、Wei Wang、Yuqi Zhang、Ruixue Zhang、Yiru Zhao、Danya Zhou、Di Wang、Dong Xiang、Hui Wang、Min Xu、Pengyu Chen、Bin Zhang、Chao Li、Shiyu Hu、Songtao Li、Yunxin Yang

Affiliation: Shanghai Pudong Development Bank

Email: zhangyq26@outlook.com、wangdee0805@139.com、lij131@spdb.com.cn

Description: Circle seals and Ellipse seals：Based on the results of the circle and ellipse seals title detection in task1， PCA technology was used to correct the rotated seal, the image processing technology was used to separate the seal title, and finally the curved text was sent to the recognition model for recognition. The recognition model was selected by Trocr, and the training data includes the provided training data and synthetic data.
Rectangle seals and Triangle seals: rectangle seals and triangle seals were not based on the task1 detection model, but train a text line detection model. the image processing technology was used to separate the seal title. The recognition model was selected by Trocr, and the training data includes the provided by synthetic data.

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Source code

Source code 2

method: Dao Xianghu light of TianQuan2023-03-19

Authors: Kai Yang, Ye Wang, Bin Wang, Wentao Liu, Xiaolu Ding, Jun Zhu, Ming Chen, Peng Yao, Zhixin Qiu

Affiliation: CCB Financial Technology Co. Ltd, China

Description: In this task, we need to recognize seal title. 5000 training data are provided. After data exploration and analysis, we find that the difficulties of recognition mainly focus on multi-directional recognition, overlapping interference from handwritten or printed characters, fuzzy and blurred images, and multiple reading orders.
Based on the analysis, we build the following solution. First, we make a seal title segmentation that masks out the non-title area, and removes the interference of irrelevant regions. Then, we train a TrOCR model using over 6 million data from the training set, open dataset, and synthetic dataset. Finally, in the post process, place names correction is implemented.
In the seal title segmentation, we adopt an ensemble strategy with five segmentation models to vote for the title segmentation, laying a good foundation for the recognition.
Since the training set only has 5000 images, it is far from enough for the recognition task. We use the official chars.txt dictionary and collect the corpus of company names and organization names on the Internet, and generate a large number of seals by codes. To simulate the real situation, we use various fonts, colors, backgrounds, and textures to synthesize the images, and we perform kinds of data augmentation strategies for improving generalization including rotation, gaussian blur, stretching, perspective transformation, contour expansion or contraction and so on. In addition, we use 10k seals from Baidu public dataset.
At the early stage of the competition, we use the public dataset and the synthesized dataset as the training set and the original training set of the competition as the test set. We continuously synthesize kinds of data to improve the accuracy of the test set.
To further improve the accuracy, we design a classifier to separate circular seals (Circle/Ellipse shapes) and non-circular seals (Rectangle/Triangle shapes). We generate nearly 400k non-circular seals. And we compare the single recognition model solution with the solution of classifying then recognizing with multiple models. And we verify that the former solution is better.
When analyzing bad cases, we find that smudging and character overlapping often lead to recognition errors. So we design place names based post-processing strategy to correct some of these errors.

Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Flo- rencio, Cha Zhang, Zhoujun Li, and Furu Wei. Trocr: Transformer-based optical character recognition with pre- trained models. arXiv preprint arXiv:2109.10282, 2021. 3

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021. 1,2,4

Source code

Source code 2

method: task2 result2023-03-10

Authors: DH

Description: A end to end seal recognition, 识别结果见json文件。本方法采用的是一种端到端的印章识别方法，输入是一张印章图片，输出结果是印章的名称。数据是采用比赛提供的训练数据及生成的数据和一些公开的数据集作为训练数据集。参考论文为：Scene Text Recognition with Permuted Autoregressive Sequence Models，参考代码为：https://github.com/baudm/parseq

https://arxiv.org/abs/2207.06966

Source code

Ranking Table

Description Paper Source Code

Date	Method	Accuracy
2023-03-20	SPDB LAB	91.88%
2023-03-19	Dao Xianghu light of TianQuan	91.22%
2023-03-10	task2 result	84.22%
2023-03-07	recog_test	78.58%
2023-03-21	ParSeq with SwinV2	76.20%
2023-03-21	ParSeq with SwinV2	76.00%
2023-03-21	ParSeq with SwinV2	75.86%
2023-03-16	Transformer Seal Text Recognition Networks with Synthetic data and Various Data Enhancements	75.70%
2023-03-21	ParSeq with SwinV2	75.68%
2023-03-13	Transformer Seal Text Recognition Networks with Synthetic data and Various Data Enhancements	75.28%
2023-03-21	ParSeq with SwinV2	72.97%
2023-03-18	imcc	67.07%
2023-03-17	Donut (fine-tuned)	61.87%

Inactive evaluations

method: SPDB LAB2023-03-20

method: Dao Xianghu light of TianQuan2023-03-19

method: task2 result2023-03-10

Ranking Table

Ranking Graphic