- Task 1 - Detection
- Task 2 - Detection-Linking
- Task 3 - Detection-Recognition
- Task 4 - Detection-Recognition-Linking
method: MapTest2024-05-05
Authors: Hongen Liu
Affiliation: Tianjin University
method: DS-LP2024-03-26
Authors: hsy
Affiliation: BUPT
Description: Four tasks unified submission
DeepSolo, Multi-Polygon NMS (word detection and recognition) -> LayoutPointer (word linking)
method: MapText Detection-Recognition-Linking Strong Pipeline2024-05-06
Authors: Yu Xie, Jielei Zhang, Ziyue Wang, Yuchen He, Yihan Meng, Weihang Wang, Peiyi Li, Longwen Gao
Affiliation: Bilibili Inc.
Description: In the Detection-Recognition-Linking task of MapText, we used ViTAE-v2 to extract global features, utilizing an encoder-decoder network architecture (DeepSolo). Data augmentation techniques such as cropping, scaling, saturation, and contrast adjustment were applied. Pre-training was conducted using available real datasets (TextOCR, TotalText, IC15, MLT2017). The model was fine-tuned on the MapText dataset, and post-processing methods were employed.
Zhang, Q., Xu, Y., Zhang, J., & Tao, D. (2023). Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 131(5), 1141-1162.
Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., & Tao, D. (2023). Deepsolo: Let transformer decoder with explicit points solo for text spotting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 19348-19357).
Date | Method | Quality | Char Accuracy | F-score | Tightness | Recall | Precision | |||
---|---|---|---|---|---|---|---|---|---|---|
2024-05-05 | MapTest | 50.97% | 85.28% | 87.12% | 68.59% | 90.87% | 83.67% | |||
2024-03-26 | DS-LP | 37.08% | 85.64% | 66.73% | 64.89% | 70.80% | 63.10% | |||
2024-05-06 | MapText Detection-Recognition-Linking Strong Pipeline | 17.08% | 55.73% | 44.57% | 68.76% | 44.99% | 44.17% | |||
2024-03-26 | Baseline TESTR Checkpoint | 6.05% | 41.46% | 21.22% | 68.73% | 12.95% | 58.71% |