Authors: Yu Xie, Ziyue Wang, Jielei Zhang

Affiliation: Bilibili Inc.

Description: In the Detection-Recognition-Linking task of MapText, we used ViTAE-v2 to extract global features, utilizing an encoder-decoder network architecture (DeepSolo). Data augmentation techniques such as cropping, scaling, saturation, and contrast adjustment were applied. Pre-training was conducted using available real datasets (TextOCR, TotalText, IC15, MLT2017). The model was fine-tuned on the MapText dataset, and post-processing methods were employed.

Zhang, Q., Xu, Y., Zhang, J., & Tao, D. (2023). Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 131(5), 1141-1162.

Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., & Tao, D. (2023). Deepsolo: Let transformer decoder with explicit points solo for text spotting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 19348-19357).

method: MapTest2024-05-05

Authors: Hongen Liu

Affiliation: Tianjin University

method: DS-LP2024-03-26

Authors: hsy

Affiliation: BUPT

Description: Four tasks unified submission
DeepSolo, Multi-Polygon NMS (word detection and recognition) -> LayoutPointer (word linking)

Ranking Table

Description Paper Source Code
DateMethodQualityChar AccuracyF-scoreTightnessRecallPrecision
2024-05-06MapText Detection-Recognition-Linking Strong Pipeline33.11%79.69%55.08%75.43%75.83%43.24%
2024-05-05MapTest32.01%76.32%56.35%74.42%77.84%44.16%
2024-03-26DS-LP28.58%81.56%50.25%69.73%69.37%39.39%
2024-03-26Baseline TESTR Checkpoint26.25%74.02%47.29%74.99%63.22%37.77%

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic