- Task 1 - Detection - Method: MapText Detection Strong Pipeline
- Method info
- Samples list
- Per sample details
method: MapText Detection Strong Pipeline 2024-05-06
Authors: Yu Xie, Ziyue Wang, Jielei Zhang
Affiliation: Bilibili Inc.
Description: In the detection task of MapText, we employed ViTAE-v2 to extract global features, utilizing an encoder-decoder network architecture (DeepSolo). Data augmentation techniques such as cropping, scaling, saturation, and contrast adjustment were applied. Pre-training was conducted using available real datasets (TextOCR, TotalText, IC15, MLT2017). Post-processing methods were also adopted.
Zhang, Q., Xu, Y., Zhang, J., & Tao, D. (2023). Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 131(5), 1141-1162.
Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., & Tao, D. (2023). Deepsolo: Let transformer decoder with explicit points solo for text spotting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 19348-19357).