Method: MapTextSpotter - Task 1 - Detection - ICDAR 2024 Competition on Historical Map Text Detection, Recognition, and Linking

method: MapTextSpotter2024-04-29

Authors: Jialiang Li, Canhui Xu, Cao Shi, Yucai Qu

Affiliation: Qingdao University of Science and Technology

Description: Unlike natural scene text, digitized historical maps have densely distributed text regions, rotated and curved text and widely spaced characters. The text instances have multiple granularities, which hierarchically represent structured geolocation context. To address the new challenges in map text spotting, we have proposed a novel unified network called MapTextSpotter which jointly explores distinct characteristics in text detection and recognition. Our MapTextSpotter utilized a single decoder with shared queries based on Transformer. The queries are specifically designed spatially and semantically according to text distribution in historical maps. Both point queries and character queries are incorporated and interacted to train the model so as to predict text instance curve Bezier points and character classification in parallel. Notably, densely distributed text instances are often accompanied by smaller fonts. We extract multi-scale visual features with high-resolution detailed convolutional features, which help capture text instances with multiple granularities. Furthermore, with the aid of priori knowledge, Large Language Model is employed to enhance interaction with contextual information to replace the lexicon matching process, which significantly boosts recognition precision. For words highly spaced with complicated text-like noisy distractors, and word phrases divided across multiple lines, we infer that the LLM could alleviate widely space text problems and improve recognition performance by performing instance linkage with prior knowledge.