A Cost-Efficient Framework for Scene Text Detection in the Wild.

Published in PRICAI 2021, 2021

Training deep text detection models needs large amounts of annotations such as bounding boxes and quadrangles, which is laborious and expensive. Although synthetic data is easier to acquire, the model trained on this data has large performance gap with that trained on real data because of domain shift. To address this problem, we propose a novel two-stage framework for cost-efficient scene text detection. Specifically, in order to unleash the power of synthetic data, we design an unsupervised domain adaptation scheme consisting of Entropy-aware Global Transfer (EGT) and Text Region Transfer (TRT) to pre-train the model. Furthermore, we utilize minimal actively annotated and enhanced pseudo labeled real samples to fine-tune the model, aiming at saving the annotation cost. In this framework, both the diversity of the synthetic data and the reality of the unlabeled real data are fully exploited. Extensive experiments on various benchmarks show that the proposed framework significantly outperforms the baseline, and achieves desirable performance with even a few labeled real datasets.

Download paper here