Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

About me

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

gallery

publications

Educational Data Mining: Discovering Principal Factors for Better Academic Performance

Published in The 3rd International Conference on Big Data Engineering and Technology (BDET), 2021

The objective of this study is to use Educational Data Mining (EDM) techniques to discover principal factors that affect students’ academic performance. We crawled a dataset from the China Education Panel Survey (CEPS) with 10,279 samples, then by clustering student-related and parents-related variables into three categories: demographic and family background information (Demographic), self-perceived willingness for education (Willingness), perceived family interaction (Interaction), we implemented various EDM methodologies such as linear regression, regression tree, and random forest on the dataset. As the first attempt to conduct a comprehensive and quantitative investigation into the principal factors that influence Chinese junior high school students’ academic performance on a nationally representative survey, this study not only summarizes, explains, and compares different principal factors discovered by different EDM techniques, but also provides some insight for mitigating China’s educational inequality.

Download here

A Cost-Efficient Framework for Scene Text Detection in the Wild.

Published in PRICAI 2021, 2021

Training deep text detection models needs large amounts of annotations such as bounding boxes and quadrangles, which is laborious and expensive. Although synthetic data is easier to acquire, the model trained on this data has large performance gap with that trained on real data because of domain shift. To address this problem, we propose a novel two-stage framework for cost-efficient scene text detection. Specifically, in order to unleash the power of synthetic data, we design an unsupervised domain adaptation scheme consisting of Entropy-aware Global Transfer (EGT) and Text Region Transfer (TRT) to pre-train the model. Furthermore, we utilize minimal actively annotated and enhanced pseudo labeled real samples to fine-tune the model, aiming at saving the annotation cost. In this framework, both the diversity of the synthetic data and the reality of the unlabeled real data are fully exploited. Extensive experiments on various benchmarks show that the proposed framework significantly outperforms the baseline, and achieves desirable performance with even a few labeled real datasets.

Download here

Beyond OCR+VQA: Involving OCR into the Flow for Robust and Accurate TextVQA

Published in Proceedings of the 29th ACM Internatioonal Conference on Multimedia, 2021

In this work, we address the problem that the performance of multimodal reasoning and question answering highly depend on the accuracy of OCR. First, we take advantages of multimodal cues to complete the semantic information of texts. A visually enhanced text embedding is proposed to enable understanding of texts without accurately recognizing them. Second, we further leverage rich contextual information to modify the answer texts even if the OCR module does not correctly recognize them. In addition, the visual objects are endued with semantic representations to enable objects in the same semantic space as OCR tokens. Equipped with these techniques, the cumulative error propagation caused by poor OCR performance is effectively suppressed. Extensive experiments on TextVQA and ST-VQA datasets demonstrate that our approach achieves the state-of-the-art performance in terms of accuracy and robustness.

Download here

Beyond OCR+VQA: Towards End-to-End Reading and Reasoning for Robust and Accurate TextVQA.

Published in Pattern Recognition, Volume 138, 2023

Text-based visual question answering (TextVQA), which answers a visual question by considering both visual contents and scene texts, has attracted increasing attention recently. Most existing methods employ an optical character recognition (OCR) module as a pre-processor to read texts, then combine it with a visual question answering (VQA) framework. However, inaccurate OCR results may lead to cumulative error propagation, and the correlation between text reading and text-based reasoning is not fully exploited. In this work, we integrate OCR into the flow of TextVQA, targeting the mutual reinforcement of OCR and VQA tasks. Specifically, a visually enhanced text embedding module is proposed to predict semantic features from the visual information of texts, by which texts can be reasonably understood even without accurate recognition. Further, two elaborate schemes are developed to leverage contextual information in VQA to modify OCR results. The first scheme is a reading modification module that adaptively selects the answer results according to the contexts. Second, we propose an efficient end-to-end text reading and reasoning network, where the downstream VQA signal contributes to the optimization of text reading. Extensive experiments show that our method outperforms existing alternatives in terms of accuracy and robustness, whether ground truth OCR annotations are used or not.

Download here

Accurate and Robust Scene Text Recognition via Adversarial Training

Published in Under Review, 2023

This paper pioneers an investigation into the implications of AT on STR models and proposes a novel regularization-based AT method to develop an accurate and robust STR model, dynamically generating adversarial examples in the training procedure. Through extensive experiments across seven public real-world datasets, we find that AT not only bolsters the robustness of STR models but also improves overall recognition accuracy. This improvement is particularly significant in low-resolution images - a common challenge in STR. Furthermore, given the diverse nature of real-world text images, developing a robust STR model requires a large dataset. We propose viewing AT as a form of model-based data augmentation technique for STR, compatible with traditional augmentation methods. We hope these encouraging findings catalyze further research into the application of AT for scene text recognition.

Download here

Masked and Permuted Implicit Context Learning for Scene Text Recognition

Published in Under Review, 2023

Scene Text Recognition (STR) grapples with the complexities introduced by variations in text styles, shapes, and back- grounds. Though the integration of linguistic information enhances the performance of STR models, existing methods base on either permuted language modeling (PLM) or masked language modeling (MLM). Each, however, has its pitfalls: PLM’s autoregressive decoding lacks foresight into subsequent characters, while MLM, although providing a compre- hensive view of the text, sometimes overlooks inter-character dependencies. Addressing these challenges, we propose a masked and permuted implicit context learning network for STR, which unifies PLM and MLM within a single decoding architecture, inheriting the advantages of both approaches. We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process and replace the undetermined characters with mask tokens. Besides, perturbation training is employed to train a more robust model against potential length prediction errors. Our empirical evaluations demonstrate the per- formance of our model. It not only achieves superior performance on the common benchmarks, but also achieves a substantial improvement of 9.1% on the more challenging Union14M-Benchmark.

Download here