International Journal of Applied Information Systems |
Foundation of Computer Science (FCS), NY, USA |
Volume 13 - Number 1 |
Year of Publication: 2025 |
Authors: Emmanuel Ludivin Tchuindjang Tchokote, Elie Fute Tagne |
![]() |
Emmanuel Ludivin Tchuindjang Tchokote, Elie Fute Tagne . FusionGuard: A Multimodal Adversarially Aware Classifier for Robust Image-Text Classification. International Journal of Applied Information Systems. 13, 1 ( Aug 2025), 45-52. DOI=10.5120/ijais2025452031
From social media content moderation to medical image diagnosis and multimedia retrieval, multimodal classification models that integrate textual and visual information are increasingly becoming the center of interest. Despite their improved performance over unimodal systems, these models are often vulnerable to adversarial attacks that exploit modality-specific weaknesses that lead to misclassification and unreliable outcomes. Moreover, real-world datasets commonly suffer from class imbalance which further degrades model generalization. To address these challenges, the researchers developed FusionGuard, a novel multimodal classification framework that combines the complementary strengths of TinyBERT for text encoding and EfficientNet for image feature extraction within a hybrid fusion architecture. Furthermore, they incorporated adversarial training to enhance robustness against adversarial attacks. Additionally, the Synthetic Minority Oversampling Technique (SMOTE) was used to mitigate class imbalance, alongside focal loss optimization to focus learning on difficult examples and reduce bias. The team obtained 80% accuracy score, 79.71% macro precision and 79.21% macro F1 score against FGSM attacks respectively. The results obtained establish their model as a reliable and fair solution for robust multimodal learning in security-critical applications.