FusionGuard: A Multimodal Adversarially Aware Classifier for Robust Image-Text Classification

Emmanuel Ludivin Tchuindjang Tchokote; Elie Fute Tagne

Call for Paper

December Edition

IJAIS solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 28 November 2025

Submit your paper

Know more

The week's pick

Exploring Search-Based Applications in the Software Development Life Cycle: A Literature Review

Abeer Alarainy Nora Madi Aljawharah Al-Muaythir Abir Benabid Najjar

Random Articles

Reseach Article

FusionGuard: A Multimodal Adversarially Aware Classifier for Robust Image-Text Classification

by Emmanuel Ludivin Tchuindjang Tchokote, Elie Fute Tagne

International Journal of Applied Information Systems

Foundation of Computer Science (FCS), NY, USA

Volume 13 - Number 1

Year of Publication: 2025

Authors: Emmanuel Ludivin Tchuindjang Tchokote, Elie Fute Tagne

10.5120/ijais2025452031

Emmanuel Ludivin Tchuindjang Tchokote, Elie Fute Tagne . FusionGuard: A Multimodal Adversarially Aware Classifier for Robust Image-Text Classification. International Journal of Applied Information Systems. 13, 1 ( Aug 2025), 45-52. DOI=10.5120/ijais2025452031

@article{ 10.5120/ijais2025452031,

author = { Emmanuel Ludivin Tchuindjang Tchokote, Elie Fute Tagne },

title = { FusionGuard: A Multimodal Adversarially Aware Classifier for Robust Image-Text Classification },

journal = { International Journal of Applied Information Systems },

issue_date = { Aug 2025 },

volume = { 13 },

number = { 1 },

month = { Aug },

year = { 2025 },

issn = { 2249-0868 },

pages = { 45-52 },

numpages = {9},

url = { https://www.ijais.org/archives/volume13/number1/fusionguard-a-multimodal-adversarially-aware-classifier-for-robust-image-text-classification/ },

doi = { 10.5120/ijais2025452031 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-08-29T00:23:15.280520+05:30

%A Emmanuel Ludivin Tchuindjang Tchokote

%A Elie Fute Tagne

%T FusionGuard: A Multimodal Adversarially Aware Classifier for Robust Image-Text Classification

%J International Journal of Applied Information Systems

%@ 2249-0868

%V 13

%N 1

%P 45-52

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

From social media content moderation to medical image diagnosis and multimedia retrieval, multimodal classification models that integrate textual and visual information are increasingly becoming the center of interest. Despite their improved performance over unimodal systems, these models are often vulnerable to adversarial attacks that exploit modality-specific weaknesses that lead to misclassification and unreliable outcomes. Moreover, real-world datasets commonly suffer from class imbalance which further degrades model generalization. To address these challenges, the researchers developed FusionGuard, a novel multimodal classification framework that combines the complementary strengths of TinyBERT for text encoding and EfficientNet for image feature extraction within a hybrid fusion architecture. Furthermore, they incorporated adversarial training to enhance robustness against adversarial attacks. Additionally, the Synthetic Minority Oversampling Technique (SMOTE) was used to mitigate class imbalance, alongside focal loss optimization to focus learning on difficult examples and reduce bias. The team obtained 80% accuracy score, 79.71% macro precision and 79.21% macro F1 score against FGSM attacks respectively. The results obtained establish their model as a reliable and fair solution for robust multimodal learning in security-critical applications.

References

I. Alsmadi, K. Ahmad, M. Nazzal, F. Alam, A.-F. Ala, A. Khreishah and A. Algosaibi, "Adversarial NLP for Social Network Applications: Attacks, Defenses and Research Directions," IEEE Transactions on Computational Social Systems, 2022.
M. Suzuki and Y. Matsuo, "A Survey of Multimodal Deep Generative Models," Journal of Advanced Robotics, Taylor&FrancisOnline, 2022.
U. R. Mohammad Zia, Z. Sufyaan, M. Areeb, M. Musharaf and K. Nagendra, "A context-aware attention and graph neural network-based multimodal framework for misogyny detection," Information Processing and Management, Elsevier, 2025.
J. Xiaoqi, Y. Yichun, S. Lifeng, J. Xin, C. Xiao, L. Linlin, W. Fang and L. Qun, "TinyBERT: Distilling BERT for Natural Language Understanding," EMNLP, 2020.
M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning, Long Beach, 2019.
E. Fersini, F. Gasparini, G. Rizzi, A. Saibene, B. Chulvi and P. Rosso, "SemEval-2022 Task 5: Multimedia automatic misogyny identification," in 16th International workshop on semantic evaluation. Association for Computational Linguistics, 2022.
D. Kiela, A. Mohan, H. Firooz, V. Goswani, A. Singh, P. Ringshia and D. Testuggine, "Hateful Memes Challenge and dataset for research on harmful multimodal content," in 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada., 2020.
E. Hossain, O. Sharif, M. M. Hoque, M. A. A. Dewan, N. Siddique and M. A. Hossain, "Identification of Multilingual Offense and Troll from Social Media Memes Using Weighted Ensemble of Multimodal Features," Journal of King Saud University-Computer and Information Sciences, Elsevier, pp. 6605-6623, 2022.
K. Singh, V. Vajrobol and N. Aggarwal, "Multimodal Hate Speech Event Detection 2023: Detection of Hate Speech and Targets using Xlm-Roberta-base," in Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, Varnia, Bulgaria, 2023.
Z. Gan, Y.-C. Chen, L. Li, C. Zhu, Y. Cheng and J. Liu, "Large-Scale Adversarial Training for Vision-and-Language Representation Learning," in 34th Conference on Neural Information Processing Systems , Vancouver, 2020.
P. Aggarwal, M. D. P. C., S. Punyajoy, M. Binny, Z. Torsten and M. Animesh, "HateProof: Are Hateful Meme Detection Systems really Robust?," in Proceedings of the ACM Web Conference 2023, 2023.
A. Rakshitha Rao and H. Sanda M, "Automatically Discovering How Misogyny is Framed on Social Media," in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025.
T. T. Emmanuel Ludivin and F. T. Elie, "Effective Multimodal Hate Speech Detection on Facebook Hate Memes Dataset using Incremental PCA, SMOTE, and Adversarial Learning," Machine Learning with Applications , 2025.
Y. Chen, W. Zhang, H. Zhang, D. Qu and X.-K. Yang, "Task-based Meta Focal Loss for Multilingual Low-resource Speech Recognition," ACM Trans. Asian Low-Resour. Lang. Inf. Process, 2023.
G. Rizzi, F. Gasparini, A. Saibene, P. Rosso and E. Fersini, "Recognizing misogynous memes: Biased models and tricky archetypes," Journal of Information Processing and Management, 2023.
J. Zhang and Y. Wang, "SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification," in Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), Seattle, United States, 2022.

Index Terms

Computer Science

Information Sciences

Keywords

Multimodal Classification Balanced-FGSM SMOTE Focal Loss Hybrid Fusion Hate Speech Detection