Albert Mundu

profile.jpeg

I’m a PhD scholar at CVBL, IIIT-Allahabad researching multimodal AI — how vision and language models can be combined to understand images and videos. My recent work spans multi-level feature fusion for image captioning, distilling foundational VLMs into task-specific models through staged pretraining and finetuning, and optimizing captioning with recent state-of-the-art RL methods.

I also teach graduate courses in Machine Learning and Algorithms at Galgotias University, and previously interned at Spyne AI working on e-commerce shadow generation with conditional VAEs, GANs, and diffusion models.

Always open to research collaborations or a good conversation about CV, NLP, or generative models — feel free to reach out.

news

Dec 14, 2025 Attended and presented ThreatNet at IEEE UPCON 2025.
Oct 10, 2025 ThreatNet: Multimodal Firearm Threat Assessment Network accepted in IEEE UPCON 2025.
Aug 27, 2024 ETransCap: Efficient Transformer for Image Captioning, Applied Intelligence, Springer is in press.
Aug 24, 2023 Joined Galgotias University, Greater Noida as Assistant Professor.

selected publications

  1. UPCON
    threatnet.png
    ThreatNet: Multimodal Firearm Threat Assessment Network
    Albert Mundu, Satish Kumar Singh, and Shiv Ram Dubey
    In IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering, 2025
  2. AI, Springer
    etranscap.png
    ETransCap: Efficient Transformer for Image Captioning
    Albert Mundu, Satish Kumar Singh, and Shiv Ram Dubey
    Applied Intelligence, Aug 2024