Albert Mundu
I’m a PhD scholar at CVBL, IIIT-Allahabad researching multimodal AI — how vision and language models can be combined to understand images and videos. My recent work spans multi-level feature fusion for image captioning, distilling foundational VLMs into task-specific models through staged pretraining and finetuning, and optimizing captioning with recent state-of-the-art RL methods.
I also teach graduate courses in Machine Learning and Algorithms at Galgotias University, and previously interned at Spyne AI working on e-commerce shadow generation with conditional VAEs, GANs, and diffusion models.
Always open to research collaborations or a good conversation about CV, NLP, or generative models — feel free to reach out.
news
| Dec 14, 2025 | Attended and presented ThreatNet at IEEE UPCON 2025. |
|---|---|
| Oct 10, 2025 | ThreatNet: Multimodal Firearm Threat Assessment Network accepted in IEEE UPCON 2025. |
| Aug 27, 2024 | ETransCap: Efficient Transformer for Image Captioning, Applied Intelligence, Springer is in press. |
| Aug 24, 2023 | Joined Galgotias University, Greater Noida as Assistant Professor. |