Data Science & AI · IIT Madras

Siddharth Umathe

AI & Deep Learning Enthusiast | Research-Oriented

I work at the intersection of AI systems, deep learning, and applied research. My work spans GenAI systems, retrieval-augmented pipelines, speech recognition, NLP fine-tuning, computer vision, generative modeling, and data-driven decision systems.

Who Am I

A Data Science and AI graduate from IIT Madras with a Minor in Generative AI and a Minor in Multimodal AI. My work is implementation-driven, spanning GenAI systems, RAG pipelines, speech AI, NLP fine-tuning, computer vision, reinforcement learning foundations, and applied data science.

I am especially interested in building reliable AI systems that can operate in real-world domains: healthcare, enterprise workflows, defense-oriented intelligence systems, and multimodal decision support. I approach each problem with an experiment-oriented mindset, drawn to emerging AI applications that demand both technical depth and engineering rigor.

I learn quickly by building: designing systems, running experiments, studying failure modes, and iterating toward reliability. My curiosity is particularly directed toward the convergence of language, vision, and speech in AI systems.

Implementation-Driven Experiment-Oriented GenAI Systems Speech AI Multimodal AI Deep Learning

Academic Background

Minor in Generative AI
IIT Madras
Specialisation alongside B.S.
Large Language Models Deep Learning Practice Mathematical Foundations of Generative AI
Minor in Multimodal AI
IIT Madras
Specialisation alongside B.S.
Large Language Models Speech Technology Deep Learning for Computer Vision
Advanced Coursework
IIT Madras · Selected Highlights
Reinforcement Learning MLOps Data Visualization Design
Earlier Academic History
B.Sc. Computer Science · RTMNU · First Class
Class XII · Maharashtra State Board · 84%
Class X · Maharashtra State Board · 91%

Capabilities

AI / Deep Learning
PyTorchDeep Learning Neural NetworksRepresentation Learning Sequence ModelingRL Foundations Computer VisionGenerative Modeling
Generative AI / LLM Systems
Large Language ModelsRAG Prompt EngineeringAI Agents Multi-module WorkflowsStructured Outputs LLM EvaluationGemini API HuggingFace TransformersPEFT / LoRA
Speech / NLP / Multimodal
wav2vec2.0HuBERT WhisperASR Pipelines CTC-based DecodingNLP Fine-tuning Vision-Language ModelsMultimodal AI
Engineering / Tools
PythonFlask SQLAlchemyREST APIs ChromaDBLangChain GitLinux DockerNumPy PandasMatplotlib

Projects

A selection of implementation-driven AI projects spanning generative systems, speech, NLP, and vision.

🏆 Best Software Engineering Project Award
GenAI System

AI-Powered Software Engineering System

AI Engineer · Team of 7

Developed and integrated a multi-module GenAI-powered academic and software engineering system designed to improve student learning, productivity, debugging, and academic workflow automation.

PythonFlask SQLAlchemyGemini API LangChainChromaDB RAGPrompt Engineering

Problem

Students often spend significant time navigating lecture content, generating notes, revising weekly material, debugging code, and preparing for assessments. The goal was to build a unified AI-powered system that could assist across these workflows in a structured and usable way.

Approach

Worked as AI Engineer in a 7-member team, contributing to the design and integration of multiple GenAI modules. Built workflows around LLM APIs, prompt engineering, retrieval-based context grounding, backend orchestration, and modular AI pipelines.

Modules Built

  • AI chatbot with contextual assistance
  • Video / lecture summarisation
  • Week-level summarisation
  • Topic notes generation
  • Practice question generation
  • Mock quiz generation
  • Coding assistant for error explanation
  • Topic recommendation support

Technical Depth

The system involved transcript processing, content chunking, LLM prompting, vector database retrieval using ChromaDB, structured outputs, backend API integration, and response reliability handling. The focus was on building usable AI workflows that fit into an academic software platform, not just generating responses.

Key Learning

Learned how to move from isolated GenAI prompts to integrated AI systems involving data flow, context management, reliability, modularity, and user-facing workflows.

Speech AI

Automatic Speech Recognition Systems

ASR Pipelines · Self-supervised Models

Built and optimised end-to-end ASR pipelines using self-supervised speech representation models including wav2vec2.0, HuBERT, and Whisper, with focus on audio preprocessing, CTC decoding, and GPU-efficient training.

PyTorchwav2vec2.0 HuBERTWhisper HuggingFaceCTC Decoding

Problem

Speech recognition systems must handle variable-length audio, noisy samples, inconsistent sampling rates, and alignment between audio signals and text tokens. The goal was to build stable ASR pipelines capable of preprocessing speech data, training/using models, decoding outputs, and analysing transcription quality.

Approach

Worked on audio preprocessing, waveform normalisation, tokenisation, batching, padding, alignment strategies, and GPU-efficient training workflows. Implemented CTC-based decoding and encoder-based ASR architectures while experimenting with representation learning strategies for transcription robustness.

Technical Depth

Designed preprocessing pipelines for variable-length audio, sampling-rate normalisation, memory-conscious batching, tokenisation, dynamic padding, model loading, and decoding. Used Hugging Face Transformers and PyTorch-based pipelines for model experimentation.

Key Learning

Gained practical exposure to speech representation learning, sequence modeling, CTC alignment, encoder-based architectures, inference stability, and ASR system design.

NLP Fine-Tuning

Google Gemma Fine-Tuning with LoRA / PEFT

LLM Adaptation · Parameter-Efficient Methods

Fine-tuned Google Gemma large language models for domain-specific NLP tasks using parameter-efficient fine-tuning techniques including LoRA and PEFT workflows within the Hugging Face ecosystem.

PyTorchHuggingFace PEFTLoRA GemmaInstruction Tuning

Problem

Large language models often need adaptation for specific tasks, domains, or response styles. Full fine-tuning can be computationally expensive, so parameter-efficient techniques are useful for improving model behavior while reducing resource requirements.

Approach

Designed training pipelines involving dataset preprocessing, prompt formatting, tokenisation, batching, LoRA configuration, training configuration tuning, and inference testing. Focused on instruction-following behavior, domain adaptation, response consistency, and efficient deployment.

Technical Depth

Worked with Hugging Face Transformers, PEFT, PyTorch, tokenizer pipelines, prompt engineering, instruction-tuning workflows, and GPU-accelerated training environments.

Key Learning

Developed practical understanding of LLM fine-tuning, parameter-efficient adaptation, transformer behavior, prompt formatting, inference optimisation, and model robustness evaluation.

Computer Vision

4× Image Super Resolution

Computer Vision Competition · CNN Architectures

Developed deep learning based image super-resolution systems focused on reconstructing high-quality images from low-resolution visual inputs using convolutional neural network architectures and residual learning.

PyTorchCNN Residual LearningPSNR/SSIM Perceptual Loss

Problem

Image super-resolution requires recovering fine-grained spatial detail and improving perceptual quality from low-resolution inputs. The challenge is to improve sharpness and texture consistency without introducing artifacts.

Approach

Built 4× image super-resolution pipelines using CNN-based architectures, residual learning concepts, feature extraction, perceptual loss optimisation, and adversarial training concepts.

Technical Depth

Worked on image resizing, normalisation, patch-based training, augmentation, GPU-efficient workflows, training stability, and inference optimisation. Evaluated results using PSNR, SSIM, and perceptual quality analysis.

Key Learning

Gained exposure to image restoration, CNN architectures, perceptual learning, adversarial optimisation concepts, and computer vision experimentation.

Generative AI · Vision

GAN-Style Architecture for Generative Image Modeling

Generative AI Competition · Adversarial Training

Developed and trained GAN-style architectures for generative image modeling tasks, focused on producing realistic image outputs from encoded datasets, with careful attention to training stability and mode diversity.

PyTorchGANs Adversarial TrainingLatent Space FID Evaluation

Problem

Generative image modeling requires stable adversarial training and maintaining diversity while improving realism. GANs are difficult to train due to mode collapse, generator-discriminator imbalance, and unstable loss behavior.

Approach

Built end-to-end generative pipelines involving dataset preprocessing, image decoding workflows, training data organisation, augmentation, generator-discriminator optimisation, and GPU-accelerated training.

Technical Depth

Worked on encoded image shard datasets, efficient loading, normalisation, batching, latent-space behavior, generator capacity tuning, discriminator balancing, regularisation strategies, and FID-style evaluation concepts.

Key Learning

Gained practical exposure to adversarial learning, latent representation learning, training stabilisation, GPU-intensive experimentation, and generative modeling workflows.

Data Science

Business Data Management · Native Chefs

Applied Data Analysis · B2C Business Insights

Analysed real-world business data from Native Chefs, a B2C home-cooked food delivery business, to identify operational and revenue-related insights around unpaid orders, dish performance, and customer behavior.

PythonPandas MatplotlibEDA Business Analytics

Problem

The business needed better visibility into unpaid orders, dish-level performance, customer ordering behavior, and revenue leakage.

Approach

Performed data cleaning, preprocessing, descriptive statistics, exploratory analysis, pivot-based summaries, dashboarding, and business interpretation.

Focus Areas

  • Revenue leakage through unpaid orders
  • Dish-level performance analysis
  • Customer ordering behavior patterns
  • Revenue trend identification
  • Operational improvement recommendations

Key Learning

Gained practical experience in turning raw business data into decision-support insights, bridging the gap between data exploration and actionable business recommendation.

Academic Training

Research Focus

I am especially drawn to AI systems that combine research depth with real-world application potential: healthcare diagnosis support, enterprise intelligence workflows, multimodal reasoning systems, speech interfaces, and high-stakes decision-support systems.

AI Systems
End-to-end intelligent pipelines and modular AI architecture
Healthcare AI
Diagnosis support, clinical decision systems, medical NLP
Multimodal AI
Vision-language integration and cross-modal reasoning
Speech AI
ASR, speech representation learning, spoken interfaces
Generative AI & RAG
LLM systems, retrieval-augmented generation, structured pipelines
Reinforcement Learning
Sequential decision-making and policy optimization
Computer Vision
Image understanding, restoration, and generative vision
Defense-Oriented AI
AI for high-stakes decision systems and strategic intelligence

Highlights

Award
Best Software Engineering Project
Recognised for the AI-Powered Software Engineering System developed as part of the IIT Madras curriculum, a multi-module GenAI platform serving academic workflows.
Project Role
AI Engineer
Led AI module design and integration across a 7-member engineering team, responsible for GenAI pipelines, RAG architecture, prompt engineering, and backend orchestration.
Academic Credential
Minor in Generative AI · IIT Madras
Completed a focused specialisation in Generative AI covering LLMs, deep learning practice, and the mathematical foundations of generative systems.
Academic Credential
Minor in Multimodal AI · IIT Madras
Completed coursework spanning Large Language Models, Speech Technology, and Deep Learning for Computer Vision — covering the three pillars of multimodal AI.
Additional Learning
Digital Marketing & AI in Business
Explored performance-driven digital marketing and AI applications in business growth, building awareness of how AI intersects with commercial decision-making and user acquisition.

Get In Touch

Open to research collaborations, AI engineering discussions, and opportunities in applied AI, deep learning systems, and intelligent product development. Feel free to reach out.

</>
Location
Nagpur, Maharashtra, India