Umang Kaushik

AI Engineer · ML Researcher

Experience
PresentMarch 2025
AI Engineer
Prospire Technology Services, Jodhpur

Pioneered SimplifAI, an agentic AI system for automated incident resolution using Google ADK, reducing MTTR and improving SLA adherence. Built full-stack with GraphQL, FastAPI, React, and ShadCN. Designed high-throughput alert pipelines with RabbitMQ, Redis, and Elasticsearch. Deployed on AWS EC2 with Docker, Nginx, and GitHub Actions CI/CD. Presented SimplifAI at India Mobile Congress 2025.

Google ADKFastAPIGraphQLRabbitMQRedisDockerAWS
November 2025August 2025
DevOps Engineer (Part-Time)
Vrya, Trilogy Group, Austin TX

Engineered Nagios monitoring scripts reducing MTTD by 20%. Automated SSL certificate lifecycles with Certbot and Route53, eliminating certificate-related downtime. Tuned CloudWatch alarms to maintain 99.9% availability, achieving 15% P99 latency reduction and 10% cloud cost decrease.

AWSNagiosCloudWatchNginxShell
Projects
2025
mini-vLLM

Implemented PagedAttention from the vLLM paper using custom Triton GPU kernels for memory-efficient LLM inference. Built paged KV cache with block allocation, reference counting, and prefix caching. Developed continuous batching scheduler with chunked prefill.

CUDATritonvLLMPython
2024
Flash Attention 2 in CUDA

Implemented Flash Attention 2 in CUDA using shared memory tiling and online softmax, reducing memory complexity from O(n^2) to O(n).

CUDA C++Attention
2025
Qwen2.5-3B-Open-R1-Math

Fine-tuned Qwen2.5 3B on math reasoning using GRPO loss. Implemented reward hacking and RL post-training with HuggingFace TRL. Used Unsloth and vLLM for quantized multi-GPU training. Explored the lower bound of reasoning capability in small models.

PyTorchTRLUnslothvLLMGRPO
2026
anki-cli

Hybrid Anki CLI for humans and AI agents. Supports both AnkiConnect and direct SQLite backends. Features a full search query language compiled to SQL, interactive TUI review mode, and ships a SKILL.md for autonomous agent integration. Published on PyPI.

PythonSQLiteCLIAI Agents
2026
tangent

Mobile agent that understands natural language and executes actions on Android. Uses an LLM as the reasoning engine with phone capabilities exposed as callable tools. Built with Expo SDK 54, React Native, Tamagui, and Zustand.

TypeScriptReact NativeExpoLLM
2026
engram

PyTorch implementation of DeepSeek's Engram paper, augmenting transformer attention with n-gram memory retrieval via hash-based lookup and learned gating. Trained on WikiText-103 with Modal cloud deployment.

PythonPyTorchModal
2026
mhc

PyTorch implementation of DeepSeek's Manifold-constrained Hyper-Connections, integrating multi-stream transformer routing into NanoGPT. Includes a Modal cloud training pipeline on A100 GPUs.

PythonPyTorchModal
2024
dream.cu

GPU-accelerated ray tracer ported from a custom CPU framework to CUDA. Implements materials, spheres, camera systems, and hittable lists entirely on the GPU with clean header-only architecture.

CUDAC++Ray Tracing
2024
pycc

A tiny C compiler written in Python. Handles lexing, parsing, and x86 assembly code generation. Supports functions and basic C constructs. Based on Nora Sandler's incremental compiler approach.

PythonCompilersx86 Assembly
2024
chotagrad

Tiny autograd engine built from scratch. Implements reverse-mode automatic differentiation, a neural network module system, and trains on scikit-learn's make_moons dataset as a demonstration.

PythonAutogradNeural Networks
Skills
ML & Agents
PyTorch (DDP, FSDP), JAX, HuggingFace, Unsloth, vLLM, LangChain/LangGraph, Google ADK
Programming
Python, CUDA, OpenAI Triton, TypeScript, Shell, C/C++
Infrastructure
FastAPI, GraphQL, RabbitMQ, Redis, Kafka, Docker, Nginx, GitHub Actions
Cloud & Data
AWS (EC2, ELB, Route53, CloudWatch, IAM), Postgres, Elasticsearch, MongoDB, Faiss, Pinecone

Compiler Durden · ubermenchh · last updated Feb 2026