Umang Kaushik

AI Engineer · ML Researcher

Experience

PresentMarch 2025

AI Engineer

Prospire Technology Services, Jodhpur

Pioneered SimplifAI, an agentic AI system for automated incident resolution using Google ADK, reducing MTTR and improving SLA adherence. Built full-stack with GraphQL, FastAPI, React, and ShadCN. Designed high-throughput alert pipelines with RabbitMQ, Redis, and Elasticsearch. Deployed on AWS EC2 with Docker, Nginx, and GitHub Actions CI/CD. Presented SimplifAI at India Mobile Congress 2025.

Google ADKFastAPIGraphQLRabbitMQRedisDockerAWS

November 2025August 2025

DevOps Engineer (Part-Time)

Vrya, Trilogy Group, Austin TX

Engineered Nagios monitoring scripts reducing MTTD by 20%. Automated SSL certificate lifecycles with Certbot and Route53, eliminating certificate-related downtime. Tuned CloudWatch alarms to maintain 99.9% availability, achieving 15% P99 latency reduction and 10% cloud cost decrease.

AWSNagiosCloudWatchNginxShell

Projects

2025

mini-vLLM

Implemented PagedAttention from the vLLM paper using custom Triton GPU kernels for memory-efficient LLM inference. Built paged KV cache with block allocation, reference counting, and prefix caching. Developed continuous batching scheduler with chunked prefill.

CUDATritonvLLMPython

github.com/ubermenchh/mini-vllm→

2024

Flash Attention 2 in CUDA

Implemented Flash Attention 2 in CUDA using shared memory tiling and online softmax, reducing memory complexity from O(n^2) to O(n).

CUDA C++Attention

github.com/ubermenchh/flash-attention→

2025

Qwen2.5-3B-Open-R1-Math

Fine-tuned Qwen2.5 3B on math reasoning using GRPO loss. Implemented reward hacking and RL post-training with HuggingFace TRL. Used Unsloth and vLLM for quantized multi-GPU training. Explored the lower bound of reasoning capability in small models.

PyTorchTRLUnslothvLLMGRPO

huggingface.co/ubermenchh/Qwen2.5-3B-open-r1-math→

2026

anki-cli

Hybrid Anki CLI for humans and AI agents. Supports both AnkiConnect and direct SQLite backends. Features a full search query language compiled to SQL, interactive TUI review mode, and ships a SKILL.md for autonomous agent integration. Published on PyPI.

PythonSQLiteCLIAI Agents

github.com/ubermenchh/anki-cli→

2026

tangent

Mobile agent that understands natural language and executes actions on Android. Uses an LLM as the reasoning engine with phone capabilities exposed as callable tools. Built with Expo SDK 54, React Native, Tamagui, and Zustand.

TypeScriptReact NativeExpoLLM

github.com/ubermenchh/tangent→

2026

engram

PyTorch implementation of DeepSeek's Engram paper, augmenting transformer attention with n-gram memory retrieval via hash-based lookup and learned gating. Trained on WikiText-103 with Modal cloud deployment.

PythonPyTorchModal

github.com/ubermenchh/engram→huggingface.co/ubermenchh/nanogpt-engram-wikitext→

2026

mhc

PyTorch implementation of DeepSeek's Manifold-constrained Hyper-Connections, integrating multi-stream transformer routing into NanoGPT. Includes a Modal cloud training pipeline on A100 GPUs.

PythonPyTorchModal

github.com/ubermenchh/mhc→huggingface.co/ubermenchh/nano-mhc→

2024

dream.cu

GPU-accelerated ray tracer ported from a custom CPU framework to CUDA. Implements materials, spheres, camera systems, and hittable lists entirely on the GPU with clean header-only architecture.

CUDAC++Ray Tracing

github.com/ubermenchh/dream.cu→

2024

pycc

A tiny C compiler written in Python. Handles lexing, parsing, and x86 assembly code generation. Supports functions and basic C constructs. Based on Nora Sandler's incremental compiler approach.

PythonCompilersx86 Assembly

github.com/ubermenchh/pycc→

2024

chotagrad

Tiny autograd engine built from scratch. Implements reverse-mode automatic differentiation, a neural network module system, and trains on scikit-learn's make_moons dataset as a demonstration.

PythonAutogradNeural Networks

github.com/ubermenchh/chotagrad→

Skills

ML & Agents

PyTorch (DDP, FSDP), JAX, HuggingFace, Unsloth, vLLM, LangChain/LangGraph, Google ADK

Programming

Python, CUDA, OpenAI Triton, TypeScript, Shell, C/C++

Infrastructure

FastAPI, GraphQL, RabbitMQ, Redis, Kafka, Docker, Nginx, GitHub Actions

Cloud & Data

AWS (EC2, ELB, Route53, CloudWatch, IAM), Postgres, Elasticsearch, MongoDB, Faiss, Pinecone

Compiler Durden · ubermenchh · last updated Feb 2026