Appendix A · Projects

What I've Built

Four systems — each one taught me something I couldn't have learned from a tutorial.

Overview

Every project here was shipped — to a user, a client, or a review panel. They share a common pattern: retrieval before generation, evidence before claims, and a preference for boring infrastructure that doesn't break over clever abstractions that do. Stack badges sit below each title. Source links are in the margin.

A.1Kneen — chat with your documents, but with receipts

ReactFastAPIPostgreSQL + pgvectorBM25HyDESSEFeb 2026

What it does

Kneen is a RAG application that lets you upload documents and have context-aware, cited conversations across multiple LLM backends. Every answer comes with inline citations pointing to the exact page and text snippet in the source material — not vague references to "chapter 4", but clickable anchors that scroll the original PDF to the highlighted passage.

The thing that makes it different — Document Insights

Every time a chunk is retrieved and used to answer a question, Kneen records it. Over time this builds a per-document intelligence map: which sections have been queried, how often, and which sections have never been touched.

Users see a coverage percentage, a hotspot list of the most referenced sections, and a dead-zone count of chunks never queried. The killer feature: "Questions You Haven't Asked Yet" — Kneen analyzes the dead zones and uses the LLM to proactively suggest important questions the user never thought to ask. It turns a passive Q&A tool into an active discovery system.

How I built it

A single PostgreSQL instance with pgvectorserves as both the relational store (users, documents, sessions) and the vector store. No separate vector DB — the embedding and the chunk text live on the same primary key, so they can't drift.

Retrieval is hybrid: dense search via cosine similarity over halfvec(768), lexical search via Postgres's built-in ts_rank, both 20-deep, fused with Reciprocal-Rank Fusion. For vague queries, the system routes through HyDE first — the LLM hallucinates an answer, I embed that, and search with the hallucination's vector. The hallucination is thrown away; only its vector survives.

Streaming uses Server-Sent Events (SSE): the API emits (token, source-anchor) tuples, not bare tokens. The frontend renders each token as a clickable footnote. A missing anchor is a visible tell for hallucination — which makes the system honest by design.

Figure 2.Kneen's architecture. The browser streams questions via SSE; the orchestrator fuses dense + lexical retrieval, then streams back tokens paired with their source anchors. The Coverage map (right) tracks which chunks have been queried and surfaces "Questions You Haven't Asked Yet"by analyzing what's been ignored.

What I shipped & what I learned

0.91 citation precision — fraction of generated claims where the cited span actually supports the claim, hand-graded on 200 query-document pairs.
The citation is the product. Stripping out citations during a usability test cut trust scores in half — even though the answers were identical.
"Questions you haven't asked" is a sleeper hit. Users cared more about this than any retrieval improvement that took ten times the effort.
Single-store discipline pays off. A separate vector DB would have saved two days on the prototype and added six months of drift bugs.

A.2Kidnex — upload a CT scan, get a diagnosis and a next step

ResNet-50PyTorchLLMFlask4-class CT

What it does

Kidnex is a kidney health platform. You upload a CT scan image and two things happen: a ResNet-50 classifier categorizes it into one of four classes — Normal, Cyst, Tumor, Stone — with a confidence score. Then that classification, along with the patient's medical history, gets passed to an LLM that generates actionable next-step suggestions. The goal: bridge the gap between a radiologist's report and a patient holding it.

Approach

A pre-trained ResNet-50 backbone is fine-tuned end-to-end on the 4-class problem using the standard cross-entropy loss with L2 regularisation:

ℒ(θ) = − 1N ∑i ∑c ∈ C 𝟙[y_i = c] log p_θ(c | x_i) + λ‖θ‖²(3)

The indicator 𝟙[y_i = c] selects the true class; the L2 term keeps the backbone from over-committing to the small training set. After training, the model outputs a top-1 label and a calibrated probability p_top. That probability controls what happens next — the LLM prompt has four conditional branches keyed on p_top's decile. Below a threshold, the patient-facing copy explicitly says "this scan should be reviewed by a clinician before any action" — not as boilerplate, but as a hard control-flow branch.

The LLM never sees the scan. It sees a structured object: classification label, confidence score, and medical history. This keeps it factual — it writes the explanation, it doesn't influence the diagnosis.

NORMAL

CYST

TUMOR

STONE

Figure 3. Stylised exemplars of the four target classes. Real training images are not redistributable. The dominant failure mode is cyst ↔ tumour confusion (3.1% of test cases) — exactly where the confidence gate matters most.

What I shipped & what I learned

96.4% top-1 accuracy on the held-out test set. The residual 3.6% sits almost entirely on the cyst↔tumour boundary.
The LLM's role is narrower than expected. It renders explanations; it doesn't reason about diagnoses. Treating it as a renderer, not a reasoner, was the right call.
Calibration matters more than accuracy when the downstream consumer is a non-expert. A 96% model that lies about its confidence is worse than an 88% model that doesn't.

A.3Pothole measurement — CSIR · CRRI internship

FlaskOpenCVMonocularFeb — May 2025

What it does

Road authorities triage repairs by pothole area, but the current method is a person with a measuring tape. The brief from CRRI: given a single phone photo of a pothole with a reference object, return its area in cm². Target accuracy: under 15% MAPE.

How I built it

Classical CV pipeline: CLAHE→ bilateral filter → adaptive threshold → contour extraction. The reference object's known size sets the pixel-to-mm scale; pothole contour area converts from there. No GPU, no training set, no model drift — a deterministic pipeline that ran on a five-year-old laptop in the field.

Listing 1 · The measurement loop (abridged)

def measure(img, ref_obj_mm):
    g  = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    g  = clahe.apply(g)
    g  = cv2.bilateralFilter(g, 9, 75, 75)
    th = cv2.adaptiveThreshold(g, 255, ADAPTIVE_GAUSSIAN, ...)

    cnts, _ = cv2.findContours(th, RETR_EXTERNAL, CHAIN_APPROX_SIMPLE)
    ref     = largest_rectangular(cnts)         # the marker
    pothole = largest_irregular(cnts, exclude=ref)

    px_per_mm = perimeter(ref) / (4 * ref_obj_mm)
    return cv2.contourArea(pothole) / (px_per_mm ** 2)

What I shipped & what I learned

8.7% MAPEon a hand-measured set of 120 potholes — the brief required <15%.
Classical CV is underrated for narrow problems. No GPU, no training data, no model versioning — just a deterministic pipeline that actually ships.
Lighting is the entire problem. Half the engineering time went into the CLAHE + bilateral combination; the contour step was an afternoon.

A.4What I'm building now at AI-Studio

Two systems under active development at eInfochips · AI-Studio. NDAs limit detail — high-level descriptions only.

**Table 2.** Active in-house projects. Both use hybrid retrieval, grounded generation, and evaluation in the loop.
System	What it does	How it works
Embedded test-case RAG	Generates firmware-level test cases from datasheets & errata.	2-agent architecture: Layer 1 analyzes the device architecture; Layer 2 generates test cases from that context + retrieved chunks. LLM-as-judge finds gaps; grid search over parameters via RAGAS evaluation.
Website test-case RAG (improvements)	Lift engineer-accept rate of generated test cases for web targets.	HyDE for under-specified user stories; reranker over RRF top-50. Result: +34 pp on internal benchmark.