Appendix A · Companion paper to arXiv:2026.0519v3

Project Catalogue

Three systems in detail — what was built, why, and what I learned from being wrong.

Précis

Each project below is presented as a self-contained mini-paper: Motivation, Approach, Implementation, and Outcome & Lessons. The systems are ordered by maturity rather than recency. Code links are in the margin where licensable. Stack badges sit beneath each title.

A.1Kneen — an open-source cited-RAG document chat

ReactFastAPIPostgreSQL + pgvectorBM25HyDESSESep 2023 — ongoing

Motivation

Most document-chat tools fail in two related ways: they cite confidently when the answer is not in the document, and they cite vaguely ("see chapter 4") when it is. Kneen was built to test a simple claim — that if every generated claim is required to point to a specific span in a specific page, the failure modes become diagnosable rather than mysterious.

Approach

A single Postgres instance serves as both the relational store (users, documents, sessions) and the vector store (via the pgvector extension). Avoiding a separate vector DB removes an entire failure surface — the embedding and the row containing the chunk text live on the same primary key, so they cannot drift.

Retrieval is hybrid: dense via cosine over halfvec(768), lexical via Postgres's built-in ts_rank, both 20-deep, fused with RRF (Eq. 1). For under-specified queries we route through HyDE first [4] — the LLM hallucinates an answer, we embed that, and use the embedding for search. The hallucination is discarded; only its vector is kept.

BrowserReactPDF viewerEventSourceuser · upload · askHTTPSSSEFastAPI/ingest/ask · stream/insightsorchestratorPostgres + pgvectorchunks · embeddings · userssingle storeLLM APIstreams (token, span)Coveragemapunexplored
Figure 2. System diagram of Kneen. The browser opens an EventSource to /ask; the orchestrator interleaves retrieval and generation; tokens are streamed back paired with the source-page anchor they came from. The Coverage map is the most-asked-about feature in user interviews — it shows which chunks have been queried, and surfaces "Questions You Haven't Asked Yet" by inverting that signal.

Implementation

The orchestrator is small — under 600 lines of Python — because the heavy lifting is delegated to Postgres. The Coverage map deserves its own paragraph: every time a chunk is retrieved, a counter is incremented; the map renders this counter as opacity in a chunk-grid (Fig. 2, right). The "unasked questions" feature inverts the signal — the LLM is asked to summarise the bottom-decile chunks, then to propose questions whose answers live there. It turns out users want this far more than they want the chat itself.

Outcome & Lessons

  • The citation is the product. Stripping out citations during a usability test caused trust scores to drop by half overnight, even though the underlying answers were identical.
  • Single-store discipline is worth it. A separate vector DB would have shaved two days off the prototype and added six months of drift bugs.
  • The "questions you haven't asked" feature is a sleeper hit. Worth more user delight than any of the retrieval improvements that took ten times the work.

A.2Kidnex — renal-CT triage with patient-facing reasoning

ResNet-50PyTorchLLMFlask4-class CTResearch

Motivation

Kidney pathology — cysts, tumours, stones — presents very differently on CT, and the diagnostic gap between radiologist and patient holding the report is enormous. The classifier was the easy half; the hard half was producing patient-facing text that was useful without being authoritative.

Approach

A pre-trained ResNet-50 backbone [3] is fine-tuned end-to-end on the 4-class problem with the standard cross-entropy:

ℒ(θ) = −i c ∈ 𝒞 𝟙[yi = c] log pθ(c | xi) + λ·||θ||²(3)

After classification, the top-1 class and the calibrated probability ptop are passed to an LLM with a tightly-templated prompt — one of four conditional branches keyed on ptop's decile. The LLM never sees the scan; it sees a structured object, which keeps it factual.

NORMAL
CYST
TUMOR
STONE
Figure 3. Stylised exemplars of the four target classes. Real training images are not redistributable; placeholders shown. Confusion matrix: cyst ↔ tumour is the dominant failure mode (3.1% of test cases) and the reason confidence-gated output exists at all.

Outcome & Lessons

  • 96.4% top-1 acc. on the held-out test set; the residual 3.6% sits almost entirely on the cyst↔tumour boundary, exactly where a confidence threshold is most useful.
  • The LLM's role is narrower than I expected. It writes the explanation; it does not influence the label. Treating it as a renderer rather than a reasoner was the right call.
  • Calibration matters more than accuracy when the downstream consumer is a non-expert. A 96% model that lies about its own confidence is worse than an 88% model that does not.

A.3Pothole measurement — CSIR · CRRI internship

FlaskOpenCVMonocularFeb — May 2025

Motivation

Road authorities triage repair work by pothole area, but the current measurement loop is a person, a measuring tape, and a clipboard. The brief from CRRI was direct: given a single phone photo of a pothole with a reference object, return its area in cm².

Approach

Classical pipeline: CLAHE→ bilateral filter → adaptive threshold → contour extraction. The reference object's known size sets the pixel-to-mm scale; pothole contour area is then trivially convertible.

Listing 1 · The measurement loop (abridged)
def measure(img, ref_obj_mm):
    g  = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    g  = clahe.apply(g)
    g  = cv2.bilateralFilter(g, 9, 75, 75)
    th = cv2.adaptiveThreshold(g, 255, ADAPTIVE_GAUSSIAN, ...)

    cnts, _ = cv2.findContours(th, RETR_EXTERNAL, CHAIN_APPROX_SIMPLE)
    ref     = largest_rectangular(cnts)         # the marker
    pothole = largest_irregular(cnts, exclude=ref)

    px_per_mm = perimeter(ref) / (4 * ref_obj_mm)
    return cv2.contourArea(pothole) / (px_per_mm ** 2)

Outcome & Lessons

  • 8.7% mean absolute percentage erroron a hand-measured set of 120 potholes — the brief's bar was 15%.
  • Classical CV is underrated for narrow problems. No GPU, no training set, no model drift; just a deterministic pipeline that ran on a five-year-old laptop in the field.
  • Lighting is the entire problem. Half of the engineering time went into the CLAHE + bilateral combination; the contour step was an afternoon.

A.4Selected work in progress

Two systems currently under active development at eInfochips · AI-Studio. NDAs limit detail; high-level descriptions only.

SystemGoalNotable choice
Embedded-device test-case RAGGenerate firmware-level test cases from datasheets & errata.Scenario-shaped chunking; co-indexes errata and historical bug reports.
Website test-case RAG (improvements)Lift engineer-accept rate of generated cases on web targets.HyDE for under-specified user stories; reranker over RRF top-50.
Table 2. Active in-house projects. Both follow the §3 method pattern: hybrid retrieval, grounded generation, evaluation in the loop.
Suthar · Project Catalogue · 2026Page 2 / 3