RAG Pipelines, Computer Vision, and the Systems in Between

A working portfolio — shipped code, real metrics, and the thinking behind them.

Samarth Suthar·AI/ML Engineer

eInfochips (An ARROW Company), AI-Studio Division
Previously · CSIR — Central Road Research Institute

Jan 2026 — presentsutharsamarth16@gmail.comgithub.com/Samarth0016

Abstract

This portfolio documents the work of Samarth Suthar, an AI/ML engineer currently at eInfochips AI-Studio building retrieval-augmented systems for embedded device test generation. The work spans two industrial roles and two shipped projects, each sharing a common thread: LLMs are most useful when they are forced to retrieve before they speak.

Documented here: a hybrid RAG pipeline that lifted engineer accept-rate by +34 pp in production; a renal CT classifier at 96.4% top-1 accuracy; a cited document-chat with 0.91 citation precision; and an OpenCV-based pothole measurement tool at 8.7% MAPE (brief required <15%). Projects details the full catalogue. About has the contact card.

Keywords — RAG LLM pgvector ResNet-50 OpenCV FastAPI Docker Redis · Categories: cs.AI, cs.IR, cs.CV

Contents

1 Who I Am & What I Build2
2 Current Work at AI-Studio3
3 Methods I Reach For5
4 Results at a Glance7
5 What I'm Looking For8
A Project Catalogue10
B About · Contact14

1Who I Am & What I Build

I am an AI/ML engineer who builds systems where the interesting problem is not which model to use, but what to show it. Every project in this portfolio has the same skeleton: a retrieval layer that shapes what the model sees, and a guarantee that every output is traceable back to something real.

Right now I am at eInfochips AI-Studio1, where I own the RAG system for embedded device test-case generation — ingesting datasheets, errata, and bug reports to emit candidate test cases in the project's fixture vocabulary. Before that I spent a research stint at CSIR–CRRI building an OpenCV pipeline to measure pothole area from monocular phone imagery[1].

Outside of work: 300+ algorithm problems across LeetCode, Codeforces, and GeeksForGeeks — five achievement badges on LeetCode. NPTEL Python for Data Science, top 2% of national cohort.2

2Current Work at AI-Studio

2.1 Embedded device test-case generation

Embedded test suites encode tacit knowledge about hardware quirks, timing tolerances, and revision-specific errata that rarely make it into formal requirements. I built a RAG system that ingests that scattered knowledge — datasheets, errata, historical bug reports — and emits candidate test cases in the project's existing fixture vocabulary. Accuracy measured via RAGAS; hyperparameters tuned with Grid Search CV.

Figure 1. The hybrid-retrieval pipeline used across the production AI-Studio test-case generator and the open-source Kneen document-chat. pgvector handles dense recall; BM25 handles lexical recall; Reciprocal-Rank Fusion merges the ranked lists before the LLM is invoked. Hover or tap any block to read the stage's role.

2.2 Website test-case RAG

Prior to the embedded work I improved an existing RAG pipeline for web-application test generation. The core finding: chunking is a UX decision. Test engineers want scenario-shaped chunks, not function-shaped ones. Switching the chunking strategy — not the model — lifted engineer accept-rate by +34 pp on the internal benchmark.

3Methods I Reach For

Three primitives appear in almost every system I ship. They are worth stating plainly.

3.1 Reciprocal-Rank Fusion

Given m ranked lists from heterogeneous retrievers (dense, lexical, HyDE), each document d receives a fused score:

RRF(d) = ∑i = 1..m 1k + rank_i(d)(1)

I default to k = 60 (TREC original[2]). The constant matters less than the property: poorly-ranked-but-frequent documents survive; well-ranked-but-orphaned ones don't get blindly promoted.

3.2 Confidence-gated classification

The kidney-pathology classifier in Kidnex (ResNet-50, four classes[3]) never returns a label without a calibrated probability. Below a threshold, the patient-facing copy says "this scan should be reviewed by a clinician before any action" — not as boilerplate, but as a hard control-flow branch.

3.3 Citations as a first-class output

In Kneen, the streaming API emits (token, source-anchor) tuples, not bare tokens. The frontend renders each token as a clickable footnote that scrolls the original PDF to the highlighted passage. A missing anchor is a visible tell for hallucination — which makes the system honest by design, not by hope.

Algorithm 1 · Cited streaming response

def stream_answer(query, store):
    # §3.1 — fuse dense, lexical, HyDE retrievers
    chunks = rrf([
        store.dense_search(query, k=20),
        store.bm25_search(query, k=20),
        store.dense_search(hyde(query), k=10),
    ])[:8]

    # §3.3 — every token carries its evidence anchor
    for tok, src in llm.stream(prompt(query, chunks)):
        yield SSE(tok, citation=src.page_anchor)

4Results at a Glance

**Table 1.** Headline metrics across four shipped systems. *Citation precision = fraction of generated claims for which the cited span actually entails the claim, hand-graded on n=200 query-document pairs.
System	Stack	Metric	Value	Status
Kidnex — renal CT triage	ResNet-50, LLM, Flask	top-1 acc., 4-class	96.4%	research
Kneen — cited doc-chat	FastAPI, pgvector, React	citation precision*	0.91	open source
Pothole measurement	Flask, OpenCV	area MAPE, n=120	8.7%	CSIR intern
AI-Studio test-case RAG	internal, Redis, Docker	engineer-accept rate	+34 pp	production

5What I'm Looking For

Every time I've tried to make a model cleverer, the gains were marginal. Every time I changed what it looked at, the gains were large. I want to keep working on that second problem — retrieval substrate, chunking strategy, evidence surfacing — in production, at scale, in domains where wrong answers have real consequences.

I'm looking for roles in Applied AI / ML Engineering, with a preference for teams shipping production-grade RAG, retrieval, or computer-vision systems. Available for full-time from June 2026. Contact card is in About.

§References

References

Suthar, S. A monocular pothole measurement pipeline using contour-based segmentation. Internship report, CSIR — Central Road Research Institute, May 2025.
Cormack, G. V., Clarke, C. L. A., & Büttcher, S. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR '09, pp. 758–759.
He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. CVPR 2016.
Gao, L., Ma, X., Lin, J., & Callan, J. Precise zero-shot dense retrieval without relevance labels. ACL 2023.