Building Retrieval-Augmented Systems at the Edge of Healthcare and Embedded Intelligence

A working portfolio of a final-year Computer Engineering student exploring practical applications of LLMs, RAG, and computer vision.

Samarth Suthar1,2·Corresponding author

1Vidush Somany Institute of Technology and Research, Kadi · Computer Engineering
2eInfochips (An ARROW Company), AI-Studio Division

B.E. 2022–2026 · CGPA 7.50DOI: 10.48550/s.suthar.2026License: CC BY 4.0
Abstract

We present the working portfolio of Samarth Suthar, a final-year Computer Engineering student whose research interests lie at the intersection of retrieval-augmented generation (RAG), medical computer vision, and embedded system test-case synthesis. Across one industrial role and two flagship projects, the author has shipped systems that demonstrate three recurring themes: (i) hybrid retrieval over heterogeneous data substrates, (ii) grounding LLM outputs in verifiable evidence, and (iii) translating model outputs into clinically- or developer-actionable artefacts.

This document is structured as a short-form preprint. §1 introduces the author. §2 describes the present research agenda at eInfochips. §3 formalises the recurring methods. Projects (companion paper) details two open-source systems. The Appendix contains the BibTeX contact card, acknowledgements, and complete experimental record.

Keywords — RAG LLM ResNet-50 pgvector OpenCV FastAPI Docker Redis · Categories: cs.AI, cs.IR, cs.CV

1Introduction

The last three years of my undergraduate work have circled around a single observation: large language models are most useful when they are wrong least often, and they are wrong least often when they are made to retrieve rather than recall. This thesis — informal but persistent — threads through every system in this portfolio, from a kidney-pathology classifier that defers to a clinician1 to a document-chat tool that refuses to answer without a citation.

I am presently a member of the AI-Studio team at eInfochips2, where I work on retrieval-augmented pipelines for test-case synthesis — first for web applications, now for embedded device firmware. Prior to that I spent a research stint at the CSIR–Central Road Research Institute, where I built a Flask & OpenCV pipeline to measure potholes from monocular imagery [1].

This portfolio is written as a paper because that is how I think. Each section below is a real piece of work; each figure is a real system; each equation is a method I have implemented and re-implemented enough times to commit to memory.

2Current Research Programme

2.1  Retrieval-augmented test synthesis for embedded devices

Embedded test suites are notoriously brittle: they encode tacit knowledge about hardware quirks, timing tolerances, and revision-specific errata that rarely make it into formal requirements. We are building a RAG system that ingests datasheets, errata, and historical bug reports, and emits candidate test cases in the project's existing fixture vocabulary.

query qinputHyDEhypoth. docE(q)dense vecHybrid Searchpgvector+ BM25top-k chunksRRFfusionLLM+ citationsSSE stream[1][2][3][4][5]hover any stage to inspect ↑
Figure 1. Schematic of the hybrid-retrieval pipeline used across both the production AI-Studio test-case generator and the open-source Kneen document-chat. Stage [3] uses pgvector for dense recall and BM25 for lexical recall, fused via Reciprocal-Rank Fusion before the LLM is invoked. Interactive — hover any block to read the stage's role.

2.2  Earlier work: a website test-case RAG

The embedded-device system is preceded by, and benefits from, an earlier RAG pipeline I improved for web-application test generation. Lessons there — chiefly that chunking is a UX decision, not a technical one — carried over directly. Test engineers do not want a function-shaped chunk; they want a scenario-shaped one.

3Methods & Recurring Patterns

Three primitives reappear so often that I now treat them as constants of the work. They are stated below in the most compact form I can defend.

3.1  Reciprocal-Rank Fusion

Given m ranked lists from heterogeneous retrievers (dense, lexical, HyDE, …), each document d receives a fused score:

RRF(d) = i = 1..m 1k + ranki(d)(1)

I default to k = 60, which is the value from the original TREC paper [2]. The constant matters less than the property: poorly-ranked-but-frequent documents survive, while well-ranked-but-orphaned ones do not get blindly promoted.

3.2  Confidence-gated classification

The kidney-pathology classifier in Kidnex never returns a label without an accompanying calibrated probability:

ŷ = arg maxc ∈ 𝒞 p(c | x; θ),   ptop = maxc p(c | x; θ)(2)

Where 𝒞 = {Normal, Cyst, Tumor, Stone}. The downstream LLM prompt is templated on ptop: below a threshold, the patient-facing copy explicitly says "this scan should be reviewed by a clinician before any action" — not as boilerplate, but as a hard control-flow branch.

Algorithm 1 · Cited streaming response
def stream_answer(query, store):
    # §3.1 — fuse retrievers
    chunks = rrf([
        store.dense_search(query, k=20),
        store.bm25_search(query, k=20),
        store.dense_search(hyde(query), k=10),
    ])[:8]

    # §3.3 — every token must carry its evidence
    for tok, src in llm.stream(prompt(query, chunks)):
        yield SSE(tok, citation=src.page_anchor)

3.3  Citations as a first-class output

In Kneen, the streaming API does not emit tokens — it emits (token, source-anchor) tuples. The frontend renders each token as a clickable footnote that scrolls the original PDF to the highlighted passage. This is the single design decision I am most proud of, because it makes hallucinations visible: a missing anchor is a tell.

4Results, At a Glance

SystemStackMetricValueStatus
Kidnex — renal CT triageResNet-50, LLM, Flasktop-1 acc., 4-class96.4%research
Kneen — cited doc-chatFastAPI, pgvector, Reactcitation precision*0.91open source
Pothole measurementFlask, OpenCVarea MAPE, n=1208.7%CSIR intern
AI-Studio test-case RAGinternal, Redis, Dockerengineer-accept rate+34 ppproduction
Table 1. Headline metrics from the four most-recent systems. *Citation precision = fraction of generated claims for which the cited span actually entails the claim, hand-graded on n=200 query-document pairs.

5Discussion & Where I'm Headed

The pattern is becoming hard to ignore: every time I have tried to make a model cleverer, the gains were marginal; every time I changed what it looked at, the gains were large. I would like to keep working on systems where the interesting research question is the substrate — what gets indexed, how it gets chunked, how it gets surfaced — rather than the parameter count.

Concretely, I am looking for roles in Applied AI / ML Engineering, with a preference for teams shipping production-grade RAG, retrieval, or computer-vision systems in regulated or high-stakes domains (health, automotive, embedded). My contact card is in Appendix B.

§References

  1. Suthar, S. A monocular pothole measurement pipeline using contour-based segmentation. Internship report, CSIR — Central Road Research Institute, May 2025.
  2. Cormack, G. V., Clarke, C. L. A., & Büttcher, S. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR '09, pp. 758–759.
  3. He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. CVPR 2016.
  4. Gao, L., Ma, X., Lin, J., & Callan, J. Precise zero-shot dense retrieval without relevance labels. ACL 2023.
Suthar · Portfolio Preprint · 2026Page 1 / 3