Building Retrieval-Augmented Systems at the Edge of Healthcare and Embedded Intelligence
A working portfolio of a final-year Computer Engineering student exploring practical applications of LLMs, RAG, and computer vision.
We present the working portfolio of Samarth Suthar, a final-year Computer Engineering student whose research interests lie at the intersection of retrieval-augmented generation (RAG), medical computer vision, and embedded system test-case synthesis. Across one industrial role and two flagship projects, the author has shipped systems that demonstrate three recurring themes: (i) hybrid retrieval over heterogeneous data substrates, (ii) grounding LLM outputs in verifiable evidence, and (iii) translating model outputs into clinically- or developer-actionable artefacts.
This document is structured as a short-form preprint. §1 introduces the author. §2 describes the present research agenda at eInfochips. §3 formalises the recurring methods. Projects (companion paper) details two open-source systems. The Appendix contains the BibTeX contact card, acknowledgements, and complete experimental record.
RAG LLM ResNet-50 pgvector OpenCV FastAPI Docker Redis · Categories: cs.AI, cs.IR, cs.CV1Introduction
The last three years of my undergraduate work have circled around a single observation: large language models are most useful when they are wrong least often, and they are wrong least often when they are made to retrieve rather than recall. This thesis — informal but persistent — threads through every system in this portfolio, from a kidney-pathology classifier that defers to a clinician1 to a document-chat tool that refuses to answer without a citation.
I am presently a member of the AI-Studio team at eInfochips2, where I work on retrieval-augmented pipelines for test-case synthesis — first for web applications, now for embedded device firmware. Prior to that I spent a research stint at the CSIR–Central Road Research Institute, where I built a Flask & OpenCV pipeline to measure potholes from monocular imagery [1].
This portfolio is written as a paper because that is how I think. Each section below is a real piece of work; each figure is a real system; each equation is a method I have implemented and re-implemented enough times to commit to memory.
2Current Research Programme
2.1 Retrieval-augmented test synthesis for embedded devices
Embedded test suites are notoriously brittle: they encode tacit knowledge about hardware quirks, timing tolerances, and revision-specific errata that rarely make it into formal requirements. We are building a RAG system that ingests datasheets, errata, and historical bug reports, and emits candidate test cases in the project's existing fixture vocabulary.
[3] uses pgvector for dense recall and BM25 for lexical recall, fused via Reciprocal-Rank Fusion before the LLM is invoked. Interactive — hover any block to read the stage's role.2.2 Earlier work: a website test-case RAG
The embedded-device system is preceded by, and benefits from, an earlier RAG pipeline I improved for web-application test generation. Lessons there — chiefly that chunking is a UX decision, not a technical one — carried over directly. Test engineers do not want a function-shaped chunk; they want a scenario-shaped one.
3Methods & Recurring Patterns
Three primitives reappear so often that I now treat them as constants of the work. They are stated below in the most compact form I can defend.
3.1 Reciprocal-Rank Fusion
Given m ranked lists from heterogeneous retrievers (dense, lexical, HyDE, …), each document d receives a fused score:
I default to k = 60, which is the value from the original TREC paper [2]. The constant matters less than the property: poorly-ranked-but-frequent documents survive, while well-ranked-but-orphaned ones do not get blindly promoted.
3.2 Confidence-gated classification
The kidney-pathology classifier in Kidnex never returns a label without an accompanying calibrated probability:
Where 𝒞 = {Normal, Cyst, Tumor, Stone}. The downstream LLM prompt is templated on ptop: below a threshold, the patient-facing copy explicitly says "this scan should be reviewed by a clinician before any action" — not as boilerplate, but as a hard control-flow branch.
def stream_answer(query, store): # §3.1 — fuse retrievers chunks = rrf([ store.dense_search(query, k=20), store.bm25_search(query, k=20), store.dense_search(hyde(query), k=10), ])[:8] # §3.3 — every token must carry its evidence for tok, src in llm.stream(prompt(query, chunks)): yield SSE(tok, citation=src.page_anchor)
3.3 Citations as a first-class output
In Kneen, the streaming API does not emit tokens — it emits (token, source-anchor) tuples. The frontend renders each token as a clickable footnote that scrolls the original PDF to the highlighted passage. This is the single design decision I am most proud of, because it makes hallucinations visible: a missing anchor is a tell.
4Results, At a Glance
| System | Stack | Metric | Value | Status |
|---|---|---|---|---|
| Kidnex — renal CT triage | ResNet-50, LLM, Flask | top-1 acc., 4-class | 96.4% | research |
| Kneen — cited doc-chat | FastAPI, pgvector, React | citation precision* | 0.91 | open source |
| Pothole measurement | Flask, OpenCV | area MAPE, n=120 | 8.7% | CSIR intern |
| AI-Studio test-case RAG | internal, Redis, Docker | engineer-accept rate | +34 pp | production |
5Discussion & Where I'm Headed
The pattern is becoming hard to ignore: every time I have tried to make a model cleverer, the gains were marginal; every time I changed what it looked at, the gains were large. I would like to keep working on systems where the interesting research question is the substrate — what gets indexed, how it gets chunked, how it gets surfaced — rather than the parameter count.
Concretely, I am looking for roles in Applied AI / ML Engineering, with a preference for teams shipping production-grade RAG, retrieval, or computer-vision systems in regulated or high-stakes domains (health, automotive, embedded). My contact card is in Appendix B.
§References
- Suthar, S. A monocular pothole measurement pipeline using contour-based segmentation. Internship report, CSIR — Central Road Research Institute, May 2025.
- Cormack, G. V., Clarke, C. L. A., & Büttcher, S. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR '09, pp. 758–759.
- He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. CVPR 2016.
- Gao, L., Ma, X., Lin, J., & Callan, J. Precise zero-shot dense retrieval without relevance labels. ACL 2023.