π Abstract
Large Language Models (LLMs) are increasingly embedded into security tools β for example, browser-based vulnerability scanners, email malware triagers, and general cybersecurity chatbots β to assist analysts in daily tasks. However, these AI-based assistants can be subverted through prompt injection attacks, in which malicious inputs (either direct user prompts or hidden in retrieved content) cause the model to ignore its intended instructions and execute an attackerβs commands. In this paper, we present a theoretical framework and evaluation design for prompt injection detection in LLM-based security assistants. We survey existing datasets (e.g. LLMail-Inject for email, GenTel-Bench, BrowseSafe-Bench) and injection taxonomies, and propose a model-agnostic, layered detection architecture that operates across use cases (web browsing tools, email agents, conversational bots). Our framework combines static filters, semantic classifiers, and consistency checks (inspired by multi-agent and output-validation schemes) to flag malicious instructions. We describe how this approach addresses direct, indirect, multi-turn, and jailbreak attacks, and outline an evaluation plan using precision/recall, attack success rate, and usability metrics. While no implementation is provided, our conceptual design lays the groundwork for future empirical studies. By unifying diverse security contexts under a single detection paradigm, this work aims to advance safety practices for AI-powered security applications.
π How to Cite
Adharsh C S, Sanjith Rana H S, Dr.Kavitha V, Uthra V, "Prompt Injection Detection in LLM-Based Security Assistants: A Dataset and Framework" International Journal of Advanced Multidisciplinary Research and Educational Development, V2(2): Page(279-285) Mar-Apr 2026. ISSN: 3107-6513. www.ijamred.com. Published by Scientific and Academic Research Publishing.