Prompting Large Language Models with Raw HTML and JavaScript for Lightweight Phishing Website Detection
Author : LunPing Hung, Shih Yang Yang Kuan Jung Chen Syu Bo Jhang
Abstract :Phishing attacks have become increasingly sophisticated, exploiting dynamic web technologies and social engineering tactics to evade signature-based detectors. Traditional rule-driven systems struggle to generalize across diverse obfuscation techniques, leading to gaps in coverage. To address this challenge, we turned to large language models for their deep semantic reasoning capabilities. By directly using the raw HTML and JavaScript code as prompts, our framework can infer intent and detect subtle malicious behaviors, enabling an adaptable, dependency-light solution that can be rapidly deployed in constrained environments. Our system retrieves web content using the requests library, capturing complete HTML and JavaScript for each URL. It then constructs structured prompts encapsulating key indicators—such as form behaviors, script execution flows, and embedded redirects—that are passed to a locally hosted large language model. When evaluated on a balanced collection of phishing and legitimate URLs, the framework achieved a 79% detection accuracy. This level of performance highlights the model‘s effectiveness for real-time threat triage and provides clear, explainable reasoning for each classification. Additionally, the modular architecture supports continuous improvement—such as fine-tuning on new phishing techniques—and easy integration into large-scale scanning pipelines, bridging the gap between research innovation and operational deployment. Proceedings of International Conference 2025
Keywords :Phishing detection, large language models, HTML/JS analysis, real-time threat triage Ask ChatGPT
Conference Name :International Symposium on Mathematics and Computer Science (ISMCS-25)
Conference Place Sydney, Australia
Conference Date 28th Jul 2025