Files
Abstract
This study investigates Vision-Language Models (VLMs) for fire detection tasks, leveraging contextual prompts to assess their performance across various models. Notable results include, the Bunny model, which achieved a 76% F1-score, highlighting its effectiveness. These findings emphasize the impact of prompt engineering on performance while raising key questions about automating prompt optimization and selecting the most suitable VLMs based on task complexity, resource constraints, and real-world applicability.