Cognition and Computation in Decision-Making

Applying the Critical Decision Method to Artificial Intelligence for Aviation Event Analysis

Authors

  • Giovane de Morais Instituto Tecnológico de Aeronáutica
  • Ingrid Kawani Leandro Strohm Instituto Tecnológico de Aeronáutica
  • Dr. Moacyr Machado Cardoso Júnior Instituto Tecnológico de Aeronáutica https://orcid.org/0000-0002-2801-0329
  • Guilherme Vieira Da Rocha Departamento de Ciência e Tecnologia Aeroespacial
  • Nickolas Batista Mendonça Machado Departamento de Ciência e Tecnologia Aeroespacial
  • Dr. Guilherme Micheli Bedini Moreira Departamento de Ciência e Tecnologia Aeroespacial
  • Dra. Emilia Villani Instituto Tecnológico de Aeronáutica https://orcid.org/0000-0002-6804-1453

DOI:

https://doi.org/10.3384/wcc215.1189

Keywords:

Local Large Language Models, Critical Decision Method, Aviation Safety, Ethics and AI, Automated Qualitative Analysis

Abstract

This paper examining how local Large Language Models (LLMs) can partially automate the Critical Decision Method (CDM) in aviation safety investigations. The CDM, while widely respected for its ability to elucidate human factors and decision-making processes in rare or complex scenarios, often requires labor-intensive qualitative coding. To address this challenge, we developed a pipeline employing two specialised models: \emph{Phi-3-Mini-Instruct} for generating structured responses and \emph{Zephyr-7B-Beta} as a “judge” to evaluate confidence, completeness, and groundedness. A single anonymised incident served as our pilot case. Seventy-two participants (36 aviation professionals (pilots) and 36 novices) responded to a 53-item CDM-inspired questionnaire, creating a human reference dataset. The pipeline’s performance was benchmarked against both this human data and a classical NLP baseline (TF-IDF + SVM). Results revealed that the LLM matched 78\% of the majority-human multiple-choice answers and achieved a mean absolute error (MAE) of 0.38 on Likert-scale questions. Its open-ended responses, although moderately accurate, occasionally exhibited factual hallucinations (e.g.\ referencing non-existent systems) and role misattributions. Further stratification showed that the LLM outperformed novices but did not match pilots’ domain expertise, underscoring the importance of operational familiarity for nuanced decision analyses. Despite the single-incident scope limiting statistical generalisation, these findings suggest that LLM-based tools can substantially expedite repetitive data processing and facilitate consistent categorisation tasks that often consume investigators’ bandwidth. Future work will expand to multiple incidents, integrate flight data recorder (FDR) and cockpit voice recorder (CVR) information to reduce speculation, and refine both self-evaluation mechanisms and ethical safeguards.

Downloads

Published

2025-10-28

Conference Proceedings Volume

Section

6. Systems and sub-system engineering