Cognition and Computation in Decision-Making: Applying the Critical Decision Method to Artificial Intelligence for Aviation Event Analysis

Giovane de Morais; Ingrid Kawani Leandro Strohm; Moacyr Machado Cardoso Júnior; Guilherme Vieira Da Rocha; Nickolas Batista Mendonça Machado; Guilherme Micheli Bedini Moreira; Emilia Villani

doi:10.3384/wcc215.1189

Authors

Giovane de Morais Instituto Tecnológico de Aeronáutica
Ingrid Kawani Leandro Strohm Instituto Tecnológico de Aeronáutica
Dr. Moacyr Machado Cardoso Júnior Instituto Tecnológico de Aeronáutica https://orcid.org/0000-0002-2801-0329
Guilherme Vieira Da Rocha Departamento de Ciência e Tecnologia Aeroespacial
Nickolas Batista Mendonça Machado Departamento de Ciência e Tecnologia Aeroespacial
Dr. Guilherme Micheli Bedini Moreira Departamento de Ciência e Tecnologia Aeroespacial
Dra. Emilia Villani Instituto Tecnológico de Aeronáutica https://orcid.org/0000-0002-6804-1453

DOI:

https://doi.org/10.3384/wcc215.1189

Keywords:

Local Large Language Models, Critical Decision Method, Aviation Safety, Ethics and AI, Automated Qualitative Analysis

Abstract

This paper examining how local Large Language Models (LLMs) can partially automate the Critical Decision Method (CDM) in aviation safety investigations. The CDM, while widely respected for its ability to elucidate human factors and decision-making processes in rare or complex scenarios, often requires labor-intensive qualitative coding. To address this challenge, we developed a pipeline employing two specialised models: \emph{Phi-3-Mini-Instruct} for generating structured responses and \emph{Zephyr-7B-Beta} as a “judge” to evaluate confidence, completeness, and groundedness. A single anonymised incident served as our pilot case. Seventy-two participants (36 aviation professionals (pilots) and 36 novices) responded to a 53-item CDM-inspired questionnaire, creating a human reference dataset. The pipeline’s performance was benchmarked against both this human data and a classical NLP baseline (TF-IDF + SVM). Results revealed that the LLM matched 78\% of the majority-human multiple-choice answers and achieved a mean absolute error (MAE) of 0.38 on Likert-scale questions. Its open-ended responses, although moderately accurate, occasionally exhibited factual hallucinations (e.g.\ referencing non-existent systems) and role misattributions. Further stratification showed that the LLM outperformed novices but did not match pilots’ domain expertise, underscoring the importance of operational familiarity for nuanced decision analyses. Despite the single-incident scope limiting statistical generalisation, these findings suggest that LLM-based tools can substantially expedite repetitive data processing and facilitate consistent categorisation tasks that often consume investigators’ bandwidth. Future work will expand to multiple incidents, integrate flight data recorder (FDR) and cockpit voice recorder (CVR) information to reduce speculation, and refine both self-evaluation mechanisms and ethical safeguards.

Cognition and Computation in Decision-Making

Applying the Critical Decision Method to Artificial Intelligence for Aviation Event Analysis

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Conference Proceedings Volume

Section

License

Make a Submission