Predictive Risk Modeling via Natural Language Processing of Industrial Safety Reports

Authors

  • Kimberly Long Holt Health and Safety Concepts-Environmental Health & Safety

DOI:

https://doi.org/10.55927/ijis.v5i2.8

Keywords:

: Natural Language Processing, Machine Learning, Random Forest, Safety Management, Injury Prediction

Abstract

Rapid management of industrial safety is producing long incident investigation reports, which are not fully reflected in the use of text. The study examines Natural Language Processing (NLP) and Machine Learning (ML) in transforming qualitative messages on safety into quantitative predictive messages. A total of 16,878 records of construction accidents were used to test different ML algorithms in terms of usefulness and Elasticity. The model of the Fields of Application was the Random Forest with an accuracy of 79.3%, 77.1% precision, 78.0% recall, and an Area Under the Receiver Operating Curve (AUROC) of 0.98. The analysis of the importance of the features revealed that accident mechanism and nature were mainly important predictors, whereas the temporal and economic factors were the least affected. These results support the effectiveness of NLP when using unstructured safety data, which provides practitioners with a resource-based and proactive risk intervention tool as opposed to a reactive mechanism

References

Alkaissy, M., Arashpour, M., Golafshani, E. M., Hosseini, M. R., Khanmohammadi, S., Bai, Y., & Feng, H. (2023). Enhancing construction safety: Machine learning-based classification of injury types. Safety science, 162, 106102. https://doi.org/10.1016/j.ssci.2023.106102

Baker, H., Hallowell, M. R., & Tixier, A. J. P. (2020). Automatically learning construction injury precursors from text. Automation in Construction, 118, 103145. https://doi.org/10.1016/j.autcon.2020.103145

Kim, K. N., Cho, D. G., & Lee, M. J. (2025). A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings, 15(23), 4343. https://doi.org/10.3390/buildings15234343

Kuhn, K. D. (2018). Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transportation Research Part C: Emerging Technologies, 87, 105-122. https://doi.org/10.1016/j.trc.2017.12.018

Liu, C., & Yang, S. (2022). Using text mining to establish a knowledge graph from accident/incident reports in risk assessment. Expert Systems with Applications, 207, 117991. https://doi.org/10.1016/j.eswa.2022.117991

Nanyonga, A., Joiner, K., Turhan, U., & Wild, G. (2025). Applications of natural language processing in aviation safety: A review and qualitative analysis. In AIAA SciTech 2025 Forum (p. 2153). https://doi.org/10.2514/6.2025-2153

Papazoglou, E., Nena, E., Kontogiorgis, C., Deligiannidou, E., Tripsianis, G., & Konstantinidis, T. (2025). Epidemiological Investigation of Occupational Accidents of Insured Salaried Employees in the Region of Thrace, Greece. OALib, 12(03), 1–20. https://doi.org/10.4236/oalib.1113080

Ricketts, J., Barry, D., Guo, W., & Pelham, J. (2023). A scoping literature review of natural language processing applications to safety occurrence reports. Safety, 9(2), 22. https://doi.org/10.3390/safety9020022

Robinson, S. D. (2019). Temporal topic modeling applied to aviation safety reports: A subject matter expert review. Safety science, 116, 275-286. https://doi.org/10.1016/j.ssci.2019.03.014

Sankarasubramanian, P., & Ganesh, E. N. (2020). Industrial Accident Report Analysis Using Natural Language Processing. International Journal of Scientific & Technology Research, 9(6), 470–476. https://www.ijstr.org/final-print/jun2020/Industrial-Accident-Report-Analysis-Using-Natural-Language-Processing.pdf

Song, B., & Suh, Y. (2019). Narrative text-based anomaly detection using accident report documents: The case of chemical process safety. Journal of loss prevention in the process industries, 57, 47-54. https://doi.org/10.1016/j.jlp.2018.08.010

Wong, A., Plasek, J. M., Montecalvo, S. P., & Zhou, L. (2018). Natural language processing and its implications for the future of medication safety: a narrative review of recent advances and challenges. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy, 38(8), 822-841. https://doi.org/10.1002/phar.2151

Published

2026-02-26

How to Cite

Holt, K. L. (2026). Predictive Risk Modeling via Natural Language Processing of Industrial Safety Reports. International Journal of Integrative Sciences, 5(2), 371–384. https://doi.org/10.55927/ijis.v5i2.8