Anomaly Detection, Prognostics, and Diagnostics : Machine Learning for the Hadron Calorimeter at the CMS Experiment
Doctoral thesis
Published version

View/ Open
Date
2024Metadata
Show full item recordCollections
- Doctoral Dissertations [426]
- Publikasjoner fra CRIStin [4568]
Original version
Asres, M. W. (2024). Anomaly Detection, Prognostics, and Diagnostics: Machine Learning for the Hadron Calorimeter at the CMS Experiment [Doctoral dissertation]. University of Agder.Abstract
Machine Learning (ML) tools have gained immense popularity due to the proliferation of sensor data for monitoring, prognostic, and diagnostic applications in various industrial domains. The growing system complexity and monitoring data volumes of the Large Hadron Collider (LHC) at CERN accentuates the need for automation through advanced ML tools. Detection, identification, and resolution of anomalies are essential to generate more physics collision data of the highest quality. Developing ML tools for complex systems often involves expensive data curation and modeling efforts; it requires adequate, cleaned, and annotated data sets, and addresses the challenges of heterogeneity and curse-of-dimensionality of large data sets. The Compact Muon Solenoid (CMS) experiment—one of the large general-purpose colliders at the LHC—has dedicated substantial monitoring efforts for detector systems and particle data quality; the control and safety systems (DCS/DSS) actively monitor safety-critical problems, and the data quality monitoring (DQM) system mitigates data loss by identifying and diagnosing physics data problems. The existing monitoring systems need to incorporate a wide range of monitoring variables and adapt to the evolving conditions of the detectors. This dissertation focuses on the development of unsupervised anomaly detection (AD), anomaly prediction (AP), and root-cause analysis (RCA) on multivariate time series data sets. We have developed deep learning models for frontend electronics of the Hadron Calorimeter (HCAL) of the CMS detector using diagnostic sensors and high-dimensional particle acquisition channel-monitoring data sets. We have employed subsystem-granularity modeling using a divide-and-conquer approach to monitor the complex HCAL systems with thousands of sensors. Our monitoring tools have detected and identified previously unknown and hard-to-monitor anomalies, and extended the monitoring, diagnostics, and prognostics automation of the HCAL. The developed tools are deployed at CERN and are currently providing essential real-time and offline anomaly monitoring and diagnostics on the frontend electronics of the HCAL and the online DQM system. Our scientific contribution in tackling the challenges for complex system monitoring includes: 1) enhancing multivariate sensor AD, 2) a promising AP approach, 3) context-aware high-dimensional spatio-temporal AD, 4) transfer learning on multi-network deep learning models, 5) lightweight interconnection and divergence discovery for multi-systems with multivariate sensors, and 6) enhancing computational efficiency of anomalies causality discovery on binary anomaly data.