Identification of Nonlinear Causality in Multivariate Systems by Designing Interpretable Machine Learning Models
Doctoral thesis
Published version
Permanent lenke
https://hdl.handle.net/11250/3159544Utgivelsesdato
2024Metadata
Vis full innførselSamlinger
Originalversjon
Roy, K. (2024). Identification of Nonlinear Causality in Multivariate Systems by Designing Interpretable Machine Learning Models [Doctoral Dissertation]. University of Agder.Sammendrag
The study focusing on inference and data analysis within networks has become increasingly important. This is due to the growing number of interconnected systems and the vast amounts of data they produce. Many of these systems generate data in the form of multivariate time series, which are comprised of simultaneous observations across multiple sensor variables. Examples of such systems include financial engineering, sensor networks for monitoring different types of environments, brain signal processing, and interconnected systems in water networks and the oil and gas sector.
By examining these time series, one can reveal critical insights into system behavior, identify patterns, make predictions, or perform data imputation, among other tasks. Consequently, the design of effective methods for analyzing data and drawing inferences from networks of multivariate time series is a pivotal research area with wide-ranging applications across various fields. In this Ph.D. Thesis, our primary focus lies in identifying the causal relationships between time series through the design of interpretable algorithms for data prediction and imputation of missing data.
A novel approach for joint topology identification and missing data imputation is developed based on the assumption that time series are generated in two steps: first, a linear VAR process in a latent space, followed by invertible nonlinear mappings per sensor. This approach involves a Nonlinear Vector Autoregressive model, sparsity-promoting techniques, and comparison with DNN-based models, demonstrating its superiority over linear methods. Additionally, experiments on synthetic and real datasets show the algorithm’s robustness in identifying nonlinear topology and handling missing and noisy data, affirming its effectiveness across synthetic and real-world datasets.
This Ph.D. thesis is structured as a compilation of papers comprising six chapters and four appendices. The initial section sets the stage by establishing the rationale for the research and examining the relevant scholarly work. Following that, the necessary background is introduced, including graph topology identification, nonlinear function approximation, regularization, iterative optimization algorithms, interpretability, and invertible neural networks. Next, we introduce a novel algorithm for identifying nonlinear topologies, assuming that the data originates from a sparse vector autoregressive (VAR) model in a latent space, translated into a measurement space through a set of invertible nonlinearities, and we outline our data-driven approaches for delineating the nonlinear graph. Subsequently, we tackle the issue of incomplete data, investigating ways in which our newfound understanding of the topology can be leveraged to manage this challenge. Finally, we present a less complex algorithm that focuses on reducing the mean squared error in the latent space, demonstrating its efficiency in comparison to the more intricate methods.
Består av
Paper I: Roy, K, Lopez-Ramos, L. M. & Beferull-Lozano, B. (2022). Joint signal estimation and nonlinear topology identification from noisy data with missing entries. 56th IEEE Asilomar Conference on Signals, Systems, and Computers, 436–440. https://doi.org/10.1109/IEEECONF56349.2022.10051968. Accepted version. Full-text is not available in AURA as a speratate file.Paper II: Roy, K, Lopez-Ramos, L. M. & Beferull-Lozano, B. (2022). Joint learning of topology and invertible nonlinearities from multiple time series. 2022 IEEE International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), 483–488. https://doi.org/10.1109/ISMODE56940.2022.10180965. Accepted version. Full-text is not available in AURA as a speratate file.
Paper III: Roy, K, Lopez-Ramos, L. M. & Beferull-Lozano, B. (Forthcoming). Efficient interpretable nonlinear modeling for multiple time series. Submitted version to IEEE Transactions on Signal Processing. Full-text is not available in AURA as a speratate file.
Paper IV: Lopez-Ramos, L. M., Roy, K & Beferull-Lozano, B. (2021). Explainable nonlinear modelling of multiple time series with invertible neural networks. International Conference on Intelligent Technologies and Applications, 17–30. https://doi.org/10.1007/978-3-031-10525-8_2. Accepted version. Full-text is not available in AURA as a speratate file.