Breaking domain barriers: mixture of experts for cross-domain fake news detectionLiguori, Angelica; Pisani, Francesco Sergio; Comito, Carmela; Guarascio, Massimo; Manco, Giuseppe
doi: 10.1007/s10994-025-06827-9pmid: N/A
Social media have become a key tool for rapidly spreading information worldwide, amplifying the risks of misinformation and fake news. This is also intensified by the fact that fake news covers a wide range of topics across multiple domains. Machine learning, particularly language models, offers a promising solution for detecting fake news. However, a major limitation of existing methods is their inability to classify instances from new or unseen domains. To tackle this issue, we introduce MERMAID, a mixture of experts approach that leverages the knowledge from different specialized models to classify examples from unknown domains. Each expert is initially trained on a specific known domain and then fine-tuned using data from other known domains. A model merging procedure is then applied to combine related experts, reducing the number of models required for predicting instances from unknown domains. In addition, our approach can effectively be used in few-shot learning scenarios, where a small amount of data from the target/unknown domain is available during training. Experiments on five benchmark datasets demonstrate the effectiveness of our method in both zero-shot and few-shot learning settings.
TransFed: cross-domain feature alignment for semi-supervised federated transfer learningZeng, Linghui; Liu, Ruixuan; Xiong, Li; Ho, Joyce C.
doi: 10.1007/s10994-025-06805-1pmid: N/A
Healthcare institutions often need to collaborate on developing predictive models while adhering to privacy regulations and handling heterogeneous data collection practices. Traditional federated learning approaches assume shared feature spaces or patient populations across institutions, limiting their applicability in real-world healthcare settings where different institutions collect distinct sets of patient data. We propose TransFed, a novel semi-supervised federated transfer learning framework that enables effective collaboration across healthcare institutions with heterogeneous feature spaces. Our framework combines cross-domain feature alignment with semi-supervised learning to leverage both labeled and unlabeled data, while maintaining privacy through federated learning principles. Using two large real-world clinical datasets, we demonstrate that TransFed effectively enables knowledge transfer without requiring direct data sharing or common feature spaces to improve prediction performance across domains and generalizes well to unseen healthcare systems.
Computing the distance between unbalanced distributions: the flat metricSchmidt, Henri; Düll, Christian
doi: 10.1007/s10994-025-06828-8pmid: 40726635
We provide an implementation to compute the flat metric in any dimension. The flat metric, also called dual bounded Lipschitz distance, generalizes the well-known Wasserstein distance W1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$W_1$$\end{document} to the case that the distributions are of unequal total mass. Thus, our implementation adapts very well to mass differences and uses them to distinguish between different distributions. This is of particular interest for unbalanced optimal transport tasks and for the analysis of data distributions where the sample size is important or normalization is not possible. The core of the method is based on a neural network to determine an optimal test function realizing the distance between two given measures. Special focus was put on achieving comparability of pairwise computed distances from independently trained networks. We tested the quality of the output in several experiments where ground truth was available as well as with simulated data.
JANET: Joint Adaptive predictioN-region Estimation for Time-seriesEnglish, Eshant; Wong-Toi, Eliot; Fontana, Matteo; Mandt, Stephan; Smyth, Padhraic; Lippert, Christoph
doi: 10.1007/s10994-025-06812-2pmid: N/A
Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET (Joint Adaptive predictioN-region Estimation for Time-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K-familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET’s superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.
Improving graph neural networks through feature importance learningAlkhoury, Fouad; Horváth, Tamás; Bauckhage, Christian; Wrobel, Stefan
doi: 10.1007/s10994-025-06815-zpmid: N/A
Graph neural networks (GNNs) are among the most widely used methods for node classification in graphs. A common strategy to improve their predictive performance is to enrich nodes with additional features. A weakness of this method is that the set of appropriate features can vary from graph to graph. We address this shortcoming by proposing a novel method. In a preprocessing step, a first GNN is trained on a set of graphs with varying structural properties, using a candidate set of node features fixed in advance. The resulting GNN model is then used to predict the most relevant features from the candidate set for unseen target graphs, which are later processed for node classification. For each target graph, a second GNN is trained on the graph, which is enriched with the node feature vectors calculated for the features selected by the first GNN. A key advantage of the proposed method is that the features are selected without computing the candidate features for the target graph. Our experimental results on synthetic and real-world graphs show that even a few features selected in this way is sufficient to significantly improve the predictive performance of GNNs that use either none or all of the candidate features. Moreover, the time needed to learn the second GNN for the target graph can be reduced by up to two orders of magnitude.
Single image inpainting and super-resolution with simultaneous uncertainty guarantees by universal reproducing kernelsHorváth, Bálint; Csáji, Balázs Csanád
doi: 10.1007/s10994-025-06814-0pmid: N/A
The paper proposes a statistical learning approach to the problem of estimating missing pixels of images, crucial for image inpainting and super-resolution problems. One of the main novelties of the method is that it also provides uncertainty quantifications together with the estimated values. Our core assumption is that the underlying data-generating function comes from a reproducing kernel Hilbert space (RKHS). A special emphasis is put on band-limited functions, central to signal processing, which form Paley–Wiener type RKHSs. The proposed method, which we call simultaneously guaranteed kernel interpolation (SGKI), is an extension and refinement of a recently developed kernel method. An advantage of SGKI is that it not only estimates the missing pixels, but also builds non-asymptotic confidence bands for the unobserved values, which are simultaneously guaranteed for all missing pixels. We also show how to compute these bands efficiently using Schur complements, we discuss a generalization to vector-valued functions, and we present a series of numerical experiments on various datasets containing synthetically generated and benchmark images, as well.
Construction of the Kolmogorov-Arnold networks using the Newton-Kaczmarz methodPoluektov, Michael; Polar, Andrew
doi: 10.1007/s10994-025-06800-6pmid: N/A
It is known that any continuous multivariate function can be represented exactly by a composition functions of a single variable—the so-called Kolmogorov-Arnold representation. It can be a convenient tool for tasks where it is required to obtain a predictive model that maps some vector input of a black box system into a scalar output. In this case, the representation may not be exact, and it is more correct to refer to such structure as the Kolmogorov-Arnold model (or, as more recently popularised, ‘network’). Construction of such model based on the recorded input–output data is a challenging task. In the present paper, it is suggested to decompose the underlying functions of the representation into continuous basis functions and parameters. It is then proposed to find the parameters using the Newton-Kaczmarz method for solving systems of non-linear equations. The algorithm is then modified to support parallelisation. The paper demonstrates that such approach is also an excellent tool for data-driven solution of partial differential equations. Numerical examples show that for the considered model, the Newton-Kaczmarz method for parameter estimation is efficient and more robust with respect to the section of the initial guess than the straightforward application of the Gauss-Newton method. Finally, the Kolmogorov-Arnold model is compared to the MATLAB’s built-in neural networks on a relatively large-scale problem (25 inputs, datasets of 10 million records), significantly outperforming the multilayer perceptrons in this particular problem (4–10 min vs. 4–8 h of training time, as well as higher accuracy, lower CPU usage, and smaller memory footprint).
Analyzing the effect of residual connections to oversmoothing in graph neural networksKelesis, Dimitrios; Fotakis, Dimitris; Paliouras, Georgios
doi: 10.1007/s10994-025-06822-0pmid: N/A
The performance of Graph Neural Networks (GNNs) diminishes as their depth increases. That is mainly attributed to oversmoothing, which leads to similar node representations through repeated graph convolutions. To enable deep GNNs, several approaches have been proposed, among which the use of residual connections. Residual connections have proven effective in benchmark datasets, but the way in which they improve the performance of deep GNNs has not been fully studied. We show that residual connections force the model to focus on the local neighborhood of graph nodes, making the GNN equivalent to the sum of shallow GCNs. We explain theoretically why this is the case and verify the theoretical results experimentally. However, our findings raise the question of whether residual connections are helpful in cases where deep networks are necessary. We assess this experimentally, in two situations: (a) in the presence of the “cold start" problem, i.e. when there is no feature information about unlabeled nodes; and (b) in a new synthetic dataset of controllable long-interactions. These experiments highlight the drawbacks of GNNs using residual connections, while showing that simpler methods can be more effective.
Physics encoded blocks in residual neural network architectures for digital twin modelsZia, Muhammad Saad; Houpert, Corentin; Anjum, Ashiq; Liu, Lu; Conway, Anthony; Peña-Rios, Anasol
doi: 10.1007/s10994-025-06808-ypmid: N/A
Physics Informed Machine Learning has emerged as a popular approach for modeling and simulation in digital twins, enabling the generation of accurate models of processes and behaviors in real-world systems. However, existing methods either rely on simple loss regularizations that offer limited physics integration or employ highly specialized architectures that are difficult to generalize across diverse physical systems. This paper presents a generic approach based on a novel physics-encoded residual neural network (PERNN) architecture that seamlessly combines data-driven and physics-based analytical models to overcome these limitations. Our method integrates differentiable physics blocks–implementing mathematical operators from physics-based models–with feed-forward learning blocks, while intermediate residual blocks ensure stable gradient flow during training. Consequently, the model naturally adheres to the underlying physical principles even when prior physics knowledge is incomplete, thereby improving generalizability with low data requirements and reduced model complexity. We investigate our approach in two application domains. The first is a steering model for autonomous vehicles in a simulation environment, and the second is a digital twin for climate modeling using an ordinary differential equation (ODE)-based model of Net Ecosystem Exchange (NEE) to enable gap-filling in flux tower data. In both cases, our method outperforms conventional neural network approaches as well as state-of-the-art Physics Informed Machine Learning methods.
Perfect counterfactuals in imperfect worlds: modelling noisy implementation of actions in sequential algorithmic recourseXuan, Yueqing; Sokol, Kacper; Sanderson, Mark; Chan, Jeffrey
doi: 10.1007/s10994-025-06821-1pmid: N/A
Algorithmic recourse suggests actions to individuals who have been adversely affected by automated decision-making, helping them to achieve the desired outcome. Knowing the recourse, however, does not guarantee that users can implement it perfectly, either due to environmental variability or personal choices. Recourse generation should thus anticipate its sub-optimal or noisy implementation. While several approaches construct recourse that is robust to small perturbations – e.g., arising due to its noisy implementation – they assume that the entire recourse is implemented in a single step, thus model the noise as one-off and uniform. But these assumptions are unrealistic since recourse often entails multiple sequential steps, which makes it harder to implement and subject to increasing noise. In this work, we consider recourse under plausible noise that adheres to the local data geometry and accumulates at every step of the way. We frame this problem as a Markov Decision Process and demonstrate that such a distribution of plausible noise satisfies the Markov property. We then propose the RObust SEquential (ROSE) recourse generator for tabular data; our method produces a series of steps leading to the desired outcome even when they are implemented imperfectly. Given plausible modelling of sub-optimal human actions and greater recourse robustness to accumulated uncertainty, ROSE provides users with a high chance of success while maintaining low recourse cost. Empirical evaluation shows that our algorithm effectively navigates the inherent trade-off between recourse robustness and cost while ensuring its sparsity and computational efficiency.