In-depth analysis of music structure as a text networkTsai, Ping-Rui;Chou, Yen-Ting;Wang, Nathan-Christopher;Chen, Hui-Ling;Huang, Hong-Yue;Luo, Zih-Jia;Hong, Tzay-Ming
doi: 10.48550/arxiv.2303.13631pmid: N/A
Abstract:Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language, aligning with the statistical characteristics of texts. Through this approach, we aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music. Relying on the advantages of structuralism, we can concentrate on the relationships and order between the physical elements of music, rather than getting entangled in the blurred boundaries of science and philosophy. The scientific framework we present not only conforms to past conclusions in music, but also serves as a bridge that connects music to natural language processing and knowledge graphs.
Urgency-aware Routing in Single Origin-destination Itineraries through Artificial CurrenciesPedroso, Leonardo;Heemels, W. P. M. H.;Salazar, Mauro
doi: 10.1109/cdc49753.2023.10383739pmid: N/A
Abstract:Within mobility systems, the presence of self-interested users can lead to aggregate routing patterns that are far from the societal optimum which could be achieved by centrally controlling the users' choices. In this paper, we design a fair incentive mechanism to steer the selfish behavior of the users to align with the societally optimal aggregate routing. The proposed mechanism is based on an artificial currency that cannot be traded or bought, but only spent or received when traveling. Specifically, we consider a parallel-arc network with a single origin and destination node within a repeated game setting whereby each user chooses from one of the available arcs to reach their destination on a daily basis. In this framework, taking faster routes comes at a cost, whereas taking slower routes is incentivized by a reward. The users are thus playing against their future selves when choosing their present actions. To capture this complex behavior, we assume the users to be rational and to minimize an urgency-weighted combination of their immediate and future discomfort. To design the optimal pricing, we first derive a closed-form expression for the best individual response strategy. Second, we formulate the pricing design problem for each arc to achieve the societally optimal aggregate flows, and reformulate it so that it can be solved with gradient-free optimization methods. Our numerical simulations show that it is possible to achieve a near-optimal routing whilst significantly reducing the users' perceived discomfort when compared to a centralized optimal but urgency-unaware policy.
Vortex Feature Positioning: Bridging Tabular IIoT Data and Image-Based Deep LearningPark, Jong-Ik;Seong, Sihoon;Lee, JunKyu;Hong, Cheol-Ho
doi: 10.48550/arxiv.2303.09068pmid: N/A
Abstract:Tabular data from IIoT devices are typically analyzed using decision tree-based machine learning techniques, which struggle with high-dimensional and numeric data. To overcome these limitations, techniques converting tabular data into images have been developed, leveraging the strengths of image-based deep learning approaches such as Convolutional Neural Networks. These methods cluster similar features into distinct image areas with fixed sizes, regardless of the number of features, resembling actual photographs. However, this increases the possibility of overfitting, as similar features, when selected carefully in a tabular format, are often discarded to prevent this issue. Additionally, fixed image sizes can lead to wasted pixels with fewer features, resulting in computational inefficiency. We introduce Vortex Feature Positioning (VFP) to address these issues. VFP arranges features based on their correlation, spacing similar ones in a vortex pattern from the image center, with the image size determined by the attribute count. VFP outperforms traditional machine learning methods and existing conversion techniques in tests across seven datasets with varying real-valued attributes.
Uncertainty Driven Bottleneck Attention U-net for Organ at Risk SegmentationNazib, Abdullah;Hassan, Riad;Islam, Zahidul;Fookes, Clinton
doi: 10.48550/arxiv.2303.10796pmid: N/A
Abstract:Organ at risk (OAR) segmentation in computed tomography (CT) imagery is a difficult task for automated segmentation methods and can be crucial for downstream radiation treatment planning. U-net has become a de-facto standard for medical image segmentation and is frequently used as a common baseline in medical image segmentation tasks. In this paper, we propose a multiple decoder U-net architecture and use the segmentation disagreement between the decoders as attention to the bottleneck of the network for segmentation refinement. While feature correlation is considered as attention in most cases, in our case it is the uncertainty from the network used as attention. For accurate segmentation, we also proposed a CT intensity integrated regularization loss. Proposed regularisation helps model understand the intensity distribution of low contrast tissues. We tested our model on two publicly available OAR challenge datasets. We also conducted the ablation on each datasets with the proposed attention module and regularization loss. Experimental results demonstrate a clear accuracy improvement on both datasets.
A Joint Model and Data Driven Method for Distributed EstimationHe, Meng;Li, Ran;Huang, Chuan;Zhang, Shulong
doi: 10.1109/jiot.2023.3322940pmid: N/A
Abstract:This paper considers the problem of distributed estimation in wireless sensor networks (WSN), which is anticipated to support a wide range of applications such as the environmental monitoring, weather forecasting, and location estimation. To this end, we propose a joint model and data driven distributed estimation method by designing the optimal quantizers and fusion center (FC) based on the Bayesian and minimum mean square error (MMSE) criterions. First, universal mean square error (MSE) lower bound for the quantization-based distributed estimation is derived and adopted as the design metric for the quantizers. Then, the optimality of the mean-fusion operation for the FC with MMSE criterion is proved. Next, by exploiting different levels of the statistic information of the desired parameter and observation noise, a joint model and data driven method is proposed to train parts of the quantizer and FC modules as deep neural networks (DNNs), and two loss functions derived from the MMSE criterion are adopted for the sequential training scheme. Furthermore, we extend the above results to the case with multi-bit quantizers, considering both the parallel and one-hot quantization schemes. Finally, simulation results reveal that the proposed method outperforms the state-of-the-art schemes in typical scenarios.
Asynchronous Decentralized Federated Lifelong Learning for Landmark Localization in Medical ImagingZheng, Guangyao;Jacobs, Michael A.;Braverman, Vladimir;Parekh, Vishwa S.
doi: 10.48550/arxiv.2303.06783pmid: N/A
Abstract:Federated learning is a recent development in the machine learning area that allows a system of devices to train on one or more tasks without sharing their data to a single location or device. However, this framework still requires a centralized global model to consolidate individual models into one, and the devices train synchronously, which both can be potential bottlenecks for using federated learning. In this paper, we propose a novel method of asynchronous decentralized federated lifelong learning (ADFLL) method that inherits the merits of federated learning and can train on multiple tasks simultaneously without the need for a central node or synchronous training. Thus, overcoming the potential drawbacks of conventional federated learning. We demonstrate excellent performance on the brain tumor segmentation (BRATS) dataset for localizing the left ventricle on multiple image sequences and image orientation. Our framework allows agents to achieve the best performance with a mean distance error of 7.81, better than the conventional all-knowing agent's mean distance error of 11.78, and significantly (p=0.01) better than a conventional lifelong learning agent with a distance error of 15.17 after eight rounds of training. In addition, all ADFLL agents have comparable or better performance than a conventional LL agent. In conclusion, we developed an ADFLL framework with excellent performance and speed-up compared to conventional RL agents.
Joint Beamforming for RIS-Assisted Integrated Sensing and Communication SystemsXu, Yongqing;Li, Yong;Zhang, J. Andrew;Di Renzo, Marco;Quek, Tony Q. S.
doi: 10.1109/tcomm.2023.3344143pmid: N/A
Abstract:Integrated sensing and communications (ISAC) is an emerging critical technique for the next generation of communication systems. However, due to multiple performance metrics used for communication and sensing, the limited degrees-of-freedom (DoF) in optimizing ISAC systems poses a challenge. Reconfigurable intelligent surfaces (RIS) can introduce new DoF for beamforming in ISAC systems, thereby enhancing the performance of communication and sensing simultaneously. In this paper, we propose two optimization techniques for beamforming in RIS-assisted ISAC systems. The first technique is an alternating optimization (AO) algorithm based on the semidefinite relaxation (SDR) method and a one-dimension iterative (ODI) algorithm, which can maximize the radar mutual information (MI) while imposing constraints on the communication rates. The second technique is an AO algorithm based on the Riemannian gradient (RG) method, which can maximize the weighted ISAC performance metrics. Simulation results verify the effectiveness of the proposed schemes. The AO-SDR-ODI method is shown to achieve better communication and sensing performance, than the AO-RG method, at a higher complexity. It is also shown that the mean-squared-error (MSE) of the estimates of the sensing parameters decreases as the radar MI increases.
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone ConversationsMorrone, Giovanni;Cornell, Samuele;Serafini, Luca;Zovato, Enrico;Brutti, Alessio;Squartini, Stefano
doi: 10.1016/j.specom.2024.103081pmid: N/A
Abstract:Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation. It performs diarization by first separating the speakers and then applying voice activity detection (VAD) on each separated stream. In this work we conduct an in-depth study of SSGD in the conversational telephone speech (CTS) domain, focusing mainly on low-latency streaming diarization applications. We consider three state-of-the-art speech separation (SSep) algorithms and study their performance both in online and offline scenarios, considering non-causal and causal implementations as well as continuous SSep (CSS) windowed inference. We compare different SSGD algorithms on two widely used CTS datasets: CALLHOME and Fisher Corpus (Part 1 and 2) and evaluate both separation and diarization performance. To improve performance, a novel, causal and computationally efficient leakage removal algorithm is proposed, which significantly decreases false alarms. We also explore, for the first time, fully end-to-end SSGD integration between SSep and VAD modules. Crucially, this enables fine-tuning on real-world data for which oracle speakers sources are not available. In particular, our best model achieves 8.8% DER on CALLHOME, which outperforms the current state-of-the-art end-to-end neural diarization model, despite being trained on an order of magnitude less data and having significantly lower latency, i.e., 0.1 vs. 1 s. Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.
TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker DiarizationWang, Jiaming;Du, Zhihao;Zhang, Shiliang
doi: 10.48550/arxiv.2303.05397pmid: N/A
Abstract:Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios. In EEND, speaker diarization is formulated as a multi-label prediction problem, where speaker activities are estimated independently and their dependency are not well considered. To overcome these disadvantages, we employ the power set encoding to reformulate speaker diarization as a single-label classification problem and propose the overlap-aware EEND (EEND-OLA) model, in which speaker overlaps and dependency can be modeled explicitly. Inspired by the success of two-stage hybrid systems, we further propose a novel Two-stage OverLap-aware Diarization framework (TOLD) by involving a speaker overlap-aware post-processing (SOAP) model to iteratively refine the diarization results of EEND-OLA. Experimental results show that, compared with the original EEND, the proposed EEND-OLA achieves a 14.39% relative improvement in terms of diarization error rates (DER), and utilizing SOAP provides another 19.33% relative improvement. As a result, our method TOLD achieves a DER of 10.14% on the CALLHOME dataset, which is a new state-of-the-art result on this benchmark to the best of our knowledge.
CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super ResolutionChen, Zixuan;Lai, Jian-Huang;Yang, Lingxiao;Xie, Xiaohua
doi: 10.48550/arxiv.2303.16242pmid: N/A
Abstract:Medical image arbitrary-scale super-resolution (MIASSR) has recently gained widespread attention, aiming to super sample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their application in various scenarios. To overcome these limitations, we propose Cube-based Neural Radiance Field (CuNeRF), a zero-shot MIASSR framework that can yield medical images at arbitrary scales and viewpoints in a continuous domain. Unlike existing MIASSR methods that fit the mapping between low-resolution (LR) and HR volumes, CuNeRF focuses on building a coordinate-intensity continuous representation from LR volumes without the need for HR references. This is achieved by the proposed differentiable modules: including cube-based sampling, isotropic volume rendering, and cube-based hierarchical rendering. Through extensive experiments on magnetic resource imaging (MRI) and computed tomography (CT) modalities, we demonstrate that CuNeRF outperforms state-of-the-art MIASSR methods. CuNeRF yields better visual verisimilitude and reduces aliasing artifacts at various upsampling factors. Moreover, our CuNeRF does not need any LR-HR training pairs, which is more flexible and easier to be used than others. Our code is released at this https URL.