Human detection of machine manipulated mediaGroh, Matthew;Epstein, Ziv;Obradovich, Nick;Cebrian, Manuel;Rahwan, Iyad
doi: N/Apmid: N/A
Abstract: Recent advances in neural networks for content generation enable artificial intelligence (AI) models to generate high-quality media manipulations. Here we report on a randomized experiment designed to study the effect of exposure to media manipulations on over 15,000 individuals' ability to discern machine-manipulated media. We engineer a neural network to plausibly and automatically remove objects from images, and we deploy this neural network online with a randomized experiment where participants can guess which image out of a pair of images has been manipulated. The system provides participants feedback on the accuracy of each guess. In the experiment, we randomize the order in which images are presented, allowing causal identification of the learning curve surrounding participants' ability to detect fake content. We find sizable and robust evidence that individuals learn to detect fake content through exposure to manipulated media when provided iterative feedback on their detection attempts. Over a succession of only ten images, participants increase their rating accuracy by over ten percentage points. Our study provides initial evidence that human ability to detect fake, machine-generated content may increase alongside the prevalence of such media online.
Generative Mask Pyramid Network for CT/CBCT Metal Artifact Reduction with Joint Projection-Sinogram CorrectionLiao, Haofu;Lin, Wei-An;Huo, Zhimin;Vogelsang, Levon;Sehnert, William J.;Zhou, S. Kevin;Luo, Jiebo
doi: 10.48550/arXiv.1907.00294pmid: N/A
Abstract: A conventional approach to computed tomography (CT) or cone beam CT (CBCT) metal artifact reduction is to replace the X-ray projection data within the metal trace with synthesized data. However, existing projection or sinogram completion methods cannot always produce anatomically consistent information to fill the metal trace, and thus, when the metallic implant is large, significant secondary artifacts are often introduced. In this work, we propose to replace metal artifact affected regions with anatomically consistent content through joint projection-sinogram correction as well as adversarial learning. To handle the metallic implants of diverse shapes and large sizes, we also propose a novel mask pyramid network that enforces the mask information across the network's encoding layers and a mask fusion loss that reduces early saturation of adversarial training. Our experimental results show that the proposed projection-sinogram correction designs are effective and our method recovers information from the metal traces better than the state-of-the-art methods.
COded Taking And Giving (COTAG): Enhancing Transport Layer Performance over Indoor Millimeter Wave Access NetworksWu, Zongshen;Huang, Chin-Ya;Ramanathan, Parameswaran
doi: N/Apmid: N/A
Abstract: Millimeter wave (mmWave) access networks have the potential to meet the high-throughput and low-latency needs of immersive applications. However, due to the highly directional nature of the mmWave beams and their susceptibility to beam misalignment and blockage resulting from user movements and rotations, the associated mmWave links are vulnerable to large channel fluctuations. These fluctuations result in disproportionately adverse effects on performance of transport layer protocols such as Transmission Control Protocol (TCP). To overcome this challenge, we propose a network layer solution, COded Taking And Giving (COTAG) scheme to sustain low-latency and high-throughput end-to-end TCP performance in dually connected networks. In particular, COTAG creates network encoded packets at the network gateway and each access point (AP) aiming to adaptively take the spare bandwidth on each link for transmission. Further, if one link bandwidth drops due to user movements, COTAG actively abandons the transmission opportunity by conditionally dropping packets. Consequently, COTAG actively adapts to link quality changes in mmWave access network and enhances the TCP performance without jeopardizing the latency of immersive content delivery. To evaluate the effectiveness of the proposed COTAG, we conduct experiments using off-the-shelf APs and network simulations. The evaluation results show that COTAG improves end-to-end TCP performance significantly on both throughput and latency.
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language RecognitionLing, Shaoshi;Salazar, Julian;Liu, Yuzong;Kirchhoff, Katrin
doi: 10.21437/Odyssey.2020-2pmid: N/A
Abstract: We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition. This is accomplished by training on two objectives: the first, inspired by adapting BERT to the continuous domain, involves masking spans of input frames and reconstructing the whole sequence for acoustic representation learning; the second, inspired by the success of bottleneck features from ASR, is a sequence-level CTC loss applied to phoneme labels for phonetic representation learning. We pretrain two BERTphone models (one on Fisher and one on TED-LIUM) and use them as feature extractors into x-vector-style DNNs for both tasks. We attain a state-of-the-art $C_{\text{avg}}$ of 6.16 on the challenging LRE07 3sec closed-set language recognition task. On Fisher and VoxCeleb speaker recognition tasks, we see an 18% relative reduction in speaker EER when training on BERTphone vectors instead of MFCCs. In general, BERTphone outperforms previous phonetic pretraining approaches on the same data. We release our code and models at this https URL.
Exploring Conditioning for Generative Music Systems with Human-Interpretable ControlsMeade, Nicholas;Barreyre, Nicholas;Lowe, Scott C.;Oore, Sageev
doi: N/Apmid: N/A
Abstract: Performance RNN is a machine-learning system designed primarily for the generation of solo piano performances using an event-based (rather than audio) representation. More specifically, Performance RNN is a long short-term memory (LSTM) based recurrent neural network that models polyphonic music with expressive timing and dynamics (Oore et al., 2018). The neural network uses a simple language model based on the Musical Instrument Digital Interface (MIDI) file format. Performance RNN is trained on the e-Piano Junior Competition Dataset (International Piano e-Competition, 2018), a collection of solo piano performances by expert pianists. As an artistic tool, one of the limitations of the original model has been the lack of useable controls. The standard form of Performance RNN can generate interesting pieces, but little control is provided over what specifically is generated. This paper explores a set of conditioning-based controls used to influence the generation process.
Searching for Apparel Products from Images in the WildTran, Son;Du, Ming;Chanda, Sampath;Manmatha, R.;Taylor, Cj
doi: 10.48550/arXiv.1907.02244pmid: N/A
Abstract: In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since the original images are taken under different pose and lighting conditions. The system initially localizes high-level descriptive regions (top, bottom, wristwear. . . ) using multiple CNN detectors such as YOLO and SSD that are trained specifically for apparel domain. It then classifies these regions into more specific regions such as t-shirts, tunic or dresses. Finally, a feature embedding learned using a multi-task function is recovered for every item and then compared with corresponding items in the online Catalog database and ranked according to distance. We validate our approach component-wise using benchmark datasets and end-to-end using human evaluation.
Artificial Intelligence Enhances the Performance of Chaos-based Wireless CommunicationRen, Hai-Peng;Zhao, Hong-Er;Bai, Chao;Yin, Hui-Ping;Grebogi, Celso
doi: 10.1049/cmu2.12162pmid: N/A
Abstract: Some new findings for chaos-based wireless communication systems have been identified recently. First, chaos has proven to be the optimal communication waveform because chaotic signals can achieve the maximum signal to noise ratio at receiver with the simplest matched filter. Second, the information transmitted in chaotic signals is not modified by the multipath wireless channel. Third, chaos properties can be used to relief inter-symbol interference (ISI) caused by multipath propagation. Although recent work has reported the method of obtaining the optimal threshold to eliminate the ISI in chaos-based wireless communication, its practical implementation is still a challenge. By knowing the channel parameters and all symbols, especially the future symbol to be transmitted in advance, it is almost an impossible task in the practical communication systems. Owning to Artificial intelligence (AI) recent developments, Convolutional Neural Network (CNN) with deep learning structure is being proposed to predict future symbols based on the received signal, so as to further reduce ISI and obtain better bit error rate (BER) performance as compared to that used the existing sub-optimal threshold. The feature of the method involves predicting the future symbol and obtaining a better threshold suitable for time variant channel. Numerical simulation and experimental results validate our theory and the superiority of the proposed method.