January 25, 2020

3191 words 15 mins read

Paper Group ANR 1752

Approximating the Permanent by Sampling from Adaptive Partitions. Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer. PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning. Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation. GANs-NQM: A Generative Adversarial Networks based No R …

Approximating the Permanent by Sampling from Adaptive Partitions


Title	Approximating the Permanent by Sampling from Adaptive Partitions
Authors	Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon
Abstract	Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that can be computed efficiently. While the problem admits a fully polynomial randomized approximation scheme, this method has seen little use because it is both inefficient in practice and difficult to implement. We present AdaPart, a simple and efficient method for drawing exact samples from an unnormalized distribution. Using AdaPart, we show how to construct tight bounds on the permanent which hold with high probability, with guaranteed polynomial runtime for dense matrices. We find that AdaPart can provide empirical speedups exceeding 25x over prior sampling methods on matrices that are challenging for variational based approaches. Finally, in the context of multi-target tracking, exact sampling from the distribution defined by the matrix permanent allows us to use the optimal proposal distribution during particle filtering. Using AdaPart, we show that this leads to improved tracking performance using an order of magnitude fewer samples.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11856v1
PDF	https://arxiv.org/pdf/1911.11856v1.pdf
PWC	https://paperswithcode.com/paper/approximating-the-permanent-by-sampling-from-1
Repo
Framework

Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer


Title	Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer
Authors	Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons, Dami Rebergen, John W Cassidy, Nirmesh Patel, Harry W Clifford
Abstract	The emerging field of precision oncology relies on the accurate pinpointing of alterations in the molecular profile of a tumor to provide personalized targeted treatments. Current methodologies in the field commonly include the application of next generation sequencing technologies to a tumor sample, followed by the identification of mutations in the DNA known as somatic variants. The differentiation of these variants from sequencing error poses a classic classification problem, which has traditionally been approached with Bayesian statistics, and more recently with supervised machine learning methods such as neural networks. Although these methods provide greater accuracy, classic neural networks lack the ability to indicate the confidence of a variant call. In this paper, we explore the performance of deep Bayesian neural networks on next generation sequencing data, and their ability to give probability estimates for somatic variant calls. In addition to demonstrating similar performance in comparison to standard neural networks, we show that the resultant output probabilities make these better suited to the disparate and highly-variable sequencing data-sets these models are likely to encounter in the real world. We aim to deliver algorithms to oncologists for which model certainty better reflects accuracy, for improved clinical application. By moving away from point estimates to reliable confidence intervals, we expect the resultant clinical and treatment decisions to be more robust and more informed by the underlying reality of the tumor molecular profile.
Tasks
Published	2019-12-06
URL	https://arxiv.org/abs/1912.04174v1
PDF	https://arxiv.org/pdf/1912.04174v1.pdf
PWC	https://paperswithcode.com/paper/deep-bayesian-recurrent-neural-networks-for
Repo
Framework

PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning


Title	PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning
Authors	Guangyao Zhai, Liang Liu, Linjian Zhang, Yong Liu
Abstract	While many visual ego-motion algorithm variants have been proposed in the past decade, learning based ego-motion estimation methods have seen an increasing attention because of its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of fully trainable visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to map directly from input image pairs to an estimate of ego-motion (parameterized as 6-DoF transformation matrices). We introduce a novel two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU, with an explicit sequence pose estimation loss to achieve this. The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6 channels tensor for module-1 to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through a stacked ConvGRU module to generate the relative transformation pose of each image pair. We also augment the training data by randomly skipping frames to simulate the velocity variation which results in a better performance in turning and high-velocity situations. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark. The experiments show a competitive performance of the proposed method to the geometric method and encourage further exploration of learning based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results.
Tasks	Calibration, Motion Estimation, Pose Estimation, Visual Odometry
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08095v1
PDF	https://arxiv.org/pdf/1906.08095v1.pdf
PWC	https://paperswithcode.com/paper/poseconvgru-a-monocular-approach-for-visual
Repo
Framework

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation


Title	Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation
Authors	Huiyun Yang, Shujian Huang, Xinyu Dai, Jiajun Chen
Abstract	In sequence labeling, previous domain adaptation methods focus on the adaptation from the source domain to the entire target domain without considering the diversity of individual target domain samples, which may lead to negative transfer results for certain samples. Besides, an important characteristic of sequence labeling tasks is that different elements within a given sample may also have diverse domain relevance, which requires further consideration. To take the multi-level domain relevance discrepancy into account, in this paper, we propose a fine-grained knowledge fusion model with the domain relevance modeling scheme to control the balance between learning from the target domain data and learning from the source domain model. Experiments on three sequence labeling tasks show that our fine-grained knowledge fusion model outperforms strong baselines and other state-of-the-art sequence labeling domain adaptation methods.
Tasks	Domain Adaptation
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04315v1
PDF	https://arxiv.org/pdf/1909.04315v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-knowledge-fusion-for-sequence
Repo
Framework

GANs-NQM: A Generative Adversarial Networks based No Reference Quality Assessment Metric for RGB-D Synthesized Views


Title	GANs-NQM: A Generative Adversarial Networks based No Reference Quality Assessment Metric for RGB-D Synthesized Views
Authors	Suiyi Ling, Jing Li, Junle Wang, Patrick Le Callet
Abstract	In this paper, we proposed a no-reference (NR) quality metric for RGB plus image-depth (RGB-D) synthesis images based on Generative Adversarial Networks (GANs), namely GANs-NQM. Due to the failure of the inpainting on dis-occluded regions in RGB-D synthesis process, to capture the non-uniformly distributed local distortions and to learn their impact on perceptual quality are challenging tasks for objective quality metrics. In our study, based on the characteristics of GANs, we proposed i) a novel training strategy of GANs for RGB-D synthesis images using existing large-scale computer vision datasets rather than RGB-D dataset; ii) a referenceless quality metric based on the trained discriminator by learning a `Bag of Distortion Word’ (BDW) codebook and a local distortion regions selector; iii) a hole filling inpainter, i.e., the generator of the trained GANs, for RGB-D dis-occluded regions as a side outcome. According to the experimental results on IRCCyN/IVC DIBR database, the proposed model outperforms the state-of-the-art quality metrics, in addition, is more applicable in real scenarios. The corresponding context inpainter also shows appealing results over other inpainting algorithms. \|
Tasks
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12088v1
PDF	http://arxiv.org/pdf/1903.12088v1.pdf
PWC	https://paperswithcode.com/paper/gans-nqm-a-generative-adversarial-networks
Repo
Framework

Incremental Class Discovery for Semantic Segmentation with RGBD Sensing


Title	Incremental Class Discovery for Semantic Segmentation with RGBD Sensing
Authors	Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, Kris Kitani
Abstract	This work addresses the task of open world semantic segmentation using RGBD sensing to discover new semantic classes over time. Although there are many types of objects in the real-word, current semantic segmentation methods make a closed world assumption and are trained only to segment a limited number of object classes. Towards a more open world approach, we propose a novel method that incrementally learns new classes for image segmentation. The proposed system first segments each RGBD frame using both color and geometric information, and then aggregates that information to build a single segmented dense 3D map of the environment. The segmented 3D map representation is a key component of our approach as it is used to discover new object classes by identifying coherent regions in the 3D map that have no semantic label. The use of coherent region in the 3D map as a primitive element, rather than traditional elements such as surfels or voxels, also significantly reduces the computational complexity and memory use of our method. It thus leads to semi-real-time performance at {10.7}Hz when incrementally updating the dense 3D map at every frame. Through experiments on the NYUDv2 dataset, we demonstrate that the proposed method is able to correctly cluster objects of both known and unseen classes. We also show the quantitative comparison with the state-of-the-art supervised methods, the processing time of each step, and the influences of each component.
Tasks	Semantic Segmentation
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10008v1
PDF	https://arxiv.org/pdf/1907.10008v1.pdf
PWC	https://paperswithcode.com/paper/incremental-class-discovery-for-semantic
Repo
Framework

Fusion of Detected Objects in Text for Visual Question Answering


Title	Fusion of Detected Objects in Text for Visual Question Answering
Authors	Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter
Abstract	To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language. The “Bounding Boxes in Text Transformer” (B2T2) also leverages referential information binding words to portions of the image in a single unified architecture. B2T2 is highly effective on the Visual Commonsense Reasoning benchmark (https://visualcommonsense.com), achieving a new state-of-the-art with a 25% relative reduction in error rate compared to published baselines and obtaining the best performance to date on the public leaderboard (as of May 22, 2019). A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture. A reference implementation of our models is provided (https://github.com/google-research/language/tree/master/language/question_answering/b2t2).
Tasks	Question Answering, Visual Commonsense Reasoning, Visual Question Answering
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05054v2
PDF	https://arxiv.org/pdf/1908.05054v2.pdf
PWC	https://paperswithcode.com/paper/fusion-of-detected-objects-in-text-for-visual
Repo
Framework

Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks


Title	Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks
Authors	Yi Xiang Marcus Tan, Yuval Elovici, Alexander Binder
Abstract	We investigate to what extent alternative variants of Artificial Neural Networks (ANNs) are susceptible to adversarial attacks. We analyse the adversarial robustness of conventional, stochastic ANNs and Spiking Neural Networks (SNNs) in the raw image space, across three different datasets. Our experiments reveal that stochastic ANN variants are almost equally as susceptible as conventional ANNs when faced with simple iterative gradient-based attacks in the white-box setting. However we observe, that in black-box settings, stochastic ANNs are more robust than conventional ANNs, when faced with boundary attacks, transferability and surrogate attacks. Consequently, we propose improved attacks and defence mechanisms for stochastic ANNs in black-box settings. When performing surrogate-based black-box attacks, one can employ stochastic models as surrogates to observe higher attack success on both stochastic and deterministic targets. This success can be further improved with our proposed Variance Mimicking (VM) surrogate training method, against stochastic targets. Finally, adopting a defender’s perspective, we investigate the plausibility of employing stochastic switching of model mixtures as a viable hardening mechanism. We observe that such a scheme does provide a partial hardening.
Tasks
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03609v3
PDF	https://arxiv.org/pdf/1912.03609v3.pdf
PWC	https://paperswithcode.com/paper/exploring-the-back-alleys-analysing-the
Repo
Framework

Statistical Learning for Analysis of Networked Control Systems over Unknown Channels


Title	Statistical Learning for Analysis of Networked Control Systems over Unknown Channels
Authors	Konstantinos Gatsis, George J. Pappas
Abstract	Recent control trends are increasingly relying on communication networks and wireless channels to close the loop for Internet-of-Things applications. Traditionally these approaches are model-based, i.e., assuming a network or channel model they are focused on stability analysis and appropriate controller designs. However the availability of such wireless channel modeling is fundamentally challenging in practice as channels are typically unknown a priori and only available through data samples. In this work we aim to develop algorithms that rely on channel sample data to determine the stability and performance of networked control tasks. In this regard our work is the first to characterize the amount of channel modeling that is required to answer such a question. Specifically we examine how many channel data samples are required in order to answer with high confidence whether a given networked control system is stable or not. This analysis is based on the notion of sample complexity from the learning literature and is facilitated by concentration inequalities. Moreover we establish a direct relation between the sample complexity and the networked system stability margin, i.e., the underlying packet success rate of the channel and the spectral radius of the dynamics of the control system. This illustrates that it becomes impractical to verify stability under a large range of plant and channel configurations. We validate our theoretical results in numerical simulations.
Tasks
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03422v1
PDF	https://arxiv.org/pdf/1911.03422v1.pdf
PWC	https://paperswithcode.com/paper/statistical-learning-for-analysis-of
Repo
Framework

Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation


Title	Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation
Authors	Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, Weihua Luo
Abstract	Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches.
Tasks	Machine Translation, Transfer Learning
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01214v1
PDF	https://arxiv.org/pdf/1912.01214v1.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-pre-training-based-transfer-for
Repo
Framework

Moment-Based Variational Inference for Markov Jump Processes


Title	Moment-Based Variational Inference for Markov Jump Processes
Authors	Christian Wildner, Heinz Koeppl
Abstract	We propose moment-based variational inference as a flexible framework for approximate smoothing of latent Markov jump processes. The main ingredient of our approach is to partition the set of all transitions of the latent process into classes. This allows to express the Kullback-Leibler divergence between the approximate and the exact posterior process in terms of a set of moment functions that arise naturally from the chosen partition. To illustrate possible choices of the partition, we consider special classes of jump processes that frequently occur in applications. We then extend the results to parameter inference and demonstrate the method on several examples.
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05451v1
PDF	https://arxiv.org/pdf/1905.05451v1.pdf
PWC	https://paperswithcode.com/paper/moment-based-variational-inference-for-markov
Repo
Framework

A critical analysis of self-supervision, or what we can learn from a single image


Title	A critical analysis of self-supervision, or what we can learn from a single image
Authors	Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
Abstract	We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset.
Tasks	Data Augmentation, Representation Learning, Unsupervised Representation Learning
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13132v3
PDF	https://arxiv.org/pdf/1904.13132v3.pdf
PWC	https://paperswithcode.com/paper/surprising-effectiveness-of-few-image
Repo
Framework

DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector


Title	DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector
Authors	Masahiro Yasuda, Yuma Koizumi, Luca Mazzon, Shoichiro Saito, Hisashi Uematsu
Abstract	We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation. Since the accuracy of IV-based DOA estimation degrades due to environmental noise and reverberation, two DNNs are used to remove such effects from the observed IVs. DOA is then estimated from the refined IVs based on the physics of wave propagation. Experiments on an open dataset showed that the average DOA error of the proposed method was 0.528 degrees, and it outperformed a conventional IV-based and DNN-based DOA estimation method.
Tasks	Denoising
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04415v1
PDF	https://arxiv.org/pdf/1910.04415v1.pdf
PWC	https://paperswithcode.com/paper/doa-estimation-by-dnn-based-denoising-and
Repo
Framework

Merging External Bilingual Pairs into Neural Machine Translation


Title	Merging External Bilingual Pairs into Neural Machine Translation
Authors	Tao Wang, Shaohui Kuang, Deyi Xiong, António Branco
Abstract	As neural machine translation (NMT) is not easily amenable to explicit correction of errors, incorporating pre-specified translations into NMT is widely regarded as a non-trivial challenge. In this paper, we propose and explore three methods to endow NMT with pre-specified bilingual pairs. Instead, for instance, of modifying the beam search algorithm during decoding or making complex modifications to the attention mechanism — mainstream approaches to tackling this challenge —, we experiment with the training data being appropriately pre-processed to add information about pre-specified translations. Extra embeddings are also used to distinguish pre-specified tokens from the other tokens. Extensive experimentation and analysis indicate that over 99% of the pre-specified phrases are successfully translated (given a 85% baseline) and that there is also a substantive improvement in translation quality with the methods explored here.
Tasks	Machine Translation
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00567v1
PDF	https://arxiv.org/pdf/1912.00567v1.pdf
PWC	https://paperswithcode.com/paper/merging-external-bilingual-pairs-into-neural
Repo
Framework

Who Needs Words? Lexicon-Free Speech Recognition


Title	Who Needs Words? Lexicon-Free Speech Recognition
Authors	Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
Abstract	Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words. In this paper, we show that character-based language models (LM) can perform as well as word-based LMs for speech recognition, in word error rates (WER), even without restricting the decoding to a lexicon. We study character-based LMs and show that convolutional LMs can effectively leverage large (character) contexts, which is key for good speech recognition performance downstream. We specifically show that the lexicon-free decoding performance (WER) on utterances with OOV words using character-based LMs is better than lexicon-based decoding, both with character or word-based LMs.
Tasks	Speech Recognition
Published	2019-04-09
URL	https://arxiv.org/abs/1904.04479v4
PDF	https://arxiv.org/pdf/1904.04479v4.pdf
PWC	https://paperswithcode.com/paper/who-needs-words-lexicon-free-speech
Repo
Framework