Paper Group ANR 1752
Approximating the Permanent by Sampling from Adaptive Partitions. Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer. PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning. Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation. GANs-NQM: A Generative Adversarial Networks based No R …
Approximating the Permanent by Sampling from Adaptive Partitions
Title | Approximating the Permanent by Sampling from Adaptive Partitions |
Authors | Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon |
Abstract | Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that can be computed efficiently. While the problem admits a fully polynomial randomized approximation scheme, this method has seen little use because it is both inefficient in practice and difficult to implement. We present AdaPart, a simple and efficient method for drawing exact samples from an unnormalized distribution. Using AdaPart, we show how to construct tight bounds on the permanent which hold with high probability, with guaranteed polynomial runtime for dense matrices. We find that AdaPart can provide empirical speedups exceeding 25x over prior sampling methods on matrices that are challenging for variational based approaches. Finally, in the context of multi-target tracking, exact sampling from the distribution defined by the matrix permanent allows us to use the optimal proposal distribution during particle filtering. Using AdaPart, we show that this leads to improved tracking performance using an order of magnitude fewer samples. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11856v1 |
https://arxiv.org/pdf/1911.11856v1.pdf | |
PWC | https://paperswithcode.com/paper/approximating-the-permanent-by-sampling-from-1 |
Repo | |
Framework | |
Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer
Title | Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer |
Authors | Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons, Dami Rebergen, John W Cassidy, Nirmesh Patel, Harry W Clifford |
Abstract | The emerging field of precision oncology relies on the accurate pinpointing of alterations in the molecular profile of a tumor to provide personalized targeted treatments. Current methodologies in the field commonly include the application of next generation sequencing technologies to a tumor sample, followed by the identification of mutations in the DNA known as somatic variants. The differentiation of these variants from sequencing error poses a classic classification problem, which has traditionally been approached with Bayesian statistics, and more recently with supervised machine learning methods such as neural networks. Although these methods provide greater accuracy, classic neural networks lack the ability to indicate the confidence of a variant call. In this paper, we explore the performance of deep Bayesian neural networks on next generation sequencing data, and their ability to give probability estimates for somatic variant calls. In addition to demonstrating similar performance in comparison to standard neural networks, we show that the resultant output probabilities make these better suited to the disparate and highly-variable sequencing data-sets these models are likely to encounter in the real world. We aim to deliver algorithms to oncologists for which model certainty better reflects accuracy, for improved clinical application. By moving away from point estimates to reliable confidence intervals, we expect the resultant clinical and treatment decisions to be more robust and more informed by the underlying reality of the tumor molecular profile. |
Tasks | |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.04174v1 |
https://arxiv.org/pdf/1912.04174v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-bayesian-recurrent-neural-networks-for |
Repo | |
Framework | |
PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning
Title | PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning |
Authors | Guangyao Zhai, Liang Liu, Linjian Zhang, Yong Liu |
Abstract | While many visual ego-motion algorithm variants have been proposed in the past decade, learning based ego-motion estimation methods have seen an increasing attention because of its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of fully trainable visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to map directly from input image pairs to an estimate of ego-motion (parameterized as 6-DoF transformation matrices). We introduce a novel two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU, with an explicit sequence pose estimation loss to achieve this. The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6 channels tensor for module-1 to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through a stacked ConvGRU module to generate the relative transformation pose of each image pair. We also augment the training data by randomly skipping frames to simulate the velocity variation which results in a better performance in turning and high-velocity situations. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark. The experiments show a competitive performance of the proposed method to the geometric method and encourage further exploration of learning based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results. |
Tasks | Calibration, Motion Estimation, Pose Estimation, Visual Odometry |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08095v1 |
https://arxiv.org/pdf/1906.08095v1.pdf | |
PWC | https://paperswithcode.com/paper/poseconvgru-a-monocular-approach-for-visual |
Repo | |
Framework | |
Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation
Title | Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation |
Authors | Huiyun Yang, Shujian Huang, Xinyu Dai, Jiajun Chen |
Abstract | In sequence labeling, previous domain adaptation methods focus on the adaptation from the source domain to the entire target domain without considering the diversity of individual target domain samples, which may lead to negative transfer results for certain samples. Besides, an important characteristic of sequence labeling tasks is that different elements within a given sample may also have diverse domain relevance, which requires further consideration. To take the multi-level domain relevance discrepancy into account, in this paper, we propose a fine-grained knowledge fusion model with the domain relevance modeling scheme to control the balance between learning from the target domain data and learning from the source domain model. Experiments on three sequence labeling tasks show that our fine-grained knowledge fusion model outperforms strong baselines and other state-of-the-art sequence labeling domain adaptation methods. |
Tasks | Domain Adaptation |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04315v1 |
https://arxiv.org/pdf/1909.04315v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-knowledge-fusion-for-sequence |
Repo | |
Framework | |
GANs-NQM: A Generative Adversarial Networks based No Reference Quality Assessment Metric for RGB-D Synthesized Views
Title | GANs-NQM: A Generative Adversarial Networks based No Reference Quality Assessment Metric for RGB-D Synthesized Views |
Authors | Suiyi Ling, Jing Li, Junle Wang, Patrick Le Callet |
Abstract | In this paper, we proposed a no-reference (NR) quality metric for RGB plus image-depth (RGB-D) synthesis images based on Generative Adversarial Networks (GANs), namely GANs-NQM. Due to the failure of the inpainting on dis-occluded regions in RGB-D synthesis process, to capture the non-uniformly distributed local distortions and to learn their impact on perceptual quality are challenging tasks for objective quality metrics. In our study, based on the characteristics of GANs, we proposed i) a novel training strategy of GANs for RGB-D synthesis images using existing large-scale computer vision datasets rather than RGB-D dataset; ii) a referenceless quality metric based on the trained discriminator by learning a `Bag of Distortion Word’ (BDW) codebook and a local distortion regions selector; iii) a hole filling inpainter, i.e., the generator of the trained GANs, for RGB-D dis-occluded regions as a side outcome. According to the experimental results on IRCCyN/IVC DIBR database, the proposed model outperforms the state-of-the-art quality metrics, in addition, is more applicable in real scenarios. The corresponding context inpainter also shows appealing results over other inpainting algorithms. | |
Tasks | |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12088v1 |
http://arxiv.org/pdf/1903.12088v1.pdf | |
PWC | https://paperswithcode.com/paper/gans-nqm-a-generative-adversarial-networks |
Repo | |
Framework | |
Incremental Class Discovery for Semantic Segmentation with RGBD Sensing
Title | Incremental Class Discovery for Semantic Segmentation with RGBD Sensing |
Authors | Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, Kris Kitani |
Abstract | This work addresses the task of open world semantic segmentation using RGBD sensing to discover new semantic classes over time. Although there are many types of objects in the real-word, current semantic segmentation methods make a closed world assumption and are trained only to segment a limited number of object classes. Towards a more open world approach, we propose a novel method that incrementally learns new classes for image segmentation. The proposed system first segments each RGBD frame using both color and geometric information, and then aggregates that information to build a single segmented dense 3D map of the environment. The segmented 3D map representation is a key component of our approach as it is used to discover new object classes by identifying coherent regions in the 3D map that have no semantic label. The use of coherent region in the 3D map as a primitive element, rather than traditional elements such as surfels or voxels, also significantly reduces the computational complexity and memory use of our method. It thus leads to semi-real-time performance at {10.7}Hz when incrementally updating the dense 3D map at every frame. Through experiments on the NYUDv2 dataset, we demonstrate that the proposed method is able to correctly cluster objects of both known and unseen classes. We also show the quantitative comparison with the state-of-the-art supervised methods, the processing time of each step, and the influences of each component. |
Tasks | Semantic Segmentation |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.10008v1 |
https://arxiv.org/pdf/1907.10008v1.pdf | |
PWC | https://paperswithcode.com/paper/incremental-class-discovery-for-semantic |
Repo | |
Framework | |
Fusion of Detected Objects in Text for Visual Question Answering
Title | Fusion of Detected Objects in Text for Visual Question Answering |
Authors | Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter |
Abstract | To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language. The “Bounding Boxes in Text Transformer” (B2T2) also leverages referential information binding words to portions of the image in a single unified architecture. B2T2 is highly effective on the Visual Commonsense Reasoning benchmark (https://visualcommonsense.com), achieving a new state-of-the-art with a 25% relative reduction in error rate compared to published baselines and obtaining the best performance to date on the public leaderboard (as of May 22, 2019). A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture. A reference implementation of our models is provided (https://github.com/google-research/language/tree/master/language/question_answering/b2t2). |
Tasks | Question Answering, Visual Commonsense Reasoning, Visual Question Answering |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05054v2 |
https://arxiv.org/pdf/1908.05054v2.pdf | |
PWC | https://paperswithcode.com/paper/fusion-of-detected-objects-in-text-for-visual |
Repo | |
Framework | |
Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks
Title | Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks |
Authors | Yi Xiang Marcus Tan, Yuval Elovici, Alexander Binder |
Abstract | We investigate to what extent alternative variants of Artificial Neural Networks (ANNs) are susceptible to adversarial attacks. We analyse the adversarial robustness of conventional, stochastic ANNs and Spiking Neural Networks (SNNs) in the raw image space, across three different datasets. Our experiments reveal that stochastic ANN variants are almost equally as susceptible as conventional ANNs when faced with simple iterative gradient-based attacks in the white-box setting. However we observe, that in black-box settings, stochastic ANNs are more robust than conventional ANNs, when faced with boundary attacks, transferability and surrogate attacks. Consequently, we propose improved attacks and defence mechanisms for stochastic ANNs in black-box settings. When performing surrogate-based black-box attacks, one can employ stochastic models as surrogates to observe higher attack success on both stochastic and deterministic targets. This success can be further improved with our proposed Variance Mimicking (VM) surrogate training method, against stochastic targets. Finally, adopting a defender’s perspective, we investigate the plausibility of employing stochastic switching of model mixtures as a viable hardening mechanism. We observe that such a scheme does provide a partial hardening. |
Tasks | |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.03609v3 |
https://arxiv.org/pdf/1912.03609v3.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-back-alleys-analysing-the |
Repo | |
Framework | |
Statistical Learning for Analysis of Networked Control Systems over Unknown Channels
Title | Statistical Learning for Analysis of Networked Control Systems over Unknown Channels |
Authors | Konstantinos Gatsis, George J. Pappas |
Abstract | Recent control trends are increasingly relying on communication networks and wireless channels to close the loop for Internet-of-Things applications. Traditionally these approaches are model-based, i.e., assuming a network or channel model they are focused on stability analysis and appropriate controller designs. However the availability of such wireless channel modeling is fundamentally challenging in practice as channels are typically unknown a priori and only available through data samples. In this work we aim to develop algorithms that rely on channel sample data to determine the stability and performance of networked control tasks. In this regard our work is the first to characterize the amount of channel modeling that is required to answer such a question. Specifically we examine how many channel data samples are required in order to answer with high confidence whether a given networked control system is stable or not. This analysis is based on the notion of sample complexity from the learning literature and is facilitated by concentration inequalities. Moreover we establish a direct relation between the sample complexity and the networked system stability margin, i.e., the underlying packet success rate of the channel and the spectral radius of the dynamics of the control system. This illustrates that it becomes impractical to verify stability under a large range of plant and channel configurations. We validate our theoretical results in numerical simulations. |
Tasks | |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03422v1 |
https://arxiv.org/pdf/1911.03422v1.pdf | |
PWC | https://paperswithcode.com/paper/statistical-learning-for-analysis-of |
Repo | |
Framework | |
Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation
Title | Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation |
Authors | Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, Weihua Luo |
Abstract | Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches. |
Tasks | Machine Translation, Transfer Learning |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01214v1 |
https://arxiv.org/pdf/1912.01214v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-pre-training-based-transfer-for |
Repo | |
Framework | |
Moment-Based Variational Inference for Markov Jump Processes
Title | Moment-Based Variational Inference for Markov Jump Processes |
Authors | Christian Wildner, Heinz Koeppl |
Abstract | We propose moment-based variational inference as a flexible framework for approximate smoothing of latent Markov jump processes. The main ingredient of our approach is to partition the set of all transitions of the latent process into classes. This allows to express the Kullback-Leibler divergence between the approximate and the exact posterior process in terms of a set of moment functions that arise naturally from the chosen partition. To illustrate possible choices of the partition, we consider special classes of jump processes that frequently occur in applications. We then extend the results to parameter inference and demonstrate the method on several examples. |
Tasks | |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05451v1 |
https://arxiv.org/pdf/1905.05451v1.pdf | |
PWC | https://paperswithcode.com/paper/moment-based-variational-inference-for-markov |
Repo | |
Framework | |
A critical analysis of self-supervision, or what we can learn from a single image
Title | A critical analysis of self-supervision, or what we can learn from a single image |
Authors | Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi |
Abstract | We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset. |
Tasks | Data Augmentation, Representation Learning, Unsupervised Representation Learning |
Published | 2019-04-30 |
URL | https://arxiv.org/abs/1904.13132v3 |
https://arxiv.org/pdf/1904.13132v3.pdf | |
PWC | https://paperswithcode.com/paper/surprising-effectiveness-of-few-image |
Repo | |
Framework | |
DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector
Title | DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector |
Authors | Masahiro Yasuda, Yuma Koizumi, Luca Mazzon, Shoichiro Saito, Hisashi Uematsu |
Abstract | We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation. Since the accuracy of IV-based DOA estimation degrades due to environmental noise and reverberation, two DNNs are used to remove such effects from the observed IVs. DOA is then estimated from the refined IVs based on the physics of wave propagation. Experiments on an open dataset showed that the average DOA error of the proposed method was 0.528 degrees, and it outperformed a conventional IV-based and DNN-based DOA estimation method. |
Tasks | Denoising |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04415v1 |
https://arxiv.org/pdf/1910.04415v1.pdf | |
PWC | https://paperswithcode.com/paper/doa-estimation-by-dnn-based-denoising-and |
Repo | |
Framework | |
Merging External Bilingual Pairs into Neural Machine Translation
Title | Merging External Bilingual Pairs into Neural Machine Translation |
Authors | Tao Wang, Shaohui Kuang, Deyi Xiong, António Branco |
Abstract | As neural machine translation (NMT) is not easily amenable to explicit correction of errors, incorporating pre-specified translations into NMT is widely regarded as a non-trivial challenge. In this paper, we propose and explore three methods to endow NMT with pre-specified bilingual pairs. Instead, for instance, of modifying the beam search algorithm during decoding or making complex modifications to the attention mechanism — mainstream approaches to tackling this challenge —, we experiment with the training data being appropriately pre-processed to add information about pre-specified translations. Extra embeddings are also used to distinguish pre-specified tokens from the other tokens. Extensive experimentation and analysis indicate that over 99% of the pre-specified phrases are successfully translated (given a 85% baseline) and that there is also a substantive improvement in translation quality with the methods explored here. |
Tasks | Machine Translation |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00567v1 |
https://arxiv.org/pdf/1912.00567v1.pdf | |
PWC | https://paperswithcode.com/paper/merging-external-bilingual-pairs-into-neural |
Repo | |
Framework | |
Who Needs Words? Lexicon-Free Speech Recognition
Title | Who Needs Words? Lexicon-Free Speech Recognition |
Authors | Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert |
Abstract | Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words. In this paper, we show that character-based language models (LM) can perform as well as word-based LMs for speech recognition, in word error rates (WER), even without restricting the decoding to a lexicon. We study character-based LMs and show that convolutional LMs can effectively leverage large (character) contexts, which is key for good speech recognition performance downstream. We specifically show that the lexicon-free decoding performance (WER) on utterances with OOV words using character-based LMs is better than lexicon-based decoding, both with character or word-based LMs. |
Tasks | Speech Recognition |
Published | 2019-04-09 |
URL | https://arxiv.org/abs/1904.04479v4 |
https://arxiv.org/pdf/1904.04479v4.pdf | |
PWC | https://paperswithcode.com/paper/who-needs-words-lexicon-free-speech |
Repo | |
Framework | |