January 29, 2020

3246 words 16 mins read

Paper Group ANR 756

Paper Group ANR 756

Towards AutoML in the presence of Drift: first results. The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design. SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. EDUCE: Explaining model Decisions through …

Towards AutoML in the presence of Drift: first results

Title Towards AutoML in the presence of Drift: first results
Authors Jorge G. Madrid, Hugo Jair Escalante, Eduardo F. Morales, Wei-Wei Tu, Yang Yu, Lisheng Sun-Hosoya, Isabelle Guyon, Michele Sebag
Abstract Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology.
Tasks AutoML
Published 2019-07-24
URL https://arxiv.org/abs/1907.10772v1
PDF https://arxiv.org/pdf/1907.10772v1.pdf
PWC https://paperswithcode.com/paper/towards-automl-in-the-presence-of-drift-first
Repo
Framework

The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

Title The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
Authors Jeffrey Dean
Abstract The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today.
Tasks Speech Recognition
Published 2019-11-13
URL https://arxiv.org/abs/1911.05289v1
PDF https://arxiv.org/pdf/1911.05289v1.pdf
PWC https://paperswithcode.com/paper/the-deep-learning-revolution-and-its
Repo
Framework

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Title SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Authors Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song
Abstract Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. SpineNet achieves state-of-the-art performance of one-stage object detector on COCO with 60% less computation, and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.
Tasks Neural Architecture Search, Object Detection
Published 2019-12-10
URL https://arxiv.org/abs/1912.05027v1
PDF https://arxiv.org/pdf/1912.05027v1.pdf
PWC https://paperswithcode.com/paper/spinenet-learning-scale-permuted-backbone-for
Repo
Framework

Learning Efficient Multi-agent Communication: An Information Bottleneck Approach

Title Learning Efficient Multi-agent Communication: An Information Bottleneck Approach
Authors Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich
Abstract Many real-world multi-agent reinforcement learning applications require agents to communicate, assisted by a communication protocol. These applications face a common and critical issue of communication’s limited bandwidth that constrains agents’ ability to cooperate successfully. In this paper, rather than proposing a fixed communication protocol, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols. Our contributions are threefold. First, we notice a fact that a limited bandwidth translates into a constraint on the communicated message entropy, thus paving the way of controlling the bandwidth. Second, we introduce a customized batch-norm layer, which controls the messages’ entropy to simulate the limited bandwidth constraint. Third, we apply the information bottleneck method to discover the optimal communication protocol, which can satisfy a bandwidth constraint via training with the prior distribution in the method. To demonstrate the efficacy of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks across two dimensions: the number of agents and different bandwidths. We show that IMAC converges fast, and leads to efficient communication among agents under the limited-bandwidth constraint as compared to many baseline methods.
Tasks Multi-agent Reinforcement Learning
Published 2019-11-16
URL https://arxiv.org/abs/1911.06992v1
PDF https://arxiv.org/pdf/1911.06992v1.pdf
PWC https://paperswithcode.com/paper/learning-efficient-multi-agent-communication
Repo
Framework

EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction

Title EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction
Authors Diane Bouchacourt, Ludovic Denoyer
Abstract Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model’s prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks.
Tasks Sentiment Analysis, Text Classification
Published 2019-05-28
URL https://arxiv.org/abs/1905.11852v2
PDF https://arxiv.org/pdf/1905.11852v2.pdf
PWC https://paperswithcode.com/paper/educe-explaining-model-decisions-through
Repo
Framework

Predicting the Politics of an Image Using Webly Supervised Data

Title Predicting the Politics of an Image Using Webly Supervised Data
Authors Christopher Thomas, Adriana Kovashka
Abstract The news media shape public opinion, and often, the visual bias they contain is evident for human observers. This bias can be inferred from how different media sources portray different subjects or topics. In this paper, we model visual political bias in contemporary media sources at scale, using webly supervised data. We collect a dataset of over one million unique images and associated news articles from left- and right-leaning news sources, and develop a method to predict the image’s political leaning. This problem is particularly challenging because of the enormous intra-class visual and semantic diversity of our data. We propose a two-stage method to tackle this problem. In the first stage, the model is forced to learn relevant visual concepts that, when joined with document embeddings computed from articles paired with the images, enable the model to predict bias. In the second stage, we remove the requirement of the text domain and train a visual classifier from the features of the former model. We show this two-stage approach facilitates learning and outperforms several strong baselines. We also present extensive qualitative results demonstrating the nuances of the data.
Tasks
Published 2019-10-31
URL https://arxiv.org/abs/1911.00147v1
PDF https://arxiv.org/pdf/1911.00147v1.pdf
PWC https://paperswithcode.com/paper/predicting-the-politics-of-an-image-using
Repo
Framework

LIDIA: Lightweight Learned Image Denoising with Instance Adaptation

Title LIDIA: Lightweight Learned Image Denoising with Instance Adaptation
Authors Gregory Vaksman, Michael Elad, Peyman Milanfar
Abstract Image denoising is a well studied problem with an extensive activity that has spread over several decades. Despite the many available denoising algorithms, the quest for simple, powerful and fast denoisers is still an active and vibrant topic of research. Leading classical denoising methods are typically designed to exploit the inner structure in images by modeling local overlapping patches, while operating in an unsupervised fashion. In contrast, recent newcomers to this arena are supervised and universal neural-network-based methods that bypass this modeling altogether, targeting the inference goal directly and globally, while tending to be very deep and parameter heavy. This work proposes a novel lightweight learnable architecture for image denoising, and presents a combination of supervised and unsupervised training of it, the first aiming for a universal denoiser and the second for adapting it to the incoming image. Our architecture embeds in it several of the main concepts taken from classical methods, relying on patch processing, leveraging non-local self-similarity, exploiting representation sparsity and providing a multiscale treatment. Our proposed universal denoiser achieves near state-of-the-art results, while using a small fraction of the typical number of parameters. In addition, we introduce and demonstrate two highly effective ways for further boosting the denoising performance, by adapting this universal network to the input image.
Tasks Denoising, Image Denoising
Published 2019-11-17
URL https://arxiv.org/abs/1911.07167v2
PDF https://arxiv.org/pdf/1911.07167v2.pdf
PWC https://paperswithcode.com/paper/low-weight-and-learnable-image-denoising
Repo
Framework

Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data

Title Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data
Authors Azim Ahmadzadeh, Maxwell Hostetter, Berkay Aydin, Manolis K. Georgoulis, Dustin J. Kempton, Sushant S. Mahajan, Rafal A. Angryk
Abstract In analyses of rare-events, regardless of the domain of application, class-imbalance issue is intrinsic. Although the challenges are known to data experts, their explicit impact on the analytic and the decisions made based on the findings are often overlooked. This is in particular prevalent in interdisciplinary research where the theoretical aspects are sometimes overshadowed by the challenges of the application. To show-case these undesirable impacts, we conduct a series of experiments on a recently created benchmark data, named Space Weather ANalytics for Solar Flares (SWAN-SF). This is a multivariate time series dataset of magnetic parameters of active regions. As a remedy for the imbalance issue, we study the impact of data manipulation (undersampling and oversampling) and model manipulation (using class weights). Furthermore, we bring to focus the auto-correlation of time series that is inherited from the use of sliding window for monitoring flares’ history. Temporal coherence, as we call this phenomenon, invalidates the randomness assumption, thus impacting all sampling practices including different cross-validation techniques. We illustrate how failing to notice this concept could give an artificial boost in the forecast performance and result in misleading findings. Throughout this study we utilized Support Vector Machine as a classifier, and True Skill Statistics as a verification metric for comparison of experiments. We conclude our work by specifying the correct practice in each case, and we hope that this study could benefit researchers in other domains where time series of rare events are of interest.
Tasks Time Series
Published 2019-11-20
URL https://arxiv.org/abs/1911.09061v1
PDF https://arxiv.org/pdf/1911.09061v1.pdf
PWC https://paperswithcode.com/paper/challenges-with-extreme-class-imbalance-and
Repo
Framework

A Brief Survey of Multilingual Neural Machine Translation

Title A Brief Survey of Multilingual Neural Machine Translation
Authors Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
Abstract We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years. MNMT has been useful in improving translation quality as a result of knowledge transfer. MNMT is more promising and interesting than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues. Many approaches have been proposed in order to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and hence deserve further exploration. In this paper, we present an in-depth survey of existing literature on MNMT. We categorize various approaches based on the resource scenarios as well as underlying modeling principles. We hope this paper will serve as a starting point for researchers and engineers interested in MNMT.
Tasks Machine Translation, Transfer Learning
Published 2019-05-14
URL https://arxiv.org/abs/1905.05395v3
PDF https://arxiv.org/pdf/1905.05395v3.pdf
PWC https://paperswithcode.com/paper/a-survey-of-multilingual-neural-machine
Repo
Framework

Shape-Aware Organ Segmentation by Predicting Signed Distance Maps

Title Shape-Aware Organ Segmentation by Predicting Signed Distance Maps
Authors Yuan Xue, Hui Tang, Zhi Qiao, Guanzhong Gong, Yong Yin, Zhen Qian, Chao Huang, Wei Fan, Xiaolei Huang
Abstract In this work, we propose to resolve the issue existing in current deep learning based organ segmentation systems that they often produce results that do not capture the overall shape of the target organ and often lack smoothness. Since there is a rigorous mapping between the Signed Distance Map (SDM) calculated from object boundary contours and the binary segmentation map, we exploit the feasibility of learning the SDM directly from medical scans. By converting the segmentation task into predicting an SDM, we show that our proposed method retains superior segmentation performance and has better smoothness and continuity in shape. To leverage the complementary information in traditional segmentation training, we introduce an approximated Heaviside function to train the model by predicting SDMs and segmentation maps simultaneously. We validate our proposed models by conducting extensive experiments on a hippocampus segmentation dataset and the public MICCAI 2015 Head and Neck Auto Segmentation Challenge dataset with multiple organs. While our carefully designed backbone 3D segmentation network improves the Dice coefficient by more than 5% compared to current state-of-the-arts, the proposed model with SDM learning produces smoother segmentation results with smaller Hausdorff distance and average surface distance, thus proving the effectiveness of our method.
Tasks
Published 2019-12-09
URL https://arxiv.org/abs/1912.03849v1
PDF https://arxiv.org/pdf/1912.03849v1.pdf
PWC https://paperswithcode.com/paper/shape-aware-organ-segmentation-by-predicting
Repo
Framework

Warping Resilient Time Series Embeddings

Title Warping Resilient Time Series Embeddings
Authors Anish Mathew, Deepak P, Sahely Bhadra
Abstract Time series are ubiquitous in real world problems and computing distance between two time series is often required in several learning tasks. Computing similarity between time series by ignoring variations in speed or warping is often encountered and dynamic time warping (DTW) is the state of the art. However DTW is not applicable in algorithms which require kernel or vectors. In this paper, we propose a mechanism named WaRTEm to generate vector embeddings of time series such that distance measures in the embedding space exhibit resilience to warping. Therefore, WaRTEm is more widely applicable than DTW. WaRTEm is based on a twin auto-encoder architecture and a training strategy involving warping operators for generating warping resilient embeddings for time series datasets. We evaluate the performance of WaRTEm and observed more than $20%$ improvement over DTW in multiple real-world datasets.
Tasks Time Series
Published 2019-06-12
URL https://arxiv.org/abs/1906.05205v1
PDF https://arxiv.org/pdf/1906.05205v1.pdf
PWC https://paperswithcode.com/paper/warping-resilient-time-series-embeddings
Repo
Framework

ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

Title ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
Authors Xiaolong Ma, Geng Yuan, Sheng Lin, Zhengang Li, Hao Sun, Yanzhi Wang
Abstract The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, high accuracy solution for extreme structured pruning that combines different types of structured sparsity still waiting for unraveling due to the extremely reduced weights in DNN networks. In this paper, we propose a DNN framework which combines two different types of structured weight pruning (filter and column prune) by incorporating alternating direction method of multipliers (ADMM) algorithm for better prune performance. We are the first to find non-optimality of ADMM process and unused weights in a structured pruned model, and further design an optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms which are dedicated to post-processing an structured pruned model after ADMM steps. Some high lights shows we achieve 232x compression on LeNet-5, 60x compression on ResNet-18 CIFAR-10 and over 5x compression on AlexNet. We share our models at anonymous link http://bit.ly/2VJ5ktv.
Tasks
Published 2019-04-30
URL http://arxiv.org/abs/1905.00136v1
PDF http://arxiv.org/pdf/1905.00136v1.pdf
PWC https://paperswithcode.com/paper/resnet-can-be-pruned-60x-introducing-network
Repo
Framework

Profiling based Out-of-core Hybrid Method for Large Neural Networks

Title Profiling based Out-of-core Hybrid Method for Large Neural Networks
Authors Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo
Abstract GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 % on x86 machine and 28 % on POWER9 machine.
Tasks
Published 2019-07-11
URL https://arxiv.org/abs/1907.05013v1
PDF https://arxiv.org/pdf/1907.05013v1.pdf
PWC https://paperswithcode.com/paper/profiling-based-out-of-core-hybrid-method-for
Repo
Framework

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Title Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Authors Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, Zihan Zhou
Abstract Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding. However, the ground truth annotations are often obtained via human labor, which is particularly challenging and inefficient for such tasks due to the large number of 3D structure instances (e.g., line segments) and other factors such as viewpoints and occlusions. In this paper, we present a new synthetic dataset, Structured3D, with the aim to providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks. We take advantage of the availability of millions of professional interior designs and automatically extract 3D structures from them. We generate high-quality images with an industry-leading rendering engine. We use our synthetic dataset in combination with real images to train deep networks for room layout estimation and demonstrate improved performance on benchmark datasets.
Tasks Room Layout Estimation
Published 2019-08-01
URL https://arxiv.org/abs/1908.00222v2
PDF https://arxiv.org/pdf/1908.00222v2.pdf
PWC https://paperswithcode.com/paper/structured3d-a-large-photo-realistic-dataset
Repo
Framework

Deep Relevance Regularization: Interpretable and Robust Tumor Typing of Imaging Mass Spectrometry Data

Title Deep Relevance Regularization: Interpretable and Robust Tumor Typing of Imaging Mass Spectrometry Data
Authors Christian Etmann, Maximilian Schmidt, Jens Behrmann, Tobias Boskamp, Lena Hauberg-Lotte, Annette Peter, Rita Casadonte, Jörg Kriegsmann, Peter Maass
Abstract Neural networks have recently been established as a viable classification method for imaging mass spectrometry data for tumor typing. For multi-laboratory scenarios however, certain confounding factors may strongly impede their performance. In this work, we introduce Deep Relevance Regularization, a method of restricting what the neural network can focus on during classification, in order to improve the classification performance. We demonstrate how Deep Relevance Regularization robustifies neural networks against confounding factors on a challenging inter-lab dataset consisting of breast and ovarian carcinoma. We further show that this makes the relevance map – a way of visualizing the discriminative parts of the mass spectrum – sparser, thereby making the classifier easier to interpret
Tasks
Published 2019-12-10
URL https://arxiv.org/abs/1912.05459v1
PDF https://arxiv.org/pdf/1912.05459v1.pdf
PWC https://paperswithcode.com/paper/deep-relevance-regularization-interpretable
Repo
Framework
comments powered by Disqus