January 29, 2020

3246 words 16 mins read

Paper Group ANR 756

Towards AutoML in the presence of Drift: first results. The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design. SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. EDUCE: Explaining model Decisions through …

Towards AutoML in the presence of Drift: first results


Title	Towards AutoML in the presence of Drift: first results
Authors	Jorge G. Madrid, Hugo Jair Escalante, Eduardo F. Morales, Wei-Wei Tu, Yang Yu, Lisheng Sun-Hosoya, Isabelle Guyon, Michele Sebag
Abstract	Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology.
Tasks	AutoML
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10772v1
PDF	https://arxiv.org/pdf/1907.10772v1.pdf
PWC	https://paperswithcode.com/paper/towards-automl-in-the-presence-of-drift-first
Repo
Framework

The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design


Title	The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
Authors	Jeffrey Dean
Abstract	The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today.
Tasks	Speech Recognition
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05289v1
PDF	https://arxiv.org/pdf/1911.05289v1.pdf
PWC	https://paperswithcode.com/paper/the-deep-learning-revolution-and-its
Repo
Framework

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization


Title	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Authors	Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song
Abstract	Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. SpineNet achieves state-of-the-art performance of one-stage object detector on COCO with 60% less computation, and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.
Tasks	Neural Architecture Search, Object Detection
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05027v1
PDF	https://arxiv.org/pdf/1912.05027v1.pdf
PWC	https://paperswithcode.com/paper/spinenet-learning-scale-permuted-backbone-for
Repo
Framework

Learning Efficient Multi-agent Communication: An Information Bottleneck Approach


Title	Learning Efficient Multi-agent Communication: An Information Bottleneck Approach
Authors	Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich
Abstract	Many real-world multi-agent reinforcement learning applications require agents to communicate, assisted by a communication protocol. These applications face a common and critical issue of communication’s limited bandwidth that constrains agents’ ability to cooperate successfully. In this paper, rather than proposing a fixed communication protocol, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols. Our contributions are threefold. First, we notice a fact that a limited bandwidth translates into a constraint on the communicated message entropy, thus paving the way of controlling the bandwidth. Second, we introduce a customized batch-norm layer, which controls the messages’ entropy to simulate the limited bandwidth constraint. Third, we apply the information bottleneck method to discover the optimal communication protocol, which can satisfy a bandwidth constraint via training with the prior distribution in the method. To demonstrate the efficacy of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks across two dimensions: the number of agents and different bandwidths. We show that IMAC converges fast, and leads to efficient communication among agents under the limited-bandwidth constraint as compared to many baseline methods.
Tasks	Multi-agent Reinforcement Learning
Published	2019-11-16
URL	https://arxiv.org/abs/1911.06992v1
PDF	https://arxiv.org/pdf/1911.06992v1.pdf
PWC	https://paperswithcode.com/paper/learning-efficient-multi-agent-communication
Repo
Framework

EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction


Title	EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction
Authors	Diane Bouchacourt, Ludovic Denoyer
Abstract	Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model’s prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks.
Tasks	Sentiment Analysis, Text Classification
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11852v2
PDF	https://arxiv.org/pdf/1905.11852v2.pdf
PWC	https://paperswithcode.com/paper/educe-explaining-model-decisions-through
Repo
Framework

Predicting the Politics of an Image Using Webly Supervised Data


Title	Predicting the Politics of an Image Using Webly Supervised Data
Authors	Christopher Thomas, Adriana Kovashka
Abstract	The news media shape public opinion, and often, the visual bias they contain is evident for human observers. This bias can be inferred from how different media sources portray different subjects or topics. In this paper, we model visual political bias in contemporary media sources at scale, using webly supervised data. We collect a dataset of over one million unique images and associated news articles from left- and right-leaning news sources, and develop a method to predict the image’s political leaning. This problem is particularly challenging because of the enormous intra-class visual and semantic diversity of our data. We propose a two-stage method to tackle this problem. In the first stage, the model is forced to learn relevant visual concepts that, when joined with document embeddings computed from articles paired with the images, enable the model to predict bias. In the second stage, we remove the requirement of the text domain and train a visual classifier from the features of the former model. We show this two-stage approach facilitates learning and outperforms several strong baselines. We also present extensive qualitative results demonstrating the nuances of the data.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00147v1
PDF	https://arxiv.org/pdf/1911.00147v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-politics-of-an-image-using
Repo
Framework

LIDIA: Lightweight Learned Image Denoising with Instance Adaptation


Title	LIDIA: Lightweight Learned Image Denoising with Instance Adaptation
Authors	Gregory Vaksman, Michael Elad, Peyman Milanfar
Abstract	Image denoising is a well studied problem with an extensive activity that has spread over several decades. Despite the many available denoising algorithms, the quest for simple, powerful and fast denoisers is still an active and vibrant topic of research. Leading classical denoising methods are typically designed to exploit the inner structure in images by modeling local overlapping patches, while operating in an unsupervised fashion. In contrast, recent newcomers to this arena are supervised and universal neural-network-based methods that bypass this modeling altogether, targeting the inference goal directly and globally, while tending to be very deep and parameter heavy. This work proposes a novel lightweight learnable architecture for image denoising, and presents a combination of supervised and unsupervised training of it, the first aiming for a universal denoiser and the second for adapting it to the incoming image. Our architecture embeds in it several of the main concepts taken from classical methods, relying on patch processing, leveraging non-local self-similarity, exploiting representation sparsity and providing a multiscale treatment. Our proposed universal denoiser achieves near state-of-the-art results, while using a small fraction of the typical number of parameters. In addition, we introduce and demonstrate two highly effective ways for further boosting the denoising performance, by adapting this universal network to the input image.
Tasks	Denoising, Image Denoising
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07167v2
PDF	https://arxiv.org/pdf/1911.07167v2.pdf
PWC	https://paperswithcode.com/paper/low-weight-and-learnable-image-denoising
Repo
Framework

Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data


Title	Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data
Authors	Azim Ahmadzadeh, Maxwell Hostetter, Berkay Aydin, Manolis K. Georgoulis, Dustin J. Kempton, Sushant S. Mahajan, Rafal A. Angryk
Abstract	In analyses of rare-events, regardless of the domain of application, class-imbalance issue is intrinsic. Although the challenges are known to data experts, their explicit impact on the analytic and the decisions made based on the findings are often overlooked. This is in particular prevalent in interdisciplinary research where the theoretical aspects are sometimes overshadowed by the challenges of the application. To show-case these undesirable impacts, we conduct a series of experiments on a recently created benchmark data, named Space Weather ANalytics for Solar Flares (SWAN-SF). This is a multivariate time series dataset of magnetic parameters of active regions. As a remedy for the imbalance issue, we study the impact of data manipulation (undersampling and oversampling) and model manipulation (using class weights). Furthermore, we bring to focus the auto-correlation of time series that is inherited from the use of sliding window for monitoring flares’ history. Temporal coherence, as we call this phenomenon, invalidates the randomness assumption, thus impacting all sampling practices including different cross-validation techniques. We illustrate how failing to notice this concept could give an artificial boost in the forecast performance and result in misleading findings. Throughout this study we utilized Support Vector Machine as a classifier, and True Skill Statistics as a verification metric for comparison of experiments. We conclude our work by specifying the correct practice in each case, and we hope that this study could benefit researchers in other domains where time series of rare events are of interest.
Tasks	Time Series
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09061v1
PDF	https://arxiv.org/pdf/1911.09061v1.pdf
PWC	https://paperswithcode.com/paper/challenges-with-extreme-class-imbalance-and
Repo
Framework

A Brief Survey of Multilingual Neural Machine Translation


Title	A Brief Survey of Multilingual Neural Machine Translation
Authors	Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
Abstract	We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years. MNMT has been useful in improving translation quality as a result of knowledge transfer. MNMT is more promising and interesting than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues. Many approaches have been proposed in order to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and hence deserve further exploration. In this paper, we present an in-depth survey of existing literature on MNMT. We categorize various approaches based on the resource scenarios as well as underlying modeling principles. We hope this paper will serve as a starting point for researchers and engineers interested in MNMT.
Tasks	Machine Translation, Transfer Learning
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05395v3
PDF	https://arxiv.org/pdf/1905.05395v3.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-multilingual-neural-machine
Repo
Framework

Shape-Aware Organ Segmentation by Predicting Signed Distance Maps


Title	Shape-Aware Organ Segmentation by Predicting Signed Distance Maps
Authors	Yuan Xue, Hui Tang, Zhi Qiao, Guanzhong Gong, Yong Yin, Zhen Qian, Chao Huang, Wei Fan, Xiaolei Huang
Abstract	In this work, we propose to resolve the issue existing in current deep learning based organ segmentation systems that they often produce results that do not capture the overall shape of the target organ and often lack smoothness. Since there is a rigorous mapping between the Signed Distance Map (SDM) calculated from object boundary contours and the binary segmentation map, we exploit the feasibility of learning the SDM directly from medical scans. By converting the segmentation task into predicting an SDM, we show that our proposed method retains superior segmentation performance and has better smoothness and continuity in shape. To leverage the complementary information in traditional segmentation training, we introduce an approximated Heaviside function to train the model by predicting SDMs and segmentation maps simultaneously. We validate our proposed models by conducting extensive experiments on a hippocampus segmentation dataset and the public MICCAI 2015 Head and Neck Auto Segmentation Challenge dataset with multiple organs. While our carefully designed backbone 3D segmentation network improves the Dice coefficient by more than 5% compared to current state-of-the-arts, the proposed model with SDM learning produces smoother segmentation results with smaller Hausdorff distance and average surface distance, thus proving the effectiveness of our method.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03849v1
PDF	https://arxiv.org/pdf/1912.03849v1.pdf
PWC	https://paperswithcode.com/paper/shape-aware-organ-segmentation-by-predicting
Repo
Framework

Warping Resilient Time Series Embeddings


Title	Warping Resilient Time Series Embeddings
Authors	Anish Mathew, Deepak P, Sahely Bhadra
Abstract	Time series are ubiquitous in real world problems and computing distance between two time series is often required in several learning tasks. Computing similarity between time series by ignoring variations in speed or warping is often encountered and dynamic time warping (DTW) is the state of the art. However DTW is not applicable in algorithms which require kernel or vectors. In this paper, we propose a mechanism named WaRTEm to generate vector embeddings of time series such that distance measures in the embedding space exhibit resilience to warping. Therefore, WaRTEm is more widely applicable than DTW. WaRTEm is based on a twin auto-encoder architecture and a training strategy involving warping operators for generating warping resilient embeddings for time series datasets. We evaluate the performance of WaRTEm and observed more than $20%$ improvement over DTW in multiple real-world datasets.
Tasks	Time Series
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05205v1
PDF	https://arxiv.org/pdf/1906.05205v1.pdf
PWC	https://paperswithcode.com/paper/warping-resilient-time-series-embeddings
Repo
Framework

ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning


Title	ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
Authors	Xiaolong Ma, Geng Yuan, Sheng Lin, Zhengang Li, Hao Sun, Yanzhi Wang
Abstract	The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, high accuracy solution for extreme structured pruning that combines different types of structured sparsity still waiting for unraveling due to the extremely reduced weights in DNN networks. In this paper, we propose a DNN framework which combines two different types of structured weight pruning (filter and column prune) by incorporating alternating direction method of multipliers (ADMM) algorithm for better prune performance. We are the first to find non-optimality of ADMM process and unused weights in a structured pruned model, and further design an optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms which are dedicated to post-processing an structured pruned model after ADMM steps. Some high lights shows we achieve 232x compression on LeNet-5, 60x compression on ResNet-18 CIFAR-10 and over 5x compression on AlexNet. We share our models at anonymous link http://bit.ly/2VJ5ktv.
Tasks
Published	2019-04-30
URL	http://arxiv.org/abs/1905.00136v1
PDF	http://arxiv.org/pdf/1905.00136v1.pdf
PWC	https://paperswithcode.com/paper/resnet-can-be-pruned-60x-introducing-network
Repo
Framework

Profiling based Out-of-core Hybrid Method for Large Neural Networks


Title	Profiling based Out-of-core Hybrid Method for Large Neural Networks
Authors	Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo
Abstract	GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 % on x86 machine and 28 % on POWER9 machine.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05013v1
PDF	https://arxiv.org/pdf/1907.05013v1.pdf
PWC	https://paperswithcode.com/paper/profiling-based-out-of-core-hybrid-method-for
Repo
Framework

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling


Title	Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Authors	Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, Zihan Zhou
Abstract	Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding. However, the ground truth annotations are often obtained via human labor, which is particularly challenging and inefficient for such tasks due to the large number of 3D structure instances (e.g., line segments) and other factors such as viewpoints and occlusions. In this paper, we present a new synthetic dataset, Structured3D, with the aim to providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks. We take advantage of the availability of millions of professional interior designs and automatically extract 3D structures from them. We generate high-quality images with an industry-leading rendering engine. We use our synthetic dataset in combination with real images to train deep networks for room layout estimation and demonstrate improved performance on benchmark datasets.
Tasks	Room Layout Estimation
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00222v2
PDF	https://arxiv.org/pdf/1908.00222v2.pdf
PWC	https://paperswithcode.com/paper/structured3d-a-large-photo-realistic-dataset
Repo
Framework

Deep Relevance Regularization: Interpretable and Robust Tumor Typing of Imaging Mass Spectrometry Data


Title	Deep Relevance Regularization: Interpretable and Robust Tumor Typing of Imaging Mass Spectrometry Data
Authors	Christian Etmann, Maximilian Schmidt, Jens Behrmann, Tobias Boskamp, Lena Hauberg-Lotte, Annette Peter, Rita Casadonte, Jörg Kriegsmann, Peter Maass
Abstract	Neural networks have recently been established as a viable classification method for imaging mass spectrometry data for tumor typing. For multi-laboratory scenarios however, certain confounding factors may strongly impede their performance. In this work, we introduce Deep Relevance Regularization, a method of restricting what the neural network can focus on during classification, in order to improve the classification performance. We demonstrate how Deep Relevance Regularization robustifies neural networks against confounding factors on a challenging inter-lab dataset consisting of breast and ovarian carcinoma. We further show that this makes the relevance map – a way of visualizing the discriminative parts of the mass spectrum – sparser, thereby making the classifier easier to interpret
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05459v1
PDF	https://arxiv.org/pdf/1912.05459v1.pdf
PWC	https://paperswithcode.com/paper/deep-relevance-regularization-interpretable
Repo
Framework