Paper Group ANR 756
Towards AutoML in the presence of Drift: first results. The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design. SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. EDUCE: Explaining model Decisions through …
Towards AutoML in the presence of Drift: first results
Title | Towards AutoML in the presence of Drift: first results |
Authors | Jorge G. Madrid, Hugo Jair Escalante, Eduardo F. Morales, Wei-Wei Tu, Yang Yu, Lisheng Sun-Hosoya, Isabelle Guyon, Michele Sebag |
Abstract | Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology. |
Tasks | AutoML |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10772v1 |
https://arxiv.org/pdf/1907.10772v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automl-in-the-presence-of-drift-first |
Repo | |
Framework | |
The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
Title | The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design |
Authors | Jeffrey Dean |
Abstract | The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today. |
Tasks | Speech Recognition |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05289v1 |
https://arxiv.org/pdf/1911.05289v1.pdf | |
PWC | https://paperswithcode.com/paper/the-deep-learning-revolution-and-its |
Repo | |
Framework | |
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Title | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization |
Authors | Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song |
Abstract | Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. SpineNet achieves state-of-the-art performance of one-stage object detector on COCO with 60% less computation, and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset. |
Tasks | Neural Architecture Search, Object Detection |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.05027v1 |
https://arxiv.org/pdf/1912.05027v1.pdf | |
PWC | https://paperswithcode.com/paper/spinenet-learning-scale-permuted-backbone-for |
Repo | |
Framework | |
Learning Efficient Multi-agent Communication: An Information Bottleneck Approach
Title | Learning Efficient Multi-agent Communication: An Information Bottleneck Approach |
Authors | Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich |
Abstract | Many real-world multi-agent reinforcement learning applications require agents to communicate, assisted by a communication protocol. These applications face a common and critical issue of communication’s limited bandwidth that constrains agents’ ability to cooperate successfully. In this paper, rather than proposing a fixed communication protocol, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols. Our contributions are threefold. First, we notice a fact that a limited bandwidth translates into a constraint on the communicated message entropy, thus paving the way of controlling the bandwidth. Second, we introduce a customized batch-norm layer, which controls the messages’ entropy to simulate the limited bandwidth constraint. Third, we apply the information bottleneck method to discover the optimal communication protocol, which can satisfy a bandwidth constraint via training with the prior distribution in the method. To demonstrate the efficacy of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks across two dimensions: the number of agents and different bandwidths. We show that IMAC converges fast, and leads to efficient communication among agents under the limited-bandwidth constraint as compared to many baseline methods. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.06992v1 |
https://arxiv.org/pdf/1911.06992v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-efficient-multi-agent-communication |
Repo | |
Framework | |
EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction
Title | EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction |
Authors | Diane Bouchacourt, Ludovic Denoyer |
Abstract | Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model’s prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks. |
Tasks | Sentiment Analysis, Text Classification |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11852v2 |
https://arxiv.org/pdf/1905.11852v2.pdf | |
PWC | https://paperswithcode.com/paper/educe-explaining-model-decisions-through |
Repo | |
Framework | |
Predicting the Politics of an Image Using Webly Supervised Data
Title | Predicting the Politics of an Image Using Webly Supervised Data |
Authors | Christopher Thomas, Adriana Kovashka |
Abstract | The news media shape public opinion, and often, the visual bias they contain is evident for human observers. This bias can be inferred from how different media sources portray different subjects or topics. In this paper, we model visual political bias in contemporary media sources at scale, using webly supervised data. We collect a dataset of over one million unique images and associated news articles from left- and right-leaning news sources, and develop a method to predict the image’s political leaning. This problem is particularly challenging because of the enormous intra-class visual and semantic diversity of our data. We propose a two-stage method to tackle this problem. In the first stage, the model is forced to learn relevant visual concepts that, when joined with document embeddings computed from articles paired with the images, enable the model to predict bias. In the second stage, we remove the requirement of the text domain and train a visual classifier from the features of the former model. We show this two-stage approach facilitates learning and outperforms several strong baselines. We also present extensive qualitative results demonstrating the nuances of the data. |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.00147v1 |
https://arxiv.org/pdf/1911.00147v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-politics-of-an-image-using |
Repo | |
Framework | |
LIDIA: Lightweight Learned Image Denoising with Instance Adaptation
Title | LIDIA: Lightweight Learned Image Denoising with Instance Adaptation |
Authors | Gregory Vaksman, Michael Elad, Peyman Milanfar |
Abstract | Image denoising is a well studied problem with an extensive activity that has spread over several decades. Despite the many available denoising algorithms, the quest for simple, powerful and fast denoisers is still an active and vibrant topic of research. Leading classical denoising methods are typically designed to exploit the inner structure in images by modeling local overlapping patches, while operating in an unsupervised fashion. In contrast, recent newcomers to this arena are supervised and universal neural-network-based methods that bypass this modeling altogether, targeting the inference goal directly and globally, while tending to be very deep and parameter heavy. This work proposes a novel lightweight learnable architecture for image denoising, and presents a combination of supervised and unsupervised training of it, the first aiming for a universal denoiser and the second for adapting it to the incoming image. Our architecture embeds in it several of the main concepts taken from classical methods, relying on patch processing, leveraging non-local self-similarity, exploiting representation sparsity and providing a multiscale treatment. Our proposed universal denoiser achieves near state-of-the-art results, while using a small fraction of the typical number of parameters. In addition, we introduce and demonstrate two highly effective ways for further boosting the denoising performance, by adapting this universal network to the input image. |
Tasks | Denoising, Image Denoising |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.07167v2 |
https://arxiv.org/pdf/1911.07167v2.pdf | |
PWC | https://paperswithcode.com/paper/low-weight-and-learnable-image-denoising |
Repo | |
Framework | |
Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data
Title | Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data |
Authors | Azim Ahmadzadeh, Maxwell Hostetter, Berkay Aydin, Manolis K. Georgoulis, Dustin J. Kempton, Sushant S. Mahajan, Rafal A. Angryk |
Abstract | In analyses of rare-events, regardless of the domain of application, class-imbalance issue is intrinsic. Although the challenges are known to data experts, their explicit impact on the analytic and the decisions made based on the findings are often overlooked. This is in particular prevalent in interdisciplinary research where the theoretical aspects are sometimes overshadowed by the challenges of the application. To show-case these undesirable impacts, we conduct a series of experiments on a recently created benchmark data, named Space Weather ANalytics for Solar Flares (SWAN-SF). This is a multivariate time series dataset of magnetic parameters of active regions. As a remedy for the imbalance issue, we study the impact of data manipulation (undersampling and oversampling) and model manipulation (using class weights). Furthermore, we bring to focus the auto-correlation of time series that is inherited from the use of sliding window for monitoring flares’ history. Temporal coherence, as we call this phenomenon, invalidates the randomness assumption, thus impacting all sampling practices including different cross-validation techniques. We illustrate how failing to notice this concept could give an artificial boost in the forecast performance and result in misleading findings. Throughout this study we utilized Support Vector Machine as a classifier, and True Skill Statistics as a verification metric for comparison of experiments. We conclude our work by specifying the correct practice in each case, and we hope that this study could benefit researchers in other domains where time series of rare events are of interest. |
Tasks | Time Series |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.09061v1 |
https://arxiv.org/pdf/1911.09061v1.pdf | |
PWC | https://paperswithcode.com/paper/challenges-with-extreme-class-imbalance-and |
Repo | |
Framework | |
A Brief Survey of Multilingual Neural Machine Translation
Title | A Brief Survey of Multilingual Neural Machine Translation |
Authors | Raj Dabre, Chenhui Chu, Anoop Kunchukuttan |
Abstract | We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years. MNMT has been useful in improving translation quality as a result of knowledge transfer. MNMT is more promising and interesting than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues. Many approaches have been proposed in order to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and hence deserve further exploration. In this paper, we present an in-depth survey of existing literature on MNMT. We categorize various approaches based on the resource scenarios as well as underlying modeling principles. We hope this paper will serve as a starting point for researchers and engineers interested in MNMT. |
Tasks | Machine Translation, Transfer Learning |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05395v3 |
https://arxiv.org/pdf/1905.05395v3.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-multilingual-neural-machine |
Repo | |
Framework | |
Shape-Aware Organ Segmentation by Predicting Signed Distance Maps
Title | Shape-Aware Organ Segmentation by Predicting Signed Distance Maps |
Authors | Yuan Xue, Hui Tang, Zhi Qiao, Guanzhong Gong, Yong Yin, Zhen Qian, Chao Huang, Wei Fan, Xiaolei Huang |
Abstract | In this work, we propose to resolve the issue existing in current deep learning based organ segmentation systems that they often produce results that do not capture the overall shape of the target organ and often lack smoothness. Since there is a rigorous mapping between the Signed Distance Map (SDM) calculated from object boundary contours and the binary segmentation map, we exploit the feasibility of learning the SDM directly from medical scans. By converting the segmentation task into predicting an SDM, we show that our proposed method retains superior segmentation performance and has better smoothness and continuity in shape. To leverage the complementary information in traditional segmentation training, we introduce an approximated Heaviside function to train the model by predicting SDMs and segmentation maps simultaneously. We validate our proposed models by conducting extensive experiments on a hippocampus segmentation dataset and the public MICCAI 2015 Head and Neck Auto Segmentation Challenge dataset with multiple organs. While our carefully designed backbone 3D segmentation network improves the Dice coefficient by more than 5% compared to current state-of-the-arts, the proposed model with SDM learning produces smoother segmentation results with smaller Hausdorff distance and average surface distance, thus proving the effectiveness of our method. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03849v1 |
https://arxiv.org/pdf/1912.03849v1.pdf | |
PWC | https://paperswithcode.com/paper/shape-aware-organ-segmentation-by-predicting |
Repo | |
Framework | |
Warping Resilient Time Series Embeddings
Title | Warping Resilient Time Series Embeddings |
Authors | Anish Mathew, Deepak P, Sahely Bhadra |
Abstract | Time series are ubiquitous in real world problems and computing distance between two time series is often required in several learning tasks. Computing similarity between time series by ignoring variations in speed or warping is often encountered and dynamic time warping (DTW) is the state of the art. However DTW is not applicable in algorithms which require kernel or vectors. In this paper, we propose a mechanism named WaRTEm to generate vector embeddings of time series such that distance measures in the embedding space exhibit resilience to warping. Therefore, WaRTEm is more widely applicable than DTW. WaRTEm is based on a twin auto-encoder architecture and a training strategy involving warping operators for generating warping resilient embeddings for time series datasets. We evaluate the performance of WaRTEm and observed more than $20%$ improvement over DTW in multiple real-world datasets. |
Tasks | Time Series |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05205v1 |
https://arxiv.org/pdf/1906.05205v1.pdf | |
PWC | https://paperswithcode.com/paper/warping-resilient-time-series-embeddings |
Repo | |
Framework | |
ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
Title | ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning |
Authors | Xiaolong Ma, Geng Yuan, Sheng Lin, Zhengang Li, Hao Sun, Yanzhi Wang |
Abstract | The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, high accuracy solution for extreme structured pruning that combines different types of structured sparsity still waiting for unraveling due to the extremely reduced weights in DNN networks. In this paper, we propose a DNN framework which combines two different types of structured weight pruning (filter and column prune) by incorporating alternating direction method of multipliers (ADMM) algorithm for better prune performance. We are the first to find non-optimality of ADMM process and unused weights in a structured pruned model, and further design an optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms which are dedicated to post-processing an structured pruned model after ADMM steps. Some high lights shows we achieve 232x compression on LeNet-5, 60x compression on ResNet-18 CIFAR-10 and over 5x compression on AlexNet. We share our models at anonymous link http://bit.ly/2VJ5ktv. |
Tasks | |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1905.00136v1 |
http://arxiv.org/pdf/1905.00136v1.pdf | |
PWC | https://paperswithcode.com/paper/resnet-can-be-pruned-60x-introducing-network |
Repo | |
Framework | |
Profiling based Out-of-core Hybrid Method for Large Neural Networks
Title | Profiling based Out-of-core Hybrid Method for Large Neural Networks |
Authors | Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo |
Abstract | GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 % on x86 machine and 28 % on POWER9 machine. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05013v1 |
https://arxiv.org/pdf/1907.05013v1.pdf | |
PWC | https://paperswithcode.com/paper/profiling-based-out-of-core-hybrid-method-for |
Repo | |
Framework | |
Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Title | Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling |
Authors | Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, Zihan Zhou |
Abstract | Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding. However, the ground truth annotations are often obtained via human labor, which is particularly challenging and inefficient for such tasks due to the large number of 3D structure instances (e.g., line segments) and other factors such as viewpoints and occlusions. In this paper, we present a new synthetic dataset, Structured3D, with the aim to providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks. We take advantage of the availability of millions of professional interior designs and automatically extract 3D structures from them. We generate high-quality images with an industry-leading rendering engine. We use our synthetic dataset in combination with real images to train deep networks for room layout estimation and demonstrate improved performance on benchmark datasets. |
Tasks | Room Layout Estimation |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00222v2 |
https://arxiv.org/pdf/1908.00222v2.pdf | |
PWC | https://paperswithcode.com/paper/structured3d-a-large-photo-realistic-dataset |
Repo | |
Framework | |
Deep Relevance Regularization: Interpretable and Robust Tumor Typing of Imaging Mass Spectrometry Data
Title | Deep Relevance Regularization: Interpretable and Robust Tumor Typing of Imaging Mass Spectrometry Data |
Authors | Christian Etmann, Maximilian Schmidt, Jens Behrmann, Tobias Boskamp, Lena Hauberg-Lotte, Annette Peter, Rita Casadonte, Jörg Kriegsmann, Peter Maass |
Abstract | Neural networks have recently been established as a viable classification method for imaging mass spectrometry data for tumor typing. For multi-laboratory scenarios however, certain confounding factors may strongly impede their performance. In this work, we introduce Deep Relevance Regularization, a method of restricting what the neural network can focus on during classification, in order to improve the classification performance. We demonstrate how Deep Relevance Regularization robustifies neural networks against confounding factors on a challenging inter-lab dataset consisting of breast and ovarian carcinoma. We further show that this makes the relevance map – a way of visualizing the discriminative parts of the mass spectrum – sparser, thereby making the classifier easier to interpret |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.05459v1 |
https://arxiv.org/pdf/1912.05459v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-relevance-regularization-interpretable |
Repo | |
Framework | |