October 20, 2019

3562 words 17 mins read

Paper Group AWR 177

CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks. Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training. Low Frequency Adversarial Perturbation. Deep learning to represent sub-grid processes in climate models. Robust and Scalable Differentiable Neura …

CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks


Title	CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks
Authors	Akhilan Boopathy, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel
Abstract	Verifying robustness of neural network classifiers has attracted great interests and attention due to the success of deep neural networks and their unexpected vulnerability to adversarial perturbations. Although finding minimum adversarial distortion of neural networks (with ReLU activations) has been shown to be an NP-complete problem, obtaining a non-trivial lower bound of minimum distortion as a provable robustness guarantee is possible. However, most previous works only focused on simple fully-connected layers (multilayer perceptrons) and were limited to ReLU activations. This motivates us to propose a general and efficient framework, CNN-Cert, that is capable of certifying robustness on general convolutional neural networks. Our framework is general – we can handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions; our approach is efficient – by exploiting the special structure of convolutional layers, we achieve up to 17 and 11 times of speed-up compared to the state-of-the-art certification algorithms (e.g. Fast-Lin, CROWN) and 366 times of speed-up compared to the dual-LP approach while our algorithm obtains similar or even better verification bounds. In addition, CNN-Cert generalizes state-of-the-art algorithms e.g. Fast-Lin and CROWN. We demonstrate by extensive experiments that our method outperforms state-of-the-art lower-bound-based certification algorithms in terms of both bound quality and speed.
Tasks
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12395v1
PDF	http://arxiv.org/pdf/1811.12395v1.pdf
PWC	https://paperswithcode.com/paper/cnn-cert-an-efficient-framework-for
Repo	https://github.com/ZhaoyangLyu/FROWN
Framework	pytorch

Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training


Title	Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training
Authors	Hila Gonen, Yoav Goldberg
Abstract	We focus on the problem of language modeling for code-switched language, in the context of automatic speech recognition (ASR). Language modeling for code-switched language is challenging for (at least) three reasons: (1) lack of available large-scale code-switched data for training; (2) lack of a replicable evaluation setup that is ASR directed yet isolates language modeling performance from the other intricacies of the ASR system; and (3) the reliance on generative modeling. We tackle these three issues: we propose an ASR-motivated evaluation setup which is decoupled from an ASR system and the choice of vocabulary, and provide an evaluation dataset for English-Spanish code-switching. This setup lends itself to a discriminative training approach, which we demonstrate to work better than generative language modeling. Finally, we explore a variety of training protocols and verify the effectiveness of training with large amounts of monolingual data followed by fine-tuning with small amounts of code-switched data, for both the generative and discriminative cases.
Tasks	Language Modelling, Speech Recognition
Published	2018-10-28
URL	https://arxiv.org/abs/1810.11895v3
PDF	https://arxiv.org/pdf/1810.11895v3.pdf
PWC	https://paperswithcode.com/paper/language-modeling-for-code-switching
Repo	https://github.com/gonenhila/codeswitching-lm
Framework	none

Low Frequency Adversarial Perturbation


Title	Low Frequency Adversarial Perturbation
Authors	Chuan Guo, Jared S. Frank, Kilian Q. Weinberger
Abstract	Adversarial images aim to change a target model’s decision by minimally perturbing a target image. In the black-box setting, the absence of gradient information often renders this search problem costly in terms of query complexity. In this paper we propose to restrict the search for adversarial images to a low frequency domain. This approach is readily compatible with many existing black-box attack frameworks and consistently reduces their query cost by 2 to 4 times. Further, we can circumvent image transformation defenses even when both the model and the defense strategy are unknown. Finally, we demonstrate the efficacy of this technique by fooling the Google Cloud Vision platform with an unprecedented low number of model queries.
Tasks	Denoising, Speech Recognition
Published	2018-09-24
URL	https://arxiv.org/abs/1809.08758v2
PDF	https://arxiv.org/pdf/1809.08758v2.pdf
PWC	https://paperswithcode.com/paper/low-frequency-adversarial-perturbation
Repo	https://github.com/cg563/low-frequency-adversarial
Framework	pytorch

Deep learning to represent sub-grid processes in climate models


Title	Deep learning to represent sub-grid processes in climate models
Authors	Stephan Rasp, Michael S. Pritchard, Pierre Gentine
Abstract	The representation of nonlinear sub-grid processes, especially clouds, has been a major source of uncertainty in climate models for decades. Cloud-resolving models better represent many of these processes and can now be run globally but only for short-term simulations of at most a few years because of computational limitations. Here we demonstrate that deep learning can be used to capture many advantages of cloud-resolving modeling at a fraction of the computational cost. We train a deep neural network to represent all atmospheric sub-grid processes in a climate model by learning from a multi-scale model in which convection is treated explicitly. The trained neural network then replaces the traditional sub-grid parameterizations in a global general circulation model in which it freely interacts with the resolved dynamics and the surface-flux scheme. The prognostic multi-year simulations are stable and closely reproduce not only the mean climate of the cloud-resolving simulation but also key aspects of variability, including precipitation extremes and the equatorial wave spectrum. Furthermore, the neural network approximately conserves energy despite not being explicitly instructed to. Finally, we show that the neural network parameterization generalizes to new surface forcing patterns but struggles to cope with temperatures far outside its training manifold. Our results show the feasibility of using deep learning for climate model parameterization. In a broader context, we anticipate that data-driven Earth System Model development could play a key role in reducing climate prediction uncertainty in the coming decade.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04731v3
PDF	http://arxiv.org/pdf/1806.04731v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-to-represent-sub-grid-processes
Repo	https://github.com/raspstephan/CBRAIN-CAM
Framework	none

Robust and Scalable Differentiable Neural Computer for Question Answering


Title	Robust and Scalable Differentiable Neural Computer for Question Answering
Authors	Jörg Franke, Jan Niehues, Alex Waibel
Abstract	Deep learning models are often not easily adaptable to new tasks and require task-specific adjustments. The differentiable neural computer (DNC), a memory-augmented neural network, is designed as a general problem solver which can be used in a wide range of tasks. But in reality, it is hard to apply this model to new tasks. We analyze the DNC and identify possible improvements within the application of question answering. This motivates a more robust and scalable DNC (rsDNC). The objective precondition is to keep the general character of this model intact while making its application more reliable and speeding up its required training time. The rsDNC is distinguished by a more robust training, a slim memory unit and a bidirectional architecture. We not only achieve new state-of-the-art performance on the bAbI task, but also minimize the performance variance between different initializations. Furthermore, we demonstrate the simplified applicability of the rsDNC to new tasks with passable results on the CNN RC task without adaptions.
Tasks	Question Answering
Published	2018-07-07
URL	http://arxiv.org/abs/1807.02658v1
PDF	http://arxiv.org/pdf/1807.02658v1.pdf
PWC	https://paperswithcode.com/paper/robust-and-scalable-differentiable-neural
Repo	https://github.com/joergfranke/ADNC
Framework	tf

Reducing Network Agnostophobia


Title	Reducing Network Agnostophobia
Authors	Akshay Raj Dhamija, Manuel Günther, Terrance E. Boult
Abstract	Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications. Unfortunately, network behavior is not well defined for inputs far from a networks training set. In an uncontrolled environment, networks face many instances that are not of interest to them and have to be rejected in order to avoid a false positive. This problem has previously been tackled by researchers by either a) thresholding softmax, which by construction cannot return “none of the known classes”, or b) using an additional background or garbage class. In this paper, we show that both of these approaches help, but are generally insufficient when previously unseen classes are encountered. We also introduce a new evaluation metric that focuses on comparing the performance of multiple approaches in scenarios where such unseen classes or unknowns are encountered. Our major contributions are simple yet effective Entropic Open-Set and Objectosphere losses that train networks using negative samples from some classes. These novel losses are designed to maximize entropy for unknown inputs while increasing separation in deep feature space by modifying magnitudes of known and unknown samples. Experiments on networks trained to classify classes from MNIST and CIFAR-10 show that our novel loss functions are significantly better at dealing with unknown inputs from datasets such as Devanagari, NotMNIST, CIFAR-100, and SVHN.
Tasks
Published	2018-11-09
URL	http://arxiv.org/abs/1811.04110v2
PDF	http://arxiv.org/pdf/1811.04110v2.pdf
PWC	https://paperswithcode.com/paper/reducing-network-agnostophobia
Repo	https://github.com/Vastlab/Reducing-Network-Agnostophobia
Framework	none

Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade


Title	Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Authors	Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Victor A. Prisacariu, Luigi Di Stefano, Philip H. S. Torr
Abstract	Camera pose estimation is an important problem in computer vision. Common techniques either match the current image against keyframes with known poses, directly regress the pose, or establish correspondences between keypoints in the image and points in the scene to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of accepting the camera pose hypothesis without question, we make it possible to score the final few hypotheses using a geometric approach and select the most promising; (ii) we chain several instantiations of our relocaliser together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade to achieve effective overall performance. These changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a way of visualising the internal behaviour of our forests and show how to entirely circumvent the need to pre-train a forest on a generic scene.
Tasks	Pose Estimation
Published	2018-10-29
URL	https://arxiv.org/abs/1810.12163v2
PDF	https://arxiv.org/pdf/1810.12163v2.pdf
PWC	https://paperswithcode.com/paper/real-time-rgb-d-camera-pose-estimation-in
Repo	https://github.com/torrvision/spaint
Framework	none

Symbolic Music Genre Transfer with CycleGAN


Title	Symbolic Music Genre Transfer with CycleGAN
Authors	Gino Brunner, Yuyi Wang, Roger Wattenhofer, Sumu Zhao
Abstract	Deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have recently been applied to style and domain transfer for images, and in the case of VAEs, music. GAN-based models employing several generators and some form of cycle consistency loss have been among the most successful for image domain transfer. In this paper we apply such a model to symbolic music and show the feasibility of our approach for music genre transfer. Evaluations using separate genre classifiers show that the style transfer works well. In order to improve the fidelity of the transformed music, we add additional discriminators that cause the generators to keep the structure of the original music mostly intact, while still achieving strong genre transfer. Visual and audible results further show the potential of our approach. To the best of our knowledge, this paper represents the first application of GANs to symbolic music domain transfer.
Tasks	Music Genre Transfer, Style Transfer
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07575v1
PDF	http://arxiv.org/pdf/1809.07575v1.pdf
PWC	https://paperswithcode.com/paper/symbolic-music-genre-transfer-with-cyclegan
Repo	https://github.com/sumuzhao/CycleGAN-Music-Style-Transfer
Framework	tf

Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction


Title	Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction
Authors	Zhiyong Cui, Ruimin Ke, Ziyuan Pu, Yinhai Wang
Abstract	Short-term traffic forecasting based on deep learning methods, especially long short-term memory (LSTM) neural networks, has received much attention in recent years. However, the potential of deep learning methods in traffic forecasting has not yet fully been exploited in terms of the depth of the model architecture, the spatial scale of the prediction area, and the predictive power of spatial-temporal data. In this paper, a deep stacked bidirectional and unidirectional LSTM (SBU- LSTM) neural network architecture is proposed, which considers both forward and backward dependencies in time series data, to predict network-wide traffic speed. A bidirectional LSTM (BDLSM) layer is exploited to capture spatial features and bidirectional temporal dependencies from historical data. To the best of our knowledge, this is the first time that BDLSTMs have been applied as building blocks for a deep architecture model to measure the backward dependency of traffic data for prediction. The proposed model can handle missing values in input data by using a masking mechanism. Further, this scalable model can predict traffic speed for both freeway and complex urban traffic networks. Comparisons with other classical and state-of-the-art models indicate that the proposed SBU-LSTM neural network achieves superior prediction performance for the whole traffic network in both accuracy and robustness.
Tasks	Time Series
Published	2018-01-07
URL	https://arxiv.org/abs/1801.02143v2
PDF	https://arxiv.org/pdf/1801.02143v2.pdf
PWC	https://paperswithcode.com/paper/deep-bidirectional-and-unidirectional-lstm
Repo	https://github.com/zhiyongc/Stacked_Bidirectional_Unidirectional_LSTM
Framework	pytorch

Dynamic Multimodal Instance Segmentation guided by natural language queries


Title	Dynamic Multimodal Instance Segmentation guided by natural language queries
Authors	Edgar Margffoy-Tuay, Juan C. Pérez, Emilio Botero, Pablo Arbeláez
Abstract	We address the problem of segmenting an object given a natural language expression that describes it. Current techniques tackle this task by either (\textit{i}) directly or recursively merging linguistic and visual information in the channel dimension and then performing convolutions; or by (\textit{ii}) mapping the expression to a space in which it can be thought of as a filter, whose response is directly related to the presence of the object at a given spatial coordinate in the image, so that a convolution can be applied to look for the object. We propose a novel method that integrates these two insights in order to fully exploit the recursive nature of language. Additionally, during the upsampling process, we take advantage of the intermediate information generated when downsampling the image, so that detailed segmentations can be obtained. We compare our method against the state-of-the-art approaches in four standard datasets, in which it surpasses all previous methods in six of eight of the splits for this task.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02257v2
PDF	http://arxiv.org/pdf/1807.02257v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-multimodal-instance-segmentation
Repo	https://github.com/BCV-Uniandes/query-objseg
Framework	pytorch

wav2letter++: The Fastest Open-source Speech Recognition System


Title	wav2letter++: The Fastest Open-source Speech Recognition System
Authors	Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert
Abstract	This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.
Tasks	Speech Recognition
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07625v1
PDF	http://arxiv.org/pdf/1812.07625v1.pdf
PWC	https://paperswithcode.com/paper/wav2letter-the-fastest-open-source-speech
Repo	https://github.com/mailong25/wav2letter
Framework	none

Step Size Matters in Deep Learning


Title	Step Size Matters in Deep Learning
Authors	Kamil Nar, S. Shankar Sastry
Abstract	Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system. Consequently, behaviors that are typically observed in these systems emerge during training, such as convergence to an orbit but not to a fixed point or dependence of convergence on the initialization. Step size of the algorithm plays a critical role in these behaviors: it determines the subset of the local optima that the algorithm can converge to, and it specifies the magnitude of the oscillations if the algorithm converges to an orbit. To elucidate the effects of the step size on training of neural networks, we study the gradient descent algorithm as a discrete-time dynamical system, and by analyzing the Lyapunov stability of different solutions, we show the relationship between the step size of the algorithm and the solutions that can be obtained with this algorithm. The results provide an explanation for several phenomena observed in practice, including the deterioration in the training error with increased depth, the hardness of estimating linear mappings with large singular values, and the distinct performance of deep residual networks.
Tasks
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08890v2
PDF	http://arxiv.org/pdf/1805.08890v2.pdf
PWC	https://paperswithcode.com/paper/step-size-matters-in-deep-learning
Repo	https://github.com/nar-k/NeurIPS-2018
Framework	none

Adversarial Attack and Defense on Graph Data: A Survey


Title	Adversarial Attack and Defense on Graph Data: A Survey
Authors	Lichao Sun, Yingtong Dou, Carl Yang, Ji Wang, Philip S. Yu, Bo Li
Abstract	Deep neural networks (DNNs) have been widely applied to various applications including image classification, text generation, audio recognition, and graph data analysis. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. Though there are several works studying adversarial attack and defense strategies on domains such as images and natural language processing, it is still difficult to directly transfer the learned knowledge to graph structure data due to its representation challenges. Given the importance of graph analysis, an increasing number of works start to analyze the robustness of machine learning models on graph data. Nevertheless, current studies considering adversarial behaviors on graph data usually focus on specific types of attacks with certain assumptions. In addition, each work proposes its own mathematical formulation which makes the comparison among different methods difficult. Therefore, in this paper, we aim to survey existing adversarial learning strategies on graph data and first provide a unified formulation for adversarial learning on graph data which covers most adversarial learning studies on graph. Moreover, we also compare different attacks and defenses on graph data and discuss their corresponding contributions and limitations. In this work, we systemically organize the considered works based on the features of each topic. This survey not only serves as a reference for the research community, but also brings a clear image researchers outside this research domain. Besides, we also create an online resource and keep updating the relevant papers during the last two years. More details of the comparisons of various studies based on this survey are open-sourced at https://github.com/YingtongDou/graph-adversarial-learning-literature.
Tasks	Adversarial Attack, Image Classification, Text Generation
Published	2018-12-26
URL	https://arxiv.org/abs/1812.10528v2
PDF	https://arxiv.org/pdf/1812.10528v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-attack-and-defense-on-graph-data
Repo	https://github.com/YingtongDou/graph-adversarial-learning-literature
Framework	none

DGC-Net: Dense Geometric Correspondence Network


Title	DGC-Net: Dense Geometric Correspondence Network
Authors	Iaroslav Melekhov, Aleksei Tiulpin, Torsten Sattler, Marc Pollefeys, Esa Rahtu, Juho Kannala
Abstract	This paper addresses the challenge of dense pixel correspondence estimation between two images. This problem is closely related to optical flow estimation task where ConvNets (CNNs) have recently achieved significant progress. While optical flow methods produce very accurate results for the small pixel translation and limited appearance variation scenarios, they hardly deal with the strong geometric transformations that we consider in this work. In this paper, we propose a coarse-to-fine CNN-based framework that can leverage the advantages of optical flow approaches and extend them to the case of large transformations providing dense and subpixel accurate estimates. It is trained on synthetic transformations and demonstrates very good performance to unseen, realistic, data. Further, we apply our method to the problem of relative camera pose estimation and demonstrate that the model outperforms existing dense approaches.
Tasks	Dense Pixel Correspondence Estimation, Optical Flow Estimation
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08393v2
PDF	http://arxiv.org/pdf/1810.08393v2.pdf
PWC	https://paperswithcode.com/paper/dgc-net-dense-geometric-correspondence
Repo	https://github.com/AaltoVision/DGC-Net
Framework	pytorch

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation


Title	ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation
Authors	Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha
Abstract	This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).
Tasks	Neural Architecture Search
Published	2018-12-21
URL	http://arxiv.org/abs/1812.08934v1
PDF	http://arxiv.org/pdf/1812.08934v1.pdf
PWC	https://paperswithcode.com/paper/chamnet-towards-efficient-network-design
Repo	https://github.com/facebookresearch/mobile-vision
Framework	caffe2