October 18, 2019

3166 words 15 mins read

Paper Group ANR 546

Adversarial examples from computational constraints. On Accurate Evaluation of GANs for Language Generation. Grounding Visual Explanations. Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification. GANs for generating EFT models. Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training. Dynamic Vid …

Adversarial examples from computational constraints


Title	Adversarial examples from computational constraints
Authors	Sébastien Bubeck, Eric Price, Ilya Razenshteyn
Abstract	Why are classifiers in high dimension vulnerable to “adversarial” perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10204v1
PDF	http://arxiv.org/pdf/1805.10204v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-examples-from-computational
Repo
Framework

On Accurate Evaluation of GANs for Language Generation


Title	On Accurate Evaluation of GANs for Language Generation
Authors	Stanislau Semeniuta, Aliaksei Severyn, Sylvain Gelly
Abstract	Generative Adversarial Networks (GANs) are a promising approach to language generation. The latest works introducing novel GAN models for language generation use n-gram based metrics for evaluation and only report single scores of the best run. In this paper, we argue that this often misrepresents the true picture and does not tell the full story, as GAN models can be extremely sensitive to the random initialization and small deviations from the best hyperparameter choice. In particular, we demonstrate that the previously used BLEU score is not sensitive to semantic deterioration of generated texts and propose alternative metrics that better capture the quality and diversity of the generated samples. We also conduct a set of experiments comparing a number of GAN models for text with a conventional Language Model (LM) and find that neither of the considered models performs convincingly better than the LM.
Tasks	Language Modelling, Text Generation
Published	2018-06-13
URL	https://arxiv.org/abs/1806.04936v3
PDF	https://arxiv.org/pdf/1806.04936v3.pdf
PWC	https://paperswithcode.com/paper/on-accurate-evaluation-of-gans-for-language
Repo
Framework

Grounding Visual Explanations


Title	Grounding Visual Explanations
Authors	Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata
Abstract	Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At inference time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image. Our explainable AI agent is capable of providing counter arguments for an alternative prediction, i.e. counterfactuals, along with explanations that justify the correct classification decisions. Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image. Moreover, on the FOIL tasks, our agent detects when there is a mistake in the sentence, grounds the incorrect phrase and corrects it significantly better than other models.
Tasks
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09685v2
PDF	http://arxiv.org/pdf/1807.09685v2.pdf
PWC	https://paperswithcode.com/paper/grounding-visual-explanations
Repo
Framework

Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification


Title	Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification
Authors	Dimitrios Milios, Raffaello Camoriano, Pietro Michiardi, Lorenzo Rosasco, Maurizio Filippone
Abstract	In this paper, we study the problem of deriving fast and accurate classification algorithms with uncertainty quantification. Gaussian process classification provides a principled approach, but the corresponding computational burden is hardly sustainable in large-scale problems and devising efficient alternatives is a challenge. In this work, we investigate if and how Gaussian process regression directly applied to the classification labels can be used to tackle this question. While in this case training time is remarkably faster, predictions need be calibrated for classification and uncertainty estimation. To this aim, we propose a novel approach based on interpreting the labels as the output of a Dirichlet distribution. Extensive experimental results show that the proposed approach provides essentially the same accuracy and uncertainty quantification of Gaussian process classification while requiring only a fraction of computational resources.
Tasks	Gaussian Processes
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10915v1
PDF	http://arxiv.org/pdf/1805.10915v1.pdf
PWC	https://paperswithcode.com/paper/dirichlet-based-gaussian-processes-for-large
Repo
Framework

GANs for generating EFT models


Title	GANs for generating EFT models
Authors	Harold Erbin, Sven Krippendorf
Abstract	We initiate a way of generating models by the computer, satisfying both experimental and theoretical constraints. In particular, we present a framework which allows the generation of effective field theories. We use Generative Adversarial Networks to generate these models and we generate examples which go beyond the examples known to the machine. As a starting point, we apply this idea to the generation of supersymmetric field theories. In this case, the machine knows consistent examples of supersymmetric field theories with a single field and generates new examples of such theories. In the generated potentials we find distinct properties, here the number of minima in the scalar potential, with values not found in the training data. We comment on potential further applications of this framework.
Tasks
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02612v1
PDF	http://arxiv.org/pdf/1809.02612v1.pdf
PWC	https://paperswithcode.com/paper/gans-for-generating-eft-models
Repo
Framework

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training


Title	Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training
Authors	Maohua Zhu, Jason Clemons, Jeff Pool, Minsoo Rhu, Stephen W. Keckler, Yuan Xie
Abstract	Exploiting sparsity enables hardware systems to run neural networks faster and more energy-efficiently. However, most prior sparsity-centric optimization techniques only accelerate the forward pass of neural networks and usually require an even longer training process with iterative pruning and retraining. We observe that artificially inducing sparsity in the gradients of the gates in an LSTM cell has little impact on the training quality. Further, we can enforce structured sparsity in the gate gradients to make the LSTM backward pass up to 45% faster than the state-of-the-art dense approach and 168% faster than the state-of-the-art sparsifying method on modern GPUs. Though the structured sparsifying method can impact the accuracy of a model, this performance gap can be eliminated by mixing our sparse training method and the standard dense training method. Experimental results show that the mixed method can achieve comparable results in a shorter time span than using purely dense training.
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00512v1
PDF	http://arxiv.org/pdf/1806.00512v1.pdf
PWC	https://paperswithcode.com/paper/structurally-sparsified-backward-propagation
Repo
Framework

Dynamic Video Segmentation Network


Title	Dynamic Video Segmentation Network
Authors	Yu-Syuan Xu, Tsu-Jui Fu, Hsuan-Kung Yang, Chun-Yi Lee
Abstract	In this paper, we present a detailed design of dynamic video segmentation network (DVSNet) for fast and efficient semantic video segmentation. DVSNet consists of two convolutional neural networks: a segmentation network and a flow network. The former generates highly accurate semantic segmentations, but is deeper and slower. The latter is much faster than the former, but its output requires further processing to generate less accurate semantic segmentations. We explore the use of a decision network to adaptively assign different frame regions to different networks based on a metric called expected confidence score. Frame regions with a higher expected confidence score traverse the flow network. Frame regions with a lower expected confidence score have to pass through the segmentation network. We have extensively performed experiments on various configurations of DVSNet, and investigated a number of variants for the proposed decision network. The experimental results show that our DVSNet is able to achieve up to 70.4% mIoU at 19.8 fps on the Cityscape dataset. A high speed version of DVSNet is able to deliver an fps of 30.4 with 63.2% mIoU on the same dataset. DVSNet is also able to reduce up to 95% of the computational workloads.
Tasks	Video Semantic Segmentation
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00931v2
PDF	http://arxiv.org/pdf/1804.00931v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-video-segmentation-network
Repo
Framework

Visual Depth Mapping from Monocular Images using Recurrent Convolutional Neural Networks


Title	Visual Depth Mapping from Monocular Images using Recurrent Convolutional Neural Networks
Authors	John Mern, Kyle Julian, Rachael E. Tompa, Mykel J. Kochenderfer
Abstract	A reliable sense-and-avoid system is critical to enabling safe autonomous operation of unmanned aircraft. Existing sense-and-avoid methods often require specialized sensors that are too large or power intensive for use on small unmanned vehicles. This paper presents a method to estimate object distances based on visual image sequences, allowing for the use of low-cost, on-board monocular cameras as simple collision avoidance sensors. We present a deep recurrent convolutional neural network and training method to generate depth maps from video sequences. Our network is trained using simulated camera and depth data generated with Microsoft’s AirSim simulator. Empirically, we show that our model achieves superior performance compared to models generated using prior methods.We further demonstrate that the method can be used for sense-and-avoid of obstacles in simulation.
Tasks	Depth And Camera Motion
Published	2018-12-10
URL	http://arxiv.org/abs/1812.04082v1
PDF	http://arxiv.org/pdf/1812.04082v1.pdf
PWC	https://paperswithcode.com/paper/visual-depth-mapping-from-monocular-images
Repo
Framework

CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF


Title	CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
Authors	Linchao Bao, Baoyuan Wu, Wei Liu
Abstract	This paper addresses the problem of video object segmentation, where the initial object mask is given in the first frame of an input video. We propose a novel spatio-temporal Markov Random Field (MRF) model defined over pixels to handle this problem. Unlike conventional MRF models, the spatial dependencies among pixels in our model are encoded by a Convolutional Neural Network (CNN). Specifically, for a given object, the probability of a labeling to a set of spatially neighboring pixels can be predicted by a CNN trained for this specific object. As a result, higher-order, richer dependencies among pixels in the set can be implicitly modeled by the CNN. With temporal dependencies established by optical flow, the resulting MRF model combines both spatial and temporal cues for tackling video object segmentation. However, performing inference in the MRF model is very difficult due to the very high-order dependencies. To this end, we propose a novel CNN-embedded algorithm to perform approximate inference in the MRF. This algorithm proceeds by alternating between a temporal fusion step and a feed-forward CNN step. When initialized with an appearance-based one-shot segmentation CNN, our model outperforms the winning entries of the DAVIS 2017 Challenge, without resorting to model ensembling or any dedicated detectors.
Tasks	One-Shot Segmentation, Optical Flow Estimation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09453v1
PDF	http://arxiv.org/pdf/1803.09453v1.pdf
PWC	https://paperswithcode.com/paper/cnn-in-mrf-video-object-segmentation-via
Repo
Framework

Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos


Title	Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos
Authors	C. Spampinato, S. Palazzo, P. D’Oro, D. Giordano, M. Shah
Abstract	Human behavior understanding in videos is a complex, still unsolved problem and requires to accurately model motion at both the local (pixel-wise dense prediction) and global (aggregation of motion cues) levels. Current approaches based on supervised learning require large amounts of annotated data, whose scarce availability is one of the main limiting factors to the development of general solutions. Unsupervised learning can instead leverage the vast amount of videos available on the web and it is a promising solution for overcoming the existing limitations. In this paper, we propose an adversarial GAN-based framework that learns video representations and dynamics through a self-supervision mechanism in order to perform dense and global prediction in videos. Our approach synthesizes videos by 1) factorizing the process into the generation of static visual content and motion, 2) learning a suitable representation of a motion latent space in order to enforce spatio-temporal coherency of object trajectories, and 3) incorporating motion estimation and pixel-wise dense prediction into the training procedure. Self-supervision is enforced by using motion masks produced by the generator, as a co-product of its generation process, to supervise the discriminator network in performing dense prediction. Performance evaluation, carried out on standard benchmarks, shows that our approach is able to learn, in an unsupervised way, both local and global video dynamics. The learned representations, then, support the training of video object segmentation methods with sensibly less (about 50%) annotations, giving performance comparable to the state of the art. Furthermore, the proposed method achieves promising performance in generating realistic videos, outperforming state-of-the-art approaches especially on motion-related metrics.
Tasks	Motion Estimation, Semantic Segmentation, Temporal Action Localization, Unsupervised Video Object Segmentation, Video Generation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-03-24
URL	https://arxiv.org/abs/1803.09092v2
PDF	https://arxiv.org/pdf/1803.09092v2.pdf
PWC	https://paperswithcode.com/paper/vos-gan-adversarial-learning-of-visual
Repo
Framework

Video Object Segmentation with Language Referring Expressions


Title	Video Object Segmentation with Language Referring Expressions
Authors	Anna Khoreva, Anna Rohrbach, Bernt Schiele
Abstract	Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and time-consuming. In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. Besides being a more practical and natural way of pointing out a target object, using language specifications can help to avoid drift as well as make the system more robust to complex dynamics and appearance variations. Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions. To evaluate our method we augment the popular video object segmentation benchmarks, DAVIS’16 and DAVIS’17 with language descriptions of target objects. We show that our language-supervised approach performs on par with the methods which have access to a pixel-level mask of the target object on DAVIS’16 and is competitive to methods using scribbles on the challenging DAVIS’17 dataset.
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-03-21
URL	http://arxiv.org/abs/1803.08006v3
PDF	http://arxiv.org/pdf/1803.08006v3.pdf
PWC	https://paperswithcode.com/paper/video-object-segmentation-with-language
Repo
Framework

Data Driven Governing Equations Approximation Using Deep Neural Networks


Title	Data Driven Governing Equations Approximation Using Deep Neural Networks
Authors	Tong Qin, Kailiang Wu, Dongbin Xiu
Abstract	We present a numerical framework for approximating unknown governing equations using observation data and deep neural networks (DNN). In particular, we propose to use residual network (ResNet) as the basic building block for equation approximation. We demonstrate that the ResNet block can be considered as a one-step method that is exact in temporal integration. We then present two multi-step methods, recurrent ResNet (RT-ResNet) method and recursive ReNet (RS-ResNet) method. The RT-ResNet is a multi-step method on uniform time steps, whereas the RS-ResNet is an adaptive multi-step method using variable time steps. All three methods presented here are based on integral form of the underlying dynamical system. As a result, they do not require time derivative data for equation recovery and can cope with relatively coarsely distributed trajectory data. Several numerical examples are presented to demonstrate the performance of the methods.
Tasks
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05537v1
PDF	http://arxiv.org/pdf/1811.05537v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-governing-equations-approximation
Repo
Framework

A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent


Title	A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent
Authors	Yongqiang Cai, Qianxiao Li, Zuowei Shen
Abstract	Despite its empirical success and recent theoretical progress, there generally lacks a quantitative analysis of the effect of batch normalization (BN) on the convergence and stability of gradient descent. In this paper, we provide such an analysis on the simple problem of ordinary least squares (OLS). Since precise dynamical properties of gradient descent (GD) is completely known for the OLS problem, it allows us to isolate and compare the additional effects of BN. More precisely, we show that unlike GD, gradient descent with BN (BNGD) converges for arbitrary learning rates for the weights, and the convergence remains linear under mild conditions. Moreover, we quantify two different sources of acceleration of BNGD over GD – one due to over-parameterization which improves the effective condition number and another due having a large range of learning rates giving rise to fast descent. These phenomena set BNGD apart from GD and could account for much of its robustness properties. These findings are confirmed quantitatively by numerical experiments, which further show that many of the uncovered properties of BNGD in OLS are also observed qualitatively in more complex supervised learning problems.
Tasks
Published	2018-09-29
URL	https://arxiv.org/abs/1810.00122v2
PDF	https://arxiv.org/pdf/1810.00122v2.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-and-robustness-of-batch
Repo
Framework

Multi-task dialog act and sentiment recognition on Mastodon


Title	Multi-task dialog act and sentiment recognition on Mastodon
Authors	Christophe Cerisara, Somayeh Jafaritazehjani, Adedayo Oluokun, Hoa Le
Abstract	Because of license restrictions, it often becomes impossible to strictly reproduce most research results on Twitter data already a few months after the creation of the corpus. This situation worsened gradually as time passes and tweets become inaccessible. This is a critical issue for reproducible and accountable research on social media. We partly solve this challenge by annotating a new Twitter-like corpus from an alternative large social medium with licenses that are compatible with reproducible experiments: Mastodon. We manually annotate both dialogues and sentiments on this corpus, and train a multi-task hierarchical recurrent network on joint sentiment and dialog act recognition. We experimentally demonstrate that transfer learning may be efficiently achieved between both tasks, and further analyze some specific correlations between sentiments and dialogues on social media. Both the annotated corpus and deep network are released with an open-source license.
Tasks	Transfer Learning
Published	2018-07-13
URL	http://arxiv.org/abs/1807.05013v1
PDF	http://arxiv.org/pdf/1807.05013v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-dialog-act-and-sentiment
Repo
Framework

Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation


Title	Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation
Authors	Xiaoxiao Li, Chen Change Loy
Abstract	The problem of video object segmentation can become extremely challenging when multiple instances co-exist. While each instance may exhibit large scale and pose variations, the problem is compounded when instances occlude each other causing failures in tracking. In this study, we formulate a deep recurrent network that is capable of segmenting and tracking objects in video simultaneously by their temporal continuity, yet able to re-identify them when they re-appear after a prolonged occlusion. We combine both temporal propagation and re-identification functionalities into a single framework that can be trained end-to-end. In particular, we present a re-identification module with template expansion to retrieve missing objects despite their large appearance changes. In addition, we contribute a new attention-based recurrent mask propagation approach that is robust to distractors not belonging to the target segment. Our approach achieves a new state-of-the-art global mean (Region Jaccard and Boundary F measure) of 68.2 on the challenging DAVIS 2017 benchmark (test-dev set), outperforming the winning solution which achieves a global mean of 66.1 on the same partition.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04242v2
PDF	http://arxiv.org/pdf/1803.04242v2.pdf
PWC	https://paperswithcode.com/paper/video-object-segmentation-with-joint-re
Repo
Framework