July 30, 2019

3159 words 15 mins read

Paper Group AWR 48

Proximal Backpropagation. SSH: Single Stage Headless Face Detector. Video Enhancement with Task-Oriented Flow. Fine-grained human evaluation of neural versus phrase-based machine translation. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. Modeling Label Ambiguity for Neural List-Wise Learning to Rank. Heterogeneou …

Proximal Backpropagation


Title	Proximal Backpropagation
Authors	Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers
Abstract	We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size limitation of explicit gradient descent, which poses an impediment for optimization. ProxProp is developed from a general point of view on the backpropagation algorithm, currently the most common technique to train neural networks via stochastic gradient descent and variants thereof. Specifically, we show that backpropagation of a prediction error is equivalent to sequential gradient descent steps on a quadratic penalty energy, which comprises the network activations as variables of the optimization. We further analyze theoretical properties of ProxProp and in particular prove that the algorithm yields a descent direction in parameter space and can therefore be combined with a wide variety of convergent algorithms. Finally, we devise an efficient numerical implementation that integrates well with popular deep learning frameworks. We conclude by demonstrating promising numerical results and show that ProxProp can be effectively combined with common first order optimizers such as Adam.
Tasks
Published	2017-06-14
URL	http://arxiv.org/abs/1706.04638v3
PDF	http://arxiv.org/pdf/1706.04638v3.pdf
PWC	https://paperswithcode.com/paper/proximal-backpropagation
Repo	https://github.com/tfrerix/proxprop
Framework	pytorch

SSH: Single Stage Headless Face Detector


Title	SSH: Single Stage Headless Face Detector
Authors	Mahyar Najibi, Pouya Samangouei, Rama Chellappa, Larry Davis
Abstract	We introduce the Single Stage Headless (SSH) face detector. Unlike two stage proposal-classification detectors, SSH detects faces in a single stage directly from the early convolutional layers in a classification network. SSH is headless. That is, it is able to achieve state-of-the-art results while removing the “head” of its underlying classification network – i.e. all fully connected layers in the VGG-16 which contains a large number of parameters. Additionally, instead of relying on an image pyramid to detect faces with various scales, SSH is scale-invariant by design. We simultaneously detect faces with different scales in a single forward pass of the network, but from different layers. These properties make SSH fast and light-weight. Surprisingly, with a headless VGG-16, SSH beats the ResNet-101-based state-of-the-art on the WIDER dataset. Even though, unlike the current state-of-the-art, SSH does not use an image pyramid and is 5X faster. Moreover, if an image pyramid is deployed, our light-weight network achieves state-of-the-art on all subsets of the WIDER dataset, improving the AP by 2.5%. SSH also reaches state-of-the-art results on the FDDB and Pascal-Faces datasets while using a small input size, leading to a runtime of 50 ms/image on a GPU. The code is available at https://github.com/mahyarnajibi/SSH.
Tasks
Published	2017-08-14
URL	http://arxiv.org/abs/1708.03979v3
PDF	http://arxiv.org/pdf/1708.03979v3.pdf
PWC	https://paperswithcode.com/paper/ssh-single-stage-headless-face-detector
Repo	https://github.com/mahyarnajibi/SSH
Framework	none

Video Enhancement with Task-Oriented Flow


Title	Video Enhancement with Task-Oriented Flow
Authors	Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman
Abstract	Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
Tasks	Denoising, Motion Estimation, Optical Flow Estimation, Super-Resolution, Video Denoising, Video Frame Interpolation, Video Super-Resolution
Published	2017-11-24
URL	https://arxiv.org/abs/1711.09078v3
PDF	https://arxiv.org/pdf/1711.09078v3.pdf
PWC	https://paperswithcode.com/paper/video-enhancement-with-task-oriented-flow
Repo	https://github.com/Coldog2333/pytoflow
Framework	pytorch

Fine-grained human evaluation of neural versus phrase-based machine translation


Title	Fine-grained human evaluation of neural versus phrase-based machine translation
Authors	Filip Klubička, Antonio Toral, Víctor M. Sánchez-Cartagena
Abstract	We compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems’ outputs. The error types in our annotation are compliant with the multidimensional quality metrics (MQM), and the annotation is performed by two annotators. Inter-annotator agreement is high for such a task, and results show that the best performing system (neural) reduces the errors produced by the worst system (phrase-based) by 54%.
Tasks	Machine Translation
Published	2017-06-14
URL	http://arxiv.org/abs/1706.04389v1
PDF	http://arxiv.org/pdf/1706.04389v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-human-evaluation-of-neural
Repo	https://github.com/GreenParachute/mqm-eng-cro
Framework	none

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network


Title	Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
Authors	Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
Abstract	Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training. Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types.
Tasks	Speech Enhancement
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06340v1
PDF	http://arxiv.org/pdf/1712.06340v1.pdf
PWC	https://paperswithcode.com/paper/language-and-noise-transfer-in-speech
Repo	https://github.com/rickyHong/segan-pytorch-repl
Framework	pytorch

Modeling Label Ambiguity for Neural List-Wise Learning to Rank


Title	Modeling Label Ambiguity for Neural List-Wise Learning to Rank
Authors	Rolf Jagerman, Julia Kiseleva, Maarten de Rijke
Abstract	List-wise learning to rank methods are considered to be the state-of-the-art. One of the major problems with these methods is that the ambiguous nature of relevance labels in learning to rank data is ignored. Ambiguity of relevance labels refers to the phenomenon that multiple documents may be assigned the same relevance label for a given query, so that no preference order should be learned for those documents. In this paper we propose a novel sampling technique for computing a list-wise loss that can take into account this ambiguity. We show the effectiveness of the proposed method by training a 3-layer deep neural network. We compare our new loss function to two strong baselines: ListNet and ListMLE. We show that our method generalizes better and significantly outperforms other methods on the validation and test sets.
Tasks	Learning-To-Rank
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07493v1
PDF	http://arxiv.org/pdf/1707.07493v1.pdf
PWC	https://paperswithcode.com/paper/modeling-label-ambiguity-for-neural-list-wise
Repo	https://github.com/rjagerman/shoelace
Framework	none

Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach


Title	Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach
Authors	Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han
Abstract	Relation extraction is a fundamental task in information extraction. Most existing methods have heavy reliance on annotations labeled by human experts, which are costly and time-consuming. To overcome this drawback, we propose a novel framework, REHession, to conduct relation extractor learning using annotations from heterogeneous information source, e.g., knowledge base and domain heuristics. These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance. Identifying context information as the backbone of both relation extraction and true label discovery, we adopt embedding techniques to learn the distributed representations of context, which bridges all components with mutual enhancement in an iterative fashion. Extensive experimental results demonstrate the superiority of REHession over the state-of-the-art.
Tasks	Relation Extraction, Representation Learning
Published	2017-07-01
URL	http://arxiv.org/abs/1707.00166v2
PDF	http://arxiv.org/pdf/1707.00166v2.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-supervision-for-relation
Repo	https://github.com/LiyuanLucasLiu/ReHession
Framework	none

Scalable Variational Inference for Dynamical Systems


Title	Scalable Variational Inference for Dynamical Systems
Authors	Nico S. Gorbach, Stefan Bauer, Joachim M. Buhmann
Abstract	Gradient matching is a promising tool for learning parameters and state dynamics of ordinary differential equations. It is a grid free inference approach, which, for fully observable systems is at times competitive with numerical integration. However, for many real-world applications, only sparse observations are available or even unobserved variables are included in the model description. In these cases most gradient matching methods are difficult to apply or simply do not provide satisfactory results. That is why, despite the high computational cost, numerical integration is still the gold standard in many applications. Using an existing gradient matching approach, we propose a scalable variational inference framework which can infer states and parameters simultaneously, offers computational speedups, improved accuracy and works well even under model misspecifications in a partially observable system.
Tasks
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07079v2
PDF	http://arxiv.org/pdf/1705.07079v2.pdf
PWC	https://paperswithcode.com/paper/scalable-variational-inference-for-dynamical
Repo	https://github.com/ngorbach/Variational_Gradient_Matching_for_Dynamical_Systems
Framework	none

Towards dense object tracking in a 2D honeybee hive


Title	Towards dense object tracking in a 2D honeybee hive
Authors	Katarzyna Bozek, Laetitia Hebert, Alexander S Mikheyev, Greg J Stephens
Abstract	From human crowds to cells in tissue, the detection and efficient tracking of multiple objects in dense configurations is an important and unsolved problem. In the past, limitations of image analysis have restricted studies of dense groups to tracking a single or subset of marked individuals, or to coarse-grained group-level dynamics, all of which yield incomplete information. Here, we combine convolutional neural networks (CNNs) with the model environment of a honeybee hive to automatically recognize all individuals in a dense group from raw image data. We create new, adapted individual labeling and use the segmentation architecture U-Net with a loss function dependent on both object identity and orientation. We additionally exploit temporal regularities of the video recording in a recurrent manner and achieve near human-level performance while reducing the network size by 94% compared to the original U-Net architecture. Given our novel application of CNNs, we generate extensive problem-specific image data in which labeled examples are produced through a custom interface with Amazon Mechanical Turk. This dataset contains over 375,000 labeled bee instances across 720 video frames at 2 FPS, representing an extensive resource for the development and testing of tracking methods. We correctly detect 96% of individuals with a location error of ~7% of a typical body dimension, and orientation error of 12 degrees, approximating the variability of human raters. Our results provide an important step towards efficient image-based dense object tracking by allowing for the accurate determination of object location and orientation across time-series image data efficiently within one network architecture.
Tasks	Object Detection, Object Tracking, Semantic Segmentation, Time Series
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08324v1
PDF	http://arxiv.org/pdf/1712.08324v1.pdf
PWC	https://paperswithcode.com/paper/towards-dense-object-tracking-in-a-2d
Repo	https://github.com/oist/DenseObjectDetection
Framework	tf

Tailoring Artificial Neural Networks for Optimal Learning


Title	Tailoring Artificial Neural Networks for Optimal Learning
Authors	Pau Vilimelis Aceituno, Yan Gang, Yang-Yu Liu
Abstract	As one of the most important paradigms of recurrent neural networks, the echo state network (ESN) has been applied to a wide range of fields, from robotics to medicine, finance, and language processing. A key feature of the ESN paradigm is its reservoir — a directed and weighted network of neurons that projects the input time series into a high dimensional space where linear regression or classification can be applied. Despite extensive studies, the impact of the reservoir network on the ESN performance remains unclear. Combining tools from physics, dynamical systems and network science, we attempt to open the black box of ESN and offer insights to understand the behavior of general artificial neural networks. Through spectral analysis of the reservoir network we reveal a key factor that largely determines the ESN memory capacity and hence affects its performance. Moreover, we find that adding short loops to the reservoir network can tailor ESN for specific tasks and optimize learning. We validate our findings by applying ESN to forecast both synthetic and real benchmark time series. Our results provide a new way to design task-specific ESN. More importantly, it demonstrates the power of combining tools from physics, dynamical systems and network science to offer new insights in understanding the mechanisms of general artificial neural networks.
Tasks	Time Series
Published	2017-07-08
URL	https://arxiv.org/abs/1707.02469v4
PDF	https://arxiv.org/pdf/1707.02469v4.pdf
PWC	https://paperswithcode.com/paper/tailoring-artificial-neural-networks-for
Repo	https://github.com/pvili/EchoStateNetworks_NetworkAdaptation
Framework	none

Generalized End-to-End Loss for Speaker Verification


Title	Generalized End-to-End Loss for Speaker Verification
Authors	Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno
Abstract	In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process. Additionally, the GE2E loss does not require an initial stage of example selection. With these properties, our model with the new loss function decreases speaker verification EER by more than 10%, while reducing the training time by 60% at the same time. We also introduce the MultiReader technique, which allows us to do domain adaptation - training a more accurate model that supports multiple keywords (i.e. “OK Google” and “Hey Google”) as well as multiple dialects.
Tasks	Domain Adaptation, Speaker Verification
Published	2017-10-28
URL	http://arxiv.org/abs/1710.10467v4
PDF	http://arxiv.org/pdf/1710.10467v4.pdf
PWC	https://paperswithcode.com/paper/generalized-end-to-end-loss-for-speaker
Repo	https://github.com/aijianiula0601/ge2eloss-svf
Framework	tf

A-NICE-MC: Adversarial Training for MCMC


Title	A-NICE-MC: Adversarial Training for MCMC
Authors	Jiaming Song, Shengjia Zhao, Stefano Ermon
Abstract	Existing Markov Chain Monte Carlo (MCMC) methods are either based on general-purpose and domain-agnostic schemes which can lead to slow convergence, or hand-crafting of problem-specific proposals by an expert. We propose A-NICE-MC, a novel method to train flexible parametric Markov chain kernels to produce samples with desired properties. First, we propose an efficient likelihood-free adversarial training method to train a Markov chain and mimic a given data distribution. Then, we leverage flexible volume preserving flows to obtain parametric kernels for MCMC. Using a bootstrap approach, we show how to train efficient Markov chains to sample from a prescribed posterior distribution by iteratively improving the quality of both the model and the samples. A-NICE-MC provides the first framework to automatically design efficient domain-specific MCMC proposals. Empirical results demonstrate that A-NICE-MC combines the strong guarantees of MCMC with the expressiveness of deep neural networks, and is able to significantly outperform competing methods such as Hamiltonian Monte Carlo.
Tasks
Published	2017-06-23
URL	http://arxiv.org/abs/1706.07561v3
PDF	http://arxiv.org/pdf/1706.07561v3.pdf
PWC	https://paperswithcode.com/paper/a-nice-mc-adversarial-training-for-mcmc
Repo	https://github.com/ermongroup/a-nice-mc
Framework	tf

Memory Aware Synapses: Learning what (not) to forget


Title	Memory Aware Synapses: Learning what (not) to forget
Authors	Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars
Abstract	Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose a novel approach for lifelong learning, coined Memory Aware Synapses (MAS). It computes the importance of the parameters of a neural network in an unsupervised and online manner. Given a new sample which is fed to the network, MAS accumulates an importance measure for each parameter of the network, based on how sensitive the predicted output function is to a change in this parameter. When learning a new task, changes to important parameters can then be penalized, effectively preventing important knowledge related to previous tasks from being overwritten. Further, we show an interesting connection between a local version of our method and Hebb’s rule,which is a model for the learning process in the brain. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding for predicting $<$subject, predicate, object$>$ triplets. We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.
Tasks	Object Recognition
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09601v4
PDF	http://arxiv.org/pdf/1711.09601v4.pdf
PWC	https://paperswithcode.com/paper/memory-aware-synapses-learning-what-not-to
Repo	https://github.com/rahafaljundi/MAS-Memory-Aware-Synapses
Framework	pytorch

When Unsupervised Domain Adaptation Meets Tensor Representations


Title	When Unsupervised Domain Adaptation Meets Tensor Representations
Authors	Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton van den Hengel
Abstract	Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another. It is thus of great practical importance to the application of such methods. Despite the fact that tensor representations are widely used in Computer Vision to capture multi-linear relationships that affect the data, most existing DA methods are applicable to vectors only. This renders them incapable of reflecting and preserving important structure in many problems. We thus propose here a learning-based method to adapt the source and target tensor representations directly, without vectorization. In particular, a set of alignment matrices is introduced to align the tensor representations from both domains into the invariant tensor subspace. These alignment matrices and the tensor subspace are modeled as a joint optimization problem and can be learned adaptively from the data using the proposed alternative minimization scheme. Extensive experiments show that our approach is capable of preserving the discriminative power of the source domain, of resisting the effects of label noise, and works effectively for small sample sizes, and even one-shot DA. We show that our method outperforms the state-of-the-art on the task of cross-domain visual recognition in both efficacy and efficiency, and particularly that it outperforms all comparators when applied to DA of the convolutional activations of deep convolutional networks.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2017-07-19
URL	http://arxiv.org/abs/1707.05956v1
PDF	http://arxiv.org/pdf/1707.05956v1.pdf
PWC	https://paperswithcode.com/paper/when-unsupervised-domain-adaptation-meets
Repo	https://github.com/poppinace/TAISL
Framework	none

Ligand Pose Optimization with Atomic Grid-Based Convolutional Neural Networks


Title	Ligand Pose Optimization with Atomic Grid-Based Convolutional Neural Networks
Authors	Matthew Ragoza, Lillian Turner, David Ryan Koes
Abstract	Docking is an important tool in computational drug discovery that aims to predict the binding pose of a ligand to a target protein through a combination of pose scoring and optimization. A scoring function that is differentiable with respect to atom positions can be used for both scoring and gradient-based optimization of poses for docking. Using a differentiable grid-based atomic representation as input, we demonstrate that a scoring function learned by training a convolutional neural network (CNN) to identify binding poses can also be applied to pose optimization. We also show that an iteratively-trained CNN that includes poses optimized by the first CNN in its training set performs even better at optimizing randomly initialized poses than either the first CNN scoring function or AutoDock Vina.
Tasks	Drug Discovery
Published	2017-10-20
URL	http://arxiv.org/abs/1710.07400v1
PDF	http://arxiv.org/pdf/1710.07400v1.pdf
PWC	https://paperswithcode.com/paper/ligand-pose-optimization-with-atomic-grid
Repo	https://github.com/gnina/gnina
Framework	none