Paper Group AWR 48
Proximal Backpropagation. SSH: Single Stage Headless Face Detector. Video Enhancement with Task-Oriented Flow. Fine-grained human evaluation of neural versus phrase-based machine translation. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. Modeling Label Ambiguity for Neural List-Wise Learning to Rank. Heterogeneou …
Proximal Backpropagation
Title | Proximal Backpropagation |
Authors | Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers |
Abstract | We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size limitation of explicit gradient descent, which poses an impediment for optimization. ProxProp is developed from a general point of view on the backpropagation algorithm, currently the most common technique to train neural networks via stochastic gradient descent and variants thereof. Specifically, we show that backpropagation of a prediction error is equivalent to sequential gradient descent steps on a quadratic penalty energy, which comprises the network activations as variables of the optimization. We further analyze theoretical properties of ProxProp and in particular prove that the algorithm yields a descent direction in parameter space and can therefore be combined with a wide variety of convergent algorithms. Finally, we devise an efficient numerical implementation that integrates well with popular deep learning frameworks. We conclude by demonstrating promising numerical results and show that ProxProp can be effectively combined with common first order optimizers such as Adam. |
Tasks | |
Published | 2017-06-14 |
URL | http://arxiv.org/abs/1706.04638v3 |
http://arxiv.org/pdf/1706.04638v3.pdf | |
PWC | https://paperswithcode.com/paper/proximal-backpropagation |
Repo | https://github.com/tfrerix/proxprop |
Framework | pytorch |
SSH: Single Stage Headless Face Detector
Title | SSH: Single Stage Headless Face Detector |
Authors | Mahyar Najibi, Pouya Samangouei, Rama Chellappa, Larry Davis |
Abstract | We introduce the Single Stage Headless (SSH) face detector. Unlike two stage proposal-classification detectors, SSH detects faces in a single stage directly from the early convolutional layers in a classification network. SSH is headless. That is, it is able to achieve state-of-the-art results while removing the “head” of its underlying classification network – i.e. all fully connected layers in the VGG-16 which contains a large number of parameters. Additionally, instead of relying on an image pyramid to detect faces with various scales, SSH is scale-invariant by design. We simultaneously detect faces with different scales in a single forward pass of the network, but from different layers. These properties make SSH fast and light-weight. Surprisingly, with a headless VGG-16, SSH beats the ResNet-101-based state-of-the-art on the WIDER dataset. Even though, unlike the current state-of-the-art, SSH does not use an image pyramid and is 5X faster. Moreover, if an image pyramid is deployed, our light-weight network achieves state-of-the-art on all subsets of the WIDER dataset, improving the AP by 2.5%. SSH also reaches state-of-the-art results on the FDDB and Pascal-Faces datasets while using a small input size, leading to a runtime of 50 ms/image on a GPU. The code is available at https://github.com/mahyarnajibi/SSH. |
Tasks | |
Published | 2017-08-14 |
URL | http://arxiv.org/abs/1708.03979v3 |
http://arxiv.org/pdf/1708.03979v3.pdf | |
PWC | https://paperswithcode.com/paper/ssh-single-stage-headless-face-detector |
Repo | https://github.com/mahyarnajibi/SSH |
Framework | none |
Video Enhancement with Task-Oriented Flow
Title | Video Enhancement with Task-Oriented Flow |
Authors | Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman |
Abstract | Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. |
Tasks | Denoising, Motion Estimation, Optical Flow Estimation, Super-Resolution, Video Denoising, Video Frame Interpolation, Video Super-Resolution |
Published | 2017-11-24 |
URL | https://arxiv.org/abs/1711.09078v3 |
https://arxiv.org/pdf/1711.09078v3.pdf | |
PWC | https://paperswithcode.com/paper/video-enhancement-with-task-oriented-flow |
Repo | https://github.com/Coldog2333/pytoflow |
Framework | pytorch |
Fine-grained human evaluation of neural versus phrase-based machine translation
Title | Fine-grained human evaluation of neural versus phrase-based machine translation |
Authors | Filip Klubička, Antonio Toral, Víctor M. Sánchez-Cartagena |
Abstract | We compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems’ outputs. The error types in our annotation are compliant with the multidimensional quality metrics (MQM), and the annotation is performed by two annotators. Inter-annotator agreement is high for such a task, and results show that the best performing system (neural) reduces the errors produced by the worst system (phrase-based) by 54%. |
Tasks | Machine Translation |
Published | 2017-06-14 |
URL | http://arxiv.org/abs/1706.04389v1 |
http://arxiv.org/pdf/1706.04389v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-human-evaluation-of-neural |
Repo | https://github.com/GreenParachute/mqm-eng-cro |
Framework | none |
Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
Title | Language and Noise Transfer in Speech Enhancement Generative Adversarial Network |
Authors | Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn |
Abstract | Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training. Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types. |
Tasks | Speech Enhancement |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06340v1 |
http://arxiv.org/pdf/1712.06340v1.pdf | |
PWC | https://paperswithcode.com/paper/language-and-noise-transfer-in-speech |
Repo | https://github.com/rickyHong/segan-pytorch-repl |
Framework | pytorch |
Modeling Label Ambiguity for Neural List-Wise Learning to Rank
Title | Modeling Label Ambiguity for Neural List-Wise Learning to Rank |
Authors | Rolf Jagerman, Julia Kiseleva, Maarten de Rijke |
Abstract | List-wise learning to rank methods are considered to be the state-of-the-art. One of the major problems with these methods is that the ambiguous nature of relevance labels in learning to rank data is ignored. Ambiguity of relevance labels refers to the phenomenon that multiple documents may be assigned the same relevance label for a given query, so that no preference order should be learned for those documents. In this paper we propose a novel sampling technique for computing a list-wise loss that can take into account this ambiguity. We show the effectiveness of the proposed method by training a 3-layer deep neural network. We compare our new loss function to two strong baselines: ListNet and ListMLE. We show that our method generalizes better and significantly outperforms other methods on the validation and test sets. |
Tasks | Learning-To-Rank |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07493v1 |
http://arxiv.org/pdf/1707.07493v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-label-ambiguity-for-neural-list-wise |
Repo | https://github.com/rjagerman/shoelace |
Framework | none |
Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach
Title | Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach |
Authors | Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han |
Abstract | Relation extraction is a fundamental task in information extraction. Most existing methods have heavy reliance on annotations labeled by human experts, which are costly and time-consuming. To overcome this drawback, we propose a novel framework, REHession, to conduct relation extractor learning using annotations from heterogeneous information source, e.g., knowledge base and domain heuristics. These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance. Identifying context information as the backbone of both relation extraction and true label discovery, we adopt embedding techniques to learn the distributed representations of context, which bridges all components with mutual enhancement in an iterative fashion. Extensive experimental results demonstrate the superiority of REHession over the state-of-the-art. |
Tasks | Relation Extraction, Representation Learning |
Published | 2017-07-01 |
URL | http://arxiv.org/abs/1707.00166v2 |
http://arxiv.org/pdf/1707.00166v2.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneous-supervision-for-relation |
Repo | https://github.com/LiyuanLucasLiu/ReHession |
Framework | none |
Scalable Variational Inference for Dynamical Systems
Title | Scalable Variational Inference for Dynamical Systems |
Authors | Nico S. Gorbach, Stefan Bauer, Joachim M. Buhmann |
Abstract | Gradient matching is a promising tool for learning parameters and state dynamics of ordinary differential equations. It is a grid free inference approach, which, for fully observable systems is at times competitive with numerical integration. However, for many real-world applications, only sparse observations are available or even unobserved variables are included in the model description. In these cases most gradient matching methods are difficult to apply or simply do not provide satisfactory results. That is why, despite the high computational cost, numerical integration is still the gold standard in many applications. Using an existing gradient matching approach, we propose a scalable variational inference framework which can infer states and parameters simultaneously, offers computational speedups, improved accuracy and works well even under model misspecifications in a partially observable system. |
Tasks | |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07079v2 |
http://arxiv.org/pdf/1705.07079v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-variational-inference-for-dynamical |
Repo | https://github.com/ngorbach/Variational_Gradient_Matching_for_Dynamical_Systems |
Framework | none |
Towards dense object tracking in a 2D honeybee hive
Title | Towards dense object tracking in a 2D honeybee hive |
Authors | Katarzyna Bozek, Laetitia Hebert, Alexander S Mikheyev, Greg J Stephens |
Abstract | From human crowds to cells in tissue, the detection and efficient tracking of multiple objects in dense configurations is an important and unsolved problem. In the past, limitations of image analysis have restricted studies of dense groups to tracking a single or subset of marked individuals, or to coarse-grained group-level dynamics, all of which yield incomplete information. Here, we combine convolutional neural networks (CNNs) with the model environment of a honeybee hive to automatically recognize all individuals in a dense group from raw image data. We create new, adapted individual labeling and use the segmentation architecture U-Net with a loss function dependent on both object identity and orientation. We additionally exploit temporal regularities of the video recording in a recurrent manner and achieve near human-level performance while reducing the network size by 94% compared to the original U-Net architecture. Given our novel application of CNNs, we generate extensive problem-specific image data in which labeled examples are produced through a custom interface with Amazon Mechanical Turk. This dataset contains over 375,000 labeled bee instances across 720 video frames at 2 FPS, representing an extensive resource for the development and testing of tracking methods. We correctly detect 96% of individuals with a location error of ~7% of a typical body dimension, and orientation error of 12 degrees, approximating the variability of human raters. Our results provide an important step towards efficient image-based dense object tracking by allowing for the accurate determination of object location and orientation across time-series image data efficiently within one network architecture. |
Tasks | Object Detection, Object Tracking, Semantic Segmentation, Time Series |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08324v1 |
http://arxiv.org/pdf/1712.08324v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-dense-object-tracking-in-a-2d |
Repo | https://github.com/oist/DenseObjectDetection |
Framework | tf |
Tailoring Artificial Neural Networks for Optimal Learning
Title | Tailoring Artificial Neural Networks for Optimal Learning |
Authors | Pau Vilimelis Aceituno, Yan Gang, Yang-Yu Liu |
Abstract | As one of the most important paradigms of recurrent neural networks, the echo state network (ESN) has been applied to a wide range of fields, from robotics to medicine, finance, and language processing. A key feature of the ESN paradigm is its reservoir — a directed and weighted network of neurons that projects the input time series into a high dimensional space where linear regression or classification can be applied. Despite extensive studies, the impact of the reservoir network on the ESN performance remains unclear. Combining tools from physics, dynamical systems and network science, we attempt to open the black box of ESN and offer insights to understand the behavior of general artificial neural networks. Through spectral analysis of the reservoir network we reveal a key factor that largely determines the ESN memory capacity and hence affects its performance. Moreover, we find that adding short loops to the reservoir network can tailor ESN for specific tasks and optimize learning. We validate our findings by applying ESN to forecast both synthetic and real benchmark time series. Our results provide a new way to design task-specific ESN. More importantly, it demonstrates the power of combining tools from physics, dynamical systems and network science to offer new insights in understanding the mechanisms of general artificial neural networks. |
Tasks | Time Series |
Published | 2017-07-08 |
URL | https://arxiv.org/abs/1707.02469v4 |
https://arxiv.org/pdf/1707.02469v4.pdf | |
PWC | https://paperswithcode.com/paper/tailoring-artificial-neural-networks-for |
Repo | https://github.com/pvili/EchoStateNetworks_NetworkAdaptation |
Framework | none |
Generalized End-to-End Loss for Speaker Verification
Title | Generalized End-to-End Loss for Speaker Verification |
Authors | Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno |
Abstract | In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process. Additionally, the GE2E loss does not require an initial stage of example selection. With these properties, our model with the new loss function decreases speaker verification EER by more than 10%, while reducing the training time by 60% at the same time. We also introduce the MultiReader technique, which allows us to do domain adaptation - training a more accurate model that supports multiple keywords (i.e. “OK Google” and “Hey Google”) as well as multiple dialects. |
Tasks | Domain Adaptation, Speaker Verification |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10467v4 |
http://arxiv.org/pdf/1710.10467v4.pdf | |
PWC | https://paperswithcode.com/paper/generalized-end-to-end-loss-for-speaker |
Repo | https://github.com/aijianiula0601/ge2eloss-svf |
Framework | tf |
A-NICE-MC: Adversarial Training for MCMC
Title | A-NICE-MC: Adversarial Training for MCMC |
Authors | Jiaming Song, Shengjia Zhao, Stefano Ermon |
Abstract | Existing Markov Chain Monte Carlo (MCMC) methods are either based on general-purpose and domain-agnostic schemes which can lead to slow convergence, or hand-crafting of problem-specific proposals by an expert. We propose A-NICE-MC, a novel method to train flexible parametric Markov chain kernels to produce samples with desired properties. First, we propose an efficient likelihood-free adversarial training method to train a Markov chain and mimic a given data distribution. Then, we leverage flexible volume preserving flows to obtain parametric kernels for MCMC. Using a bootstrap approach, we show how to train efficient Markov chains to sample from a prescribed posterior distribution by iteratively improving the quality of both the model and the samples. A-NICE-MC provides the first framework to automatically design efficient domain-specific MCMC proposals. Empirical results demonstrate that A-NICE-MC combines the strong guarantees of MCMC with the expressiveness of deep neural networks, and is able to significantly outperform competing methods such as Hamiltonian Monte Carlo. |
Tasks | |
Published | 2017-06-23 |
URL | http://arxiv.org/abs/1706.07561v3 |
http://arxiv.org/pdf/1706.07561v3.pdf | |
PWC | https://paperswithcode.com/paper/a-nice-mc-adversarial-training-for-mcmc |
Repo | https://github.com/ermongroup/a-nice-mc |
Framework | tf |
Memory Aware Synapses: Learning what (not) to forget
Title | Memory Aware Synapses: Learning what (not) to forget |
Authors | Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars |
Abstract | Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose a novel approach for lifelong learning, coined Memory Aware Synapses (MAS). It computes the importance of the parameters of a neural network in an unsupervised and online manner. Given a new sample which is fed to the network, MAS accumulates an importance measure for each parameter of the network, based on how sensitive the predicted output function is to a change in this parameter. When learning a new task, changes to important parameters can then be penalized, effectively preventing important knowledge related to previous tasks from being overwritten. Further, we show an interesting connection between a local version of our method and Hebb’s rule,which is a model for the learning process in the brain. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding for predicting $<$subject, predicate, object$>$ triplets. We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions. |
Tasks | Object Recognition |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09601v4 |
http://arxiv.org/pdf/1711.09601v4.pdf | |
PWC | https://paperswithcode.com/paper/memory-aware-synapses-learning-what-not-to |
Repo | https://github.com/rahafaljundi/MAS-Memory-Aware-Synapses |
Framework | pytorch |
When Unsupervised Domain Adaptation Meets Tensor Representations
Title | When Unsupervised Domain Adaptation Meets Tensor Representations |
Authors | Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton van den Hengel |
Abstract | Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another. It is thus of great practical importance to the application of such methods. Despite the fact that tensor representations are widely used in Computer Vision to capture multi-linear relationships that affect the data, most existing DA methods are applicable to vectors only. This renders them incapable of reflecting and preserving important structure in many problems. We thus propose here a learning-based method to adapt the source and target tensor representations directly, without vectorization. In particular, a set of alignment matrices is introduced to align the tensor representations from both domains into the invariant tensor subspace. These alignment matrices and the tensor subspace are modeled as a joint optimization problem and can be learned adaptively from the data using the proposed alternative minimization scheme. Extensive experiments show that our approach is capable of preserving the discriminative power of the source domain, of resisting the effects of label noise, and works effectively for small sample sizes, and even one-shot DA. We show that our method outperforms the state-of-the-art on the task of cross-domain visual recognition in both efficacy and efficiency, and particularly that it outperforms all comparators when applied to DA of the convolutional activations of deep convolutional networks. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.05956v1 |
http://arxiv.org/pdf/1707.05956v1.pdf | |
PWC | https://paperswithcode.com/paper/when-unsupervised-domain-adaptation-meets |
Repo | https://github.com/poppinace/TAISL |
Framework | none |
Ligand Pose Optimization with Atomic Grid-Based Convolutional Neural Networks
Title | Ligand Pose Optimization with Atomic Grid-Based Convolutional Neural Networks |
Authors | Matthew Ragoza, Lillian Turner, David Ryan Koes |
Abstract | Docking is an important tool in computational drug discovery that aims to predict the binding pose of a ligand to a target protein through a combination of pose scoring and optimization. A scoring function that is differentiable with respect to atom positions can be used for both scoring and gradient-based optimization of poses for docking. Using a differentiable grid-based atomic representation as input, we demonstrate that a scoring function learned by training a convolutional neural network (CNN) to identify binding poses can also be applied to pose optimization. We also show that an iteratively-trained CNN that includes poses optimized by the first CNN in its training set performs even better at optimizing randomly initialized poses than either the first CNN scoring function or AutoDock Vina. |
Tasks | Drug Discovery |
Published | 2017-10-20 |
URL | http://arxiv.org/abs/1710.07400v1 |
http://arxiv.org/pdf/1710.07400v1.pdf | |
PWC | https://paperswithcode.com/paper/ligand-pose-optimization-with-atomic-grid |
Repo | https://github.com/gnina/gnina |
Framework | none |