Paper Group AWR 45
Graph-Sparse Logistic Regression. Smooth and Sparse Optimal Transport. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks. Truncating Wide Networks using Binary Tree Architectures. Structured Bayesian Pruning via Log-Normal Multiplicative Noise. LAP: a Linearize and Project Me …
Graph-Sparse Logistic Regression
Title | Graph-Sparse Logistic Regression |
Authors | Alexander LeNail, Ludwig Schmidt, Johnathan Li, Tobias Ehrenberger, Karen Sachs, Stefanie Jegelka, Ernest Fraenkel |
Abstract | We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package. |
Tasks | |
Published | 2017-12-15 |
URL | http://arxiv.org/abs/1712.05510v1 |
http://arxiv.org/pdf/1712.05510v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-sparse-logistic-regression |
Repo | https://github.com/fraenkel-lab/GSLR |
Framework | none |
Smooth and Sparse Optimal Transport
Title | Smooth and Sparse Optimal Transport |
Authors | Mathieu Blondel, Vivien Seguy, Antoine Rolet |
Abstract | Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, entropy keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack of sparsity can be problematic in applications where the transportation plan itself is of interest. In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations. We show how to incorporate squared $2$-norm and group lasso regularizations within that framework, leading to sparse and group-sparse transportation plans. On the theoretical side, we bound the approximation error introduced by regularizing the primal and dual formulations. Our results suggest that, for the regularized primal, the approximation error can often be smaller with squared $2$-norm than with entropic regularization. We showcase our proposed framework on the task of color transfer. |
Tasks | |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06276v2 |
http://arxiv.org/pdf/1710.06276v2.pdf | |
PWC | https://paperswithcode.com/paper/smooth-and-sparse-optimal-transport |
Repo | https://github.com/mblondel/smooth-ot |
Framework | none |
Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks
Title | Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks |
Authors | Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li |
Abstract | Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the rectangular orthogonal matrix can stabilize the distribution of network activations and regularize FNNs. We also propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal and back-propagates gradient information through the transformation during training. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design an orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method largely improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets. In particular, we have reduced the test error of wide residual network on CIFAR-100 from 20.04% to 18.61% with such simple substitution. Our code is available online for result reproduction. |
Tasks | Image Classification |
Published | 2017-09-16 |
URL | http://arxiv.org/abs/1709.06079v2 |
http://arxiv.org/pdf/1709.06079v2.pdf | |
PWC | https://paperswithcode.com/paper/orthogonal-weight-normalization-solution-to |
Repo | https://github.com/huangleiBuaa/OthogonalWN |
Framework | pytorch |
Truncating Wide Networks using Binary Tree Architectures
Title | Truncating Wide Networks using Binary Tree Architectures |
Authors | Yan Zhang, Mete Ozay, Shuohao Li, Takayuki Okatani |
Abstract | Recent study shows that a wide deep network can obtain accuracy comparable to a deeper but narrower network. Compared to narrower and deeper networks, wide networks employ relatively less number of layers and have various important benefits, such that they have less running time on parallel computing devices, and they are less affected by gradient vanishing problems. However, the parameter size of a wide network can be very large due to use of large width of each layer in the network. In order to keep the benefits of wide networks meanwhile improve the parameter size and accuracy trade-off of wide networks, we propose a binary tree architecture to truncate architecture of wide networks by reducing the width of the networks. More precisely, in the proposed architecture, the width is continuously reduced from lower layers to higher layers in order to increase the expressive capacity of network with a less increase on parameter size. Also, to ease the gradient vanishing problem, features obtained at different layers are concatenated to form the output of our architecture. By employing the proposed architecture on a baseline wide network, we can construct and train a new network with same depth but considerably less number of parameters. In our experimental analyses, we observe that the proposed architecture enables us to obtain better parameter size and accuracy trade-off compared to baseline networks using various benchmark image classification datasets. The results show that our model can decrease the classification error of baseline from 20.43% to 19.22% on Cifar-100 using only 28% of parameters that baseline has. Code is available at https://github.com/ZhangVision/bitnet. |
Tasks | Image Classification |
Published | 2017-04-03 |
URL | http://arxiv.org/abs/1704.00509v1 |
http://arxiv.org/pdf/1704.00509v1.pdf | |
PWC | https://paperswithcode.com/paper/truncating-wide-networks-using-binary-tree |
Repo | https://github.com/ZhangVision/bitnet |
Framework | torch |
Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Title | Structured Bayesian Pruning via Log-Normal Multiplicative Noise |
Authors | Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov |
Abstract | Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is computed in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer. |
Tasks | |
Published | 2017-05-20 |
URL | http://arxiv.org/abs/1705.07283v2 |
http://arxiv.org/pdf/1705.07283v2.pdf | |
PWC | https://paperswithcode.com/paper/structured-bayesian-pruning-via-log-normal |
Repo | https://github.com/maxblumental/variational-drouput |
Framework | pytorch |
LAP: a Linearize and Project Method for Solving Inverse Problems with Coupled Variables
Title | LAP: a Linearize and Project Method for Solving Inverse Problems with Coupled Variables |
Authors | James Herring, James Nagy, Lars Ruthotto |
Abstract | Many inverse problems involve two or more sets of variables that represent different physical quantities but are tightly coupled with each other. For example, image super-resolution requires joint estimation of the image and motion parameters from noisy measurements. Exploiting this structure is key for efficiently solving these large-scale optimization problems, which are often ill-conditioned. In this paper, we present a new method called Linearize And Project (LAP) that offers a flexible framework for solving inverse problems with coupled variables. LAP is most promising for cases when the subproblem corresponding to one of the variables is considerably easier to solve than the other. LAP is based on a Gauss-Newton method, and thus after linearizing the residual, it eliminates one block of variables through projection. Due to the linearization, this block can be chosen freely. Further, LAP supports direct, iterative, and hybrid regularization as well as constraints. Therefore LAP is attractive, e.g., for ill-posed imaging problems. These traits differentiate LAP from common alternatives for this type of problem such as variable projection (VarPro) and block coordinate descent (BCD). Our numerical experiments compare the performance of LAP to BCD and VarPro using three coupled problems whose forward operators are linear with respect to one block and nonlinear for the other set of variables. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2017-05-28 |
URL | http://arxiv.org/abs/1705.09992v3 |
http://arxiv.org/pdf/1705.09992v3.pdf | |
PWC | https://paperswithcode.com/paper/lap-a-linearize-and-project-method-for |
Repo | https://github.com/herrinj/LAP |
Framework | none |
Discrete Modeling of Multi-Transmitter Neural Networks with Neuron Competition
Title | Discrete Modeling of Multi-Transmitter Neural Networks with Neuron Competition |
Authors | Nikolay Bazenkov, Varvara Dyakonova, Oleg Kuznetsov, Dmitri Sakharov, Dmitry Vorontsov, Liudmila Zhilyakova |
Abstract | We propose a novel discrete model of central pattern generators (CPG), neuronal ensembles generating rhythmic activity. The model emphasizes the role of nonsynaptic interactions and the diversity of electrical properties in nervous systems. Neurons in the model release different neurotransmitters into the shared extracellular space (ECS) so each neuron with the appropriate set of receptors can receive signals from other neurons. We consider neurons, differing in their electrical activity, represented as finite-state machines functioning in discrete time steps. Discrete modeling is aimed to provide a computationally tractable and compact explanation of rhythmic pattern generation in nervous systems. The important feature of the model is the introduced mechanism of neuronal competition which is shown to be responsible for the generation of proper rhythms. The model is illustrated with two examples: a half-center oscillator considered to be a basic mechanism of emerging rhythmic activity and the well-studied feeding network of a pond snail. Future research will focus on the neuromodulatory effects ubiquitous in CPG networks and the whole nervous systems. |
Tasks | |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02176v2 |
http://arxiv.org/pdf/1705.02176v2.pdf | |
PWC | https://paperswithcode.com/paper/discrete-modeling-of-multi-transmitter-neural |
Repo | https://github.com/bazenkov/MultiCPG |
Framework | none |
Deep-Learnt Classification of Light Curves
Title | Deep-Learnt Classification of Light Curves |
Authors | Ashish Mahabal, Kshiteej Sheth, Fabian Gieseke, Akshay Pai, S. George Djorgovski, Andrew Drake, Matthew Graham, the CSS/CRTS/PTF Collaboration |
Abstract | Astronomy light curves are sparse, gappy, and heteroscedastic. As a result standard time series methods regularly used for financial and similar datasets are of little help and astronomers are usually left to their own instruments and techniques to classify light curves. A common approach is to derive statistical features from the time series and to use machine learning methods, generally supervised, to separate objects into a few of the standard classes. In this work, we transform the time series to two-dimensional light curve representations in order to classify them using modern deep learning techniques. In particular, we show that convolutional neural networks based classifiers work well for broad characterization and classification. We use labeled datasets of periodic variables from CRTS survey and show how this opens doors for a quick classification of diverse classes with several possible exciting extensions. |
Tasks | Time Series |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06257v1 |
http://arxiv.org/pdf/1709.06257v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learnt-classification-of-light-curves |
Repo | https://github.com/sakshambassi/DeepStarClassification |
Framework | none |
Fluency-Guided Cross-Lingual Image Captioning
Title | Fluency-Guided Cross-Lingual Image Captioning |
Authors | Weiyu Lan, Xirong Li, Jianfeng Dong |
Abstract | Image captioning has so far been explored mostly in English, as most available datasets are in this language. However, the application of image captioning should not be restricted by language. Only few studies have been conducted for image captioning in a cross-lingual setting. Different from these works that manually build a dataset for a target language, we aim to learn a cross-lingual captioning model fully from machine-translated sentences. To conquer the lack of fluency in the translated sentences, we propose in this paper a fluency-guided learning framework. The framework comprises a module to automatically estimate the fluency of the sentences and another module to utilize the estimated fluency scores to effectively train an image captioning model for the target language. As experiments on two bilingual (English-Chinese) datasets show, our approach improves both fluency and relevance of the generated captions in Chinese, but without using any manually written sentences from the target language. |
Tasks | Image Captioning |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04390v1 |
http://arxiv.org/pdf/1708.04390v1.pdf | |
PWC | https://paperswithcode.com/paper/fluency-guided-cross-lingual-image-captioning |
Repo | https://github.com/weiyuk/fluent-cap |
Framework | tf |
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Title | Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory |
Authors | Ron Amit, Ron Meir |
Abstract | In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are ‘related’ to previous tasks, the accumulated knowledge should be learned in a way which captures the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of new tasks. We present a framework for meta-learning that is based on generalization error bounds, allowing us to extend various PAC-Bayes bounds to meta-learning. Learning takes place through the construction of a distribution over hypotheses based on the observed tasks, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks. We develop a gradient-based algorithm which minimizes an objective function derived from the bounds and demonstrate its effectiveness numerically with deep neural networks. In addition to establishing the improved performance available through meta-learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network. |
Tasks | Meta-Learning |
Published | 2017-11-03 |
URL | https://arxiv.org/abs/1711.01244v8 |
https://arxiv.org/pdf/1711.01244v8.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-by-adjusting-priors-based-on |
Repo | https://github.com/ron-amit/meta-learning-adjusting-priors |
Framework | pytorch |
Deep Reinforcement Learning for Visual Object Tracking in Videos
Title | Deep Reinforcement Learning for Visual Object Tracking in Videos |
Authors | Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang |
Abstract | In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame. An important insight is that the tracking problem can be considered as a sequential decision-making process and historical semantics encode highly relevant information for future decisions. Based on this intuition, we formulate our model as a recurrent convolutional neural network agent that interacts with a video overtime, and our model can be trained with reinforcement learning (RL) algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. The proposed tracking algorithm achieves state-of-the-art performance in an existing tracking benchmark and operates at frame-rates faster than real-time. To the best of our knowledge, our tracker is the first neural-network tracker that combines convolutional and recurrent networks with RL algorithms. |
Tasks | Decision Making, Object Tracking, Visual Object Tracking, Visual Tracking |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1701.08936v2 |
http://arxiv.org/pdf/1701.08936v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-visual-object |
Repo | https://github.com/dazhang-cv/Project |
Framework | none |
Ethical Challenges in Data-Driven Dialogue Systems
Title | Ethical Challenges in Data-Driven Dialogue Systems |
Authors | Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary Ke, Genevieve Fried, Ryan Lowe, Joelle Pineau |
Abstract | The use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm. A growing number of dialogue systems use conversation strategies that are learned from large datasets. There are well documented instances where interactions with these system have resulted in biased or even offensive conversations due to the data-driven training process. Here, we highlight potential ethical issues that arise in dialogue systems research, including: implicit biases in data-driven systems, the rise of adversarial examples, potential sources of privacy violations, safety concerns, special considerations for reinforcement learning systems, and reproducibility concerns. We also suggest areas stemming from these issues that deserve further investigation. Through this initial survey, we hope to spur research leading to robust, safe, and ethically sound dialogue systems. |
Tasks | |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09050v1 |
http://arxiv.org/pdf/1711.09050v1.pdf | |
PWC | https://paperswithcode.com/paper/ethical-challenges-in-data-driven-dialogue |
Repo | https://github.com/Breakend/EthicsInDialogue |
Framework | none |
Generating Steganographic Images via Adversarial Training
Title | Generating Steganographic Images via Adversarial Training |
Authors | Jamie Hayes, George Danezis |
Abstract | Adversarial training was recently shown to be competitive against supervised learning methods on computer vision tasks, however, studies have mainly been confined to generative tasks such as image synthesis. In this paper, we apply adversarial training techniques to the discriminative task of learning a steganographic algorithm. Steganography is a collection of techniques for concealing information by embedding it within a non-secret medium, such as cover texts or images. We show that adversarial training can produce robust steganographic techniques: our unsupervised training scheme produces a steganographic algorithm that competes with state-of-the-art steganographic techniques, and produces a robust steganalyzer, which performs the discriminative task of deciding if an image contains secret information. We define a game between three parties, Alice, Bob and Eve, in order to simultaneously train both a steganographic algorithm and a steganalyzer. Alice and Bob attempt to communicate a secret message contained within an image, while Eve eavesdrops on their conversation and attempts to determine if secret information is embedded within the image. We represent Alice, Bob and Eve by neural networks, and validate our scheme on two independent image datasets, showing our novel method of studying steganographic problems is surprisingly competitive against established steganographic techniques. |
Tasks | Image Generation |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00371v3 |
http://arxiv.org/pdf/1703.00371v3.pdf | |
PWC | https://paperswithcode.com/paper/generating-steganographic-images-via |
Repo | https://github.com/zhangdi0220/nips |
Framework | tf |
Disentangled Person Image Generation
Title | Disentangled Person Image Generation |
Authors | Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, Mario Fritz |
Abstract | Generating novel, yet realistic, images of persons is a challenging task due to the complex interplay between the different image factors, such as the foreground, background and pose information. In this work, we aim at generating such images based on a novel, two-stage reconstruction pipeline that learns a disentangled representation of the aforementioned image factors and generates novel person images at the same time. First, a multi-branched reconstruction network is proposed to disentangle and encode the three factors into embedding features, which are then combined to re-compose the input image itself. Second, three corresponding mapping functions are learned in an adversarial manner in order to map Gaussian noise to the learned embedding feature space, for each factor respectively. Using the proposed framework, we can manipulate the foreground, background and pose of the input image, and also sample new embedding features to generate such targeted manipulations, that provide more control over the generation process. Experiments on Market-1501 and Deepfashion datasets show that our model does not only generate realistic person images with new foregrounds, backgrounds and poses, but also manipulates the generated factors and interpolates the in-between states. Another set of experiments on Market-1501 shows that our model can also be beneficial for the person re-identification task. |
Tasks | Gesture-to-Gesture Translation, Image Generation, Person Re-Identification, Pose Transfer |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02621v4 |
http://arxiv.org/pdf/1712.02621v4.pdf | |
PWC | https://paperswithcode.com/paper/disentangled-person-image-generation |
Repo | https://github.com/charliememory/Disentangled-Person-Image-Generation |
Framework | tf |
Convolutional Sequence to Sequence Learning
Title | Convolutional Sequence to Sequence Learning |
Authors | Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin |
Abstract | The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT’14 English-German and WMT’14 English-French translation at an order of magnitude faster speed, both on GPU and CPU. |
Tasks | Machine Translation |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.03122v3 |
http://arxiv.org/pdf/1705.03122v3.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-sequence-to-sequence-learning |
Repo | https://github.com/shashiongithub/XSum |
Framework | pytorch |