Paper Group ANR 473
Efficient Online Convex Optimization with Adaptively Minimax Optimal Dynamic Regret. Deep Music Analogy Via Latent Representation Disentanglement. Depth from a polarisation + RGB stereo pair. XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. Deep learning methods in speaker recognition: a review. How noise affects the Hessian …
Efficient Online Convex Optimization with Adaptively Minimax Optimal Dynamic Regret
Title | Efficient Online Convex Optimization with Adaptively Minimax Optimal Dynamic Regret |
Authors | Hakan Gokcesu, S. Serdar Kozat |
Abstract | We introduce an online convex optimization algorithm using projected sub-gradient descent with ideal adaptive learning rates, where each computation is efficiently done in a sequential manner. For the first time in the literature, this algorithm provides an adaptively minimax optimal dynamic regret guarantee for a sequence of convex functions without any restrictions – such as strong convexity, smoothness or even Lipschitz continuity – against a comparator decision sequence with bounded total successive changes. We show optimality by generating the worst-case dynamic regret adaptive lower bound, which constitutes of actual sub-gradient norms and matches with our guarantees. We discuss the advantages of our algorithm as opposed to adaptive projection with sub-gradient self outer products and also derive the extension for independent learning in each decision coordinate separately. Additionally, we demonstrate how to best preserve our guarantees when the bound on total successive changes in the dynamic comparator sequence grows as time goes, in a truly online manner. |
Tasks | |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00497v1 |
https://arxiv.org/pdf/1907.00497v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-online-convex-optimization-with |
Repo | |
Framework | |
Deep Music Analogy Via Latent Representation Disentanglement
Title | Deep Music Analogy Via Latent Representation Disentanglement |
Authors | Ruihan Yang, Dingsu Wang, Ziyu Wang, Tianyao Chen, Junyan Jiang, Gus Xia |
Abstract | Analogy-making is a key method for computer algorithms to generate both natural and creative music pieces. In general, an analogy is made by partially transferring the music abstractions, i.e., high-level representations and their relationships, from one piece to another; however, this procedure requires disentangling music representations, which usually takes little effort for musicians but is non-trivial for computers. Three sub-problems arise: extracting latent representations from the observation, disentangling the representations so that each part has a unique semantic interpretation, and mapping the latent representations back to actual music. In this paper, we contribute an explicitly-constrained variational autoencoder (EC$^2$-VAE) as a unified solution to all three sub-problems. We focus on disentangling the pitch and rhythm representations of 8-beat music clips conditioned on chords. In producing music analogies, this model helps us to realize the imaginary situation of “what if” a piece is composed using a different pitch contour, rhythm pattern, or chord progression by borrowing the representations from other pieces. Finally, we validate the proposed disentanglement method using objective measurements and evaluate the analogy examples by a subjective study. |
Tasks | |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03626v4 |
https://arxiv.org/pdf/1906.03626v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-music-analogy-via-latent-representation |
Repo | |
Framework | |
Depth from a polarisation + RGB stereo pair
Title | Depth from a polarisation + RGB stereo pair |
Authors | Dizhong Zhu, William A. P. Smith |
Abstract | In this paper, we propose a hybrid depth imaging system in which a polarisation camera is augmented by a second image from a standard digital camera. For this modest increase in equipment complexity over conventional shape-from-polarisation, we obtain a number of benefits that enable us to overcome longstanding problems with the polarisation shape cue. The stereo cue provides a depth map which, although coarse, is metrically accurate. This is used as a guide surface for disambiguation of the polarisation surface normal estimates using a higher order graphical model. In turn, these are used to estimate diffuse albedo. By extending a previous shape-from-polarisation method to the perspective case, we show how to compute dense, detailed maps of absolute depth, while retaining a linear formulation. We show that our hybrid method is able to recover dense 3D geometry that is superior to state-of-the-art shape-from-polarisation or two view stereo alone. |
Tasks | |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12061v2 |
http://arxiv.org/pdf/1903.12061v2.pdf | |
PWC | https://paperswithcode.com/paper/depth-from-a-polarisation-rgb-stereo-pair |
Repo | |
Framework | |
XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training
Title | XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training |
Authors | Lei Guan, Wotao Yin, Dongsheng Li, Xicheng Lu |
Abstract | We propose XPipe, an efficient asynchronous pipeline model parallelism approach for multi-GPU DNN training. XPipe is designed to make use of multiple GPUs to concurrently and continuously train different parts of a DNN model. To improve GPU utilization and achieve high throughput, it splits a mini-batch into a set of micro-batches and allows the overlapping of the pipelines of multiple micro-batches, including those belonging to different mini-batches. Most importantly, the novel weight prediction strategy adopted by XPipe enables it to effectively address the weight inconsistency and staleness issues incurred by the asynchronous pipeline parallelism. As a result, XPipe incorporates the advantages of both synchronous and asynchronous pipeline model parallelism approaches. Concretely, it can achieve very comparable (even slightly better) model accuracy as its synchronous counterpart, while obtaining higher throughput than it. Experimental results show that XPipe outperforms other state-of-the-art synchronous and asynchronous model parallelism approaches. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1911.04610v2 |
https://arxiv.org/pdf/1911.04610v2.pdf | |
PWC | https://paperswithcode.com/paper/xpipe-efficient-pipeline-model-parallelism |
Repo | |
Framework | |
Deep learning methods in speaker recognition: a review
Title | Deep learning methods in speaker recognition: a review |
Authors | Dávid Sztahó, György Szaszák, András Beke |
Abstract | This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective. |
Tasks | Speaker Recognition, Speaker Verification |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.06615v1 |
https://arxiv.org/pdf/1911.06615v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-methods-in-speaker-recognition |
Repo | |
Framework | |
How noise affects the Hessian spectrum in overparameterized neural networks
Title | How noise affects the Hessian spectrum in overparameterized neural networks |
Authors | Mingwei Wei, David J Schwab |
Abstract | Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks. While some theoretical progress has been made, it still remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trace of the Hessian of the loss. We also generalize this result to other noise structures and show that isotropic noise in the non-degenerate subspace of the Hessian decreases its determinant. In addition to explaining SGDs role in sculpting the Hessian spectrum, this opens the door to new optimization approaches that may confer better generalization performance. We test our results with experiments on toy models and deep neural networks. |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00195v2 |
https://arxiv.org/pdf/1910.00195v2.pdf | |
PWC | https://paperswithcode.com/paper/how-noise-affects-the-hessian-spectrum-in |
Repo | |
Framework | |
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
Title | Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis |
Authors | Yiyi Liao, Katja Schwarz, Lars Mescheder, Andreas Geiger |
Abstract | In recent years, Generative Adversarial Networks have achieved impressive results in photorealistic image synthesis. This progress nurtures hopes that one day the classical rendering pipeline can be replaced by efficient models that are learned directly from images. However, current image synthesis models operate in the 2D domain where disentangling 3D properties such as camera viewpoint or object pose is challenging. Furthermore, they lack an interpretable and controllable representation. Our key hypothesis is that the image generation process should be modeled in 3D space as the physical world surrounding us is intrinsically three-dimensional. We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. We demonstrate that our model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images. Compared to pure 2D baselines, it allows for synthesizing scenes that are consistent wrt. changes in viewpoint or object pose. We further evaluate various 3D representations in terms of their usefulness for this challenging task. |
Tasks | Image Generation |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05237v2 |
https://arxiv.org/pdf/1912.05237v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-unsupervised-learning-of-generative |
Repo | |
Framework | |
Toward automatic comparison of visualization techniques: Application to graph visualization
Title | Toward automatic comparison of visualization techniques: Application to graph visualization |
Authors | R. Bourqui, R. Giot, D. Auber |
Abstract | Many end-user evaluations of data visualization techniques have been run during the last decades. Their results are cornerstones to build efficient visualization systems. However, designing an evaluation is always complex and time-consuming and may end in a lack of statistical evidence. The raising of modern efficient computer vision techniques may help visualization researchers to adjust their evaluation hypothesis and thus reduces the risk of failure. In this paper, we present a methodology that uses such computer vision techniques to automatically compare the efficiency of several visualization techniques. The basis of our methodology is to generate a set of images for each compared visualization technique from a common dataset and to train machine learning models (one for each set and visualization technique) to solve a given task. Our assumption is that the performance of each model allows to compare the efficiencies of the corresponding visualization techniques; as current machine learning models are not capable enough to reflect human capabilities, including their imperfections, such results should be interpreted with caution. However, we argue that using machine learning-based evaluation as a pre-process of standard user evaluations should help researchers to perform a more exhaustive study of the design space and thus should improve the final user evaluation by providing better test cases. To show that our methodology can reproduce, up to a certain level, results of user evaluations, we applied it to compare two mainstream graph visualization techniques: node-link (NL) and adjacency-matrix (MD) diagrams. We partially reproduced a user evaluation from Ghoniem et al. using two well-known deep convolutional neural networks as machine learning-based systems. Our results show up that Ghoniem et al. results can be reproduced automatically at a larger scale with our system. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09477v1 |
https://arxiv.org/pdf/1910.09477v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-automatic-comparison-of-visualization |
Repo | |
Framework | |
Coevolution of Generative Adversarial Networks
Title | Coevolution of Generative Adversarial Networks |
Authors | Victor Costa, Nuno Lourenço, Penousal Machado |
Abstract | Generative adversarial networks (GAN) became a hot topic, presenting impressive results in the field of computer vision. However, there are still open problems with the GAN model, such as the training stability and the hand-design of architectures. Neuroevolution is a technique that can be used to provide the automatic design of network architectures even in large search spaces as in deep neural networks. Therefore, this project proposes COEGAN, a model that combines neuroevolution and coevolution in the coordination of the GAN training algorithm. The proposal uses the adversarial characteristic between the generator and discriminator components to design an algorithm using coevolution techniques. Our proposal was evaluated in the MNIST dataset. The results suggest the improvement of the training stability and the automatic discovery of efficient network architectures for GANs. Our model also partially solves the mode collapse problem. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06172v1 |
https://arxiv.org/pdf/1912.06172v1.pdf | |
PWC | https://paperswithcode.com/paper/coevolution-of-generative-adversarial |
Repo | |
Framework | |
Counterfactual Distribution Regression for Structured Inference
Title | Counterfactual Distribution Regression for Structured Inference |
Authors | Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton |
Abstract | We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the affected rails. We assume that the data available provides records of the system functioning at its “natural regime” (e.g., the train network without disruptions) and data on cases where perturbations took place. The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations. We approach this problem from the point of view of a mapping from the counterfactual distribution of the system behavior without disruptions to the distribution of the disrupted system. A variant on \emph{distribution regression} is developed for this setup. |
Tasks | |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07193v1 |
https://arxiv.org/pdf/1908.07193v1.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-distribution-regression-for |
Repo | |
Framework | |
STaDA: Style Transfer as Data Augmentation
Title | STaDA: Style Transfer as Data Augmentation |
Authors | Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, Aljosa Smolic |
Abstract | The success of training deep Convolutional Neural Networks (CNNs) heavily depends on a significant amount of labelled data. Recent research has found that neural style transfer algorithms can apply the artistic style of one image to another image without changing the latter’s high-level semantic content, which makes it feasible to employ neural style transfer as a data augmentation method to add more variation to the training dataset. The contribution of this paper is a thorough evaluation of the effectiveness of the neural style transfer as a data augmentation method for image classification tasks. We explore the state-of-the-art neural style transfer algorithms and apply them as a data augmentation method on Caltech 101 and Caltech 256 dataset, where we found around 2% improvement from 83% to 85% of the image classification accuracy with VGG16, compared with traditional data augmentation strategies. We also combine this new method with conventional data augmentation approaches to further improve the performance of image classification. This work shows the potential of neural style transfer in computer vision field, such as helping us to reduce the difficulty of collecting sufficient labelled data and improve the performance of generic image-based deep learning algorithms. |
Tasks | Data Augmentation, Image Classification, Style Transfer |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01056v1 |
https://arxiv.org/pdf/1909.01056v1.pdf | |
PWC | https://paperswithcode.com/paper/stada-style-transfer-as-data-augmentation |
Repo | |
Framework | |
Separate and Attend in Personal Email Search
Title | Separate and Attend in Personal Email Search |
Authors | Yu Meng, Maryam Karimzadehgan, Honglei Zhuang, Donald Metzler |
Abstract | In personal email search, user queries often impose different requirements on different aspects of the retrieved emails. For example, the query “my recent flight to the US” requires emails to be ranked based on both textual contents and recency of the email documents, while other queries such as “medical history” do not impose any constraints on the recency of the email. Recent deep learning-to-rank models for personal email search often directly concatenate dense numerical features (e.g., document age) with embedded sparse features (e.g., n-gram embeddings). In this paper, we first show with a set of experiments on synthetic datasets that direct concatenation of dense and sparse features does not lead to the optimal search performance of deep neural ranking models. To effectively incorporate both sparse and dense email features into personal email search ranking, we propose a novel neural model, SepAttn. SepAttn first builds two separate neural models to learn from sparse and dense features respectively, and then applies an attention mechanism at the prediction level to derive the final prediction from these two models. We conduct a comprehensive set of experiments on a large-scale email search dataset, and demonstrate that our SepAttn model consistently improves the search quality over the baseline models. |
Tasks | Learning-To-Rank |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09732v1 |
https://arxiv.org/pdf/1911.09732v1.pdf | |
PWC | https://paperswithcode.com/paper/separate-and-attend-in-personal-email-search |
Repo | |
Framework | |
S-Flow GAN
Title | S-Flow GAN |
Authors | Yakov Miron, Yona Coscas |
Abstract | Our work offers a new method for domain translation from semantic label maps and Computer Graphic (CG) simulation edge map images to photo-realistic images. We train a Generative Adversarial Network (GAN) in a conditional way to generate a photo-realistic version of a given CG scene. Existing architectures of GANs still lack the photo-realism capabilities needed to train DNNs for computer vision tasks, we address this issue by embedding edge maps, and training it in an adversarial mode. We also offer an extension to our model that uses our GAN architecture to create visually appealing and temporally coherent videos. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08474v2 |
https://arxiv.org/pdf/1905.08474v2.pdf | |
PWC | https://paperswithcode.com/paper/s-flow-gan |
Repo | |
Framework | |
Self-supervised Learning of Image Embedding for Continuous Control
Title | Self-supervised Learning of Image Embedding for Continuous Control |
Authors | Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, Martin Riedmiller |
Abstract | Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches assume the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy and representations tend to be task specific and may not transfer well. In this work we investigate completely self-supervised learning of a general image embedding and control primitives, based on finding the shortest time to reach any state. We also introduce a new structure for the state-action value function that builds a connection between model-free and model-based methods, and improves the performance of the learning algorithm. We experimentally demonstrate these findings in three simulated robotic tasks. |
Tasks | Continuous Control |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00943v1 |
http://arxiv.org/pdf/1901.00943v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-of-image-embedding |
Repo | |
Framework | |
Fair Learning-to-Rank from Implicit Feedback
Title | Fair Learning-to-Rank from Implicit Feedback |
Authors | Himank Yadav, Zhengxiao Du, Thorsten Joachims |
Abstract | Addressing unfairness in rankings has become an increasingly important problem due to the growing influence of rankings in critical decision making, yet existing learning-to-rank algorithms suffer from multiple drawbacks when learning fair ranking policies from implicit feedback. Some algorithms suffer from extrinsic reasons of unfairness due to inherent selection biases in implicit feedback leading to rich-get-richer dynamics. While those that address the biased nature of implicit feedback suffer from intrinsic reasons of unfairness due to the lack of explicit control over the allocation of exposure based on merit (i.e, relevance). In both cases, the learned ranking policy can be unfair and lead to suboptimal results. To this end, we propose a novel learning-to-rank framework, FULTR, that is the first to address both intrinsic and extrinsic reasons of unfairness when learning ranking policies from logged implicit feedback. Considering the needs of various applications, we define a class of amortized fairness of exposure constraints with respect to items based on their merit, and propose corresponding counterfactual estimators of disparity (aka unfairness) and utility that are also robust to click noise. Furthermore, we provide an efficient algorithm that optimizes both utility and fairness via a policy-gradient approach. To show that our proposed algorithm learns accurate and fair ranking policies from biased and noisy feedback, we provide empirical results beyond the theoretical justification of the framework. |
Tasks | Decision Making, Learning-To-Rank |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08054v1 |
https://arxiv.org/pdf/1911.08054v1.pdf | |
PWC | https://paperswithcode.com/paper/fair-learning-to-rank-from-implicit-feedback |
Repo | |
Framework | |