May 7, 2019

3003 words 15 mins read

Paper Group AWR 80

Deep Networks with Stochastic Depth. Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding. CSVideoNet: A Real-time End-to-end Learning Framework for High-frame-rate Video Compressive Sensing. Convolutional Radio Modulation Recognition Networks. Confidence driven TGV fusion. Deep Learning without Poor Local Minima. A Power …

Deep Networks with Stochastic Depth


Title	Deep Networks with Stochastic Depth
Authors	Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, Kilian Weinberger
Abstract	Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very deep networks comes with its own set of challenges. The gradients can vanish, the forward flow often diminishes, and the training time can be painfully slow. To address these problems, we propose stochastic depth, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time. We start with very deep networks but during training, for each mini-batch, randomly drop a subset of layers and bypass them with the identity function. This simple approach complements the recent success of residual networks. It reduces training time substantially and improves the test error significantly on almost all data sets that we used for evaluation. With stochastic depth we can increase the depth of residual networks even beyond 1200 layers and still yield meaningful improvements in test error (4.91% on CIFAR-10).
Tasks	Image Classification
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09382v3
PDF	http://arxiv.org/pdf/1603.09382v3.pdf
PWC	https://paperswithcode.com/paper/deep-networks-with-stochastic-depth
Repo	https://github.com/dblN/stochastic_depth_keras
Framework	none

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding


Title	Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
Authors	Xiang Ren, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Jiawei Han
Abstract	Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention’s local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method.
Tasks	Entity Typing, Semantic Similarity, Semantic Textual Similarity
Published	2016-02-17
URL	http://arxiv.org/abs/1602.05307v1
PDF	http://arxiv.org/pdf/1602.05307v1.pdf
PWC	https://paperswithcode.com/paper/label-noise-reduction-in-entity-typing-by
Repo	https://github.com/shanzhenren/AFET
Framework	none

CSVideoNet: A Real-time End-to-end Learning Framework for High-frame-rate Video Compressive Sensing


Title	CSVideoNet: A Real-time End-to-end Learning Framework for High-frame-rate Video Compressive Sensing
Authors	Kai Xu, Fengbo Ren
Abstract	This paper addresses the real-time encoding-decoding problem for high-frame-rate video compressive sensing (CS). Unlike prior works that perform reconstruction using iterative optimization-based approaches, we propose a non-iterative model, named “CSVideoNet”. CSVideoNet directly learns the inverse mapping of CS and reconstructs the original input in a single forward propagation. To overcome the limitations of existing CS cameras, we propose a multi-rate CNN and a synthesizing RNN to improve the trade-off between compression ratio (CR) and spatial-temporal resolution of the reconstructed videos. The experiment results demonstrate that CSVideoNet significantly outperforms the state-of-the-art approaches. With no pre/post-processing, we achieve 25dB PSNR recovery quality at 100x CR, with a frame rate of 125 fps on a Titan X GPU. Due to the feedforward and high-data-concurrency natures of CSVideoNet, it can take advantage of GPU acceleration to achieve three orders of magnitude speed-up over conventional iterative-based approaches. We share the source code at https://github.com/PSCLab-ASU/CSVideoNet.
Tasks	Compressive Sensing, Video Compressive Sensing
Published	2016-12-15
URL	http://arxiv.org/abs/1612.05203v5
PDF	http://arxiv.org/pdf/1612.05203v5.pdf
PWC	https://paperswithcode.com/paper/csvideonet-a-real-time-end-to-end-learning
Repo	https://github.com/calmevtime/CSImageNet
Framework	none

Convolutional Radio Modulation Recognition Networks


Title	Convolutional Radio Modulation Recognition Networks
Authors	Timothy J O’Shea, Johnathan Corgan, T. Charles Clancy
Abstract	We study the adaptation of convolutional neural networks to the complex temporal radio signal domain. We compare the efficacy of radio modulation classification using naively learned features against using expert features which are widely used in the field today and we show significant performance improvements. We show that blind temporal learning on large and densely encoded time series using deep convolutional neural networks is viable and a strong candidate approach for this task especially at low signal to noise ratio.
Tasks	Time Series
Published	2016-02-12
URL	http://arxiv.org/abs/1602.04105v3
PDF	http://arxiv.org/pdf/1602.04105v3.pdf
PWC	https://paperswithcode.com/paper/convolutional-radio-modulation-recognition
Repo	https://github.com/randaller/cnn-rtlsdr
Framework	tf

Confidence driven TGV fusion


Title	Confidence driven TGV fusion
Authors	Valsamis Ntouskos, Fiora Pirri
Abstract	We introduce a novel model for spatially varying variational data fusion, driven by point-wise confidence values. The proposed model allows for the joint estimation of the data and the confidence values based on the spatial coherence of the data. We discuss the main properties of the introduced model as well as suitable algorithms for estimating the solution of the corresponding biconvex minimization problem and their convergence. The performance of the proposed model is evaluated considering the problem of depth image fusion by using both synthetic and real data from publicly available datasets.
Tasks
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09302v2
PDF	http://arxiv.org/pdf/1603.09302v2.pdf
PWC	https://paperswithcode.com/paper/confidence-driven-tgv-fusion
Repo	https://github.com/alcor-vision/confidence-fusion
Framework	none

Deep Learning without Poor Local Minima


Title	Deep Learning without Poor Local Minima
Authors	Kenji Kawaguchi
Abstract	In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first prove the following statements for the squared loss function of deep linear neural networks with any depth and any widths: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) there exist “bad” saddle points (where the Hessian has no negative eigenvalue) for the deeper networks (with more than three layers), whereas there is no bad saddle point for the shallow networks (with three layers). Moreover, for deep nonlinear neural networks, we prove the same four statements via a reduction to a deep linear model under the independence assumption adopted from recent work. As a result, we present an instance, for which we can answer the following question: how difficult is it to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima). Furthermore, the mathematically proven existence of bad saddle points for deeper models would suggest a possible open problem. We note that even though we have advanced the theoretical foundations of deep learning and non-convex optimization, there is still a gap between theory and practice.
Tasks
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07110v3
PDF	http://arxiv.org/pdf/1605.07110v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-without-poor-local-minima
Repo	https://github.com/yijiazh/DFER_Summer2019
Framework	tf

A Powerful Generative Model Using Random Weights for the Deep Image Representation


Title	A Powerful Generative Model Using Random Weights for the Deep Image Representation
Authors	Kun He, Yan Wang, John Hopcroft
Abstract	To what extent is the success of deep visualization due to the training? Could we do deep visualization using untrained, random weight networks? To address this issue, we explore new and powerful generative models for three popular deep visualization tasks using untrained, random weight convolutional neural networks. First we invert representations in feature spaces and reconstruct images from white noise inputs. The reconstruction quality is statistically higher than that of the same method applied on well trained networks with the same architecture. Next we synthesize textures using scaled correlations of representations in multiple layers and our results are almost indistinguishable with the original natural texture and the synthesized textures based on the trained network. Third, by recasting the content of an image in the style of various artworks, we create artistic images with high perceptual quality, highly competitive to the prior work of Gatys et al. on pretrained networks. To our knowledge this is the first demonstration of image representations using untrained deep neural networks. Our work provides a new and fascinating tool to study the representation of deep network architecture and sheds light on new understandings on deep visualization.
Tasks
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04801v2
PDF	http://arxiv.org/pdf/1606.04801v2.pdf
PWC	https://paperswithcode.com/paper/a-powerful-generative-model-using-random
Repo	https://github.com/inzouzouwetrust/IMA_RWCNN_project
Framework	none

Partial Membership Latent Dirichlet Allocation


Title	Partial Membership Latent Dirichlet Allocation
Authors	Chao Chen, Alina Zare, Huy Trinh, Gbeng Omotara, J. Tory Cobb, Timotius Lagaunne
Abstract	Topic models (e.g., pLSA, LDA, sLDA) have been widely used for segmenting imagery. However, these models are confined to crisp segmentation, forcing a visual word (i.e., an image patch) to belong to one and only one topic. Yet, there are many images in which some regions cannot be assigned a crisp categorical label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PM-LDA) model and an associated parameter estimation algorithm. This model can be useful for imagery where a visual word may be a mixture of multiple topics. Experimental results on visual and sonar imagery show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability previous topic modeling methods do not have.
Tasks	Topic Models
Published	2016-12-28
URL	http://arxiv.org/abs/1612.08936v1
PDF	http://arxiv.org/pdf/1612.08936v1.pdf
PWC	https://paperswithcode.com/paper/partial-membership-latent-dirichlet-1
Repo	https://github.com/TigerSense/PMLDA
Framework	none

TristouNet: Triplet Loss for Speaker Turn Embedding


Title	TristouNet: Triplet Loss for Speaker Turn Embedding
Authors	Hervé Bredin
Abstract	TristouNet is a neural network architecture based on Long Short-Term Memory recurrent networks, meant to project speech sequences into a fixed-dimensional euclidean space. Thanks to the triplet loss paradigm used for training, the resulting sequence embeddings can be compared directly with the euclidean distance, for speaker comparison purposes. Experiments on short (between 500ms and 5s) speech turn comparison and speaker change detection show that TristouNet brings significant improvements over the current state-of-the-art techniques for both tasks.
Tasks
Published	2016-09-14
URL	http://arxiv.org/abs/1609.04301v3
PDF	http://arxiv.org/pdf/1609.04301v3.pdf
PWC	https://paperswithcode.com/paper/tristounet-triplet-loss-for-speaker-turn
Repo	https://github.com/jsalt-coml/babytrain_multilabel
Framework	tf

Comparative evaluation of state-of-the-art algorithms for SSVEP-based BCIs


Title	Comparative evaluation of state-of-the-art algorithms for SSVEP-based BCIs
Authors	Vangelis P. Oikonomou, Georgios Liaros, Kostantinos Georgiadis, Elisavet Chatzilari, Katerina Adam, Spiros Nikolopoulos, Ioannis Kompatsiaris
Abstract	Brain-computer interfaces (BCIs) have been gaining momentum in making human-computer interaction more natural, especially for people with neuro-muscular disabilities. Among the existing solutions the systems relying on electroencephalograms (EEG) occupy the most prominent place due to their non-invasiveness. However, the process of translating EEG signals into computer commands is far from trivial, since it requires the optimization of many different parameters that need to be tuned jointly. In this report, we focus on the category of EEG-based BCIs that rely on Steady-State-Visual-Evoked Potentials (SSVEPs) and perform a comparative evaluation of the most promising algorithms existing in the literature. More specifically, we define a set of algorithms for each of the various different parameters composing a BCI system (i.e. filtering, artifact removal, feature extraction, feature selection and classification) and study each parameter independently by keeping all other parameters fixed. The results obtained from this evaluation process are provided together with a dataset consisting of the 256-channel, EEG signals of 11 subjects, as well as a processing toolbox for reproducing the results and supporting further experimentation. In this way, we manage to make available for the community a state-of-the-art baseline for SSVEP-based BCIs that can be used as a basis for introducing novel methods and approaches.
Tasks	EEG, Feature Selection
Published	2016-02-02
URL	http://arxiv.org/abs/1602.00904v2
PDF	http://arxiv.org/pdf/1602.00904v2.pdf
PWC	https://paperswithcode.com/paper/comparative-evaluation-of-state-of-the-art
Repo	https://github.com/akhilmurali013/project
Framework	none

A Comprehensive Performance Evaluation of Deformable Face Tracking “In-the-Wild”


Title	A Comprehensive Performance Evaluation of Deformable Face Tracking “In-the-Wild”
Authors	Grigorios G. Chrysos, Epameinondas Antonakos, Patrick Snape, Akshay Asthana, Stefanos Zafeiriou
Abstract	Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as “in-the-wild”). This is partially attributed to the fact that comprehensive “in-the-wild” benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking “in-the-wild”. Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.
Tasks	Face Alignment, Face Detection, Face Recognition
Published	2016-03-18
URL	http://arxiv.org/abs/1603.06015v2
PDF	http://arxiv.org/pdf/1603.06015v2.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-performance-evaluation-of
Repo	https://github.com/zhusz/CVPR15-CFSS
Framework	none

A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation


Title	A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation
Authors	Lei Tai, Jingwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard
Abstract	Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers.
Tasks	Imitation Learning
Published	2016-12-21
URL	http://arxiv.org/abs/1612.07139v4
PDF	http://arxiv.org/pdf/1612.07139v4.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-deep-network-solutions-for
Repo	https://github.com/tccnchsu/study
Framework	none

Pixel-Level Domain Transfer


Title	Pixel-Level Domain Transfer
Authors	Donggeun Yoo, Namil Kim, Sunggyun Park, Anthony S. Paek, In So Kweon
Abstract	We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator as in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.
Tasks	Conditional Image Generation, Image Generation
Published	2016-03-24
URL	http://arxiv.org/abs/1603.07442v3
PDF	http://arxiv.org/pdf/1603.07442v3.pdf
PWC	https://paperswithcode.com/paper/pixel-level-domain-transfer
Repo	https://github.com/eliceio/FashionRetrieval
Framework	tf

On the (im)possibility of fairness


Title	On the (im)possibility of fairness
Authors	Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian
Abstract	What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the “observed” space) and outputs (the “decision” space), we introduce the notion of a construct space: a space that captures unobservable, but meaningful variables for the prediction. We show that in order to prove desirable properties of the entire decision-making process, different mechanisms for fairness require different assumptions about the nature of the mapping from construct space to decision space. The results in this paper imply that future treatments of algorithmic fairness should more explicitly state assumptions about the relationship between constructs and observations.
Tasks	Decision Making
Published	2016-09-23
URL	http://arxiv.org/abs/1609.07236v1
PDF	http://arxiv.org/pdf/1609.07236v1.pdf
PWC	https://paperswithcode.com/paper/on-the-impossibility-of-fairness
Repo	https://github.com/cteicher-m/loanBiases
Framework	none

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms


Title	Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms
Authors	Christian A. Naesseth, Francisco J. R. Ruiz, Scott W. Linderman, David M. Blei
Abstract	Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization trick is applicable when we can simulate a random variable by applying a differentiable deterministic function on an auxiliary random variable whose distribution is fixed. For many distributions of interest (such as the gamma or Dirichlet), simulation of random variables relies on acceptance-rejection sampling. The discontinuity introduced by the accept-reject step means that standard reparameterization tricks are not applicable. We propose a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm. Our approach enables reparameterization on a larger class of variational distributions. In several studies of real and synthetic data, we show that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods. This leads to faster convergence of stochastic gradient variational inference.
Tasks	Bayesian Inference, Stochastic Optimization
Published	2016-10-18
URL	https://arxiv.org/abs/1610.05683v3
PDF	https://arxiv.org/pdf/1610.05683v3.pdf
PWC	https://paperswithcode.com/paper/reparameterization-gradients-through
Repo	https://github.com/blei-lab/ars-reparameterization
Framework	none