July 30, 2019

3161 words 15 mins read

Paper Group AWR 18

Paper Group AWR 18

The Trimmed Lasso: Sparsity and Robustness. Relation Networks for Object Detection. Learning to Acquire Information. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. Linear Disentangled Representation Learning for Facial Actions. Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding …

The Trimmed Lasso: Sparsity and Robustness

Title The Trimmed Lasso: Sparsity and Robustness
Authors Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder
Abstract Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust statistics and robust optimization, we show that the trimmed-Lasso-regularized least squares problem can be viewed as a generalized form of total least squares under a specific model of uncertainty. In contrast, this same model of uncertainty, viewed instead through a robust optimization lens, leads to the convex SLOPE (or OWL) penalty. 2) Further, in relating the trimmed Lasso to commonly used sparsity-inducing penalty functions, we provide a succinct characterization of the connection between trimmed-Lasso- like approaches and penalty functions that are coordinate-wise separable, showing that the trimmed penalties subsume existing coordinate-wise separable penalties, with strict containment in general. 3) Finally, we describe a variety of exact and heuristic algorithms, both existing and new, for trimmed Lasso regularized estimation problems. We include a comparison between the different approaches and an accompanying implementation of the algorithms.
Tasks
Published 2017-08-15
URL http://arxiv.org/abs/1708.04527v1
PDF http://arxiv.org/pdf/1708.04527v1.pdf
PWC https://paperswithcode.com/paper/the-trimmed-lasso-sparsity-and-robustness
Repo https://github.com/copenhaver/trimmedlasso
Framework none

Relation Networks for Object Detection

Title Relation Networks for Object Detection
Authors Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
Abstract Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector.
Tasks Object Detection, Object Recognition
Published 2017-11-30
URL http://arxiv.org/abs/1711.11575v2
PDF http://arxiv.org/pdf/1711.11575v2.pdf
PWC https://paperswithcode.com/paper/relation-networks-for-object-detection
Repo https://github.com/msracver/Relation-Networks-for-Object-Detection
Framework tf

Learning to Acquire Information

Title Learning to Acquire Information
Authors Yewen Pu, Leslie P Kaelbling, Armando Solar-Lezama
Abstract We consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uniform observation entropy, one can build an implication model which directly predicts the outcome of the potential next observation conditioned on the results of past observations, and selects the observation with the maximum entropy. This approach enjoys reduced computation complexity by bypassing the complicated hypothesis space, and can be trained on observation data alone, learning how to query without knowledge of the hidden hypothesis.
Tasks
Published 2017-04-20
URL http://arxiv.org/abs/1704.06131v2
PDF http://arxiv.org/pdf/1704.06131v2.pdf
PWC https://paperswithcode.com/paper/learning-to-acquire-information
Repo https://github.com/evanthebouncy/uai2017_learning_to_acquire_information
Framework none

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network

Title A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Authors Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung
Abstract In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction performance than previous state-of-the-art embedding models on two benchmark datasets WN18RR and FB15k-237.
Tasks Knowledge Base Completion, Link Prediction
Published 2017-12-06
URL http://arxiv.org/abs/1712.02121v2
PDF http://arxiv.org/pdf/1712.02121v2.pdf
PWC https://paperswithcode.com/paper/a-novel-embedding-model-for-knowledge-base
Repo https://github.com/daiquocnguyen/ConvKB
Framework tf

Linear Disentangled Representation Learning for Facial Actions

Title Linear Disentangled Representation Learning for Facial Actions
Authors Xiang Xiang, Trac D. Tran
Abstract Limited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic multichannel signals such as 2D face videos. The simple yet powerful model does not rely on huge training data and is natural for recognizing facial actions without explicitly disentangling the identity. Base on well-understood intuitive linear models such as Sparse Representation based Classification (SRC), previous attempts require a prepossessing of explicit decoupling which is practically inexact. Instead, we exploit the low-rank property across frames to subtract the underlying neutral faces which are modeled jointly with sparse representation on the action components with group sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot automatic method on raw face videos performs as competitive as SRC applied on manually prepared action components and performs even better than SRC in terms of true positive rate. We apply the model to the even more challenging task of facial action unit recognition, verified on the MPI Face Video Database (MPI-VDB) achieving a decent performance. All the programs and data have been made publicly available.
Tasks Facial Action Unit Detection, Representation Learning, Sparse Representation-based Classification
Published 2017-01-11
URL http://arxiv.org/abs/1701.03102v1
PDF http://arxiv.org/pdf/1701.03102v1.pdf
PWC https://paperswithcode.com/paper/linear-disentangled-representation-learning
Repo https://github.com/eglxiang/icassp15_emotion
Framework none

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

Title Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding
Authors Scott Wisdom, Thomas Powers, James Pitton, Les Atlas
Abstract In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF network has three distinct advantages. First, DR-NMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DR-NMF is an order of magnitude faster at test time than NMF, since computation of the network output only requires evaluating a few layers at each time step. Third, when a limited amount of training data is available, DR-NMF exhibits stronger generalization and separation performance compared to sparse NMF and state-of-the-art long-short term memory (LSTM) networks. When a large amount of training data is available, DR-NMF achieves lower yet competitive separation performance compared to LSTM networks.
Tasks Speech Separation
Published 2017-09-21
URL http://arxiv.org/abs/1709.07124v1
PDF http://arxiv.org/pdf/1709.07124v1.pdf
PWC https://paperswithcode.com/paper/deep-recurrent-nmf-for-speech-separation-by
Repo https://github.com/stwisdom/dr-nmf
Framework none

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

Title Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions
Authors Amir Mazaheri, Dong Zhang, Mubarak Shah
Abstract Given a video and a description sentence with one missing word (we call it the “source sentence”), Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence’s left fragment (before the blank) and the sentence’s right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence. For example, a missing word can be the first word or be in the middle of the sentence and it can be a verb or an adjective. In this paper, we propose a framework to tackle the textual encoding: Two separate LSTMs (the LR and RL LSTMs) are employed to encode the left and right sentence fragments and a novel structure is introduced to combine each fragment with an “external memory” corresponding the opposite fragments. For the visual encoding, end-to-end spatial and temporal attention models are employed to select discriminative visual representations to find the missing word. In the experiments, we demonstrate the superior performance of the proposed method on challenging VFIB problem. Furthermore, we introduce an extended and more generalized version of VFIB, which is not limited to a single blank. Our experiments indicate the generalization capability of our method in dealing with such more realistic scenarios.
Tasks
Published 2017-04-15
URL http://arxiv.org/abs/1704.04689v1
PDF http://arxiv.org/pdf/1704.04689v1.pdf
PWC https://paperswithcode.com/paper/video-fill-in-the-blank-using-lrrl-lstms-with
Repo https://github.com/amirmazaheri1990/VFIB-LRRLLSTMs
Framework none

Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation

Title Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation
Authors Andrew Norton, Yanjun Qi
Abstract With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods. The source code along with the data from all of our experiments are available at \url{https://github.com/QData/AdversarialDNN-Playground}.
Tasks
Published 2017-06-06
URL http://arxiv.org/abs/1706.01763v2
PDF http://arxiv.org/pdf/1706.01763v2.pdf
PWC https://paperswithcode.com/paper/adversarial-playground-a-visualization-suite-1
Repo https://github.com/QData/AdversarialDNN-Playground
Framework tf

Deep reinforcement learning from human preferences

Title Deep reinforcement learning from human preferences
Authors Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
Abstract For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent’s interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.
Tasks Atari Games
Published 2017-06-12
URL http://arxiv.org/abs/1706.03741v3
PDF http://arxiv.org/pdf/1706.03741v3.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-from-human
Repo https://github.com/vcharvet/project-rl
Framework tf

Symmetric Variational Autoencoder and Connections to Adversarial Learning

Title Symmetric Variational Autoencoder and Connections to Adversarial Learning
Authors Liqun Chen, Shuyang Dai, Yunchen Pu, Chunyuan Li, Qinliang Su, Lawrence Carin
Abstract A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence. It is demonstrated that learning of the resulting symmetric VAE (sVAE) has close connections to previously developed adversarial-learning methods. This relationship helps unify the previously distinct techniques of VAE and adversarially learning, and provides insights that allow us to ameliorate shortcomings with some previously developed adversarial methods. In addition to an analysis that motivates and explains the sVAE, an extensive set of experiments validate the utility of the approach.
Tasks
Published 2017-09-06
URL http://arxiv.org/abs/1709.01846v2
PDF http://arxiv.org/pdf/1709.01846v2.pdf
PWC https://paperswithcode.com/paper/symmetric-variational-autoencoder-and
Repo https://github.com/LiqunChen0606/Symmetric-VAE
Framework tf

SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation

Title SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation
Authors Yuan Xue, Tao Xu, Han Zhang, Rodney Long, Xiaolei Huang
Abstract Inspired by classic generative adversarial networks (GAN), we propose a novel end-to-end adversarial neural network, called SegAN, for the task of medical image segmentation. Since image segmentation requires dense, pixel-level labeling, the single scalar real/fake output of a classic GAN’s discriminator may be ineffective in producing stable and sufficient gradient feedback to the networks. Instead, we use a fully convolutional neural network as the segmentor to generate segmentation label maps, and propose a novel adversarial critic network with a multi-scale $L_1$ loss function to force the critic and segmentor to learn both global and local features that capture long- and short-range spatial relationships between pixels. In our SegAN framework, the segmentor and critic networks are trained in an alternating fashion in a min-max game: The critic takes as input a pair of images, (original_image $$ predicted_label_map, original_image $$ ground_truth_label_map), and then is trained by maximizing a multi-scale loss function; The segmentor is trained with only gradients passed along by the critic, with the aim to minimize the multi-scale loss function. We show that such a SegAN framework is more effective and stable for the segmentation task, and it leads to better performance than the state-of-the-art U-net segmentation method. We tested our SegAN method using datasets from the MICCAI BRATS brain tumor segmentation challenge. Extensive experimental results demonstrate the effectiveness of the proposed SegAN with multi-scale loss: on BRATS 2013 SegAN gives performance comparable to the state-of-the-art for whole tumor and tumor core segmentation while achieves better precision and sensitivity for Gd-enhance tumor core segmentation; on BRATS 2015 SegAN achieves better performance than the state-of-the-art in both dice score and precision.
Tasks Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation
Published 2017-06-06
URL http://arxiv.org/abs/1706.01805v2
PDF http://arxiv.org/pdf/1706.01805v2.pdf
PWC https://paperswithcode.com/paper/segan-adversarial-network-with-multi-scale
Repo https://github.com/iNLyze/DeepLearning-SeGAN-Segmentation
Framework tf

Grounding Referring Expressions in Images by Variational Context

Title Grounding Referring Expressions in Images by Variational Context
Authors Hanwang Zhang, Yulei Niu, Shih-Fu Chang
Abstract We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., “largest elephant standing behind baby elephant”. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context — visual attributes (e.g., “largest”, “baby”) and relationships (e.g., “behind”) that help to distinguish the referent from other objects, especially those of the same category. Due to the exponential complexity involved in modeling the context associated with multiple image regions, existing work oversimplifies this task to pairwise region modeling by multiple instance learning. In this paper, we propose a variational Bayesian method, called Variational Context, to solve the problem of complex context modeling in referring expression grounding. Our model exploits the reciprocal relation between the referent and context, i.e., either of them influences the estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced, resulting in better localization of referent. We develop a novel cue-specific language-vision embedding network that learns this reciprocity model end-to-end. We also extend the model to the unsupervised setting where no annotation for the referent is available. Extensive experiments on various benchmarks show consistent improvement over state-of-the-art methods in both supervised and unsupervised settings.
Tasks Multiple Instance Learning
Published 2017-12-05
URL http://arxiv.org/abs/1712.01892v2
PDF http://arxiv.org/pdf/1712.01892v2.pdf
PWC https://paperswithcode.com/paper/grounding-referring-expressions-in-images-by
Repo https://github.com/yuleiniu/vc
Framework tf

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

Title Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples
Authors Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl
Abstract Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results.
Tasks
Published 2017-11-01
URL http://arxiv.org/abs/1711.00155v1
PDF http://arxiv.org/pdf/1711.00155v1.pdf
PWC https://paperswithcode.com/paper/neural-wikipedian-generating-textual
Repo https://github.com/pvougiou/Neural-Wikipedian
Framework torch

Fully Convolutional Measurement Network for Compressive Sensing Image Reconstruction

Title Fully Convolutional Measurement Network for Compressive Sensing Image Reconstruction
Authors Jiang Du, Xuemei Xie, Chenye Wang, Guangming Shi, Xun Xu, Yuxiang Wang
Abstract Recently, deep learning methods have made a significant improvement in compressive sensing image reconstruction task. In the existing methods, the scene is measured block by block due to the high computational complexity. This results in block-effect of the recovered images. In this paper, we propose a fully convolutional measurement network, where the scene is measured as a whole. The proposed method powerfully removes the block-effect since the structure information of scene images is preserved. To make the measure more flexible, the measurement and the recovery parts are jointly trained. From the experiments, it is shown that the results by the proposed method outperforms those by the existing methods in PSNR, SSIM, and visual effect.
Tasks Compressive Sensing, Image Reconstruction
Published 2017-11-21
URL http://arxiv.org/abs/1712.01641v2
PDF http://arxiv.org/pdf/1712.01641v2.pdf
PWC https://paperswithcode.com/paper/fully-convolutional-measurement-network-for
Repo https://github.com/jiang-du/Perceptual-CS
Framework none

Deep Echo State Network (DeepESN): A Brief Survey

Title Deep Echo State Network (DeepESN): A Brief Survey
Authors Claudio Gallicchio, Alessio Micheli
Abstract The study of deep recurrent neural networks (RNNs) and, in particular, of deep Reservoir Computing (RC) is gaining an increasing research attention in the neural networks community. The recently introduced Deep Echo State Network (DeepESN) model opened the way to an extremely efficient approach for designing deep neural networks for temporal data. At the same time, the study of DeepESNs allowed to shed light on the intrinsic properties of state dynamics developed by hierarchical compositions of recurrent layers, i.e. on the bias of depth in RNNs architectural design. In this paper, we summarize the advancements in the development, analysis and applications of DeepESNs.
Tasks
Published 2017-12-12
URL http://arxiv.org/abs/1712.04323v3
PDF http://arxiv.org/pdf/1712.04323v3.pdf
PWC https://paperswithcode.com/paper/deep-echo-state-network-deepesn-a-brief
Repo https://github.com/lucasburger/pyRC
Framework none
comments powered by Disqus