July 30, 2019

3161 words 15 mins read

Paper Group AWR 18

The Trimmed Lasso: Sparsity and Robustness. Relation Networks for Object Detection. Learning to Acquire Information. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. Linear Disentangled Representation Learning for Facial Actions. Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding …

The Trimmed Lasso: Sparsity and Robustness


Title	The Trimmed Lasso: Sparsity and Robustness
Authors	Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder
Abstract	Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust statistics and robust optimization, we show that the trimmed-Lasso-regularized least squares problem can be viewed as a generalized form of total least squares under a specific model of uncertainty. In contrast, this same model of uncertainty, viewed instead through a robust optimization lens, leads to the convex SLOPE (or OWL) penalty. 2) Further, in relating the trimmed Lasso to commonly used sparsity-inducing penalty functions, we provide a succinct characterization of the connection between trimmed-Lasso- like approaches and penalty functions that are coordinate-wise separable, showing that the trimmed penalties subsume existing coordinate-wise separable penalties, with strict containment in general. 3) Finally, we describe a variety of exact and heuristic algorithms, both existing and new, for trimmed Lasso regularized estimation problems. We include a comparison between the different approaches and an accompanying implementation of the algorithms.
Tasks
Published	2017-08-15
URL	http://arxiv.org/abs/1708.04527v1
PDF	http://arxiv.org/pdf/1708.04527v1.pdf
PWC	https://paperswithcode.com/paper/the-trimmed-lasso-sparsity-and-robustness
Repo	https://github.com/copenhaver/trimmedlasso
Framework	none

Relation Networks for Object Detection


Title	Relation Networks for Object Detection
Authors	Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
Abstract	Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector.
Tasks	Object Detection, Object Recognition
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11575v2
PDF	http://arxiv.org/pdf/1711.11575v2.pdf
PWC	https://paperswithcode.com/paper/relation-networks-for-object-detection
Repo	https://github.com/msracver/Relation-Networks-for-Object-Detection
Framework	tf

Learning to Acquire Information


Title	Learning to Acquire Information
Authors	Yewen Pu, Leslie P Kaelbling, Armando Solar-Lezama
Abstract	We consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uniform observation entropy, one can build an implication model which directly predicts the outcome of the potential next observation conditioned on the results of past observations, and selects the observation with the maximum entropy. This approach enjoys reduced computation complexity by bypassing the complicated hypothesis space, and can be trained on observation data alone, learning how to query without knowledge of the hidden hypothesis.
Tasks
Published	2017-04-20
URL	http://arxiv.org/abs/1704.06131v2
PDF	http://arxiv.org/pdf/1704.06131v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-acquire-information
Repo	https://github.com/evanthebouncy/uai2017_learning_to_acquire_information
Framework	none

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network


Title	A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Authors	Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung
Abstract	In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction performance than previous state-of-the-art embedding models on two benchmark datasets WN18RR and FB15k-237.
Tasks	Knowledge Base Completion, Link Prediction
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02121v2
PDF	http://arxiv.org/pdf/1712.02121v2.pdf
PWC	https://paperswithcode.com/paper/a-novel-embedding-model-for-knowledge-base
Repo	https://github.com/daiquocnguyen/ConvKB
Framework	tf

Linear Disentangled Representation Learning for Facial Actions


Title	Linear Disentangled Representation Learning for Facial Actions
Authors	Xiang Xiang, Trac D. Tran
Abstract	Limited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic multichannel signals such as 2D face videos. The simple yet powerful model does not rely on huge training data and is natural for recognizing facial actions without explicitly disentangling the identity. Base on well-understood intuitive linear models such as Sparse Representation based Classification (SRC), previous attempts require a prepossessing of explicit decoupling which is practically inexact. Instead, we exploit the low-rank property across frames to subtract the underlying neutral faces which are modeled jointly with sparse representation on the action components with group sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot automatic method on raw face videos performs as competitive as SRC applied on manually prepared action components and performs even better than SRC in terms of true positive rate. We apply the model to the even more challenging task of facial action unit recognition, verified on the MPI Face Video Database (MPI-VDB) achieving a decent performance. All the programs and data have been made publicly available.
Tasks	Facial Action Unit Detection, Representation Learning, Sparse Representation-based Classification
Published	2017-01-11
URL	http://arxiv.org/abs/1701.03102v1
PDF	http://arxiv.org/pdf/1701.03102v1.pdf
PWC	https://paperswithcode.com/paper/linear-disentangled-representation-learning
Repo	https://github.com/eglxiang/icassp15_emotion
Framework	none

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding


Title	Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding
Authors	Scott Wisdom, Thomas Powers, James Pitton, Les Atlas
Abstract	In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF network has three distinct advantages. First, DR-NMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DR-NMF is an order of magnitude faster at test time than NMF, since computation of the network output only requires evaluating a few layers at each time step. Third, when a limited amount of training data is available, DR-NMF exhibits stronger generalization and separation performance compared to sparse NMF and state-of-the-art long-short term memory (LSTM) networks. When a large amount of training data is available, DR-NMF achieves lower yet competitive separation performance compared to LSTM networks.
Tasks	Speech Separation
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07124v1
PDF	http://arxiv.org/pdf/1709.07124v1.pdf
PWC	https://paperswithcode.com/paper/deep-recurrent-nmf-for-speech-separation-by
Repo	https://github.com/stwisdom/dr-nmf
Framework	none

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions


Title	Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions
Authors	Amir Mazaheri, Dong Zhang, Mubarak Shah
Abstract	Given a video and a description sentence with one missing word (we call it the “source sentence”), Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence’s left fragment (before the blank) and the sentence’s right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence. For example, a missing word can be the first word or be in the middle of the sentence and it can be a verb or an adjective. In this paper, we propose a framework to tackle the textual encoding: Two separate LSTMs (the LR and RL LSTMs) are employed to encode the left and right sentence fragments and a novel structure is introduced to combine each fragment with an “external memory” corresponding the opposite fragments. For the visual encoding, end-to-end spatial and temporal attention models are employed to select discriminative visual representations to find the missing word. In the experiments, we demonstrate the superior performance of the proposed method on challenging VFIB problem. Furthermore, we introduce an extended and more generalized version of VFIB, which is not limited to a single blank. Our experiments indicate the generalization capability of our method in dealing with such more realistic scenarios.
Tasks
Published	2017-04-15
URL	http://arxiv.org/abs/1704.04689v1
PDF	http://arxiv.org/pdf/1704.04689v1.pdf
PWC	https://paperswithcode.com/paper/video-fill-in-the-blank-using-lrrl-lstms-with
Repo	https://github.com/amirmazaheri1990/VFIB-LRRLLSTMs
Framework	none

Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation


Title	Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation
Authors	Andrew Norton, Yanjun Qi
Abstract	With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods. The source code along with the data from all of our experiments are available at \url{https://github.com/QData/AdversarialDNN-Playground}.
Tasks
Published	2017-06-06
URL	http://arxiv.org/abs/1706.01763v2
PDF	http://arxiv.org/pdf/1706.01763v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-playground-a-visualization-suite-1
Repo	https://github.com/QData/AdversarialDNN-Playground
Framework	tf

Deep reinforcement learning from human preferences


Title	Deep reinforcement learning from human preferences
Authors	Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
Abstract	For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent’s interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.
Tasks	Atari Games
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03741v3
PDF	http://arxiv.org/pdf/1706.03741v3.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-from-human
Repo	https://github.com/vcharvet/project-rl
Framework	tf

Symmetric Variational Autoencoder and Connections to Adversarial Learning


Title	Symmetric Variational Autoencoder and Connections to Adversarial Learning
Authors	Liqun Chen, Shuyang Dai, Yunchen Pu, Chunyuan Li, Qinliang Su, Lawrence Carin
Abstract	A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence. It is demonstrated that learning of the resulting symmetric VAE (sVAE) has close connections to previously developed adversarial-learning methods. This relationship helps unify the previously distinct techniques of VAE and adversarially learning, and provides insights that allow us to ameliorate shortcomings with some previously developed adversarial methods. In addition to an analysis that motivates and explains the sVAE, an extensive set of experiments validate the utility of the approach.
Tasks
Published	2017-09-06
URL	http://arxiv.org/abs/1709.01846v2
PDF	http://arxiv.org/pdf/1709.01846v2.pdf
PWC	https://paperswithcode.com/paper/symmetric-variational-autoencoder-and
Repo	https://github.com/LiqunChen0606/Symmetric-VAE
Framework	tf

SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation


Title	SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation
Authors	Yuan Xue, Tao Xu, Han Zhang, Rodney Long, Xiaolei Huang
Abstract	Inspired by classic generative adversarial networks (GAN), we propose a novel end-to-end adversarial neural network, called SegAN, for the task of medical image segmentation. Since image segmentation requires dense, pixel-level labeling, the single scalar real/fake output of a classic GAN’s discriminator may be ineffective in producing stable and sufficient gradient feedback to the networks. Instead, we use a fully convolutional neural network as the segmentor to generate segmentation label maps, and propose a novel adversarial critic network with a multi-scale $L_1$ loss function to force the critic and segmentor to learn both global and local features that capture long- and short-range spatial relationships between pixels. In our SegAN framework, the segmentor and critic networks are trained in an alternating fashion in a min-max game: The critic takes as input a pair of images, (original_image $$ predicted_label_map, original_image $$ ground_truth_label_map), and then is trained by maximizing a multi-scale loss function; The segmentor is trained with only gradients passed along by the critic, with the aim to minimize the multi-scale loss function. We show that such a SegAN framework is more effective and stable for the segmentation task, and it leads to better performance than the state-of-the-art U-net segmentation method. We tested our SegAN method using datasets from the MICCAI BRATS brain tumor segmentation challenge. Extensive experimental results demonstrate the effectiveness of the proposed SegAN with multi-scale loss: on BRATS 2013 SegAN gives performance comparable to the state-of-the-art for whole tumor and tumor core segmentation while achieves better precision and sensitivity for Gd-enhance tumor core segmentation; on BRATS 2015 SegAN achieves better performance than the state-of-the-art in both dice score and precision.
Tasks	Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2017-06-06
URL	http://arxiv.org/abs/1706.01805v2
PDF	http://arxiv.org/pdf/1706.01805v2.pdf
PWC	https://paperswithcode.com/paper/segan-adversarial-network-with-multi-scale
Repo	https://github.com/iNLyze/DeepLearning-SeGAN-Segmentation
Framework	tf

Grounding Referring Expressions in Images by Variational Context


Title	Grounding Referring Expressions in Images by Variational Context
Authors	Hanwang Zhang, Yulei Niu, Shih-Fu Chang
Abstract	We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., “largest elephant standing behind baby elephant”. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context — visual attributes (e.g., “largest”, “baby”) and relationships (e.g., “behind”) that help to distinguish the referent from other objects, especially those of the same category. Due to the exponential complexity involved in modeling the context associated with multiple image regions, existing work oversimplifies this task to pairwise region modeling by multiple instance learning. In this paper, we propose a variational Bayesian method, called Variational Context, to solve the problem of complex context modeling in referring expression grounding. Our model exploits the reciprocal relation between the referent and context, i.e., either of them influences the estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced, resulting in better localization of referent. We develop a novel cue-specific language-vision embedding network that learns this reciprocity model end-to-end. We also extend the model to the unsupervised setting where no annotation for the referent is available. Extensive experiments on various benchmarks show consistent improvement over state-of-the-art methods in both supervised and unsupervised settings.
Tasks	Multiple Instance Learning
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01892v2
PDF	http://arxiv.org/pdf/1712.01892v2.pdf
PWC	https://paperswithcode.com/paper/grounding-referring-expressions-in-images-by
Repo	https://github.com/yuleiniu/vc
Framework	tf

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples


Title	Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples
Authors	Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl
Abstract	Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results.
Tasks
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00155v1
PDF	http://arxiv.org/pdf/1711.00155v1.pdf
PWC	https://paperswithcode.com/paper/neural-wikipedian-generating-textual
Repo	https://github.com/pvougiou/Neural-Wikipedian
Framework	torch

Fully Convolutional Measurement Network for Compressive Sensing Image Reconstruction


Title	Fully Convolutional Measurement Network for Compressive Sensing Image Reconstruction
Authors	Jiang Du, Xuemei Xie, Chenye Wang, Guangming Shi, Xun Xu, Yuxiang Wang
Abstract	Recently, deep learning methods have made a significant improvement in compressive sensing image reconstruction task. In the existing methods, the scene is measured block by block due to the high computational complexity. This results in block-effect of the recovered images. In this paper, we propose a fully convolutional measurement network, where the scene is measured as a whole. The proposed method powerfully removes the block-effect since the structure information of scene images is preserved. To make the measure more flexible, the measurement and the recovery parts are jointly trained. From the experiments, it is shown that the results by the proposed method outperforms those by the existing methods in PSNR, SSIM, and visual effect.
Tasks	Compressive Sensing, Image Reconstruction
Published	2017-11-21
URL	http://arxiv.org/abs/1712.01641v2
PDF	http://arxiv.org/pdf/1712.01641v2.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-measurement-network-for
Repo	https://github.com/jiang-du/Perceptual-CS
Framework	none

Deep Echo State Network (DeepESN): A Brief Survey


Title	Deep Echo State Network (DeepESN): A Brief Survey
Authors	Claudio Gallicchio, Alessio Micheli
Abstract	The study of deep recurrent neural networks (RNNs) and, in particular, of deep Reservoir Computing (RC) is gaining an increasing research attention in the neural networks community. The recently introduced Deep Echo State Network (DeepESN) model opened the way to an extremely efficient approach for designing deep neural networks for temporal data. At the same time, the study of DeepESNs allowed to shed light on the intrinsic properties of state dynamics developed by hierarchical compositions of recurrent layers, i.e. on the bias of depth in RNNs architectural design. In this paper, we summarize the advancements in the development, analysis and applications of DeepESNs.
Tasks
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04323v3
PDF	http://arxiv.org/pdf/1712.04323v3.pdf
PWC	https://paperswithcode.com/paper/deep-echo-state-network-deepesn-a-brief
Repo	https://github.com/lucasburger/pyRC
Framework	none