Paper Group AWR 18
The Trimmed Lasso: Sparsity and Robustness. Relation Networks for Object Detection. Learning to Acquire Information. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. Linear Disentangled Representation Learning for Facial Actions. Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding …
The Trimmed Lasso: Sparsity and Robustness
Title | The Trimmed Lasso: Sparsity and Robustness |
Authors | Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder |
Abstract | Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust statistics and robust optimization, we show that the trimmed-Lasso-regularized least squares problem can be viewed as a generalized form of total least squares under a specific model of uncertainty. In contrast, this same model of uncertainty, viewed instead through a robust optimization lens, leads to the convex SLOPE (or OWL) penalty. 2) Further, in relating the trimmed Lasso to commonly used sparsity-inducing penalty functions, we provide a succinct characterization of the connection between trimmed-Lasso- like approaches and penalty functions that are coordinate-wise separable, showing that the trimmed penalties subsume existing coordinate-wise separable penalties, with strict containment in general. 3) Finally, we describe a variety of exact and heuristic algorithms, both existing and new, for trimmed Lasso regularized estimation problems. We include a comparison between the different approaches and an accompanying implementation of the algorithms. |
Tasks | |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04527v1 |
http://arxiv.org/pdf/1708.04527v1.pdf | |
PWC | https://paperswithcode.com/paper/the-trimmed-lasso-sparsity-and-robustness |
Repo | https://github.com/copenhaver/trimmedlasso |
Framework | none |
Relation Networks for Object Detection
Title | Relation Networks for Object Detection |
Authors | Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei |
Abstract | Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector. |
Tasks | Object Detection, Object Recognition |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11575v2 |
http://arxiv.org/pdf/1711.11575v2.pdf | |
PWC | https://paperswithcode.com/paper/relation-networks-for-object-detection |
Repo | https://github.com/msracver/Relation-Networks-for-Object-Detection |
Framework | tf |
Learning to Acquire Information
Title | Learning to Acquire Information |
Authors | Yewen Pu, Leslie P Kaelbling, Armando Solar-Lezama |
Abstract | We consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uniform observation entropy, one can build an implication model which directly predicts the outcome of the potential next observation conditioned on the results of past observations, and selects the observation with the maximum entropy. This approach enjoys reduced computation complexity by bypassing the complicated hypothesis space, and can be trained on observation data alone, learning how to query without knowledge of the hidden hypothesis. |
Tasks | |
Published | 2017-04-20 |
URL | http://arxiv.org/abs/1704.06131v2 |
http://arxiv.org/pdf/1704.06131v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-acquire-information |
Repo | https://github.com/evanthebouncy/uai2017_learning_to_acquire_information |
Framework | none |
A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Title | A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network |
Authors | Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung |
Abstract | In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction performance than previous state-of-the-art embedding models on two benchmark datasets WN18RR and FB15k-237. |
Tasks | Knowledge Base Completion, Link Prediction |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02121v2 |
http://arxiv.org/pdf/1712.02121v2.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-embedding-model-for-knowledge-base |
Repo | https://github.com/daiquocnguyen/ConvKB |
Framework | tf |
Linear Disentangled Representation Learning for Facial Actions
Title | Linear Disentangled Representation Learning for Facial Actions |
Authors | Xiang Xiang, Trac D. Tran |
Abstract | Limited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic multichannel signals such as 2D face videos. The simple yet powerful model does not rely on huge training data and is natural for recognizing facial actions without explicitly disentangling the identity. Base on well-understood intuitive linear models such as Sparse Representation based Classification (SRC), previous attempts require a prepossessing of explicit decoupling which is practically inexact. Instead, we exploit the low-rank property across frames to subtract the underlying neutral faces which are modeled jointly with sparse representation on the action components with group sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot automatic method on raw face videos performs as competitive as SRC applied on manually prepared action components and performs even better than SRC in terms of true positive rate. We apply the model to the even more challenging task of facial action unit recognition, verified on the MPI Face Video Database (MPI-VDB) achieving a decent performance. All the programs and data have been made publicly available. |
Tasks | Facial Action Unit Detection, Representation Learning, Sparse Representation-based Classification |
Published | 2017-01-11 |
URL | http://arxiv.org/abs/1701.03102v1 |
http://arxiv.org/pdf/1701.03102v1.pdf | |
PWC | https://paperswithcode.com/paper/linear-disentangled-representation-learning |
Repo | https://github.com/eglxiang/icassp15_emotion |
Framework | none |
Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding
Title | Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding |
Authors | Scott Wisdom, Thomas Powers, James Pitton, Les Atlas |
Abstract | In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF network has three distinct advantages. First, DR-NMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DR-NMF is an order of magnitude faster at test time than NMF, since computation of the network output only requires evaluating a few layers at each time step. Third, when a limited amount of training data is available, DR-NMF exhibits stronger generalization and separation performance compared to sparse NMF and state-of-the-art long-short term memory (LSTM) networks. When a large amount of training data is available, DR-NMF achieves lower yet competitive separation performance compared to LSTM networks. |
Tasks | Speech Separation |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07124v1 |
http://arxiv.org/pdf/1709.07124v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-recurrent-nmf-for-speech-separation-by |
Repo | https://github.com/stwisdom/dr-nmf |
Framework | none |
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions
Title | Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions |
Authors | Amir Mazaheri, Dong Zhang, Mubarak Shah |
Abstract | Given a video and a description sentence with one missing word (we call it the “source sentence”), Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence’s left fragment (before the blank) and the sentence’s right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence. For example, a missing word can be the first word or be in the middle of the sentence and it can be a verb or an adjective. In this paper, we propose a framework to tackle the textual encoding: Two separate LSTMs (the LR and RL LSTMs) are employed to encode the left and right sentence fragments and a novel structure is introduced to combine each fragment with an “external memory” corresponding the opposite fragments. For the visual encoding, end-to-end spatial and temporal attention models are employed to select discriminative visual representations to find the missing word. In the experiments, we demonstrate the superior performance of the proposed method on challenging VFIB problem. Furthermore, we introduce an extended and more generalized version of VFIB, which is not limited to a single blank. Our experiments indicate the generalization capability of our method in dealing with such more realistic scenarios. |
Tasks | |
Published | 2017-04-15 |
URL | http://arxiv.org/abs/1704.04689v1 |
http://arxiv.org/pdf/1704.04689v1.pdf | |
PWC | https://paperswithcode.com/paper/video-fill-in-the-blank-using-lrrl-lstms-with |
Repo | https://github.com/amirmazaheri1990/VFIB-LRRLLSTMs |
Framework | none |
Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation
Title | Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation |
Authors | Andrew Norton, Yanjun Qi |
Abstract | With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods. The source code along with the data from all of our experiments are available at \url{https://github.com/QData/AdversarialDNN-Playground}. |
Tasks | |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01763v2 |
http://arxiv.org/pdf/1706.01763v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-playground-a-visualization-suite-1 |
Repo | https://github.com/QData/AdversarialDNN-Playground |
Framework | tf |
Deep reinforcement learning from human preferences
Title | Deep reinforcement learning from human preferences |
Authors | Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei |
Abstract | For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent’s interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback. |
Tasks | Atari Games |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03741v3 |
http://arxiv.org/pdf/1706.03741v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-from-human |
Repo | https://github.com/vcharvet/project-rl |
Framework | tf |
Symmetric Variational Autoencoder and Connections to Adversarial Learning
Title | Symmetric Variational Autoencoder and Connections to Adversarial Learning |
Authors | Liqun Chen, Shuyang Dai, Yunchen Pu, Chunyuan Li, Qinliang Su, Lawrence Carin |
Abstract | A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence. It is demonstrated that learning of the resulting symmetric VAE (sVAE) has close connections to previously developed adversarial-learning methods. This relationship helps unify the previously distinct techniques of VAE and adversarially learning, and provides insights that allow us to ameliorate shortcomings with some previously developed adversarial methods. In addition to an analysis that motivates and explains the sVAE, an extensive set of experiments validate the utility of the approach. |
Tasks | |
Published | 2017-09-06 |
URL | http://arxiv.org/abs/1709.01846v2 |
http://arxiv.org/pdf/1709.01846v2.pdf | |
PWC | https://paperswithcode.com/paper/symmetric-variational-autoencoder-and |
Repo | https://github.com/LiqunChen0606/Symmetric-VAE |
Framework | tf |
SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation
Title | SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation |
Authors | Yuan Xue, Tao Xu, Han Zhang, Rodney Long, Xiaolei Huang |
Abstract | Inspired by classic generative adversarial networks (GAN), we propose a novel end-to-end adversarial neural network, called SegAN, for the task of medical image segmentation. Since image segmentation requires dense, pixel-level labeling, the single scalar real/fake output of a classic GAN’s discriminator may be ineffective in producing stable and sufficient gradient feedback to the networks. Instead, we use a fully convolutional neural network as the segmentor to generate segmentation label maps, and propose a novel adversarial critic network with a multi-scale $L_1$ loss function to force the critic and segmentor to learn both global and local features that capture long- and short-range spatial relationships between pixels. In our SegAN framework, the segmentor and critic networks are trained in an alternating fashion in a min-max game: The critic takes as input a pair of images, (original_image $$ predicted_label_map, original_image $$ ground_truth_label_map), and then is trained by maximizing a multi-scale loss function; The segmentor is trained with only gradients passed along by the critic, with the aim to minimize the multi-scale loss function. We show that such a SegAN framework is more effective and stable for the segmentation task, and it leads to better performance than the state-of-the-art U-net segmentation method. We tested our SegAN method using datasets from the MICCAI BRATS brain tumor segmentation challenge. Extensive experimental results demonstrate the effectiveness of the proposed SegAN with multi-scale loss: on BRATS 2013 SegAN gives performance comparable to the state-of-the-art for whole tumor and tumor core segmentation while achieves better precision and sensitivity for Gd-enhance tumor core segmentation; on BRATS 2015 SegAN achieves better performance than the state-of-the-art in both dice score and precision. |
Tasks | Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01805v2 |
http://arxiv.org/pdf/1706.01805v2.pdf | |
PWC | https://paperswithcode.com/paper/segan-adversarial-network-with-multi-scale |
Repo | https://github.com/iNLyze/DeepLearning-SeGAN-Segmentation |
Framework | tf |
Grounding Referring Expressions in Images by Variational Context
Title | Grounding Referring Expressions in Images by Variational Context |
Authors | Hanwang Zhang, Yulei Niu, Shih-Fu Chang |
Abstract | We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., “largest elephant standing behind baby elephant”. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context — visual attributes (e.g., “largest”, “baby”) and relationships (e.g., “behind”) that help to distinguish the referent from other objects, especially those of the same category. Due to the exponential complexity involved in modeling the context associated with multiple image regions, existing work oversimplifies this task to pairwise region modeling by multiple instance learning. In this paper, we propose a variational Bayesian method, called Variational Context, to solve the problem of complex context modeling in referring expression grounding. Our model exploits the reciprocal relation between the referent and context, i.e., either of them influences the estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced, resulting in better localization of referent. We develop a novel cue-specific language-vision embedding network that learns this reciprocity model end-to-end. We also extend the model to the unsupervised setting where no annotation for the referent is available. Extensive experiments on various benchmarks show consistent improvement over state-of-the-art methods in both supervised and unsupervised settings. |
Tasks | Multiple Instance Learning |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01892v2 |
http://arxiv.org/pdf/1712.01892v2.pdf | |
PWC | https://paperswithcode.com/paper/grounding-referring-expressions-in-images-by |
Repo | https://github.com/yuleiniu/vc |
Framework | tf |
Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples
Title | Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples |
Authors | Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl |
Abstract | Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results. |
Tasks | |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00155v1 |
http://arxiv.org/pdf/1711.00155v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-wikipedian-generating-textual |
Repo | https://github.com/pvougiou/Neural-Wikipedian |
Framework | torch |
Fully Convolutional Measurement Network for Compressive Sensing Image Reconstruction
Title | Fully Convolutional Measurement Network for Compressive Sensing Image Reconstruction |
Authors | Jiang Du, Xuemei Xie, Chenye Wang, Guangming Shi, Xun Xu, Yuxiang Wang |
Abstract | Recently, deep learning methods have made a significant improvement in compressive sensing image reconstruction task. In the existing methods, the scene is measured block by block due to the high computational complexity. This results in block-effect of the recovered images. In this paper, we propose a fully convolutional measurement network, where the scene is measured as a whole. The proposed method powerfully removes the block-effect since the structure information of scene images is preserved. To make the measure more flexible, the measurement and the recovery parts are jointly trained. From the experiments, it is shown that the results by the proposed method outperforms those by the existing methods in PSNR, SSIM, and visual effect. |
Tasks | Compressive Sensing, Image Reconstruction |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1712.01641v2 |
http://arxiv.org/pdf/1712.01641v2.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-measurement-network-for |
Repo | https://github.com/jiang-du/Perceptual-CS |
Framework | none |
Deep Echo State Network (DeepESN): A Brief Survey
Title | Deep Echo State Network (DeepESN): A Brief Survey |
Authors | Claudio Gallicchio, Alessio Micheli |
Abstract | The study of deep recurrent neural networks (RNNs) and, in particular, of deep Reservoir Computing (RC) is gaining an increasing research attention in the neural networks community. The recently introduced Deep Echo State Network (DeepESN) model opened the way to an extremely efficient approach for designing deep neural networks for temporal data. At the same time, the study of DeepESNs allowed to shed light on the intrinsic properties of state dynamics developed by hierarchical compositions of recurrent layers, i.e. on the bias of depth in RNNs architectural design. In this paper, we summarize the advancements in the development, analysis and applications of DeepESNs. |
Tasks | |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04323v3 |
http://arxiv.org/pdf/1712.04323v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-echo-state-network-deepesn-a-brief |
Repo | https://github.com/lucasburger/pyRC |
Framework | none |