May 7, 2019

3229 words 16 mins read

Paper Group AWR 44

Paper Group AWR 44

Neural Architecture Search with Reinforcement Learning. Global Neural CCG Parsing with Optimality Guarantees. Learning to Match Using Local and Distributed Representations of Text for Web Search. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. Data-driven HR - Résumé Analysis Based on Natural Language Process …

Neural Architecture Search with Reinforcement Learning

Title Neural Architecture Search with Reinforcement Learning
Authors Barret Zoph, Quoc V. Le
Abstract Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.
Tasks Image Classification, Language Modelling, Neural Architecture Search
Published 2016-11-05
URL http://arxiv.org/abs/1611.01578v2
PDF http://arxiv.org/pdf/1611.01578v2.pdf
PWC https://paperswithcode.com/paper/neural-architecture-search-with-reinforcement
Repo https://github.com/GiuliaLanzillotta/INAS
Framework pytorch

Global Neural CCG Parsing with Optimality Guarantees

Title Global Neural CCG Parsing with Optimality Guarantees
Authors Kenton Lee, Mike Lewis, Luke Zettlemoyer
Abstract We introduce the first global recursive neural parsing model with optimality guarantees during decoding. To support global features, we give up dynamic programs and instead search directly in the space of all possible subtrees. Although this space is exponentially large in the sentence length, we show it is possible to learn an efficient A* parser. We augment existing parsing models, which have informative bounds on the outside score, with a global model that has loose bounds but only needs to model non-local phenomena. The global model is trained with a new objective that encourages the parser to explore a tiny fraction of the search space. The approach is applied to CCG parsing, improving state-of-the-art accuracy by 0.4 F1. The parser finds the optimal parse for 99.9% of held-out sentences, exploring on average only 190 subtrees.
Tasks
Published 2016-07-05
URL http://arxiv.org/abs/1607.01432v2
PDF http://arxiv.org/pdf/1607.01432v2.pdf
PWC https://paperswithcode.com/paper/global-neural-ccg-parsing-with-optimality
Repo https://github.com/kentonl/neuralccg
Framework none
Title Learning to Match Using Local and Distributed Representations of Text for Web Search
Authors Bhaskar Mitra, Fernando Diaz, Nick Craswell
Abstract Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favorable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or `duet’ performs significantly better than either neural network individually on a Web page ranking task, and also significantly outperforms traditional baselines and other recently proposed models based on neural networks. |
Tasks Document Ranking, Information Retrieval
Published 2016-10-26
URL http://arxiv.org/abs/1610.08136v1
PDF http://arxiv.org/pdf/1610.08136v1.pdf
PWC https://paperswithcode.com/paper/learning-to-match-using-local-and-distributed-1
Repo https://github.com/yongbowin/nlp-papernotes
Framework tf

Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences

Title Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences
Authors Daniel Neil, Michael Pfeiffer, Shih-Chii Liu
Abstract Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for extracting patterns from temporal sequences. However, current RNN models are ill-suited to process irregularly sampled data triggered by events generated in continuous time by sensors or other neurons. Such data can occur, for example, when the input comes from novel event-driven artificial sensors that generate sparse, asynchronous streams of events or from multiple conventional sensors with different update intervals. In this work, we introduce the Phased LSTM model, which extends the LSTM unit by adding a new time gate. This gate is controlled by a parametrized oscillation with a frequency range that produces updates of the memory cell only during a small percentage of the cycle. Even with the sparse updates imposed by the oscillation, the Phased LSTM network achieves faster convergence than regular LSTMs on tasks which require learning of long sequences. The model naturally integrates inputs from sensors of arbitrary sampling rates, thereby opening new areas of investigation for processing asynchronous sensory events that carry timing information. It also greatly improves the performance of LSTMs in standard RNN applications, and does so with an order-of-magnitude fewer computes at runtime.
Tasks
Published 2016-10-29
URL http://arxiv.org/abs/1610.09513v1
PDF http://arxiv.org/pdf/1610.09513v1.pdf
PWC https://paperswithcode.com/paper/phased-lstm-accelerating-recurrent-network
Repo https://github.com/philipperemy/tensorflow-phased-lstm
Framework tf

Data-driven HR - Résumé Analysis Based on Natural Language Processing and Machine Learning

Title Data-driven HR - Résumé Analysis Based on Natural Language Processing and Machine Learning
Authors Tim Zimmermann, Leo Kotschenreuther, Karsten Schmidt
Abstract Recruiters usually spend less than a minute looking at each r'esum'e when deciding whether it’s worth continuing the recruitment process with the candidate. Recruiters focus on keywords, and it’s almost impossible to guarantee a fair process of candidate selection. The main scope of this paper is to tackle this issue by introducing a data-driven approach that shows how to process r'esum'es automatically and give recruiters more time to only examine promising candidates. Furthermore, we show how to leverage Machine Learning and Natural Language Processing in order to extract all required information from the r'esum'es. Once the information is extracted, a ranking score is calculated. The score describes how well the candidates fit based on their education, work experience and skills. Later this paper illustrates a prototype application that shows how this novel approach can increase the productivity of recruiters. The application enables them to filter and rank candidates based on predefined job descriptions. Guided by the ranking, recruiters can get deeper insights from candidate profiles and validate why and how the application ranked them. This application shows how to improve the hiring process by giving an unbiased hiring decision support.
Tasks
Published 2016-06-17
URL http://arxiv.org/abs/1606.05611v2
PDF http://arxiv.org/pdf/1606.05611v2.pdf
PWC https://paperswithcode.com/paper/data-driven-hr-resume-analysis-based-on
Repo https://github.com/paszin/paszin.github.io
Framework none

Rotation equivariant vector field networks

Title Rotation equivariant vector field networks
Authors Diego Marcos, Michele Volpi, Nikos Komodakis, Devis Tuia
Abstract In many computer vision tasks, we expect a particular behavior of the output with respect to rotations of the input image. If this relationship is explicitly encoded, instead of treated as any other variation, the complexity of the problem is decreased, leading to a reduction in the size of the required model. In this paper, we propose the Rotation Equivariant Vector Field Networks (RotEqNet), a Convolutional Neural Network (CNN) architecture encoding rotation equivariance, invariance and covariance. Each convolutional filter is applied at multiple orientations and returns a vector field representing magnitude and angle of the highest scoring orientation at every spatial location. We develop a modified convolution operator relying on this representation to obtain deep architectures. We test RotEqNet on several problems requiring different responses with respect to the inputs’ rotation: image classification, biomedical image segmentation, orientation estimation and patch matching. In all cases, we show that RotEqNet offers extremely compact models in terms of number of parameters and provides results in line to those of networks orders of magnitude larger.
Tasks Image Classification, Semantic Segmentation
Published 2016-12-29
URL http://arxiv.org/abs/1612.09346v3
PDF http://arxiv.org/pdf/1612.09346v3.pdf
PWC https://paperswithcode.com/paper/rotation-equivariant-vector-field-networks
Repo https://github.com/COGMAR/RotEqNet
Framework pytorch

Low-Rank Inducing Norms with Optimality Interpretations

Title Low-Rank Inducing Norms with Optimality Interpretations
Authors Christian Grussler, Pontus Giselsson
Abstract Optimization problems with rank constraints appear in many diverse fields such as control, machine learning and image analysis. Since the rank constraint is non-convex, these problems are often approximately solved via convex relaxations. Nuclear norm regularization is the prevailing convexifying technique for dealing with these types of problem. This paper introduces a family of low-rank inducing norms and regularizers which includes the nuclear norm as a special case. A posteriori guarantees on solving an underlying rank constrained optimization problem with these convex relaxations are provided. We evaluate the performance of the low-rank inducing norms on three matrix completion problems. In all examples, the nuclear norm heuristic is outperformed by convex relaxations based on other low-rank inducing norms. For two of the problems there exist low-rank inducing norms that succeed in recovering the partially unknown matrix, while the nuclear norm fails. These low-rank inducing norms are shown to be representable as semi-definite programs. Moreover, these norms have cheaply computable proximal mappings, which makes it possible to also solve problems of large size using first-order methods.
Tasks Matrix Completion
Published 2016-12-09
URL http://arxiv.org/abs/1612.03186v2
PDF http://arxiv.org/pdf/1612.03186v2.pdf
PWC https://paperswithcode.com/paper/low-rank-inducing-norms-with-optimality
Repo https://github.com/LowRankOpt/LRINorm
Framework none

A Minimax Approach to Supervised Learning

Title A Minimax Approach to Supervised Learning
Authors Farzan Farnia, David Tse
Abstract Given a task of predicting $Y$ from $X$, a loss function $L$, and a set of probability distributions $\Gamma$ on $(X,Y)$, what is the optimal decision rule minimizing the worst-case expected loss over $\Gamma$? In this paper, we address this question by introducing a generalization of the principle of maximum entropy. Applying this principle to sets of distributions with marginal on $X$ constrained to be the empirical marginal from the data, we develop a general minimax approach for supervised learning problems. While for some loss functions such as squared-error and log loss, the minimax approach rederives well-knwon regression models, for the 0-1 loss it results in a new linear classifier which we call the maximum entropy machine. The maximum entropy machine minimizes the worst-case 0-1 loss over the structured set of distribution, and by our numerical experiments can outperform other well-known linear classifiers such as SVM. We also prove a bound on the generalization worst-case error in the minimax approach.
Tasks
Published 2016-06-07
URL http://arxiv.org/abs/1606.02206v5
PDF http://arxiv.org/pdf/1606.02206v5.pdf
PWC https://paperswithcode.com/paper/a-minimax-approach-to-supervised-learning
Repo https://github.com/KaloshinPE/MEM_detector
Framework none

Progressive Attention Networks for Visual Attribute Prediction

Title Progressive Attention Networks for Visual Attribute Prediction
Authors Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han
Abstract We propose a novel attention model that can accurately attends to target objects of various scales and shapes in images. The model is trained to gradually suppress irrelevant regions in an input image via a progressive attentive process over multiple layers of a convolutional neural network. The attentive process in each layer determines whether to pass or block features at certain spatial locations for use in the subsequent layers. The proposed progressive attention mechanism works well especially when combined with hard attention. We further employ local contexts to incorporate neighborhood features of each location and estimate a better attention probability map. The experiments on synthetic and real datasets show that the proposed attention networks outperform traditional attention methods in visual attribute prediction tasks.
Tasks
Published 2016-06-08
URL http://arxiv.org/abs/1606.02393v5
PDF http://arxiv.org/pdf/1606.02393v5.pdf
PWC https://paperswithcode.com/paper/progressive-attention-networks-for-visual
Repo https://github.com/hworang77/PAN
Framework none

Deep convolutional neural networks for predominant instrument recognition in polyphonic music

Title Deep convolutional neural networks for predominant instrument recognition in polyphonic music
Authors Yoonchang Han, Jaehun Kim, Kyogu Lee
Abstract Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription easier and more accurate. In this paper, we present a convolutional neural network framework for predominant instrument recognition in real-world polyphonic music. We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length. To obtain the audio-excerpt-wise result, we aggregate multiple outputs from sliding windows over the test audio. In doing so, we investigated two different aggregation methods: one takes the average for each instrument and the other takes the instrument-wise sum followed by normalization. In addition, we conducted extensive experiments on several important factors that affect the performance, including analysis window size, identification threshold, and activation functions for neural networks to find the optimal set of parameters. Using a dataset of 10k audio excerpts from 11 instruments for evaluation, we found that convolutional neural networks are more robust than conventional methods that exploit spectral features and source separation with support vector machines. Experimental results showed that the proposed convolutional network architecture obtained an F1 measure of 0.602 for micro and 0.503 for macro, respectively, achieving 19.6% and 16.4% in performance improvement compared with other state-of-the-art algorithms.
Tasks Information Retrieval, Music Information Retrieval
Published 2016-05-31
URL http://arxiv.org/abs/1605.09507v3
PDF http://arxiv.org/pdf/1605.09507v3.pdf
PWC https://paperswithcode.com/paper/deep-convolutional-neural-networks-for-6
Repo https://github.com/iooops/CS221-Audio-Tagging
Framework none

Entity Embeddings of Categorical Variables

Title Entity Embeddings of Categorical Variables
Authors Cheng Guo, Felix Berkhahn
Abstract We map categorical variables in a function approximation problem into Euclidean spaces, which are the entity embeddings of the categorical variables. The mapping is learned by a neural network during the standard supervised training process. Entity embedding not only reduces memory usage and speeds up neural networks compared with one-hot encoding, but more importantly by mapping similar values close to each other in the embedding space it reveals the intrinsic properties of the categorical variables. We applied it successfully in a recent Kaggle competition and were able to reach the third position with relative simple features. We further demonstrate in this paper that entity embedding helps the neural network to generalize better when the data is sparse and statistics is unknown. Thus it is especially useful for datasets with lots of high cardinality features, where other methods tend to overfit. We also demonstrate that the embeddings obtained from the trained neural network boost the performance of all tested machine learning methods considerably when used as the input features instead. As entity embedding defines a distance measure for categorical variables it can be used for visualizing categorical data and for data clustering.
Tasks Entity Embeddings
Published 2016-04-22
URL http://arxiv.org/abs/1604.06737v1
PDF http://arxiv.org/pdf/1604.06737v1.pdf
PWC https://paperswithcode.com/paper/entity-embeddings-of-categorical-variables
Repo https://github.com/ajinkyaT/Deep_learning_Entity_Embeddings
Framework none

Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection

Title Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection
Authors Yu Xiang, Wongun Choi, Yuanqing Lin, Silvio Savarese
Abstract In CNN-based object detection methods, region proposal becomes a bottleneck when objects exhibit significant scale variation, occlusion or truncation. In addition, these methods mainly focus on 2D object detection and cannot estimate detailed properties of objects. In this paper, we propose subcategory-aware CNNs for object detection. We introduce a novel region proposal network that uses subcategory information to guide the proposal generating process, and a new detection network for joint detection and subcategory classification. By using subcategories related to object pose, we achieve state-of-the-art performance on both detection and pose estimation on commonly used benchmarks.
Tasks Object Detection, Pose Estimation
Published 2016-04-16
URL http://arxiv.org/abs/1604.04693v3
PDF http://arxiv.org/pdf/1604.04693v3.pdf
PWC https://paperswithcode.com/paper/subcategory-aware-convolutional-neural
Repo https://github.com/xiaohaoChen/rrc_detection
Framework none

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Title Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Authors Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach
Abstract Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.
Tasks Visual Question Answering
Published 2016-06-06
URL http://arxiv.org/abs/1606.01847v3
PDF http://arxiv.org/pdf/1606.01847v3.pdf
PWC https://paperswithcode.com/paper/multimodal-compact-bilinear-pooling-for
Repo https://github.com/MarcBS/keras
Framework none

Variational Latent Gaussian Process for Recovering Single-Trial Dynamics from Population Spike Trains

Title Variational Latent Gaussian Process for Recovering Single-Trial Dynamics from Population Spike Trains
Authors Yuan Zhao, Il Memming Park
Abstract When governed by underlying low-dimensional dynamics, the interdependence of simultaneously recorded population of neurons can be explained by a small number of shared factors, or a low-dimensional trajectory. Recovering these latent trajectories, particularly from single-trial population recordings, may help us understand the dynamics that drive neural computation. However, due to the biophysical constraints and noise in the spike trains, inferring trajectories from data is a challenging statistical problem in general. Here, we propose a practical and efficient inference method, called the variational latent Gaussian process (vLGP). The vLGP combines a generative model with a history-dependent point process observation together with a smoothness prior on the latent trajectories. The vLGP improves upon earlier methods for recovering latent trajectories, which assume either observation models inappropriate for point processes or linear dynamics. We compare and validate vLGP on both simulated datasets and population recordings from the primary visual cortex. In the V1 dataset, we find that vLGP achieves substantially higher performance than previous methods for predicting omitted spike trains, as well as capturing both the toroidal topology of visual stimuli space, and the noise-correlation. These results show that vLGP is a robust method with a potential to reveal hidden neural dynamics from large-scale neural recordings.
Tasks Point Processes
Published 2016-04-11
URL http://arxiv.org/abs/1604.03053v5
PDF http://arxiv.org/pdf/1604.03053v5.pdf
PWC https://paperswithcode.com/paper/variational-latent-gaussian-process-for
Repo https://github.com/catniplab/vLGP
Framework none

Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections

Title Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections
Authors Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang
Abstract Image restoration, including image denoising, super resolution, inpainting, and so on, is a well-studied problem in computer vision and image processing, as well as a test bed for low-level image modeling algorithms. In this work, we propose a very deep fully convolutional auto-encoder network for image restoration, which is a encoding-decoding framework with symmetric convolutional-deconvolutional layers. In other words, the network is composed of multiple layers of convolution and de-convolution operators, learning end-to-end mappings from corrupted images to the original ones. The convolutional layers capture the abstraction of image contents while eliminating corruptions. Deconvolutional layers have the capability to upsample the feature maps and recover the image details. To deal with the problem that deeper networks tend to be more difficult to train, we propose to symmetrically link convolutional and deconvolutional layers with skip-layer connections, with which the training converges much faster and attains better results.
Tasks Denoising, Image Denoising, Image Restoration, Super-Resolution
Published 2016-06-29
URL http://arxiv.org/abs/1606.08921v3
PDF http://arxiv.org/pdf/1606.08921v3.pdf
PWC https://paperswithcode.com/paper/image-restoration-using-convolutional-auto
Repo https://github.com/titu1994/Image-Super-Resolution
Framework tf
comments powered by Disqus