May 6, 2019

3209 words 16 mins read

Paper Group ANR 416

Active Long Term Memory Networks. Training Recurrent Answering Units with Joint Loss Minimization for VQA. Semi-supervised structured output prediction by local linear regression and sub-gradient descent. Semantic Object Parsing with Graph LSTM. PARAPH: Presentation Attack Rejection by Analyzing Polarization Hypotheses. Debugging Machine Learning T …

Active Long Term Memory Networks


Title	Active Long Term Memory Networks
Authors	Tommaso Furlanello, Jiaping Zhao, Andrew M. Saxe, Laurent Itti, Bosco S. Tjan
Abstract	Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially. This paper introduces the Active Long Term Memory Networks (A-LTM), a model of sequential multi-task deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge. A-LTM exploits the non-convex nature of deep neural networks and actively maintains knowledge of previously learned, inactive tasks using a distillation loss. Distortions of the learned input-output map are penalized but hidden layers are free to transverse towards new local optima that are more favorable for the multi-task objective. We re-frame the McClelland’s seminal Hippocampal theory with respect to Catastrophic Inference (CI) behavior exhibited by modern deep architectures trained with back-propagation and inhomogeneous sampling of latent factors across epochs. We present empirical results of non-trivial CI during continual learning in Deep Linear Networks trained on the same task, in Convolutional Neural Networks when the task shifts from predicting semantic to graphical factors and during domain adaptation from simple to complex environments. We present results of the A-LTM model’s ability to maintain viewpoint recognition learned in the highly controlled iLab-20M dataset with 10 object categories and 88 camera viewpoints, while adapting to the unstructured domain of Imagenet with 1,000 object categories.
Tasks	Continual Learning, Domain Adaptation
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02355v1
PDF	http://arxiv.org/pdf/1606.02355v1.pdf
PWC	https://paperswithcode.com/paper/active-long-term-memory-networks
Repo
Framework

Training Recurrent Answering Units with Joint Loss Minimization for VQA


Title	Training Recurrent Answering Units with Joint Loss Minimization for VQA
Authors	Hyeonwoo Noh, Bohyung Han
Abstract	We propose a novel algorithm for visual question answering based on a recurrent deep neural network, where every module in the network corresponds to a complete answering unit with attention mechanism by itself. The network is optimized by minimizing loss aggregated from all the units, which share model parameters while receiving different information to compute attention probability. For training, our model attends to a region within image feature map, updates its memory based on the question and attended image feature, and answers the question based on its memory state. This procedure is performed to compute loss in each step. The motivation of this approach is our observation that multi-step inferences are often required to answer questions while each problem may have a unique desirable number of steps, which is difficult to identify in practice. Hence, we always make the first unit in the network solve problems, but allow it to learn the knowledge from the rest of units by backpropagation unless it degrades the model. To implement this idea, we early-stop training each unit as soon as it starts to overfit. Note that, since more complex models tend to overfit on easier questions quickly, the last answering unit in the unfolded recurrent neural network is typically killed first while the first one remains last. We make a single-step prediction for a new question using the shared model. This strategy works better than the other options within our framework since the selected model is trained effectively from all units without overfitting. The proposed algorithm outperforms other multi-step attention based approaches using a single step prediction in VQA dataset.
Tasks	Question Answering, Visual Question Answering
Published	2016-06-12
URL	http://arxiv.org/abs/1606.03647v2
PDF	http://arxiv.org/pdf/1606.03647v2.pdf
PWC	https://paperswithcode.com/paper/training-recurrent-answering-units-with-joint
Repo
Framework

Semi-supervised structured output prediction by local linear regression and sub-gradient descent


Title	Semi-supervised structured output prediction by local linear regression and sub-gradient descent
Authors	Ru-Ze Liang, Wei Xie, Weizhi Li, Xin Du, Jim Jing-Yan Wang, Jingbin Wang
Abstract	We propose a novel semi-supervised structured output prediction method based on local linear regression in this paper. The existing semi-supervise structured output prediction methods learn a global predictor for all the data points in a data set, which ignores the differences of local distributions of the data set, and the effects to the structured output prediction. To solve this problem, we propose to learn the missing structured outputs and local predictors for neighborhoods of different data points jointly. Using the local linear regression strategy, in the neighborhood of each data point, we propose to learn a local linear predictor by minimizing both the complexity of the predictor and the upper bound of the structured prediction loss. The minimization problem is solved by sub-gradient descent algorithms. We conduct experiments over two benchmark data sets, and the results show the advantages of the proposed method.
Tasks	Structured Prediction
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02279v3
PDF	http://arxiv.org/pdf/1606.02279v3.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-structured-output-prediction
Repo
Framework

Semantic Object Parsing with Graph LSTM


Title	Semantic Object Parsing with Graph LSTM
Authors	Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan
Abstract	By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data. Particularly, instead of evenly and fixedly dividing an image to pixels or patches in existing multi-dimensional LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each arbitrary-shaped superpixel as a semantically consistent node, and adaptively construct an undirected graph for each image, where the spatial relations of the superpixels are naturally used as edges. Constructed on such an adaptive graph topology, the Graph LSTM is more naturally aligned with the visual patterns in the image (e.g., object boundaries or appearance similarities) and provides a more economical information propagation route. Furthermore, for each optimization step over Graph LSTM, we propose to use a confidence-driven scheme to update the hidden and memory states of nodes progressively till all nodes are updated. In addition, for each node, the forgets gates are adaptively learned to capture different degrees of semantic correlation with neighboring nodes. Comprehensive evaluations on four diverse semantic object parsing datasets well demonstrate the significant superiority of our Graph LSTM over other state-of-the-art solutions.
Tasks
Published	2016-03-23
URL	http://arxiv.org/abs/1603.07063v1
PDF	http://arxiv.org/pdf/1603.07063v1.pdf
PWC	https://paperswithcode.com/paper/semantic-object-parsing-with-graph-lstm
Repo
Framework

PARAPH: Presentation Attack Rejection by Analyzing Polarization Hypotheses


Title	PARAPH: Presentation Attack Rejection by Analyzing Polarization Hypotheses
Authors	Ethan M. Rudd, Manuel Gunther, Terrance E. Boult
Abstract	For applications such as airport border control, biometric technologies that can process many capture subjects quickly, efficiently, with weak supervision, and with minimal discomfort are desirable. Facial recognition is particularly appealing because it is minimally invasive yet offers relatively good recognition performance. Unfortunately, the combination of weak supervision and minimal invasiveness makes even highly accurate facial recognition systems susceptible to spoofing via presentation attacks. Thus, there is great demand for an effective and low cost system capable of rejecting such attacks.To this end we introduce PARAPH – a novel hardware extension that exploits different measurements of light polarization to yield an image space in which presentation media are readily discernible from Bona Fide facial characteristics. The PARAPH system is inexpensive with an added cost of less than 10 US dollars. The system makes two polarization measurements in rapid succession, allowing them to be approximately pixel-aligned, with a frame rate limited by the camera, not the system. There are no moving parts above the molecular level, due to the efficient use of twisted nematic liquid crystals. We present evaluation images using three presentation attack media next to an actual face – high quality photos on glossy and matte paper and a video of the face on an LCD. In each case, the actual face in the image generated by PARAPH is structurally discernible from the presentations, which appear either as noise (print attacks) or saturated images (replay attacks).
Tasks
Published	2016-05-10
URL	http://arxiv.org/abs/1605.03124v1
PDF	http://arxiv.org/pdf/1605.03124v1.pdf
PWC	https://paperswithcode.com/paper/paraph-presentation-attack-rejection-by
Repo
Framework

Debugging Machine Learning Tasks


Title	Debugging Machine Learning Tasks
Authors	Aleksandar Chakarov, Aditya Nori, Sriram Rajamani, Shayak Sen, Deepak Vijaykeerthy
Abstract	Unlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous amounts of data. Just like developers of traditional programs debug errors in their code, developers of machine learning tasks debug and fix errors in their data. However, algorithms and tools for debugging and fixing errors in data are less common, when compared to their counterparts for detecting and fixing errors in code. In this paper, we consider classification tasks where errors in training data lead to misclassifications in test points, and propose an automated method to find the root causes of such misclassifications. Our root cause analysis is based on Pearl’s theory of causation, and uses Pearl’s PS (Probability of Sufficiency) as a scoring metric. Our implementation, Psi, encodes the computation of PS as a probabilistic program, and uses recent work on probabilistic programs and transformations on probabilistic programs (along with gray-box models of machine learning algorithms) to efficiently compute PS. Psi is able to identify root causes of data errors in interesting data sets.
Tasks
Published	2016-03-23
URL	http://arxiv.org/abs/1603.07292v1
PDF	http://arxiv.org/pdf/1603.07292v1.pdf
PWC	https://paperswithcode.com/paper/debugging-machine-learning-tasks
Repo
Framework

Unsupervised Cross-Media Hashing with Structure Preservation


Title	Unsupervised Cross-Media Hashing with Structure Preservation
Authors	Xiangyu Wang, Alex Yong-Sang Chia
Abstract	Recent years have seen the exponential growth of heterogeneous multimedia data. The need for effective and accurate data retrieval from heterogeneous data sources has attracted much research interest in cross-media retrieval. Here, given a query of any media type, cross-media retrieval seeks to find relevant results of different media types from heterogeneous data sources. To facilitate large-scale cross-media retrieval, we propose a novel unsupervised cross-media hashing method. Our method incorporates local affinity and distance repulsion constraints into a matrix factorization framework. Correspondingly, the proposed method learns hash functions that generates unified hash codes from different media types, while ensuring intrinsic geometric structure of the data distribution is preserved. These hash codes empower the similarity between data of different media types to be evaluated directly. Experimental results on two large-scale multimedia datasets demonstrate the effectiveness of the proposed method, where we outperform the state-of-the-art methods.
Tasks
Published	2016-03-18
URL	http://arxiv.org/abs/1603.05782v1
PDF	http://arxiv.org/pdf/1603.05782v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-cross-media-hashing-with
Repo
Framework

Towards Music Captioning: Generating Music Playlist Descriptions


Title	Towards Music Captioning: Generating Music Playlist Descriptions
Authors	Keunwoo Choi, George Fazekas, Brian McFee, Kyunghyun Cho, Mark Sandler
Abstract	Descriptions are often provided along with recommendations to help users’ discovery. Recommending automatically generated music playlists (e.g. personalised playlists) introduces the problem of generating descriptions. In this paper, we propose a method for generating music playlist descriptions, which is called as music captioning. In the proposed method, audio content analysis and natural language processing are adopted to utilise the information of each track.
Tasks
Published	2016-08-17
URL	http://arxiv.org/abs/1608.04868v2
PDF	http://arxiv.org/pdf/1608.04868v2.pdf
PWC	https://paperswithcode.com/paper/towards-music-captioning-generating-music
Repo
Framework

Unit Dependency Graph and its Application to Arithmetic Word Problem Solving


Title	Unit Dependency Graph and its Application to Arithmetic Word Problem Solving
Authors	Subhro Roy, Dan Roth
Abstract	Math word problems provide a natural abstraction to a range of natural language understanding problems that involve reasoning about quantities, such as interpreting election results, news about casualties, and the financial section of a newspaper. Units associated with the quantities often provide information that is essential to support this reasoning. This paper proposes a principled way to capture and reason about units and shows how it can benefit an arithmetic word problem solver. This paper presents the concept of Unit Dependency Graphs (UDGs), which provides a compact representation of the dependencies between units of numbers mentioned in a given problem. Inducing the UDG alleviates the brittleness of the unit extraction system and allows for a natural way to leverage domain knowledge about unit compatibility, for word problem solving. We introduce a decomposed model for inducing UDGs with minimal additional annotations, and use it to augment the expressions used in the arithmetic word problem solver of (Roy and Roth 2015) via a constrained inference framework. We show that introduction of UDGs reduces the error of the solver by over 10 %, surpassing all existing systems for solving arithmetic word problems. In addition, it also makes the system more robust to adaptation to new vocabulary and equation forms .
Tasks
Published	2016-12-03
URL	http://arxiv.org/abs/1612.00969v1
PDF	http://arxiv.org/pdf/1612.00969v1.pdf
PWC	https://paperswithcode.com/paper/unit-dependency-graph-and-its-application-to
Repo
Framework

A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling


Title	A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling
Authors	Anantharaman Palacode Narayana Iyer
Abstract	Statistical language models are central to many applications that use semantics. Recurrent Neural Networks (RNN) are known to produce state of the art results for language modelling, outperforming their traditional n-gram counterparts in many cases. To generate a probability distribution across a vocabulary, these models require a softmax output layer that linearly increases in size with the size of the vocabulary. Large vocabularies need a commensurately large softmax layer and training them on typical laptops/PCs requires significant time and machine resources. In this paper we present a new technique for implementing RNN based large vocabulary language models that substantially speeds up computation while optimally using the limited memory resources. Our technique, while building on the notion of factorizing the output layer by having multiple output layers, improves on the earlier work by substantially optimizing on the individual output layer size and also eliminating the need for a multistep prediction process.
Tasks	Language Modelling
Published	2016-02-04
URL	http://arxiv.org/abs/1602.01576v1
PDF	http://arxiv.org/pdf/1602.01576v1.pdf
PWC	https://paperswithcode.com/paper/a-factorized-recurrent-neural-network-based
Repo
Framework

Latent common manifold learning with alternating diffusion: analysis and applications


Title	Latent common manifold learning with alternating diffusion: analysis and applications
Authors	Ronen Talmon, Hau-tieng Wu
Abstract	The analysis of data sets arising from multiple sensors has drawn significant research attention over the years. Traditional methods, including kernel-based methods, are typically incapable of capturing nonlinear geometric structures. We introduce a latent common manifold model underlying multiple sensor observations for the purpose of multimodal data fusion. A method based on alternating diffusion is presented and analyzed; we provide theoretical analysis of the method under the latent common manifold model. To exemplify the power of the proposed framework, experimental results in several applications are reported.
Tasks
Published	2016-01-30
URL	http://arxiv.org/abs/1602.00078v2
PDF	http://arxiv.org/pdf/1602.00078v2.pdf
PWC	https://paperswithcode.com/paper/latent-common-manifold-learning-with
Repo
Framework

Distributed Estimation of Dynamic Parameters : Regret Analysis


Title	Distributed Estimation of Dynamic Parameters : Regret Analysis
Authors	Shahin Shahrampour, Alexander Rakhlin, Ali Jadbabaie
Abstract	This paper addresses the estimation of a time- varying parameter in a network. A group of agents sequentially receive noisy signals about the parameter (or moving target), which does not follow any particular dynamics. The parameter is not observable to an individual agent, but it is globally identifiable for the whole network. Viewing the problem with an online optimization lens, we aim to provide the finite-time or non-asymptotic analysis of the problem. To this end, we use a notion of dynamic regret which suits the online, non-stationary nature of the problem. In our setting, dynamic regret can be recognized as a finite-time counterpart of stability in the mean- square sense. We develop a distributed, online algorithm for tracking the moving target. Defining the path-length as the consecutive differences between target locations, we express an upper bound on regret in terms of the path-length of the target and network errors. We further show the consistency of the result with static setting and noiseless observations.
Tasks
Published	2016-03-02
URL	http://arxiv.org/abs/1603.00576v1
PDF	http://arxiv.org/pdf/1603.00576v1.pdf
PWC	https://paperswithcode.com/paper/distributed-estimation-of-dynamic-parameters
Repo
Framework

Properties and Bayesian fitting of restricted Boltzmann machines


Title	Properties and Bayesian fitting of restricted Boltzmann machines
Authors	Andee Kaplan, Daniel Nordman, Stephen Vardeman
Abstract	A restricted Boltzmann machine (RBM) is an undirected graphical model constructed for discrete or continuous random variables, with two layers, one hidden and one visible, and no conditional dependency within a layer. In recent years, RBMs have risen to prominence due to their connection to deep learning. By treating a hidden layer of one RBM as the visible layer in a second RBM, a deep architecture can be created. RBMs are thought to thereby have the ability to encode very complex and rich structures in data, making them attractive for supervised learning. However, the generative behavior of RBMs is largely unexplored and typical fitting methodology does not easily allow for uncertainty quantification in addition to point estimates. In this paper, we discuss the relationship between RBM parameter specification in the binary case and model properties such as degeneracy, instability and uninterpretability. We also describe the associated difficulties that can arise with likelihood-based inference and further discuss the potential Bayes fitting of such (highly flexible) models, especially as Gibbs sampling (quasi-Bayes) methods are often advocated for the RBM model structure.
Tasks
Published	2016-12-04
URL	http://arxiv.org/abs/1612.01158v3
PDF	http://arxiv.org/pdf/1612.01158v3.pdf
PWC	https://paperswithcode.com/paper/properties-and-bayesian-fitting-of-restricted
Repo
Framework

Tag Prediction at Flickr: a View from the Darkroom


Title	Tag Prediction at Flickr: a View from the Darkroom
Authors	Kofi Boakye, Sachin Farfade, Hamid Izadinia, Yannis Kalantidis, Pierre Garrigues
Abstract	Automated photo tagging has established itself as one of the most compelling applications of deep learning. While deep convolutional neural networks have repeatedly demonstrated top performance on standard datasets for classification, there are a number of often overlooked but important considerations when deploying this technology in a real-world scenario. In this paper, we present our efforts in developing a large-scale photo tagging system for Flickr photo search. We discuss topics including how to 1) select the tags that matter most to our users; 2) develop lightweight, high-performance models for tag prediction; and 3) leverage the power of large amounts of noisy data for training. Our results demonstrate that, for real-world datasets, training exclusively with this noisy data yields performance on par with the standard paradigm of first pre-training on clean data and then fine-tuning. In addition, we observe that the models trained with user-generated data can yield better fine-tuning results when a small amount of clean data is available. As such, we advocate for the approach of harnessing user-generated data in large-scale systems.
Tasks
Published	2016-12-06
URL	http://arxiv.org/abs/1612.01922v3
PDF	http://arxiv.org/pdf/1612.01922v3.pdf
PWC	https://paperswithcode.com/paper/tag-prediction-at-flickr-a-view-from-the
Repo
Framework

Feature Based Task Recommendation in Crowdsourcing with Implicit Observations


Title	Feature Based Task Recommendation in Crowdsourcing with Implicit Observations
Authors	Habibur Rahman, Lucas Joppa, Senjuti Basu Roy
Abstract	Existing research in crowdsourcing has investigated how to recommend tasks to workers based on which task the workers have already completed, referred to as {\em implicit feedback}. We, on the other hand, investigate the task recommendation problem, where we leverage both implicit feedback and explicit features of the task. We assume that we are given a set of workers, a set of tasks, interactions (such as the number of times a worker has completed a particular task), and the presence of explicit features of each task (such as, task location). We intend to recommend tasks to the workers by exploiting the implicit interactions, and the presence or absence of explicit features in the tasks. We formalize the problem as an optimization problem, propose two alternative problem formulations and respective solutions that exploit implicit feedback, explicit features, as well as similarity between the tasks. We compare the efficacy of our proposed solutions against multiple state-of-the-art techniques using two large scale real world datasets.
Tasks
Published	2016-02-10
URL	http://arxiv.org/abs/1602.03291v2
PDF	http://arxiv.org/pdf/1602.03291v2.pdf
PWC	https://paperswithcode.com/paper/feature-based-task-recommendation-in
Repo
Framework