May 5, 2019

2665 words 13 mins read

Paper Group ANR 492

Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition. Target-Side Context for Discriminative Models in Statistical Machine Translation. Solve-Select-Scale: A Three Step Process For Sparse Signal Estimation. Learning to Learn Neural Networks. Predicting Drug Interactions and Mutagenicity with Ensemble Classi …

Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition


Title	Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition
Authors	Xiao Liu, Jiang Wang, Shilei Wen, Errui Ding, Yuanqing Lin
Abstract	A key challenge in fine-grained recognition is how to find and represent discriminative local regions. Recent attention models are capable of learning discriminative region localizers only from category labels with reinforcement learning. However, not utilizing any explicit part information, they are not able to accurately find multiple distinctive regions. In this work, we introduce an attribute-guided attention localization scheme where the local region localizers are learned under the guidance of part attribute descriptions. By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm. The attribute labeling requirement of the scheme is more amenable than the accurate part location annotation required by traditional part-based fine-grained recognition methods. Experimental results on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme on both fine-grained recognition and attribute recognition.
Tasks
Published	2016-05-20
URL	http://arxiv.org/abs/1605.06217v2
PDF	http://arxiv.org/pdf/1605.06217v2.pdf
PWC	https://paperswithcode.com/paper/localizing-by-describing-attribute-guided
Repo
Framework

Target-Side Context for Discriminative Models in Statistical Machine Translation


Title	Target-Side Context for Discriminative Models in Statistical Machine Translation
Authors	Aleš Tamchyna, Alexander Fraser, Ondřej Bojar, Marcin Junczys-Dowmunt
Abstract	Discriminative translation models utilizing source context have been shown to help statistical machine translation performance. We propose a novel extension of this work using target context information. Surprisingly, we show that this model can be efficiently integrated directly in the decoding process. Our approach scales to large training data sizes and results in consistent improvements in translation quality on four language pairs. We also provide an analysis comparing the strengths of the baseline source-context model with our extended source-context and target-context model and we show that our extension allows us to better capture morphological coherence. Our work is freely available as part of Moses.
Tasks	Machine Translation
Published	2016-07-05
URL	http://arxiv.org/abs/1607.01149v1
PDF	http://arxiv.org/pdf/1607.01149v1.pdf
PWC	https://paperswithcode.com/paper/target-side-context-for-discriminative-models
Repo
Framework

Solve-Select-Scale: A Three Step Process For Sparse Signal Estimation


Title	Solve-Select-Scale: A Three Step Process For Sparse Signal Estimation
Authors	Mithun Das Gupta
Abstract	In the theory of compressed sensing (CS), the sparsity $\x_0$ of the unknown signal $\mathbf{x} \in \mathcal{R}^n$ is of prime importance and the focus of reconstruction algorithms has mainly been either $\x_0$ or its convex relaxation (via $\x_1$). However, it is typically unknown in practice and has remained a challenge when nothing about the size of the support is known. As pointed recently, $\x_0$ might not be the best metric to minimize directly, both due to its inherent complexity as well as its noise performance. Recently a novel stable measure of sparsity $s(\mathbf{x}) := \mathbf{x}_1^2/\mathbf{x}_2^2$ has been investigated by Lopes \cite{Lopes2012}, which is a sharp lower bound on $\mathbf{x}_0$. The estimation procedure for this measure uses only a small number of linear measurements, does not rely on any sparsity assumptions, and requires very little computation. The usage of the quantity $s(\mathbf{x})$ in sparse signal estimation problems has not received much importance yet. We develop the idea of incorporating $s(\mathbf{x})$ into the signal estimation framework. We also provide a three step algorithm to solve problems of the form $\mathbf{Ax=b}$ with no additional assumptions on the original signal $\mathbf{x}$.
Tasks
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04657v1
PDF	http://arxiv.org/pdf/1605.04657v1.pdf
PWC	https://paperswithcode.com/paper/solve-select-scale-a-three-step-process-for
Repo
Framework

Learning to Learn Neural Networks


Title	Learning to Learn Neural Networks
Authors	Tom Bosc
Abstract	Meta-learning consists in learning learning algorithms. We use a Long Short Term Memory (LSTM) based network to learn to compute on-line updates of the parameters of another neural network. These parameters are stored in the cell state of the LSTM. Our framework allows to compare learned algorithms to hand-made algorithms within the traditional train and test methodology. In an experiment, we learn a learning algorithm for a one-hidden layer Multi-Layer Perceptron (MLP) on non-linearly separable datasets. The learned algorithm is able to update parameters of both layers and generalise well on similar datasets.
Tasks	Meta-Learning
Published	2016-10-19
URL	http://arxiv.org/abs/1610.06072v1
PDF	http://arxiv.org/pdf/1610.06072v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-learn-neural-networks
Repo
Framework

Predicting Drug Interactions and Mutagenicity with Ensemble Classifiers on Subgraphs of Molecules


Title	Predicting Drug Interactions and Mutagenicity with Ensemble Classifiers on Subgraphs of Molecules
Authors	Andrew Schaumberg, Angela Yu, Tatsuhiro Koshi, Xiaochan Zong, Santoshkalyan Rayadhurgam
Abstract	In this study, we intend to solve a mutual information problem in interacting molecules of any type, such as proteins, nucleic acids, and small molecules. Using machine learning techniques, we accurately predict pairwise interactions, which can be of medical and biological importance. Graphs are are useful in this problem for their generality to all types of molecules, due to the inherent association of atoms through atomic bonds. Subgraphs can represent different molecular domains. These domains can be biologically significant as most molecules only have portions that are of functional significance and can interact with other domains. Thus, we use subgraphs as features in different machine learning algorithms to predict if two drugs interact and predict potential single molecule effects.
Tasks
Published	2016-01-27
URL	http://arxiv.org/abs/1601.07233v1
PDF	http://arxiv.org/pdf/1601.07233v1.pdf
PWC	https://paperswithcode.com/paper/predicting-drug-interactions-and-mutagenicity
Repo
Framework

Object-Centric Representation Learning from Unlabeled Videos


Title	Object-Centric Representation Learning from Unlabeled Videos
Authors	Ruohan Gao, Dinesh Jayaraman, Kristen Grauman
Abstract	Supervised (pre-)training currently yields state-of-the-art performance for representation learning for visual recognition, yet it comes at the cost of (1) intensive manual annotations and (2) an inherent restriction in the scope of data relevant for learning. In this work, we explore unsupervised feature learning from unlabeled video. We introduce a novel object-centric approach to temporal coherence that encourages similar representations to be learned for object-like regions segmented from nearby frames. Our framework relies on a Siamese-triplet network to train a deep convolutional neural network (CNN) representation. Compared to existing temporal coherence methods, our idea has the advantage of lightweight preprocessing of the unlabeled video (no tracking required) while still being able to extract object-level regions from which to learn invariances. Furthermore, as we show in results on several standard datasets, our method typically achieves substantial accuracy gains over competing unsupervised methods for image classification and retrieval tasks.
Tasks	Image Classification, Representation Learning
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00500v1
PDF	http://arxiv.org/pdf/1612.00500v1.pdf
PWC	https://paperswithcode.com/paper/object-centric-representation-learning-from
Repo
Framework

Wavelet Scattering on the Pitch Spiral


Title	Wavelet Scattering on the Pitch Spiral
Authors	Vincent Lostanlen, Stéphane Mallat
Abstract	We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in time, log-frequency, and octave index. We give a closed-form approximation of spiral scattering coefficients for a nonstationary generalization of the harmonic source-filter model.
Tasks
Published	2016-01-03
URL	http://arxiv.org/abs/1601.00287v1
PDF	http://arxiv.org/pdf/1601.00287v1.pdf
PWC	https://paperswithcode.com/paper/wavelet-scattering-on-the-pitch-spiral
Repo
Framework

A Variational Perspective on Accelerated Methods in Optimization


Title	A Variational Perspective on Accelerated Methods in Optimization
Authors	Andre Wibisono, Ashia C. Wilson, Michael I. Jordan
Abstract	Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. While many generalizations and extensions of Nesterov’s original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the \emph{Bregman Lagrangian} which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods. We show that the continuous-time limit of all of these methods correspond to traveling the same curve in spacetime at different speeds. From this perspective, Nesterov’s technique and many of its generalizations can be viewed as a systematic way to go from the continuous-time curves generated by the Bregman Lagrangian to a family of discrete-time accelerated algorithms.
Tasks
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04245v1
PDF	http://arxiv.org/pdf/1603.04245v1.pdf
PWC	https://paperswithcode.com/paper/a-variational-perspective-on-accelerated
Repo
Framework

Streaming Algorithms for News and Scientific Literature Recommendation: Submodular Maximization with a d-Knapsack Constraint


Title	Streaming Algorithms for News and Scientific Literature Recommendation: Submodular Maximization with a d-Knapsack Constraint
Authors	Qilian Yu, Easton Li Xu, Shuguang Cui
Abstract	Submodular maximization problems belong to the family of combinatorial optimization problems and enjoy wide applications. In this paper, we focus on the problem of maximizing a monotone submodular function subject to a $d$-knapsack constraint, for which we propose a streaming algorithm that achieves a $\left(\frac{1}{1+2d}-\epsilon\right)$-approximation of the optimal value, while it only needs one single pass through the dataset without storing all the data in the memory. In our experiments, we extensively evaluate the effectiveness of our proposed algorithm via two applications: news recommendation and scientific literature recommendation. It is observed that the proposed streaming algorithm achieves both execution speedup and memory saving by several orders of magnitude, compared with existing approaches.
Tasks	Combinatorial Optimization
Published	2016-03-17
URL	http://arxiv.org/abs/1603.05614v3
PDF	http://arxiv.org/pdf/1603.05614v3.pdf
PWC	https://paperswithcode.com/paper/streaming-algorithms-for-news-and-scientific
Repo
Framework

Two-stream convolutional neural network for accurate RGB-D fingertip detection using depth and edge information


Title	Two-stream convolutional neural network for accurate RGB-D fingertip detection using depth and edge information
Authors	Hengkai Guo, Guijin Wang, Xinghao Chen
Abstract	Accurate detection of fingertips in depth image is critical for human-computer interaction. In this paper, we present a novel two-stream convolutional neural network (CNN) for RGB-D fingertip detection. Firstly edge image is extracted from raw depth image using random forest. Then the edge information is combined with depth information in our CNN structure. We study several fusion approaches and suggest a slow fusion strategy as a promising way of fingertip detection. As shown in our experiments, our real-time algorithm outperforms state-of-the-art fingertip detection methods on the public dataset HandNet with an average 3D error of 9.9mm, and shows comparable accuracy of fingertip estimation on NYU hand dataset.
Tasks
Published	2016-12-23
URL	http://arxiv.org/abs/1612.07978v1
PDF	http://arxiv.org/pdf/1612.07978v1.pdf
PWC	https://paperswithcode.com/paper/two-stream-convolutional-neural-network-for
Repo
Framework

Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients


Title	Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients
Authors	Houman Owhadi, Lei Zhang
Abstract	Implicit schemes are popular methods for the integration of time dependent PDEs such as hyperbolic and parabolic PDEs. However the necessity to solve corresponding linear systems at each time step constitutes a complexity bottleneck in their application to PDEs with rough coefficients. We present a generalization of gamblets introduced in \cite{OwhadiMultigrid:2015} enabling the resolution of these implicit systems in near-linear complexity and provide rigorous a-priori error bounds on the resulting numerical approximations of hyperbolic and parabolic PDEs. These generalized gamblets induce a multiresolution decomposition of the solution space that is adapted to both the underlying (hyperbolic and parabolic) PDE (and the system of ODEs resulting from space discretization) and to the time-steps of the numerical scheme.
Tasks
Published	2016-06-24
URL	http://arxiv.org/abs/1606.07686v2
PDF	http://arxiv.org/pdf/1606.07686v2.pdf
PWC	https://paperswithcode.com/paper/gamblets-for-opening-the-complexity
Repo
Framework

Inertial-Based Scale Estimation for Structure from Motion on Mobile Devices


Title	Inertial-Based Scale Estimation for Structure from Motion on Mobile Devices
Authors	Janne Mustaniemi, Juho Kannala, Simo Särkkä, Jiri Matas, Janne Heikkilä
Abstract	Structure from motion algorithms have an inherent limitation that the reconstruction can only be determined up to the unknown scale factor. Modern mobile devices are equipped with an inertial measurement unit (IMU), which can be used for estimating the scale of the reconstruction. We propose a method that recovers the metric scale given inertial measurements and camera poses. In the process, we also perform a temporal and spatial alignment of the camera and the IMU. Therefore, our solution can be easily combined with any existing visual reconstruction software. The method can cope with noisy camera pose estimates, typically caused by motion blur or rolling shutter artifacts, via utilizing a Rauch-Tung-Striebel (RTS) smoother. Furthermore, the scale estimation is performed in the frequency domain, which provides more robustness to inaccurate sensor time stamps and noisy IMU samples than the previously used time domain representation. In contrast to previous methods, our approach has no parameters that need to be tuned for achieving a good performance. In the experiments, we show that the algorithm outperforms the state-of-the-art in both accuracy and convergence speed of the scale estimate. The accuracy of the scale is around $1%$ from the ground truth depending on the recording. We also demonstrate that our method can improve the scale accuracy of the Project Tango’s build-in motion tracking.
Tasks
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09498v2
PDF	http://arxiv.org/pdf/1611.09498v2.pdf
PWC	https://paperswithcode.com/paper/inertial-based-scale-estimation-for-structure
Repo
Framework

Deep Tensor Convolution on Multicores


Title	Deep Tensor Convolution on Multicores
Authors	David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit
Abstract	Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the $N$-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2-dimensional ConvNets as a special (and the least beneficial) case of our approach, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.
Tasks
Published	2016-11-20
URL	http://arxiv.org/abs/1611.06565v3
PDF	http://arxiv.org/pdf/1611.06565v3.pdf
PWC	https://paperswithcode.com/paper/deep-tensor-convolution-on-multicores
Repo
Framework

Distributed Coding of Multiview Sparse Sources with Joint Recovery


Title	Distributed Coding of Multiview Sparse Sources with Joint Recovery
Authors	Huynh Van Luong, Nikos Deligiannis, Søren Forchhammer, André Kaup
Abstract	In support of applications involving multiview sources in distributed object recognition using lightweight cameras, we propose a new method for the distributed coding of sparse sources as visual descriptor histograms extracted from multiview images. The problem is challenging due to the computational and energy constraints at each camera as well as the limitations regarding inter-camera communication. Our approach addresses these challenges by exploiting the sparsity of the visual descriptor histograms as well as their intra- and inter-camera correlations. Our method couples distributed source coding of the sparse sources with a new joint recovery algorithm that incorporates multiple side information signals, where prior knowledge (low quality) of all the sparse sources is initially sent to exploit their correlations. Experimental evaluation using the histograms of shift-invariant feature transform (SIFT) descriptors extracted from multiview images shows that our method leads to bit-rate saving of up to 43% compared to the state-of-the-art distributed compressed sensing method with independent encoding of the sources.
Tasks	Object Recognition
Published	2016-07-18
URL	http://arxiv.org/abs/1607.04965v1
PDF	http://arxiv.org/pdf/1607.04965v1.pdf
PWC	https://paperswithcode.com/paper/distributed-coding-of-multiview-sparse
Repo
Framework

A Solution to Time-Varying Markov Decision Processes


Title	A Solution to Time-Varying Markov Decision Processes
Authors	Lantao Liu, Gaurav S. Sukhatme
Abstract	We consider a decision-making problem where the environment varies both in space and time. Such problems arise naturally when considering e.g., the navigation of an underwater robot amidst ocean currents or the navigation of an aerial vehicle in wind. To model such spatiotemporal variation, we extend the standard Markov Decision Process (MDP) to a new framework called the Time-Varying Markov Decision Process (TVMDP). The TVMDP has a time-varying state transition model and transforms the standard MDP that considers only immediate and static uncertainty descriptions of state transitions, to a framework that is able to adapt to future time-varying transition dynamics over some horizon. We show how to solve a TVMDP via a redesign of the MDP value propagation mechanisms by incorporating the introduced dynamics along the temporal dimension. We validate our framework in a marine robotics navigation setting using spatiotemporal ocean data and show that it outperforms prior efforts.
Tasks	Decision Making
Published	2016-05-03
URL	http://arxiv.org/abs/1605.01018v3
PDF	http://arxiv.org/pdf/1605.01018v3.pdf
PWC	https://paperswithcode.com/paper/a-solution-to-time-varying-markov-decision
Repo
Framework