July 27, 2019

3302 words 16 mins read

Paper Group ANR 679

Paper Group ANR 679

Making 360$^{\circ}$ Video Watchable in 2D: Learning Videography for Click Free Viewing. On the Interplay between Strong Regularity and Graph Densification. A Nonlinear Orthogonal Non-Negative Matrix Factorization Approach to Subspace Clustering. Compressed Sensing MRI Reconstruction using a Generative Adversarial Network with a Cyclic Loss. Unders …

Making 360$^{\circ}$ Video Watchable in 2D: Learning Videography for Click Free Viewing

Title Making 360$^{\circ}$ Video Watchable in 2D: Learning Videography for Click Free Viewing
Authors Yu-Chuan Su, Kristen Grauman
Abstract 360$^{\circ}$ video requires human viewers to actively control “where” to look while watching the video. Although it provides a more immersive experience of the visual content, it also introduces additional burden for viewers; awkward interfaces to navigate the video lead to suboptimal viewing experiences. Virtual cinematography is an appealing direction to remedy these problems, but conventional methods are limited to virtual environments or rely on hand-crafted heuristics. We propose a new algorithm for virtual cinematography that automatically controls a virtual camera within a 360$^{\circ}$ video. Compared to the state of the art, our algorithm allows more general camera control, avoids redundant outputs, and extracts its output videos substantially more efficiently. Experimental results on over 7 hours of real “in the wild” video show that our generalized camera control is crucial for viewing 360$^{\circ}$ video, while the proposed efficient algorithm is essential for making the generalized control computationally tractable.
Tasks
Published 2017-03-01
URL http://arxiv.org/abs/1703.00495v2
PDF http://arxiv.org/pdf/1703.00495v2.pdf
PWC https://paperswithcode.com/paper/making-360circ-video-watchable-in-2d-learning
Repo
Framework

On the Interplay between Strong Regularity and Graph Densification

Title On the Interplay between Strong Regularity and Graph Densification
Authors Marco Fiorucci, Alessandro Torcinovich, Manuel Curado, Francisco Escolano, Marcello Pelillo
Abstract In this paper we analyze the practical implications of Szemer'edi’s regularity lemma in the preservation of metric information contained in large graphs. To this end, we present a heuristic algorithm to find regular partitions. Our experiments show that this method is quite robust to the natural sparsification of proximity graphs. In addition, this robustness can be enforced by graph densification.
Tasks
Published 2017-03-21
URL http://arxiv.org/abs/1703.07107v1
PDF http://arxiv.org/pdf/1703.07107v1.pdf
PWC https://paperswithcode.com/paper/on-the-interplay-between-strong-regularity
Repo
Framework

A Nonlinear Orthogonal Non-Negative Matrix Factorization Approach to Subspace Clustering

Title A Nonlinear Orthogonal Non-Negative Matrix Factorization Approach to Subspace Clustering
Authors Dijana Tolic, Nino Antulov-Fantulin, Ivica Kopriva
Abstract A recent theoretical analysis shows the equivalence between non-negative matrix factorization (NMF) and spectral clustering based approach to subspace clustering. As NMF and many of its variants are essentially linear, we introduce a nonlinear NMF with explicit orthogonality and derive general kernel-based orthogonal multiplicative update rules to solve the subspace clustering problem. In nonlinear orthogonal NMF framework, we propose two subspace clustering algorithms, named kernel-based non-negative subspace clustering KNSC-Ncut and KNSC-Rcut and establish their connection with spectral normalized cut and ratio cut clustering. We further extend the nonlinear orthogonal NMF framework and introduce a graph regularization to obtain a factorization that respects a local geometric structure of the data after the nonlinear mapping. The proposed NMF-based approach to subspace clustering takes into account the nonlinear nature of the manifold, as well as its intrinsic local geometry, which considerably improves the clustering performance when compared to the several recently proposed state-of-the-art methods.
Tasks
Published 2017-09-29
URL http://arxiv.org/abs/1709.10323v1
PDF http://arxiv.org/pdf/1709.10323v1.pdf
PWC https://paperswithcode.com/paper/a-nonlinear-orthogonal-non-negative-matrix
Repo
Framework

Compressed Sensing MRI Reconstruction using a Generative Adversarial Network with a Cyclic Loss

Title Compressed Sensing MRI Reconstruction using a Generative Adversarial Network with a Cyclic Loss
Authors Tran Minh Quan, Thanh Nguyen-Duc, Won-Ki Jeong
Abstract Compressed Sensing MRI (CS-MRI) has provided theoretical foundations upon which the time-consuming MRI acquisition process can be accelerated. However, it primarily relies on iterative numerical solvers which still hinders their adaptation in time-critical applications. In addition, recent advances in deep neural networks have shown their potential in computer vision and image processing, but their adaptation to MRI reconstruction is still in an early stage. In this paper, we propose a novel deep learning-based generative adversarial model, RefineGAN, for fast and accurate CS-MRI reconstruction. The proposed model is a variant of fully-residual convolutional autoencoder and generative adversarial networks (GANs), specifically designed for CS-MRI formulation; it employs deeper generator and discriminator networks with cyclic data consistency loss for faithful interpolation in the given under-sampled k-space data. In addition, our solution leverages a chained network to further enhance the reconstruction quality. RefineGAN is fast and accurate – the reconstruction process is extremely rapid, as low as tens of milliseconds for reconstruction of a 256x256 image, because it is one-way deployment on a feed-forward network, and the image quality is superior even for extremely low sampling rate (as low as 10%) due to the data-driven nature of the method. We demonstrate that RefineGAN outperforms the state-of-the-art CS-MRI methods by a large margin in terms of both running time and image quality via evaluation using several open-source MRI databases.
Tasks
Published 2017-09-03
URL http://arxiv.org/abs/1709.00753v2
PDF http://arxiv.org/pdf/1709.00753v2.pdf
PWC https://paperswithcode.com/paper/compressed-sensing-mri-reconstruction-using-a
Repo
Framework

Understanding the Logical and Semantic Structure of Large Documents

Title Understanding the Logical and Semantic Structure of Large Documents
Authors Muhammad Mahbubur Rahman, Tim Finin
Abstract Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents like legal briefs, proposals, technical manuals and research articles is still a challenging task. We describe a framework that can analyze a large document and help people to know where a particular information is in that document. We aim to automatically identify and classify semantic sections of documents and assign consistent and human-understandable labels to similar sections across documents. A key contribution of our research is modeling the logical and semantic structure of an electronic document. We apply machine learning techniques, including deep learning, in our prototype system. We also make available a dataset of information about a collection of scholarly articles from the arXiv eprints collection that includes a wide range of metadata for each article, including a table of contents, section labels, section summarizations and more. We hope that this dataset will be a useful resource for the machine learning and NLP communities in information retrieval, content-based question answering and language modeling.
Tasks Information Retrieval, Language Modelling, Question Answering
Published 2017-09-03
URL http://arxiv.org/abs/1709.00770v1
PDF http://arxiv.org/pdf/1709.00770v1.pdf
PWC https://paperswithcode.com/paper/understanding-the-logical-and-semantic
Repo
Framework

Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

Title Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems
Authors Lei Yang
Abstract In this paper, we consider a class of possibly nonconvex, nonsmooth and non-Lipschitz optimization problems arising in many contemporary applications such as machine learning, variable selection and image processing. To solve this class of problems, we propose a proximal gradient method with extrapolation and line search (PGels). This method is developed based on a special potential function and successfully incorporates both extrapolation and non-monotone line search, which are two simple and efficient accelerating techniques for the proximal gradient method. Thanks to the line search, this method allows more flexibilities in choosing the extrapolation parameters and updates them adaptively at each iteration if a certain line search criterion is not satisfied. Moreover, with proper choices of parameters, our PGels reduces to many existing algorithms. We also show that, under some mild conditions, our line search criterion is well defined and any cluster point of the sequence generated by PGels is a stationary point of our problem. In addition, by assuming the Kurdyka-${\L}$ojasiewicz exponent of the objective in our problem, we further analyze the local convergence rate of two special cases of PGels, including the widely used non-monotone proximal gradient method as one case. Finally, we conduct some numerical experiments for solving the $\ell_1$ regularized logistic regression problem and the $\ell_{1\text{-}2}$ regularized least squares problem. Our numerical results illustrate the efficiency of PGels and show the potential advantage of combining two accelerating techniques.
Tasks
Published 2017-11-18
URL http://arxiv.org/abs/1711.06831v3
PDF http://arxiv.org/pdf/1711.06831v3.pdf
PWC https://paperswithcode.com/paper/proximal-gradient-method-with-extrapolation
Repo
Framework

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Title Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Authors Chi Jin, Praneeth Netrapalli, Michael I. Jordan
Abstract Nesterov’s accelerated gradient descent (AGD), an instance of the general family of “momentum methods”, provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stationary point in $\tilde{O}(1/\epsilon^{7/4})$ iterations, faster than the $\tilde{O}(1/\epsilon^{2})$ iterations required by GD. To the best of our knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first-order stationary point. Our analysis is based on two key ideas: (1) the use of a simple Hamiltonian function, inspired by a continuous-time perspective, which AGD monotonically decreases per step even for nonconvex functions, and (2) a novel framework called improve or localize, which is useful for tracking the long-term behavior of gradient-based optimization algorithms. We believe that these techniques may deepen our understanding of both acceleration algorithms and nonconvex optimization.
Tasks
Published 2017-11-28
URL http://arxiv.org/abs/1711.10456v1
PDF http://arxiv.org/pdf/1711.10456v1.pdf
PWC https://paperswithcode.com/paper/accelerated-gradient-descent-escapes-saddle
Repo
Framework

Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands

Title Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands
Authors Thomas Kollar, Stefanie Tellex, Matthew Walter, Albert Huang, Abraham Bachrach, Sachi Hemachandra, Emma Brunskill, Ashis Banerjee, Deb Roy, Seth Teller, Nicholas Roy
Abstract Many task domains require robots to interpret and act upon natural language commands which are given by people and which refer to the robot’s physical surroundings. Such interpretation is known variously as the symbol grounding problem, grounded semantics and grounded language acquisition. This problem is challenging because people employ diverse vocabulary and grammar, and because robots have substantial uncertainty about the nature and contents of their surroundings, making it difficult to associate the constitutive language elements (principally noun phrases and spatial relations) of the command text to elements of those surroundings. Symbolic models capture linguistic structure but have not scaled successfully to handle the diverse language produced by untrained users. Existing statistical approaches can better handle diversity, but have not to date modeled complex linguistic structure, limiting achievable accuracy. Recent hybrid approaches have addressed limitations in scaling and complexity, but have not effectively associated linguistic and perceptual features. Our framework, called Generalized Grounding Graphs (G^3), addresses these issues by defining a probabilistic graphical model dynamically according to the linguistic parse structure of a natural language command. This approach scales effectively, handles linguistic diversity, and enables the system to associate parts of a command with the specific objects, places, and events in the external world to which they refer. We show that robots can learn word meanings and use those learned meanings to robustly follow natural language commands produced by untrained users. We demonstrate our approach for both mobility commands and mobile manipulation commands involving a variety of semi-autonomous robotic platforms, including a wheelchair, a micro-air vehicle, a forklift, and the Willow Garage PR2.
Tasks Language Acquisition
Published 2017-11-29
URL http://arxiv.org/abs/1712.01097v1
PDF http://arxiv.org/pdf/1712.01097v1.pdf
PWC https://paperswithcode.com/paper/generalized-grounding-graphs-a-probabilistic
Repo
Framework

Meta-Learning via Feature-Label Memory Network

Title Meta-Learning via Feature-Label Memory Network
Authors Dawit Mureja, Hyunsin Park, Chang D. Yoo
Abstract Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researches on “meta-learning” are being actively conducted. Recent work has suggested a Memory Augmented Neural Network (MANN) for meta-learning. MANN is an implementation of a Neural Turing Machine (NTM) with the ability to rapidly assimilate new data in its memory, and use this data to make accurate predictions. In models such as MANN, the input data samples and their appropriate labels from previous step are bound together in the same memory locations. This often leads to memory interference when performing a task as these models have to retrieve a feature of an input from a certain memory location and read only the label information bound to that location. In this paper, we tried to address this issue by presenting a more robust MANN. We revisited the idea of meta-learning and proposed a new memory augmented neural network by explicitly splitting the external memory into feature and label memories. The feature memory is used to store the features of input data samples and the label memory stores their labels. Hence, when predicting the label of a given input, our model uses its feature memory unit as a reference to extract the stored feature of the input, and based on that feature, it retrieves the label information of the input from the label memory unit. In order for the network to function in this framework, a new memory-writingmodule to encode label information into the label memory in accordance with the meta-learning task structure is designed. Here, we demonstrate that our model outperforms MANN by a large margin in supervised one-shot classification tasks using Omniglot and MNIST datasets.
Tasks Meta-Learning, Omniglot
Published 2017-10-19
URL http://arxiv.org/abs/1710.07110v1
PDF http://arxiv.org/pdf/1710.07110v1.pdf
PWC https://paperswithcode.com/paper/meta-learning-via-feature-label-memory
Repo
Framework

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Title Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Authors Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin
Abstract This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the parameters for computing the spatial transformer. On large-scale benchmarks of multi-label image classification (e.g., MS-COCO and PASCAL VOC 07), our approach demonstrates superior performances over other existing state-of-the-arts in both accuracy and efficiency.
Tasks Image Classification
Published 2017-11-08
URL http://arxiv.org/abs/1711.02816v1
PDF http://arxiv.org/pdf/1711.02816v1.pdf
PWC https://paperswithcode.com/paper/multi-label-image-recognition-by-recurrently
Repo
Framework

Neural method for Explicit Mapping of Quasi-curvature Locally Linear Embedding in image retrieval

Title Neural method for Explicit Mapping of Quasi-curvature Locally Linear Embedding in image retrieval
Authors Shenglan Liu, Jun Wu, Lin Feng, Feilong Wang
Abstract This paper proposed a new explicit nonlinear dimensionality reduction using neural networks for image retrieval tasks. We first proposed a Quasi-curvature Locally Linear Embedding (QLLE) for training set. QLLE guarantees the linear criterion in neighborhood of each sample. Then, a neural method (NM) is proposed for out-of-sample problem. Combining QLLE and NM, we provide a explicit nonlinear dimensionality reduction approach for efficient image retrieval. The experimental results in three benchmark datasets illustrate that our method can get better performance than other state-of-the-art out-of-sample methods.
Tasks Dimensionality Reduction, Image Retrieval
Published 2017-03-11
URL http://arxiv.org/abs/1703.03957v1
PDF http://arxiv.org/pdf/1703.03957v1.pdf
PWC https://paperswithcode.com/paper/neural-method-for-explicit-mapping-of-quasi
Repo
Framework

Deep Prior

Title Deep Prior
Authors Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris Oreshkin, Wonchang Chung, David Krueger
Abstract The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds. In this work we investigate the possibility of learning the prior distribution over neural network parameters using such tools. Our resulting variational Bayes algorithm generalizes well to new tasks, even when very few training examples are provided. Furthermore, this learned prior allows the model to extrapolate correctly far from a given task’s training data on a meta-dataset of periodic signals.
Tasks
Published 2017-12-13
URL http://arxiv.org/abs/1712.05016v2
PDF http://arxiv.org/pdf/1712.05016v2.pdf
PWC https://paperswithcode.com/paper/deep-prior
Repo
Framework

Fully Convolutional Neural Networks for Dynamic Object Detection in Grid Maps

Title Fully Convolutional Neural Networks for Dynamic Object Detection in Grid Maps
Authors Florian Piewak, Timo Rehfeld, Michael Weber, J. Marius Zöllner
Abstract Grid maps are widely used in robotics to represent obstacles in the environment and differentiating dynamic objects from static infrastructure is essential for many practical applications. In this work, we present a methods that uses a deep convolutional neural network (CNN) to infer whether grid cells are covering a moving object or not. Compared to tracking approaches, that use e.g. a particle filter to estimate grid cell velocities and then make a decision for individual grid cells based on this estimate, our approach uses the entire grid map as input image for a CNN that inspects a larger area around each cell and thus takes the structural appearance in the grid map into account to make a decision. Compared to our reference method, our concept yields a performance increase from 83.9% to 97.2%. A runtime optimized version of our approach yields similar improvements with an execution time of just 10 milliseconds.
Tasks Object Detection
Published 2017-09-10
URL http://arxiv.org/abs/1709.03139v1
PDF http://arxiv.org/pdf/1709.03139v1.pdf
PWC https://paperswithcode.com/paper/fully-convolutional-neural-networks-for
Repo
Framework

Speaker Recognition with Cough, Laugh and “Wei”

Title Speaker Recognition with Cough, Laugh and “Wei”
Authors Miao Zhang, Yixiang Chen, Lantian Li, Dong Wang
Abstract This paper proposes a speaker recognition (SRE) task with trivial speech events, such as cough and laugh. These trivial events are ubiquitous in conversations and less subjected to intentional change, therefore offering valuable particularities to discover the genuine speaker from disguised speech. However, trivial events are often short and idiocratic in spectral patterns, making SRE extremely difficult. Fortunately, we found a very powerful deep feature learning structure that can extract highly speaker-sensitive features. By employing this tool, we studied the SRE performance on three types of trivial events: cough, laugh and “Wei” (a short Chinese “Hello”). The results show that there is rich speaker information within these trivial events, even for cough that is intuitively less speaker distinguishable. With the deep feature approach, the EER can reach 10%-14% with the three trivial events, despite their extremely short durations (0.2-1.0 seconds).
Tasks Speaker Recognition
Published 2017-06-22
URL http://arxiv.org/abs/1706.07860v1
PDF http://arxiv.org/pdf/1706.07860v1.pdf
PWC https://paperswithcode.com/paper/speaker-recognition-with-cough-laugh-and-wei
Repo
Framework

A Novel Model for Arbitration between Planning and Habitual Control Systems

Title A Novel Model for Arbitration between Planning and Habitual Control Systems
Authors Farzaneh S. Fard, Thomas P. Trappenberg
Abstract It is well established that humans decision making and instrumental control uses multiple systems, some which use habitual action selection and some which require deliberate planning. Deliberate planning systems use predictions of action-outcomes using an internal model of the agent’s environment, while habitual action selection systems learn to automate by repeating previously rewarded actions. Habitual control is computationally efficient but may be inflexible in changing environments. Conversely, deliberate planning may be computationally expensive, but flexible in dynamic environments. This paper proposes a general architecture comprising both control paradigms by introducing an arbitrator that controls which subsystem is used at any time. This system is implemented for a target-reaching task with a simulated two-joint robotic arm that comprises a supervised internal model and deep reinforcement learning. Through permutation of target-reaching conditions, we demonstrate that the proposed is capable of rapidly learning kinematics of the system without a priori knowledge, and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to exclusive deliberate planning with the internal model and exclusive habitual control instances of the model. The results show how such a model can harness the benefits of both systems, using fast decisions in reliable circumstances while optimizing performance in changing environments. In addition, the proposed model learns very fast. Finally, the system which includes internal models is able to reach the target under the visual occlusion, while the pure habitual system is unable to operate sufficiently under such conditions.
Tasks Decision Making
Published 2017-12-06
URL http://arxiv.org/abs/1712.02441v1
PDF http://arxiv.org/pdf/1712.02441v1.pdf
PWC https://paperswithcode.com/paper/a-novel-model-for-arbitration-between
Repo
Framework
comments powered by Disqus