July 27, 2019

3346 words 16 mins read

Paper Group ANR 700

Paper Group ANR 700

End-to-End Learning of Semantic Grasping. Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions. Statistics on the (compact) Stiefel manifold: Theory and Applications. A Logic for Global and Local Announcements. Detecting animals in African Savanna with UAVs and the crowds. The FastMap Algorithm for Shortest Path Computations. …

End-to-End Learning of Semantic Grasping

Title End-to-End Learning of Semantic Grasping
Authors Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine
Abstract We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.
Tasks Object Detection, Robotic Grasping, Visual Reasoning
Published 2017-07-06
URL http://arxiv.org/abs/1707.01932v3
PDF http://arxiv.org/pdf/1707.01932v3.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-of-semantic-grasping
Repo
Framework

Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions

Title Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions
Authors Behnaz Nojavanasghari, Charles. E. Hughes, Tadas Baltrusaitis, Louis-philippe Morency
Abstract A person’s face discloses important information about their affective state. Although there has been extensive research on recognition of facial expressions, the performance of existing approaches is challenged by facial occlusions. Facial occlusions are often treated as noise and discarded in recognition of affective states. However, hand over face occlusions can provide additional information for recognition of some affective states such as curiosity, frustration and boredom. One of the reasons that this problem has not gained attention is the lack of naturalistic occluded faces that contain hand over face occlusions as well as other types of occlusions. Traditional approaches for obtaining affective data are time demanding and expensive, which limits researchers in affective computing to work on small datasets. This limitation affects the generalizability of models and deprives researchers from taking advantage of recent advances in deep learning that have shown great success in many fields but require large volumes of data. In this paper, we first introduce a novel framework for synthesizing naturalistic facial occlusions from an initial dataset of non-occluded faces and separate images of hands, reducing the costly process of data collection and annotation. We then propose a model for facial occlusion type recognition to differentiate between hand over face occlusions and other types of occlusions such as scarves, hair, glasses and objects. Finally, we present a model to localize hand over face occlusions and identify the occluded regions of the face.
Tasks
Published 2017-08-01
URL http://arxiv.org/abs/1708.00370v2
PDF http://arxiv.org/pdf/1708.00370v2.pdf
PWC https://paperswithcode.com/paper/hand2face-automatic-synthesis-and-recognition
Repo
Framework

Statistics on the (compact) Stiefel manifold: Theory and Applications

Title Statistics on the (compact) Stiefel manifold: Theory and Applications
Authors Rudrasis Chakraborty, Baba Vemuri
Abstract A Stiefel manifold of the compact type is often encountered in many fields of Engineering including, signal and image processing, machine learning, numerical optimization and others. The Stiefel manifold is a Riemannian homogeneous space but not a symmetric space. In previous work, researchers have defined probability distributions on symmetric spaces and performed statistical analysis of data residing in these spaces. In this paper, we present original work involving definition of Gaussian distributions on a homogeneous space and show that the maximum-likelihood estimate of the location parameter of a Gaussian distribution on the homogeneous space yields the Fr'echet mean (FM) of the samples drawn from this distribution. Further, we present an algorithm to sample from the Gaussian distribution on the Stiefel manifold and recursively compute the FM of these samples. We also prove the weak consistency of this recursive FM estimator. Several synthetic and real data experiments are then presented, demonstrating the superior computational performance of this estimator over the gradient descent based non-recursive counter part as well as the stochastic gradient descent based method prevalent in literature.
Tasks
Published 2017-07-31
URL http://arxiv.org/abs/1708.00045v1
PDF http://arxiv.org/pdf/1708.00045v1.pdf
PWC https://paperswithcode.com/paper/statistics-on-the-compact-stiefel-manifold
Repo
Framework

A Logic for Global and Local Announcements

Title A Logic for Global and Local Announcements
Authors Francesco Belardinelli, Hans van Ditmarsch, Wiebe van der Hoek
Abstract In this paper we introduce {\em global and local announcement logic} (GLAL), a dynamic epistemic logic with two distinct announcement operators – $[\phi]^+_A$ and $[\phi]^-A$ indexed to a subset $A$ of the set $Ag$ of all agents – for global and local announcements respectively. The boundary case $[\phi]^+{Ag}$ corresponds to the public announcement of $\phi$, as known from the literature. Unlike standard public announcements, which are {\em model transformers}, the global and local announcements are {\em pointed model transformers}. In particular, the update induced by the announcement may be different in different states of the model. Therefore, the resulting computations are trees of models, rather than the typical sequences. A consequence of our semantics is that modally bisimilar states may be distinguished in our logic. Then, we provide a stronger notion of bisimilarity and we show that it preserves modal equivalence in GLAL. Additionally, we show that GLAL is strictly more expressive than public announcement logic with common knowledge. We prove a wide range of validities for GLAL involving the interaction between dynamics and knowledge, and show that the satisfiability problem for GLAL is decidable. We illustrate the formal machinery by means of detailed epistemic scenarios.
Tasks
Published 2017-07-27
URL http://arxiv.org/abs/1707.08735v1
PDF http://arxiv.org/pdf/1707.08735v1.pdf
PWC https://paperswithcode.com/paper/a-logic-for-global-and-local-announcements
Repo
Framework

Detecting animals in African Savanna with UAVs and the crowds

Title Detecting animals in African Savanna with UAVs and the crowds
Authors Nicolas Rey, Michele Volpi, Stéphane Joost, Devis Tuia
Abstract Unmanned aerial vehicles (UAVs) offer new opportunities for wildlife monitoring, with several advantages over traditional field-based methods. They have readily been used to count birds, marine mammals and large herbivores in different environments, tasks which are routinely performed through manual counting in large collections of images. In this paper, we propose a semi-automatic system able to detect large mammals in semi-arid Savanna. It relies on an animal-detection system based on machine learning, trained with crowd-sourced annotations provided by volunteers who manually interpreted sub-decimeter resolution color images. The system achieves a high recall rate and a human operator can then eliminate false detections with limited effort. Our system provides good perspectives for the development of data-driven management practices in wildlife conservation. It shows that the detection of large mammals in semi-arid Savanna can be approached by processing data provided by standard RGB cameras mounted on affordable fixed wings UAVs.
Tasks
Published 2017-09-06
URL http://arxiv.org/abs/1709.01722v1
PDF http://arxiv.org/pdf/1709.01722v1.pdf
PWC https://paperswithcode.com/paper/detecting-animals-in-african-savanna-with
Repo
Framework

The FastMap Algorithm for Shortest Path Computations

Title The FastMap Algorithm for Shortest Path Computations
Authors Liron Cohen, Tansel Uras, Shiva Jahangiri, Aliyah Arunasalam, Sven Koenig, T. K. Satish Kumar
Abstract We present a new preprocessing algorithm for embedding the nodes of a given edge-weighted undirected graph into a Euclidean space. The Euclidean distance between any two nodes in this space approximates the length of the shortest path between them in the given graph. Later, at runtime, a shortest path between any two nodes can be computed with A* search using the Euclidean distances as heuristic. Our preprocessing algorithm, called FastMap, is inspired by the data mining algorithm of the same name and runs in near-linear time. Hence, FastMap is orders of magnitude faster than competing approaches that produce a Euclidean embedding using Semidefinite Programming. FastMap also produces admissible and consistent heuristics and therefore guarantees the generation of shortest paths. Moreover, FastMap applies to general undirected graphs for which many traditional heuristics, such as the Manhattan Distance heuristic, are not well defined. Empirically, we demonstrate that A* search using the FastMap heuristic is competitive with A* search using other state-of-the-art heuristics, such as the Differential heuristic.
Tasks
Published 2017-06-08
URL http://arxiv.org/abs/1706.02792v3
PDF http://arxiv.org/pdf/1706.02792v3.pdf
PWC https://paperswithcode.com/paper/the-fastmap-algorithm-for-shortest-path
Repo
Framework

Estimate exponential memory decay in Hidden Markov Model and its applications

Title Estimate exponential memory decay in Hidden Markov Model and its applications
Authors Felix X. -F. Ye, Yi-an Ma, Hong Qian
Abstract Inference in hidden Markov model has been challenging in terms of scalability due to dependencies in the observation data. In this paper, we utilize the inherent memory decay in hidden Markov models, such that the forward and backward probabilities can be carried out with subsequences, enabling efficient inference over long sequences of observations. We formulate this forward filtering process in the setting of the random dynamical system and there exist Lyapunov exponents in the i.i.d random matrices production. And the rate of the memory decay is known as $\lambda_2-\lambda_1$, the gap of the top two Lyapunov exponents almost surely. An efficient and accurate algorithm is proposed to numerically estimate the gap after the soft-max parametrization. The length of subsequences $B$ given the controlled error $\epsilon$ is $B=\log(\epsilon)/(\lambda_2-\lambda_1)$. We theoretically prove the validity of the algorithm and demonstrate the effectiveness with numerical examples. The method developed here can be applied to widely used algorithms, such as mini-batch stochastic gradient method. Moreover, the continuity of Lyapunov spectrum ensures the estimated $B$ could be reused for the nearby parameter during the inference.
Tasks
Published 2017-10-17
URL http://arxiv.org/abs/1710.06078v1
PDF http://arxiv.org/pdf/1710.06078v1.pdf
PWC https://paperswithcode.com/paper/estimate-exponential-memory-decay-in-hidden
Repo
Framework

Gradient Descent using Duality Structures

Title Gradient Descent using Duality Structures
Authors Thomas Flynn
Abstract Gradient descent is commonly used to solve optimization problems arising in machine learning, such as training neural networks. Although it seems to be effective for many different neural network training problems, it is unclear if the effectiveness of gradient descent can be explained using existing performance guarantees for the algorithm. We argue that existing analyses of gradient descent rely on assumptions that are too strong to be applicable in the case of multi-layer neural networks. To address this, we propose an algorithm, duality structure gradient descent (DSGD), that is amenable to a non-asymptotic performance analysis, under mild assumptions on the training set and network architecture. The algorithm can be viewed as a form of layer-wise coordinate descent, where at each iteration the algorithm chooses one layer of the network to update. The decision of what layer to update is done in a greedy fashion, based on a rigorous lower bound of the function decrease for each possible choice of layer. In the analysis, we bound the time required to reach approximate stationary points, in both the deterministic and stochastic settings. The convergence is measured in terms of a Finsler geometry that is derived from the network architecture and designed to confirm a Lipschitz-like property on the gradient of the training objective function. Numerical experiments in both the full batch and mini-batch settings suggest that the algorithm is a promising step towards methods for training neural networks that are both rigorous and efficient.
Tasks
Published 2017-08-01
URL http://arxiv.org/abs/1708.00523v5
PDF http://arxiv.org/pdf/1708.00523v5.pdf
PWC https://paperswithcode.com/paper/gradient-descent-using-duality-structures
Repo
Framework

Learning to Paraphrase for Question Answering

Title Learning to Paraphrase for Question Answering
Authors Li Dong, Jonathan Mallinson, Siva Reddy, Mirella Lapata
Abstract Question answering (QA) systems are sensitive to the many different ways natural language expresses the same information need. In this paper we turn to paraphrases as a means of capturing this knowledge and present a general framework which learns felicitous paraphrases for various QA tasks. Our method is trained end-to-end using question-answer pairs as a supervision signal. A question and its paraphrases serve as input to a neural scoring model which assigns higher weights to linguistic expressions most likely to yield correct answers. We evaluate our approach on QA over Freebase and answer sentence selection. Experimental results on three datasets show that our framework consistently improves performance, achieving competitive results despite the use of simple QA models.
Tasks Question Answering
Published 2017-08-20
URL http://arxiv.org/abs/1708.06022v1
PDF http://arxiv.org/pdf/1708.06022v1.pdf
PWC https://paperswithcode.com/paper/learning-to-paraphrase-for-question-answering
Repo
Framework

Optimal statistical decision for Gaussian graphical model selection

Title Optimal statistical decision for Gaussian graphical model selection
Authors Valery A. Kalyagin, Alexander P. Koldanov, Petr A. Koldanov, Panos M. Pardalos
Abstract Gaussian graphical model is a graphical representation of the dependence structure for a Gaussian random vector. It is recognized as a powerful tool in different applied fields such as bioinformatics, error-control codes, speech language, information retrieval and others. Gaussian graphical model selection is a statistical problem to identify the Gaussian graphical model from a sample of a given size. Different approaches for Gaussian graphical model selection are suggested in the literature. One of them is based on considering the family of individual conditional independence tests. The application of this approach leads to the construction of a variety of multiple testing statistical procedures for Gaussian graphical model selection. An important characteristic of these procedures is its error rate for a given sample size. In existing literature great attention is paid to the control of error rates for incorrect edge inclusion (Type I error). However, in graphical model selection it is also important to take into account error rates for incorrect edge exclusion (Type II error). To deal with this issue we consider the graphical model selection problem in the framework of the multiple decision theory. The quality of statistical procedures is measured by a risk function with additive losses. Additive losses allow both types of errors to be taken into account. We construct the tests of a Neyman structure for individual hypotheses and combine them to obtain a multiple decision statistical procedure. We show that the obtained procedure is optimal in the sense that it minimizes the linear combination of expected numbers of Type I and Type II errors in the class of unbiased multiple decision procedures.
Tasks Information Retrieval, Model Selection
Published 2017-01-09
URL http://arxiv.org/abs/1701.02071v1
PDF http://arxiv.org/pdf/1701.02071v1.pdf
PWC https://paperswithcode.com/paper/optimal-statistical-decision-for-gaussian
Repo
Framework

Customizing First Person Image Through Desired Actions

Title Customizing First Person Image Through Desired Actions
Authors Shan Su, Jianbo Shi, Hyun Soo Park
Abstract This paper studies a problem of inverse visual path planning: creating a visual scene from a first person action. Our conjecture is that the spatial arrangement of a first person visual scene is deployed to afford an action, and therefore, the action can be inversely used to synthesize a new scene such that the action is feasible. As a proof-of-concept, we focus on linking visual experiences induced by walking. A key innovation of this paper is a concept of ActionTunnel—a 3D virtual tunnel along the future trajectory encoding what the wearer will visually experience as moving into the scene. This connects two distinctive first person images through similar walking paths. Our method takes a first person image with a user defined future trajectory and outputs a new image that can afford the future motion. The image is created by combining present and future ActionTunnels in 3D where the missing pixels in adjoining area are computed by a generative adversarial network. Our work can provide a travel across different first person experiences in diverse real world scenes.
Tasks
Published 2017-04-01
URL http://arxiv.org/abs/1704.00098v1
PDF http://arxiv.org/pdf/1704.00098v1.pdf
PWC https://paperswithcode.com/paper/customizing-first-person-image-through
Repo
Framework

Viewpoint Invariant Action Recognition using RGB-D Videos

Title Viewpoint Invariant Action Recognition using RGB-D Videos
Authors Jian Liu, Naveed Akhtar, Ajmal Mian
Abstract In video-based action recognition, viewpoint variations often pose major challenges because the same actions can appear different from different views. We use the complementary RGB and Depth information from the RGB-D cameras to address this problem. The proposed technique capitalizes on the spatio-temporal information available in the two data streams to the extract action features that are largely insensitive to the viewpoint variations. We use the RGB data to compute dense trajectories that are translated to viewpoint insensitive deep features under a non-linear knowledge transfer model. Similarly, the Depth stream is used to extract CNN-based view invariant features on which Fourier Temporal Pyramid is computed to incorporate the temporal information. The heterogeneous features from the two streams are combined and used as a dictionary to predict the label of the test samples. To that end, we propose a sparse-dense collaborative representation classification scheme that strikes a balance between the discriminative abilities of the dense and the sparse representations of the samples over the extracted heterogeneous dictionary.
Tasks Temporal Action Localization, Transfer Learning
Published 2017-09-15
URL http://arxiv.org/abs/1709.05087v2
PDF http://arxiv.org/pdf/1709.05087v2.pdf
PWC https://paperswithcode.com/paper/viewpoint-invariant-action-recognition-using
Repo
Framework

Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

Title Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio
Authors Ahmad AbdulKader, Kareem Nassar, Mohamed Mahmoud, Daniel Galvez, Chetan Patil
Abstract We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments — a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task’s class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75
Tasks Keyword Spotting, Multiple Instance Learning
Published 2017-11-21
URL http://arxiv.org/abs/1711.08058v1
PDF http://arxiv.org/pdf/1711.08058v1.pdf
PWC https://paperswithcode.com/paper/multiple-instance-cascaded-classification-for
Repo
Framework

Computer Aided Detection of Anemia-like Pallor

Title Computer Aided Detection of Anemia-like Pallor
Authors Sohini Roychowdhury, Donny Sun, Matthew Bihis, Johnny Ren, Paul Hage, Humairat H. Rahman
Abstract Paleness or pallor is a manifestation of blood loss or low hemoglobin concentrations in the human blood that can be caused by pathologies such as anemia. This work presents the first automated screening system that utilizes pallor site images, segments, and extracts color and intensity-based features for multi-class classification of patients with high pallor due to anemia-like pathologies, normal patients and patients with other abnormalities. This work analyzes the pallor sites of conjunctiva and tongue for anemia screening purposes. First, for the eye pallor site images, the sclera and conjunctiva regions are automatically segmented for regions of interest. Similarly, for the tongue pallor site images, the inner and outer tongue regions are segmented. Then, color-plane based feature extraction is performed followed by machine learning algorithms for feature reduction and image level classification for anemia. In this work, a suite of classification algorithms image-level classifications for normal (class 0), pallor (class 1) and other abnormalities (class 2). The proposed method achieves 86% accuracy, 85% precision and 67% recall in eye pallor site images and 98.2% accuracy and precision with 100% recall in tongue pallor site images for classification of images with pallor. The proposed pallor screening system can be further fine-tuned to detect the severity of anemia-like pathologies using controlled set of local images that can then be used for future benchmarking purposes.
Tasks
Published 2017-03-17
URL http://arxiv.org/abs/1703.05913v1
PDF http://arxiv.org/pdf/1703.05913v1.pdf
PWC https://paperswithcode.com/paper/computer-aided-detection-of-anemia-like
Repo
Framework

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Title Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
Authors Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
Abstract Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) — a new architecture that incorporates copying into the Convolutional Neural Networks (CNN) plus Recurrent Neural Networks (RNN) image captioning framework, for describing novel objects in captions. Specifically, freely available object recognition datasets are leveraged to develop classifiers for novel objects. Our LSTM-C then nicely integrates the standard word-by-word sentence generation by a decoder RNN with copying mechanism which may instead select words from novel objects at proper places in the output sentence. Extensive experiments are conducted on both MSCOCO image captioning and ImageNet datasets, demonstrating the ability of our proposed LSTM-C architecture to describe novel objects. Furthermore, superior results are reported when compared to state-of-the-art deep models.
Tasks Image Captioning, Object Recognition
Published 2017-08-17
URL http://arxiv.org/abs/1708.05271v1
PDF http://arxiv.org/pdf/1708.05271v1.pdf
PWC https://paperswithcode.com/paper/incorporating-copying-mechanism-in-image
Repo
Framework
comments powered by Disqus