October 19, 2019

2957 words 14 mins read

Paper Group ANR 326

Paper Group ANR 326

From Same Photo: Cheating on Visual Kinship Challenges. Approximate Nearest Neighbor Search in High Dimensions. A Logarithmic Barrier Method For Proximal Policy Optimization. A Deep Learning Approach for Multimodal Deception Detection. Incremental Natural Language Processing: Challenges, Strategies, and Evaluation. ActionXPose: A Novel 2D Multi-vie …

From Same Photo: Cheating on Visual Kinship Challenges

Title From Same Photo: Cheating on Visual Kinship Challenges
Authors Mitchell Dawson, Andrew Zisserman, Christoffer Nellåker
Abstract With the propensity for deep learning models to learn unintended signals from data sets there is always the possibility that the network can `cheat’ in order to solve a task. In the instance of data sets for visual kinship verification, one such unintended signal could be that the faces are cropped from the same photograph, since faces from the same photograph are more likely to be from the same family. In this paper we investigate the influence of this artefactual data inference in published data sets for kinship verification. To this end, we obtain a large dataset, and train a CNN classifier to determine if two faces are from the same photograph or not. Using this classifier alone as a naive classifier of kinship, we demonstrate near state of the art results on five public benchmark data sets for kinship verification - achieving over 90% accuracy on one of them. Thus, we conclude that faces derived from the same photograph are a strong inadvertent signal in all the data sets we examined, and it is likely that the fraction of kinship explained by existing kinship models is small. |
Tasks
Published 2018-09-17
URL http://arxiv.org/abs/1809.06200v2
PDF http://arxiv.org/pdf/1809.06200v2.pdf
PWC https://paperswithcode.com/paper/from-same-photo-cheating-on-visual-kinship
Repo
Framework

Approximate Nearest Neighbor Search in High Dimensions

Title Approximate Nearest Neighbor Search in High Dimensions
Authors Alexandr Andoni, Piotr Indyk, Ilya Razenshteyn
Abstract The nearest neighbor problem is defined as follows: Given a set $P$ of $n$ points in some metric space $(X,D)$, build a data structure that, given any point $q$, returns a point in $P$ that is closest to $q$ (its “nearest neighbor” in $P$). The data structure stores additional information about the set $P$, which is then used to find the nearest neighbor without computing all distances between $q$ and $P$. The problem has a wide range of applications in machine learning, computer vision, databases and other fields. To reduce the time needed to find nearest neighbors and the amount of memory used by the data structure, one can formulate the {\em approximate} nearest neighbor problem, where the the goal is to return any point $p’ \in P$ such that the distance from $q$ to $p'$ is at most $c \cdot \min_{p \in P} D(q,p)$, for some $c \geq 1$. Over the last two decades, many efficient solutions to this problem were developed. In this article we survey these developments, as well as their connections to questions in geometric functional analysis and combinatorial geometry.
Tasks
Published 2018-06-26
URL http://arxiv.org/abs/1806.09823v1
PDF http://arxiv.org/pdf/1806.09823v1.pdf
PWC https://paperswithcode.com/paper/approximate-nearest-neighbor-search-in-high
Repo
Framework

A Logarithmic Barrier Method For Proximal Policy Optimization

Title A Logarithmic Barrier Method For Proximal Policy Optimization
Authors Cheng Zeng, Hongming Zhang
Abstract Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This method, which we call proximal policy optimization with barrier method (PPO-B), keeps almost all advantageous spheres of PPO such as easy implementation and good generalization. Specifically, a new surrogate objective with interior penalty method is proposed to avoid the defect arose from exterior penalty method. Conclusions can be draw that PPO-B is able to outperform PPO in terms of sampling efficiency since PPO-B achieved clearly better performance on Atari and Mujoco environment than PPO.
Tasks
Published 2018-12-16
URL http://arxiv.org/abs/1812.06502v1
PDF http://arxiv.org/pdf/1812.06502v1.pdf
PWC https://paperswithcode.com/paper/a-logarithmic-barrier-method-for-proximal
Repo
Framework

A Deep Learning Approach for Multimodal Deception Detection

Title A Deep Learning Approach for Multimodal Deception Detection
Authors Gangeshwar Krishnamurthy, Navonil Majumder, Soujanya Poria, Erik Cambria
Abstract Automatic deception detection is an important task that has gained momentum in computational linguistics due to its potential applications. In this paper, we propose a simple yet tough to beat multi-modal neural model for deception detection. By combining features from different modalities such as video, audio, and text along with Micro-Expression features, we show that detecting deception in real life videos can be more accurate. Experimental results on a dataset of real-life deception videos show that our model outperforms existing techniques for deception detection with an accuracy of 96.14% and ROC-AUC of 0.9799.
Tasks Deception Detection
Published 2018-03-01
URL http://arxiv.org/abs/1803.00344v1
PDF http://arxiv.org/pdf/1803.00344v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-approach-for-multimodal
Repo
Framework

Incremental Natural Language Processing: Challenges, Strategies, and Evaluation

Title Incremental Natural Language Processing: Challenges, Strategies, and Evaluation
Authors Arne Köhn
Abstract Incrementality is ubiquitous in human-human interaction and beneficial for human-computer interaction. It has been a topic of research in different parts of the NLP community, mostly with focus on the specific topic at hand even though incremental systems have to deal with similar challenges regardless of domain. In this survey, I consolidate and categorize the approaches, identifying similarities and differences in the computation and data, and show trade-offs that have to be considered. A focus lies on evaluating incremental systems because the standard metrics often fail to capture the incremental properties of a system and coming up with a suitable evaluation scheme is non-trivial.
Tasks
Published 2018-05-31
URL http://arxiv.org/abs/1805.12518v2
PDF http://arxiv.org/pdf/1805.12518v2.pdf
PWC https://paperswithcode.com/paper/incremental-natural-language-processing-1
Repo
Framework

ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition

Title ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition
Authors Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, Syed Mohsen Naqvi
Abstract We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in our Intelligent Sensing Lab, to show ActionXPose generalization performance.
Tasks Temporal Action Localization
Published 2018-10-29
URL http://arxiv.org/abs/1810.12126v1
PDF http://arxiv.org/pdf/1810.12126v1.pdf
PWC https://paperswithcode.com/paper/actionxpose-a-novel-2d-multi-view-pose-based
Repo
Framework

Knowledge Representation Learning: A Quantitative Review

Title Knowledge Representation Learning: A Quantitative Review
Authors Yankai Lin, Xu Han, Ruobing Xie, Zhiyuan Liu, Maosong Sun
Abstract Knowledge representation learning (KRL) aims to represent entities and relations in knowledge graph in low-dimensional semantic space, which have been widely used in massive knowledge-driven tasks. In this article, we introduce the reader to the motivations for KRL, and overview existing approaches for KRL. Afterwards, we extensively conduct and quantitative comparison and analysis of several typical KRL methods on three evaluation tasks of knowledge acquisition including knowledge graph completion, triple classification, and relation extraction. We also review the real-world applications of KRL, such as language modeling, question answering, information retrieval, and recommender systems. Finally, we discuss the remaining challenges and outlook the future directions for KRL. The codes and datasets used in the experiments can be found in https://github.com/thunlp/OpenKE.
Tasks Information Retrieval, Knowledge Graph Completion, Language Modelling, Question Answering, Recommendation Systems, Relation Extraction, Representation Learning
Published 2018-12-28
URL http://arxiv.org/abs/1812.10901v1
PDF http://arxiv.org/pdf/1812.10901v1.pdf
PWC https://paperswithcode.com/paper/knowledge-representation-learning-a
Repo
Framework

Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos

Title Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos
Authors Mickey Li, Noyan Songur, Pavel Orlov, Stefan Leutenegger, A Aldo Faisal
Abstract Incorporating the physical environment is essential for a complete understanding of human behavior in unconstrained every-day tasks. This is especially important in ego-centric tasks where obtaining 3 dimensional information is both limiting and challenging with the current 2D video analysis methods proving insufficient. Here we demonstrate a proof-of-concept system which provides real-time 3D mapping and semantic labeling of the local environment from an ego-centric RGB-D video-stream with 3D gaze point estimation from head mounted eye tracking glasses. We augment existing work in Semantic Simultaneous Localization And Mapping (Semantic SLAM) with collected gaze vectors. Our system can then find and track objects both inside and outside the user field-of-view in 3D from multiple perspectives with reasonable accuracy. We validate our concept by producing a semantic map from images of the NYUv2 dataset while simultaneously estimating gaze position and gaze classes from recorded gaze data of the dataset images.
Tasks 3D Scene Reconstruction, Eye Tracking, Simultaneous Localization and Mapping
Published 2018-07-27
URL http://arxiv.org/abs/1807.10561v1
PDF http://arxiv.org/pdf/1807.10561v1.pdf
PWC https://paperswithcode.com/paper/towards-an-embodied-semantic-fovea-semantic
Repo
Framework

Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks

Title Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks
Authors Marian K. Y. Boktor, Ahmad Al-Kabbany, Radwa Khalil, Said El-Khamy
Abstract This paper addresses the problem of real-time action recognition in trimmed videos, for which deep neural networks have defined the state-of-the-art performance in the recent literature. For attaining higher recognition accuracies with efficient computations, researchers have addressed the various aspects of limitations in the recognition pipeline. This includes network architecture, the number of input streams (where additional streams augment the color information), the cost function to be optimized, in addition to others. The literature has always aimed, though, at assigning the adopted network (or networks, in case of multiple streams) the task of recognizing the whole number of action classes in the dataset at hand. We propose to train multiple specialized module networks instead. Each module is trained to recognize a subset of the action classes. Towards this goal, we present a dissimilarity-based optimized procedure for distributing the action classes over the modules, which can be trained simultaneously offline. On two standard datasets–UCF-101 and HMDB-51–the proposed method demonstrates a comparable performance, that is superior in some aspects, to the state-of-the-art, and that satisfies the real-time constraint. We achieved 72.5% accuracy on the challenging HMDB-51 dataset. By assigning fewer and unalike classes to each module network, this research paves the way to benefit from light-weight architectures without compromising recognition accuracy.
Tasks Temporal Action Localization
Published 2018-10-27
URL http://arxiv.org/abs/1810.11731v1
PDF http://arxiv.org/pdf/1810.11731v1.pdf
PWC https://paperswithcode.com/paper/real-time-action-recognition-with
Repo
Framework

$A^2$-Nets: Double Attention Networks

Title $A^2$-Nets: Double Attention Networks
Authors Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
Abstract Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently. The component is designed with a double attention mechanism in two steps, where the first step gathers features from the entire space into a compact set through second-order attention pooling and the second step adaptively selects and distributes features to each location via another attention. The proposed double attention block is easy to adopt and can be plugged into existing deep neural networks conveniently. We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. On the image recognition task, a ResNet-50 equipped with our double attention blocks outperforms a much larger ResNet-152 architecture on ImageNet-1k dataset with over 40% less the number of parameters and less FLOPs. On the action recognition task, our proposed model achieves the state-of-the-art results on the Kinetics and UCF-101 datasets with significantly higher efficiency than recent works.
Tasks Temporal Action Localization, Video Recognition
Published 2018-10-27
URL http://arxiv.org/abs/1810.11579v1
PDF http://arxiv.org/pdf/1810.11579v1.pdf
PWC https://paperswithcode.com/paper/a2-nets-double-attention-networks
Repo
Framework

Cross-Modal and Hierarchical Modeling of Video and Text

Title Cross-Modal and Hierarchical Modeling of Video and Text
Authors Bowen Zhang, Hexiang Hu, Fei Sha
Abstract Visual data and text data are composed of information at multiple granularities. A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action. Similarly, a paragraph may contain sentences with different topics, which collectively conveys a coherent message or story. In this paper, we investigate the modeling techniques for such hierarchical sequential data where there are correspondences across multiple modalities. Specifically, we introduce hierarchical sequence embedding (HSE), a generic model for embedding sequential data of different modalities into hierarchically semantic spaces, with either explicit or implicit correspondence information. We perform empirical studies on large-scale video and paragraph retrieval datasets and demonstrated superior performance by the proposed methods. Furthermore, we examine the effectiveness of our learned embeddings when applied to downstream tasks. We show its utility in zero-shot action recognition and video captioning.
Tasks Temporal Action Localization, Video Captioning
Published 2018-10-16
URL http://arxiv.org/abs/1810.07212v1
PDF http://arxiv.org/pdf/1810.07212v1.pdf
PWC https://paperswithcode.com/paper/cross-modal-and-hierarchical-modeling-of
Repo
Framework

Lie Transform–based Neural Networks for Dynamics Simulation and Learning

Title Lie Transform–based Neural Networks for Dynamics Simulation and Learning
Authors Andrei Ivanov, Alena Sholokhova, Sergei Andrianov, Roman Konoplev-Esgenburg
Abstract In the article, we discuss the architecture of the polynomial neural network that corresponds to the matrix representation of Lie transform. The matrix form of Lie transform is an approximation of the general solution of the nonlinear system of ordinary differential equations. The proposed architecture can be trained with small data sets, extrapolate predictions outside the training data, and provide a possibility for interpretation. We provide a theoretical explanation of the proposed architecture, as well as demonstrate it in several applications. We present the results of modeling and identification for both simple and well-known dynamical systems, and more complicated examples from price dynamics, chemistry, and accelerator physics. From a practical point of view, we describe the training of a Lie transform–based neural network with a small data set containing only 10 data points. We also demonstrate an interpretation of the fitted neural network by converting it to a system of differential equations.
Tasks Time Series
Published 2018-02-05
URL https://arxiv.org/abs/1802.01353v2
PDF https://arxiv.org/pdf/1802.01353v2.pdf
PWC https://paperswithcode.com/paper/lie-transform-based-polynomial-neural
Repo
Framework

Como funciona o Deep Learning

Title Como funciona o Deep Learning
Authors Moacir Antonelli Ponti, Gabriel B. Paranhos da Costa
Abstract Deep Learning methods are currently the state-of-the-art in many problems which can be tackled via machine learning, in particular classification problems. However there is still lack of understanding on how those methods work, why they work and what are the limitations involved in using them. In this chapter we will describe in detail the transition from shallow to deep networks, include examples of code on how to implement them, as well as the main issues one faces when training a deep network. Afterwards, we introduce some theoretical background behind the use of deep models, and discuss their limitations.
Tasks
Published 2018-06-20
URL http://arxiv.org/abs/1806.07908v1
PDF http://arxiv.org/pdf/1806.07908v1.pdf
PWC https://paperswithcode.com/paper/como-funciona-o-deep-learning
Repo
Framework

Knowledge-based end-to-end memory networks

Title Knowledge-based end-to-end memory networks
Authors Jatin Ganhotra, Lazaros Polymenakos
Abstract End-to-end dialog systems have become very popular because they hold the promise of learning directly from human to human dialog interaction. Retrieval and Generative methods have been explored in this area with mixed results. A key element that is missing so far, is the incorporation of a-priori knowledge about the task at hand. This knowledge may exist in the form of structured or unstructured information. As a first step towards this direction, we present a novel approach, Knowledge based end-to-end memory networks (KB-memN2N), which allows special handling of named entities for goal-oriented dialog tasks. We present results on two datasets, DSTC6 challenge dataset and dialog bAbI tasks.
Tasks Goal-Oriented Dialog
Published 2018-04-23
URL http://arxiv.org/abs/1804.08204v1
PDF http://arxiv.org/pdf/1804.08204v1.pdf
PWC https://paperswithcode.com/paper/knowledge-based-end-to-end-memory-networks
Repo
Framework

An Overview on Application of Machine Learning Techniques in Optical Networks

Title An Overview on Application of Machine Learning Techniques in Optical Networks
Authors Francesco Musumeci, Cristina Rottondi, Avishek Nag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore
Abstract Today’s telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users’ behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions.
Tasks
Published 2018-03-21
URL http://arxiv.org/abs/1803.07976v4
PDF http://arxiv.org/pdf/1803.07976v4.pdf
PWC https://paperswithcode.com/paper/an-overview-on-application-of-machine
Repo
Framework
comments powered by Disqus