October 19, 2019

2957 words 14 mins read

Paper Group ANR 326

From Same Photo: Cheating on Visual Kinship Challenges. Approximate Nearest Neighbor Search in High Dimensions. A Logarithmic Barrier Method For Proximal Policy Optimization. A Deep Learning Approach for Multimodal Deception Detection. Incremental Natural Language Processing: Challenges, Strategies, and Evaluation. ActionXPose: A Novel 2D Multi-vie …

From Same Photo: Cheating on Visual Kinship Challenges


Title	From Same Photo: Cheating on Visual Kinship Challenges
Authors	Mitchell Dawson, Andrew Zisserman, Christoffer Nellåker
Abstract	With the propensity for deep learning models to learn unintended signals from data sets there is always the possibility that the network can `cheat’ in order to solve a task. In the instance of data sets for visual kinship verification, one such unintended signal could be that the faces are cropped from the same photograph, since faces from the same photograph are more likely to be from the same family. In this paper we investigate the influence of this artefactual data inference in published data sets for kinship verification. To this end, we obtain a large dataset, and train a CNN classifier to determine if two faces are from the same photograph or not. Using this classifier alone as a naive classifier of kinship, we demonstrate near state of the art results on five public benchmark data sets for kinship verification - achieving over 90% accuracy on one of them. Thus, we conclude that faces derived from the same photograph are a strong inadvertent signal in all the data sets we examined, and it is likely that the fraction of kinship explained by existing kinship models is small. \|
Tasks
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06200v2
PDF	http://arxiv.org/pdf/1809.06200v2.pdf
PWC	https://paperswithcode.com/paper/from-same-photo-cheating-on-visual-kinship
Repo
Framework

Approximate Nearest Neighbor Search in High Dimensions


Title	Approximate Nearest Neighbor Search in High Dimensions
Authors	Alexandr Andoni, Piotr Indyk, Ilya Razenshteyn
Abstract	The nearest neighbor problem is defined as follows: Given a set $P$ of $n$ points in some metric space $(X,D)$, build a data structure that, given any point $q$, returns a point in $P$ that is closest to $q$ (its “nearest neighbor” in $P$). The data structure stores additional information about the set $P$, which is then used to find the nearest neighbor without computing all distances between $q$ and $P$. The problem has a wide range of applications in machine learning, computer vision, databases and other fields. To reduce the time needed to find nearest neighbors and the amount of memory used by the data structure, one can formulate the {\em approximate} nearest neighbor problem, where the the goal is to return any point $p’ \in P$ such that the distance from $q$ to $p'$ is at most $c \cdot \min_{p \in P} D(q,p)$, for some $c \geq 1$. Over the last two decades, many efficient solutions to this problem were developed. In this article we survey these developments, as well as their connections to questions in geometric functional analysis and combinatorial geometry.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09823v1
PDF	http://arxiv.org/pdf/1806.09823v1.pdf
PWC	https://paperswithcode.com/paper/approximate-nearest-neighbor-search-in-high
Repo
Framework

A Logarithmic Barrier Method For Proximal Policy Optimization


Title	A Logarithmic Barrier Method For Proximal Policy Optimization
Authors	Cheng Zeng, Hongming Zhang
Abstract	Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This method, which we call proximal policy optimization with barrier method (PPO-B), keeps almost all advantageous spheres of PPO such as easy implementation and good generalization. Specifically, a new surrogate objective with interior penalty method is proposed to avoid the defect arose from exterior penalty method. Conclusions can be draw that PPO-B is able to outperform PPO in terms of sampling efficiency since PPO-B achieved clearly better performance on Atari and Mujoco environment than PPO.
Tasks
Published	2018-12-16
URL	http://arxiv.org/abs/1812.06502v1
PDF	http://arxiv.org/pdf/1812.06502v1.pdf
PWC	https://paperswithcode.com/paper/a-logarithmic-barrier-method-for-proximal
Repo
Framework

A Deep Learning Approach for Multimodal Deception Detection


Title	A Deep Learning Approach for Multimodal Deception Detection
Authors	Gangeshwar Krishnamurthy, Navonil Majumder, Soujanya Poria, Erik Cambria
Abstract	Automatic deception detection is an important task that has gained momentum in computational linguistics due to its potential applications. In this paper, we propose a simple yet tough to beat multi-modal neural model for deception detection. By combining features from different modalities such as video, audio, and text along with Micro-Expression features, we show that detecting deception in real life videos can be more accurate. Experimental results on a dataset of real-life deception videos show that our model outperforms existing techniques for deception detection with an accuracy of 96.14% and ROC-AUC of 0.9799.
Tasks	Deception Detection
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00344v1
PDF	http://arxiv.org/pdf/1803.00344v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-for-multimodal
Repo
Framework

Incremental Natural Language Processing: Challenges, Strategies, and Evaluation


Title	Incremental Natural Language Processing: Challenges, Strategies, and Evaluation
Authors	Arne Köhn
Abstract	Incrementality is ubiquitous in human-human interaction and beneficial for human-computer interaction. It has been a topic of research in different parts of the NLP community, mostly with focus on the specific topic at hand even though incremental systems have to deal with similar challenges regardless of domain. In this survey, I consolidate and categorize the approaches, identifying similarities and differences in the computation and data, and show trade-offs that have to be considered. A focus lies on evaluating incremental systems because the standard metrics often fail to capture the incremental properties of a system and coming up with a suitable evaluation scheme is non-trivial.
Tasks
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12518v2
PDF	http://arxiv.org/pdf/1805.12518v2.pdf
PWC	https://paperswithcode.com/paper/incremental-natural-language-processing-1
Repo
Framework

ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition


Title	ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition
Authors	Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, Syed Mohsen Naqvi
Abstract	We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in our Intelligent Sensing Lab, to show ActionXPose generalization performance.
Tasks	Temporal Action Localization
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12126v1
PDF	http://arxiv.org/pdf/1810.12126v1.pdf
PWC	https://paperswithcode.com/paper/actionxpose-a-novel-2d-multi-view-pose-based
Repo
Framework

Knowledge Representation Learning: A Quantitative Review


Title	Knowledge Representation Learning: A Quantitative Review
Authors	Yankai Lin, Xu Han, Ruobing Xie, Zhiyuan Liu, Maosong Sun
Abstract	Knowledge representation learning (KRL) aims to represent entities and relations in knowledge graph in low-dimensional semantic space, which have been widely used in massive knowledge-driven tasks. In this article, we introduce the reader to the motivations for KRL, and overview existing approaches for KRL. Afterwards, we extensively conduct and quantitative comparison and analysis of several typical KRL methods on three evaluation tasks of knowledge acquisition including knowledge graph completion, triple classification, and relation extraction. We also review the real-world applications of KRL, such as language modeling, question answering, information retrieval, and recommender systems. Finally, we discuss the remaining challenges and outlook the future directions for KRL. The codes and datasets used in the experiments can be found in https://github.com/thunlp/OpenKE.
Tasks	Information Retrieval, Knowledge Graph Completion, Language Modelling, Question Answering, Recommendation Systems, Relation Extraction, Representation Learning
Published	2018-12-28
URL	http://arxiv.org/abs/1812.10901v1
PDF	http://arxiv.org/pdf/1812.10901v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-representation-learning-a
Repo
Framework

Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos


Title	Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos
Authors	Mickey Li, Noyan Songur, Pavel Orlov, Stefan Leutenegger, A Aldo Faisal
Abstract	Incorporating the physical environment is essential for a complete understanding of human behavior in unconstrained every-day tasks. This is especially important in ego-centric tasks where obtaining 3 dimensional information is both limiting and challenging with the current 2D video analysis methods proving insufficient. Here we demonstrate a proof-of-concept system which provides real-time 3D mapping and semantic labeling of the local environment from an ego-centric RGB-D video-stream with 3D gaze point estimation from head mounted eye tracking glasses. We augment existing work in Semantic Simultaneous Localization And Mapping (Semantic SLAM) with collected gaze vectors. Our system can then find and track objects both inside and outside the user field-of-view in 3D from multiple perspectives with reasonable accuracy. We validate our concept by producing a semantic map from images of the NYUv2 dataset while simultaneously estimating gaze position and gaze classes from recorded gaze data of the dataset images.
Tasks	3D Scene Reconstruction, Eye Tracking, Simultaneous Localization and Mapping
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10561v1
PDF	http://arxiv.org/pdf/1807.10561v1.pdf
PWC	https://paperswithcode.com/paper/towards-an-embodied-semantic-fovea-semantic
Repo
Framework

Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks


Title	Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks
Authors	Marian K. Y. Boktor, Ahmad Al-Kabbany, Radwa Khalil, Said El-Khamy
Abstract	This paper addresses the problem of real-time action recognition in trimmed videos, for which deep neural networks have defined the state-of-the-art performance in the recent literature. For attaining higher recognition accuracies with efficient computations, researchers have addressed the various aspects of limitations in the recognition pipeline. This includes network architecture, the number of input streams (where additional streams augment the color information), the cost function to be optimized, in addition to others. The literature has always aimed, though, at assigning the adopted network (or networks, in case of multiple streams) the task of recognizing the whole number of action classes in the dataset at hand. We propose to train multiple specialized module networks instead. Each module is trained to recognize a subset of the action classes. Towards this goal, we present a dissimilarity-based optimized procedure for distributing the action classes over the modules, which can be trained simultaneously offline. On two standard datasets–UCF-101 and HMDB-51–the proposed method demonstrates a comparable performance, that is superior in some aspects, to the state-of-the-art, and that satisfies the real-time constraint. We achieved 72.5% accuracy on the challenging HMDB-51 dataset. By assigning fewer and unalike classes to each module network, this research paves the way to benefit from light-weight architectures without compromising recognition accuracy.
Tasks	Temporal Action Localization
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11731v1
PDF	http://arxiv.org/pdf/1810.11731v1.pdf
PWC	https://paperswithcode.com/paper/real-time-action-recognition-with
Repo
Framework

$A^2$-Nets: Double Attention Networks


Title	$A^2$-Nets: Double Attention Networks
Authors	Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
Abstract	Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently. The component is designed with a double attention mechanism in two steps, where the first step gathers features from the entire space into a compact set through second-order attention pooling and the second step adaptively selects and distributes features to each location via another attention. The proposed double attention block is easy to adopt and can be plugged into existing deep neural networks conveniently. We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. On the image recognition task, a ResNet-50 equipped with our double attention blocks outperforms a much larger ResNet-152 architecture on ImageNet-1k dataset with over 40% less the number of parameters and less FLOPs. On the action recognition task, our proposed model achieves the state-of-the-art results on the Kinetics and UCF-101 datasets with significantly higher efficiency than recent works.
Tasks	Temporal Action Localization, Video Recognition
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11579v1
PDF	http://arxiv.org/pdf/1810.11579v1.pdf
PWC	https://paperswithcode.com/paper/a2-nets-double-attention-networks
Repo
Framework


Title	Cross-Modal and Hierarchical Modeling of Video and Text
Authors	Bowen Zhang, Hexiang Hu, Fei Sha
Abstract	Visual data and text data are composed of information at multiple granularities. A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action. Similarly, a paragraph may contain sentences with different topics, which collectively conveys a coherent message or story. In this paper, we investigate the modeling techniques for such hierarchical sequential data where there are correspondences across multiple modalities. Specifically, we introduce hierarchical sequence embedding (HSE), a generic model for embedding sequential data of different modalities into hierarchically semantic spaces, with either explicit or implicit correspondence information. We perform empirical studies on large-scale video and paragraph retrieval datasets and demonstrated superior performance by the proposed methods. Furthermore, we examine the effectiveness of our learned embeddings when applied to downstream tasks. We show its utility in zero-shot action recognition and video captioning.
Tasks	Temporal Action Localization, Video Captioning
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07212v1
PDF	http://arxiv.org/pdf/1810.07212v1.pdf
PWC	https://paperswithcode.com/paper/cross-modal-and-hierarchical-modeling-of
Repo
Framework

Lie Transform–based Neural Networks for Dynamics Simulation and Learning


Title	Lie Transform–based Neural Networks for Dynamics Simulation and Learning
Authors	Andrei Ivanov, Alena Sholokhova, Sergei Andrianov, Roman Konoplev-Esgenburg
Abstract	In the article, we discuss the architecture of the polynomial neural network that corresponds to the matrix representation of Lie transform. The matrix form of Lie transform is an approximation of the general solution of the nonlinear system of ordinary differential equations. The proposed architecture can be trained with small data sets, extrapolate predictions outside the training data, and provide a possibility for interpretation. We provide a theoretical explanation of the proposed architecture, as well as demonstrate it in several applications. We present the results of modeling and identification for both simple and well-known dynamical systems, and more complicated examples from price dynamics, chemistry, and accelerator physics. From a practical point of view, we describe the training of a Lie transform–based neural network with a small data set containing only 10 data points. We also demonstrate an interpretation of the fitted neural network by converting it to a system of differential equations.
Tasks	Time Series
Published	2018-02-05
URL	https://arxiv.org/abs/1802.01353v2
PDF	https://arxiv.org/pdf/1802.01353v2.pdf
PWC	https://paperswithcode.com/paper/lie-transform-based-polynomial-neural
Repo
Framework

Como funciona o Deep Learning


Title	Como funciona o Deep Learning
Authors	Moacir Antonelli Ponti, Gabriel B. Paranhos da Costa
Abstract	Deep Learning methods are currently the state-of-the-art in many problems which can be tackled via machine learning, in particular classification problems. However there is still lack of understanding on how those methods work, why they work and what are the limitations involved in using them. In this chapter we will describe in detail the transition from shallow to deep networks, include examples of code on how to implement them, as well as the main issues one faces when training a deep network. Afterwards, we introduce some theoretical background behind the use of deep models, and discuss their limitations.
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07908v1
PDF	http://arxiv.org/pdf/1806.07908v1.pdf
PWC	https://paperswithcode.com/paper/como-funciona-o-deep-learning
Repo
Framework

Knowledge-based end-to-end memory networks


Title	Knowledge-based end-to-end memory networks
Authors	Jatin Ganhotra, Lazaros Polymenakos
Abstract	End-to-end dialog systems have become very popular because they hold the promise of learning directly from human to human dialog interaction. Retrieval and Generative methods have been explored in this area with mixed results. A key element that is missing so far, is the incorporation of a-priori knowledge about the task at hand. This knowledge may exist in the form of structured or unstructured information. As a first step towards this direction, we present a novel approach, Knowledge based end-to-end memory networks (KB-memN2N), which allows special handling of named entities for goal-oriented dialog tasks. We present results on two datasets, DSTC6 challenge dataset and dialog bAbI tasks.
Tasks	Goal-Oriented Dialog
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08204v1
PDF	http://arxiv.org/pdf/1804.08204v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-based-end-to-end-memory-networks
Repo
Framework

An Overview on Application of Machine Learning Techniques in Optical Networks


Title	An Overview on Application of Machine Learning Techniques in Optical Networks
Authors	Francesco Musumeci, Cristina Rottondi, Avishek Nag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore
Abstract	Today’s telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users’ behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions.
Tasks
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07976v4
PDF	http://arxiv.org/pdf/1803.07976v4.pdf
PWC	https://paperswithcode.com/paper/an-overview-on-application-of-machine
Repo
Framework