Paper Group ANR 326
From Same Photo: Cheating on Visual Kinship Challenges. Approximate Nearest Neighbor Search in High Dimensions. A Logarithmic Barrier Method For Proximal Policy Optimization. A Deep Learning Approach for Multimodal Deception Detection. Incremental Natural Language Processing: Challenges, Strategies, and Evaluation. ActionXPose: A Novel 2D Multi-vie …
From Same Photo: Cheating on Visual Kinship Challenges
Title | From Same Photo: Cheating on Visual Kinship Challenges |
Authors | Mitchell Dawson, Andrew Zisserman, Christoffer Nellåker |
Abstract | With the propensity for deep learning models to learn unintended signals from data sets there is always the possibility that the network can `cheat’ in order to solve a task. In the instance of data sets for visual kinship verification, one such unintended signal could be that the faces are cropped from the same photograph, since faces from the same photograph are more likely to be from the same family. In this paper we investigate the influence of this artefactual data inference in published data sets for kinship verification. To this end, we obtain a large dataset, and train a CNN classifier to determine if two faces are from the same photograph or not. Using this classifier alone as a naive classifier of kinship, we demonstrate near state of the art results on five public benchmark data sets for kinship verification - achieving over 90% accuracy on one of them. Thus, we conclude that faces derived from the same photograph are a strong inadvertent signal in all the data sets we examined, and it is likely that the fraction of kinship explained by existing kinship models is small. | |
Tasks | |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06200v2 |
http://arxiv.org/pdf/1809.06200v2.pdf | |
PWC | https://paperswithcode.com/paper/from-same-photo-cheating-on-visual-kinship |
Repo | |
Framework | |
Approximate Nearest Neighbor Search in High Dimensions
Title | Approximate Nearest Neighbor Search in High Dimensions |
Authors | Alexandr Andoni, Piotr Indyk, Ilya Razenshteyn |
Abstract | The nearest neighbor problem is defined as follows: Given a set $P$ of $n$ points in some metric space $(X,D)$, build a data structure that, given any point $q$, returns a point in $P$ that is closest to $q$ (its “nearest neighbor” in $P$). The data structure stores additional information about the set $P$, which is then used to find the nearest neighbor without computing all distances between $q$ and $P$. The problem has a wide range of applications in machine learning, computer vision, databases and other fields. To reduce the time needed to find nearest neighbors and the amount of memory used by the data structure, one can formulate the {\em approximate} nearest neighbor problem, where the the goal is to return any point $p’ \in P$ such that the distance from $q$ to $p'$ is at most $c \cdot \min_{p \in P} D(q,p)$, for some $c \geq 1$. Over the last two decades, many efficient solutions to this problem were developed. In this article we survey these developments, as well as their connections to questions in geometric functional analysis and combinatorial geometry. |
Tasks | |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09823v1 |
http://arxiv.org/pdf/1806.09823v1.pdf | |
PWC | https://paperswithcode.com/paper/approximate-nearest-neighbor-search-in-high |
Repo | |
Framework | |
A Logarithmic Barrier Method For Proximal Policy Optimization
Title | A Logarithmic Barrier Method For Proximal Policy Optimization |
Authors | Cheng Zeng, Hongming Zhang |
Abstract | Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This method, which we call proximal policy optimization with barrier method (PPO-B), keeps almost all advantageous spheres of PPO such as easy implementation and good generalization. Specifically, a new surrogate objective with interior penalty method is proposed to avoid the defect arose from exterior penalty method. Conclusions can be draw that PPO-B is able to outperform PPO in terms of sampling efficiency since PPO-B achieved clearly better performance on Atari and Mujoco environment than PPO. |
Tasks | |
Published | 2018-12-16 |
URL | http://arxiv.org/abs/1812.06502v1 |
http://arxiv.org/pdf/1812.06502v1.pdf | |
PWC | https://paperswithcode.com/paper/a-logarithmic-barrier-method-for-proximal |
Repo | |
Framework | |
A Deep Learning Approach for Multimodal Deception Detection
Title | A Deep Learning Approach for Multimodal Deception Detection |
Authors | Gangeshwar Krishnamurthy, Navonil Majumder, Soujanya Poria, Erik Cambria |
Abstract | Automatic deception detection is an important task that has gained momentum in computational linguistics due to its potential applications. In this paper, we propose a simple yet tough to beat multi-modal neural model for deception detection. By combining features from different modalities such as video, audio, and text along with Micro-Expression features, we show that detecting deception in real life videos can be more accurate. Experimental results on a dataset of real-life deception videos show that our model outperforms existing techniques for deception detection with an accuracy of 96.14% and ROC-AUC of 0.9799. |
Tasks | Deception Detection |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00344v1 |
http://arxiv.org/pdf/1803.00344v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-approach-for-multimodal |
Repo | |
Framework | |
Incremental Natural Language Processing: Challenges, Strategies, and Evaluation
Title | Incremental Natural Language Processing: Challenges, Strategies, and Evaluation |
Authors | Arne Köhn |
Abstract | Incrementality is ubiquitous in human-human interaction and beneficial for human-computer interaction. It has been a topic of research in different parts of the NLP community, mostly with focus on the specific topic at hand even though incremental systems have to deal with similar challenges regardless of domain. In this survey, I consolidate and categorize the approaches, identifying similarities and differences in the computation and data, and show trade-offs that have to be considered. A focus lies on evaluating incremental systems because the standard metrics often fail to capture the incremental properties of a system and coming up with a suitable evaluation scheme is non-trivial. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12518v2 |
http://arxiv.org/pdf/1805.12518v2.pdf | |
PWC | https://paperswithcode.com/paper/incremental-natural-language-processing-1 |
Repo | |
Framework | |
ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition
Title | ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition |
Authors | Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, Syed Mohsen Naqvi |
Abstract | We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in our Intelligent Sensing Lab, to show ActionXPose generalization performance. |
Tasks | Temporal Action Localization |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12126v1 |
http://arxiv.org/pdf/1810.12126v1.pdf | |
PWC | https://paperswithcode.com/paper/actionxpose-a-novel-2d-multi-view-pose-based |
Repo | |
Framework | |
Knowledge Representation Learning: A Quantitative Review
Title | Knowledge Representation Learning: A Quantitative Review |
Authors | Yankai Lin, Xu Han, Ruobing Xie, Zhiyuan Liu, Maosong Sun |
Abstract | Knowledge representation learning (KRL) aims to represent entities and relations in knowledge graph in low-dimensional semantic space, which have been widely used in massive knowledge-driven tasks. In this article, we introduce the reader to the motivations for KRL, and overview existing approaches for KRL. Afterwards, we extensively conduct and quantitative comparison and analysis of several typical KRL methods on three evaluation tasks of knowledge acquisition including knowledge graph completion, triple classification, and relation extraction. We also review the real-world applications of KRL, such as language modeling, question answering, information retrieval, and recommender systems. Finally, we discuss the remaining challenges and outlook the future directions for KRL. The codes and datasets used in the experiments can be found in https://github.com/thunlp/OpenKE. |
Tasks | Information Retrieval, Knowledge Graph Completion, Language Modelling, Question Answering, Recommendation Systems, Relation Extraction, Representation Learning |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.10901v1 |
http://arxiv.org/pdf/1812.10901v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-representation-learning-a |
Repo | |
Framework | |
Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos
Title | Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos |
Authors | Mickey Li, Noyan Songur, Pavel Orlov, Stefan Leutenegger, A Aldo Faisal |
Abstract | Incorporating the physical environment is essential for a complete understanding of human behavior in unconstrained every-day tasks. This is especially important in ego-centric tasks where obtaining 3 dimensional information is both limiting and challenging with the current 2D video analysis methods proving insufficient. Here we demonstrate a proof-of-concept system which provides real-time 3D mapping and semantic labeling of the local environment from an ego-centric RGB-D video-stream with 3D gaze point estimation from head mounted eye tracking glasses. We augment existing work in Semantic Simultaneous Localization And Mapping (Semantic SLAM) with collected gaze vectors. Our system can then find and track objects both inside and outside the user field-of-view in 3D from multiple perspectives with reasonable accuracy. We validate our concept by producing a semantic map from images of the NYUv2 dataset while simultaneously estimating gaze position and gaze classes from recorded gaze data of the dataset images. |
Tasks | 3D Scene Reconstruction, Eye Tracking, Simultaneous Localization and Mapping |
Published | 2018-07-27 |
URL | http://arxiv.org/abs/1807.10561v1 |
http://arxiv.org/pdf/1807.10561v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-an-embodied-semantic-fovea-semantic |
Repo | |
Framework | |
Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks
Title | Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks |
Authors | Marian K. Y. Boktor, Ahmad Al-Kabbany, Radwa Khalil, Said El-Khamy |
Abstract | This paper addresses the problem of real-time action recognition in trimmed videos, for which deep neural networks have defined the state-of-the-art performance in the recent literature. For attaining higher recognition accuracies with efficient computations, researchers have addressed the various aspects of limitations in the recognition pipeline. This includes network architecture, the number of input streams (where additional streams augment the color information), the cost function to be optimized, in addition to others. The literature has always aimed, though, at assigning the adopted network (or networks, in case of multiple streams) the task of recognizing the whole number of action classes in the dataset at hand. We propose to train multiple specialized module networks instead. Each module is trained to recognize a subset of the action classes. Towards this goal, we present a dissimilarity-based optimized procedure for distributing the action classes over the modules, which can be trained simultaneously offline. On two standard datasets–UCF-101 and HMDB-51–the proposed method demonstrates a comparable performance, that is superior in some aspects, to the state-of-the-art, and that satisfies the real-time constraint. We achieved 72.5% accuracy on the challenging HMDB-51 dataset. By assigning fewer and unalike classes to each module network, this research paves the way to benefit from light-weight architectures without compromising recognition accuracy. |
Tasks | Temporal Action Localization |
Published | 2018-10-27 |
URL | http://arxiv.org/abs/1810.11731v1 |
http://arxiv.org/pdf/1810.11731v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-action-recognition-with |
Repo | |
Framework | |
$A^2$-Nets: Double Attention Networks
Title | $A^2$-Nets: Double Attention Networks |
Authors | Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng |
Abstract | Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently. The component is designed with a double attention mechanism in two steps, where the first step gathers features from the entire space into a compact set through second-order attention pooling and the second step adaptively selects and distributes features to each location via another attention. The proposed double attention block is easy to adopt and can be plugged into existing deep neural networks conveniently. We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. On the image recognition task, a ResNet-50 equipped with our double attention blocks outperforms a much larger ResNet-152 architecture on ImageNet-1k dataset with over 40% less the number of parameters and less FLOPs. On the action recognition task, our proposed model achieves the state-of-the-art results on the Kinetics and UCF-101 datasets with significantly higher efficiency than recent works. |
Tasks | Temporal Action Localization, Video Recognition |
Published | 2018-10-27 |
URL | http://arxiv.org/abs/1810.11579v1 |
http://arxiv.org/pdf/1810.11579v1.pdf | |
PWC | https://paperswithcode.com/paper/a2-nets-double-attention-networks |
Repo | |
Framework | |
Cross-Modal and Hierarchical Modeling of Video and Text
Title | Cross-Modal and Hierarchical Modeling of Video and Text |
Authors | Bowen Zhang, Hexiang Hu, Fei Sha |
Abstract | Visual data and text data are composed of information at multiple granularities. A video can describe a complex scene that is composed of multiple clips or shots, where each depicts a semantically coherent event or action. Similarly, a paragraph may contain sentences with different topics, which collectively conveys a coherent message or story. In this paper, we investigate the modeling techniques for such hierarchical sequential data where there are correspondences across multiple modalities. Specifically, we introduce hierarchical sequence embedding (HSE), a generic model for embedding sequential data of different modalities into hierarchically semantic spaces, with either explicit or implicit correspondence information. We perform empirical studies on large-scale video and paragraph retrieval datasets and demonstrated superior performance by the proposed methods. Furthermore, we examine the effectiveness of our learned embeddings when applied to downstream tasks. We show its utility in zero-shot action recognition and video captioning. |
Tasks | Temporal Action Localization, Video Captioning |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07212v1 |
http://arxiv.org/pdf/1810.07212v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-and-hierarchical-modeling-of |
Repo | |
Framework | |
Lie Transform–based Neural Networks for Dynamics Simulation and Learning
Title | Lie Transform–based Neural Networks for Dynamics Simulation and Learning |
Authors | Andrei Ivanov, Alena Sholokhova, Sergei Andrianov, Roman Konoplev-Esgenburg |
Abstract | In the article, we discuss the architecture of the polynomial neural network that corresponds to the matrix representation of Lie transform. The matrix form of Lie transform is an approximation of the general solution of the nonlinear system of ordinary differential equations. The proposed architecture can be trained with small data sets, extrapolate predictions outside the training data, and provide a possibility for interpretation. We provide a theoretical explanation of the proposed architecture, as well as demonstrate it in several applications. We present the results of modeling and identification for both simple and well-known dynamical systems, and more complicated examples from price dynamics, chemistry, and accelerator physics. From a practical point of view, we describe the training of a Lie transform–based neural network with a small data set containing only 10 data points. We also demonstrate an interpretation of the fitted neural network by converting it to a system of differential equations. |
Tasks | Time Series |
Published | 2018-02-05 |
URL | https://arxiv.org/abs/1802.01353v2 |
https://arxiv.org/pdf/1802.01353v2.pdf | |
PWC | https://paperswithcode.com/paper/lie-transform-based-polynomial-neural |
Repo | |
Framework | |
Como funciona o Deep Learning
Title | Como funciona o Deep Learning |
Authors | Moacir Antonelli Ponti, Gabriel B. Paranhos da Costa |
Abstract | Deep Learning methods are currently the state-of-the-art in many problems which can be tackled via machine learning, in particular classification problems. However there is still lack of understanding on how those methods work, why they work and what are the limitations involved in using them. In this chapter we will describe in detail the transition from shallow to deep networks, include examples of code on how to implement them, as well as the main issues one faces when training a deep network. Afterwards, we introduce some theoretical background behind the use of deep models, and discuss their limitations. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07908v1 |
http://arxiv.org/pdf/1806.07908v1.pdf | |
PWC | https://paperswithcode.com/paper/como-funciona-o-deep-learning |
Repo | |
Framework | |
Knowledge-based end-to-end memory networks
Title | Knowledge-based end-to-end memory networks |
Authors | Jatin Ganhotra, Lazaros Polymenakos |
Abstract | End-to-end dialog systems have become very popular because they hold the promise of learning directly from human to human dialog interaction. Retrieval and Generative methods have been explored in this area with mixed results. A key element that is missing so far, is the incorporation of a-priori knowledge about the task at hand. This knowledge may exist in the form of structured or unstructured information. As a first step towards this direction, we present a novel approach, Knowledge based end-to-end memory networks (KB-memN2N), which allows special handling of named entities for goal-oriented dialog tasks. We present results on two datasets, DSTC6 challenge dataset and dialog bAbI tasks. |
Tasks | Goal-Oriented Dialog |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08204v1 |
http://arxiv.org/pdf/1804.08204v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-based-end-to-end-memory-networks |
Repo | |
Framework | |
An Overview on Application of Machine Learning Techniques in Optical Networks
Title | An Overview on Application of Machine Learning Techniques in Optical Networks |
Authors | Francesco Musumeci, Cristina Rottondi, Avishek Nag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore |
Abstract | Today’s telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users’ behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions. |
Tasks | |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.07976v4 |
http://arxiv.org/pdf/1803.07976v4.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-on-application-of-machine |
Repo | |
Framework | |