January 30, 2020

3315 words 16 mins read

Paper Group ANR 360

On Learning Disentangled Representations for Gait Recognition. Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation. A Graph-Based Machine Learning Approach for Bot Detection. When Collaborative Filtering Meets Reinforcement Learning. Inferring Dynamic Representations of Facial Actions from a St …

On Learning Disentangled Representations for Gait Recognition


Title	On Learning Disentangled Representations for Gait Recognition
Authors	Ziyuan Zhang, Luan Tran, Feng Liu, Xiaoming Liu
Abstract	Gait, the walking pattern of individuals, is one of the important biometrics modalities. Most of the existing gait recognition methods take silhouettes or articulated body models as gait features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, carrying and viewing angle. To remedy this issue, we propose a novel AutoEncoder framework, GaitNet, to explicitly disentangle appearance, canonical and pose features from RGB imagery. The LSTM integrates pose features over time as a dynamic gait feature while canonical features are averaged as a static gait feature. Both of them are utilized as classification features. In addition, we collect a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since it contains minimal gait cues compared to other views. FVG also includes other important variations, e.g., walking speed, carrying, and clothing. With extensive experiments on CASIA-B, USF, and FVG datasets, our method demonstrates superior performance to the SOTA quantitatively, the ability of feature disentanglement qualitatively, and promising computational efficiency. We further compare our GaitNet with state-of-the-art face recognition to demonstrate the advantages of gait biometrics identification under certain scenarios, e.g., long distance/lower resolutions, cross viewing angles.
Tasks	Face Recognition, Gait Recognition
Published	2019-09-05
URL	https://arxiv.org/abs/1909.03051v1
PDF	https://arxiv.org/pdf/1909.03051v1.pdf
PWC	https://paperswithcode.com/paper/on-learning-disentangled-representations-for
Repo
Framework

Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation


Title	Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation
Authors	Jian Peng Bo Tang, Hao Jiang, Zhuo Li, Yinjie Lei, Tao Lin, Haifeng Li
Abstract	Enabling a neural network to sequentially learn multiple tasks is of great significance for expanding the applicability of neural networks in realistic human application scenarios. However, as the task sequence increases, the model quickly forgets previously learned skills; we refer to this loss of memory of long sequences as long-term catastrophic forgetting. There are two main reasons for the long-term forgetting: first, as the tasks increase, the intersection of the low-error parameter subspace satisfying these tasks will become smaller and smaller or even non-existent; The second is the cumulative error in the process of protecting the knowledge of previous tasks. This paper, we propose a confrontation mechanism in which neural pruning and synaptic consolidation are used to overcome long-term catastrophic forgetting. This mechanism distills task-related knowledge into a small number of parameters, and retains the old knowledge by consolidating a small number of parameters, while sparing most parameters to learn the follow-up tasks, which not only avoids forgetting but also can learn a large number of tasks. Specifically, the neural pruning iteratively relaxes the parameter conditions of the current task to expand the common parameter subspace of tasks; The modified synaptic consolidation strategy is comprised of two components, a novel network structure information considered measurement is proposed to calculate the parameter importance, and a element-wise parameter updating strategy that is designed to prevent significant parameters being overridden in subsequent learning. We verified the method on image classification, and the results showed that our proposed ANPSC approach outperforms the state-of-the-art methods. The hyperparametric sensitivity test further demonstrates the robustness of our proposed approach.
Tasks	Image Classification
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09091v1
PDF	https://arxiv.org/pdf/1912.09091v1.pdf
PWC	https://paperswithcode.com/paper/overcoming-long-term-catastrophic-forgetting
Repo
Framework

A Graph-Based Machine Learning Approach for Bot Detection


Title	A Graph-Based Machine Learning Approach for Bot Detection
Authors	Abbas Abou Daya, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba
Abstract	Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems which leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representations of network communications. In this paper, we propose a two-phased, graph-based bot detection system which leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our system detects multiple types of bots and is robust to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data.
Tasks
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08538v1
PDF	http://arxiv.org/pdf/1902.08538v1.pdf
PWC	https://paperswithcode.com/paper/a-graph-based-machine-learning-approach-for
Repo
Framework

When Collaborative Filtering Meets Reinforcement Learning


Title	When Collaborative Filtering Meets Reinforcement Learning
Authors	Yu Lei, Wenjie Li
Abstract	In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent’s optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets.
Tasks
Published	2019-02-02
URL	http://arxiv.org/abs/1902.00715v2
PDF	http://arxiv.org/pdf/1902.00715v2.pdf
PWC	https://paperswithcode.com/paper/when-collaborative-filtering-meets
Repo
Framework

Inferring Dynamic Representations of Facial Actions from a Still Image


Title	Inferring Dynamic Representations of Facial Actions from a Still Image
Authors	Siyang Song, Enrique Sánchez-Lozano, Linlin Shen, Alan Johnston, Michel Valstar
Abstract	Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel approach to capture multiple scales of such temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular, 1) we propose a framework that infers a dynamic representation (DR) from a still image, which captures the bi-directional flow of time within a short time-window centered at the input image; 2) we show that we can train our method without the need of explicitly generating target representations, allowing the network to represent dynamics more broadly; and 3) we propose to apply a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.
Tasks
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02382v1
PDF	http://arxiv.org/pdf/1904.02382v1.pdf
PWC	https://paperswithcode.com/paper/inferring-dynamic-representations-of-facial
Repo
Framework

Indexing Graph Search Trees and Applications


Title	Indexing Graph Search Trees and Applications
Authors	Sankardeep Chakraborty, Kunihiko Sadakane
Abstract	We consider the problem of compactly representing the Depth First Search (DFS) tree of a given undirected or directed graph having $n$ vertices and $m$ edges while supporting various DFS related queries efficiently in the RAM with logarithmic word size. We study this problem in two well-known models: {\it indexing} and {\it encoding} models. While most of these queries can be supported easily in constant time using $O(n \lg n)$ bits\footnote{We use $\lg$ to denote logarithm to the base $2$.} of extra space, our goal here is, more specifically, to beat this trivial $O(n \lg n)$ bit space bound, yet not compromise too much on the running time of these queries. In the {\it indexing} model, the space bound of our solution involves the quantity $m$, hence, we obtain different bounds for sparse and dense graphs respectively. In the {\it encoding} model, we first give a space lower bound, followed by an almost optimal data structure with extremely fast query time. Central to our algorithm is a partitioning of the DFS tree into connected subtrees, and a compact way to store these connections. Finally, we also apply these techniques to compactly index the shortest path structure, biconnectivity structures among others.
Tasks
Published	2019-06-19
URL	https://arxiv.org/abs/1906.07871v1
PDF	https://arxiv.org/pdf/1906.07871v1.pdf
PWC	https://paperswithcode.com/paper/indexing-graph-search-trees-and-applications
Repo
Framework

Mapping road safety features from streetview imagery: A deep learning approach


Title	Mapping road safety features from streetview imagery: A deep learning approach
Authors	Arpan Sainju, Zhe Jiang
Abstract	Each year, around 6 million car accidents occur in the U.S. on average. Road safety features (e.g., concrete barriers, metal crash barriers, rumble strips) play an important role in preventing or mitigating vehicle crashes. Accurate maps of road safety features is an important component of safety management systems for federal or state transportation agencies, helping traffic engineers identify locations to invest on safety infrastructure. In current practice, mapping road safety features is largely done manually (e.g., observations on the road or visual interpretation of streetview imagery), which is both expensive and time consuming. In this paper, we propose a deep learning approach to automatically map road safety features from streetview imagery. Unlike existing Convolutional Neural Networks (CNNs) that classify each image individually, we propose to further add Recurrent Neural Network (Long Short Term Memory) to capture geographic context of images (spatial autocorrelation effect along linear road network paths). Evaluations on real world streetview imagery show that our proposed model outperforms several baseline methods.
Tasks
Published	2019-07-15
URL	https://arxiv.org/abs/1907.12647v1
PDF	https://arxiv.org/pdf/1907.12647v1.pdf
PWC	https://paperswithcode.com/paper/mapping-road-safety-features-from-streetview
Repo
Framework

Recovery of Future Data via Convolution Nuclear Norm Minimization


Title	Recovery of Future Data via Convolution Nuclear Norm Minimization
Authors	Guangcan Liu, Wayne Zhang
Abstract	This paper is about recovering the unseen future data from a given sequence of historical samples, so called as future data recovery—a significant problem closely related to time series forecasting. For the first time, we study the problem from a perspective of tensor completion. Namely, we convert future data recovery into a more inclusive problem called sequential tensor completion (STC), which is to recover a tensor of sequential structure from some entries sampled arbitrarily from the tensor. Unlike the ordinary tensor completion (OTC) problem studied in the majority of literature, STC has a distinctive setup that the target tensor is sequential and not permutable, which means that the target owns rich spatio-temporal structures. This enables the possibility of restoring the arbitrarily selected missing entries, which is not possible under the framework of OTC. Then we propose two methods to address STC, including Discrete Fourier Transform based $\ell_1$ minimization ($\mathrm{DFT}{\ell_1}$) and Convolution Nuclear Norm Minimization (CNNM), where $\mathrm{DFT}{\ell_1}$ is indeed a special case of CNNM. Whenever the target is low-rank in some convolution domain, CNNM provably succeeds in solving STC. This immediately gives that $\mathrm{DFT}_{\ell_1}$ is also successful when the Fourier transform of the target is sparse, as convolution low-rankness is a generalization of Fourier sparsity. Experiments on univariate time series, images and videos show encouraging results.
Tasks	Time Series, Time Series Forecasting
Published	2019-09-06
URL	https://arxiv.org/abs/1909.03889v3
PDF	https://arxiv.org/pdf/1909.03889v3.pdf
PWC	https://paperswithcode.com/paper/recovery-of-future-data-via-convolution
Repo
Framework

Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms


Title	Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms
Authors	Imen Chakroun, Tom Vander Aa, Tom Ashby
Abstract	Machine learning (ML) is probably the first and foremost used technique to deal with the size and complexity of the new generation of data. In this paper, we analyze one of the means to increase the performances of ML algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We document the possibilities of such reuse in some selected machine learning algorithms and give initial indicative results from our first experiments on data access improvement and algorithm redesign.
Tasks
Published	2019-04-25
URL	https://arxiv.org/abs/1904.11203v3
PDF	https://arxiv.org/pdf/1904.11203v3.pdf
PWC	https://paperswithcode.com/paper/reviewing-data-access-patterns-and
Repo
Framework

Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis


Title	Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis
Authors	Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero
Abstract	Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet. However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks. In this work, we propose to overcome this problem by learning a novel sentiment-aligned image embedding that is better suited for subsequent visual sentiment analysis. Our embedding leverages the intricate relation between emojis and images in large-scale and readily available data from social media. Emojis are language-agnostic, consistent, and carry a clear sentiment signal which make them an excellent proxy to learn a sentiment aligned embedding. Hence, we construct a novel dataset of 4 million images collected from Twitter with their associated emojis. We train a deep neural model for image embedding using emoji prediction task as a proxy. Our evaluation demonstrates that the proposed embedding outperforms the popular object-based counterpart consistently across several sentiment analysis benchmarks. Furthermore, without bell and whistles, our compact, effective and simple embedding outperforms the more elaborate and customized state-of-the-art deep models on these public benchmarks. Additionally, we introduce a novel emoji representation based on their visual emotional response which supports a deeper understanding of the emoji modality and their usage on social media.
Tasks	Object Classification, Sentiment Analysis, Transfer Learning
Published	2019-07-14
URL	https://arxiv.org/abs/1907.06160v2
PDF	https://arxiv.org/pdf/1907.06160v2.pdf
PWC	https://paperswithcode.com/paper/smile-be-happy-emoji-embedding-for-visual
Repo
Framework

Sketch-Fill-A-R: A Persona-Grounded Chit-Chat Generation Framework


Title	Sketch-Fill-A-R: A Persona-Grounded Chit-Chat Generation Framework
Authors	Michael Shum, Stephan Zheng, Wojciech Kryściński, Caiming Xiong, Richard Socher
Abstract	Human-like chit-chat conversation requires agents to generate responses that are fluent, engaging and consistent. We propose Sketch-Fill-A-R, a framework that uses a persona-memory to generate chit-chat responses in three phases. First, it generates dynamic sketch responses with open slots. Second, it generates candidate responses by filling slots with parts of its stored persona traits. Lastly, it ranks and selects the final response via a language model score. Sketch-Fill-A-R outperforms a state-of-the-art baseline both quantitatively (10-point lower perplexity) and qualitatively (preferred by 55% heads-up in single-turn and 20% higher in consistency in multi-turn user studies) on the Persona-Chat dataset. Finally, we extensively analyze Sketch-Fill-A-R’s responses and human feedback, and show it is more consistent and engaging by using more relevant responses and questions.
Tasks	Language Modelling
Published	2019-10-28
URL	https://arxiv.org/abs/1910.13008v1
PDF	https://arxiv.org/pdf/1910.13008v1.pdf
PWC	https://paperswithcode.com/paper/sketch-fill-a-r-a-persona-grounded-chit-chat
Repo
Framework

A Network-centric Framework for Auditing Recommendation Systems


Title	A Network-centric Framework for Auditing Recommendation Systems
Authors	Abhisek Dash, Animesh Mukherjee, Saptarshi Ghosh
Abstract	To improve the experience of consumers, all social media, commerce and entertainment sites deploy Recommendation Systems (RSs) that aim to help users locate interesting content. These RSs are black-boxes - the way a chunk of information is filtered out and served to a user from a large information base is mostly opaque. No one except the parent company generally has access to the entire information required for auditing these systems - neither the details of the algorithm nor the user-item interactions are ever made publicly available for third-party auditors. Hence auditing RSs remains an important challenge, especially with the recent concerns about how RSs are affecting the views of the society at large with new technical jargons like “echo chambers”, “confirmation biases”, “filter bubbles” etc. in place. Many prior works have evaluated different properties of RSs such as diversity, novelty, etc. However, most of these have focused on evaluating static snapshots of RSs. Today, auditors are not only interested in these static evaluations on a snapshot of the system, but also interested in how these systems are affecting the society in course of time. In this work, we propose a novel network-centric framework which is not only able to quantify various static properties of RSs, but also is able to quantify dynamic properties such as how likely RSs are to lead to polarization or segregation of information among their users. We apply the framework to several popular movie RSs to demonstrate its utility.
Tasks	Recommendation Systems
Published	2019-02-07
URL	http://arxiv.org/abs/1902.02710v2
PDF	http://arxiv.org/pdf/1902.02710v2.pdf
PWC	https://paperswithcode.com/paper/a-network-centric-framework-for-auditing
Repo
Framework

Towards Explainable Neural-Symbolic Visual Reasoning


Title	Towards Explainable Neural-Symbolic Visual Reasoning
Authors	Adrien Bennetot, Jean-Luc Laurent, Raja Chatila, Natalia Díaz-Rodríguez
Abstract	Many high-performance models suffer from a lack of interpretability. There has been an increasing influx of work on explainable artificial intelligence (XAI) in order to disentangle what is meant and expected by XAI. Nevertheless, there is no general consensus on how to produce and judge explanations. In this paper, we discuss why techniques integrating connectionist and symbolic paradigms are the most efficient solutions to produce explanations for non-technical users and we propose a reasoning model, based on definitions by Doran et al. [2017] (arXiv:1710.00794) to explain a neural network’s decision. We use this explanation in order to correct bias in the network’s decision rationale. We accompany this model with an example of its potential use, based on the image captioning method in Burns et al. [2018] (arXiv:1803.09797).
Tasks	Image Captioning, Visual Reasoning
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09065v2
PDF	https://arxiv.org/pdf/1909.09065v2.pdf
PWC	https://paperswithcode.com/paper/highlighting-bias-with-explainable-neural
Repo
Framework

Distance Map Loss Penalty Term for Semantic Segmentation


Title	Distance Map Loss Penalty Term for Semantic Segmentation
Authors	Francesco Caliva, Claudia Iriondo, Alejandro Morales Martinez, Sharmila Majumdar, Valentina Pedoia
Abstract	Convolutional neural networks for semantic segmentation suffer from low performance at object boundaries. In medical imaging, accurate representation of tissue surfaces and volumes is important for tracking of disease biomarkers such as tissue morphology and shape features. In this work, we propose a novel distance map derived loss penalty term for semantic segmentation. We propose to use distance maps, derived from ground truth masks, to create a penalty term, guiding the network’s focus towards hard-to-segment boundary regions. We investigate the effects of this penalizing factor against cross-entropy, Dice, and focal loss, among others, evaluating performance on a 3D MRI bone segmentation task from the publicly available Osteoarthritis Initiative dataset. We observe a significant improvement in the quality of segmentation, with better shape preservation at bone boundaries and areas affected by partial volume. We ultimately aim to use our loss penalty term to improve the extraction of shape biomarkers and derive metrics to quantitatively evaluate the preservation of shape.
Tasks	Semantic Segmentation
Published	2019-08-10
URL	https://arxiv.org/abs/1908.03679v1
PDF	https://arxiv.org/pdf/1908.03679v1.pdf
PWC	https://paperswithcode.com/paper/distance-map-loss-penalty-term-for-semantic
Repo
Framework

Large-scale representation learning from visually grounded untranscribed speech


Title	Large-scale representation learning from visually grounded untranscribed speech
Authors	Gabriel Ilharco, Yuan Zhang, Jason Baldridge
Abstract	Systems that can associate images with their spoken audio captions are an important step towards visually grounded language learning. We describe a scalable method to automatically generate diverse audio for image captioning datasets. This supports pretraining deep networks for encoding both audio and images, which we do via a dual encoder that learns to align latent representations from both modalities. We show that a masked margin softmax loss for such models is superior to the standard triplet loss. We fine-tune these models on the Flickr8k Audio Captions Corpus and obtain state-of-the-art results—improving recall in the top 10 from 29.6% to 49.5%. We also obtain human ratings on retrieval outputs to better assess the impact of incidentally matching image-caption pairs that were not associated in the data, finding that automatic evaluation substantially underestimates the quality of the retrieved results.
Tasks	Image Captioning, Representation Learning
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08782v1
PDF	https://arxiv.org/pdf/1909.08782v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-representation-learning-from
Repo
Framework