Paper Group ANR 360
On Learning Disentangled Representations for Gait Recognition. Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation. A Graph-Based Machine Learning Approach for Bot Detection. When Collaborative Filtering Meets Reinforcement Learning. Inferring Dynamic Representations of Facial Actions from a St …
On Learning Disentangled Representations for Gait Recognition
Title | On Learning Disentangled Representations for Gait Recognition |
Authors | Ziyuan Zhang, Luan Tran, Feng Liu, Xiaoming Liu |
Abstract | Gait, the walking pattern of individuals, is one of the important biometrics modalities. Most of the existing gait recognition methods take silhouettes or articulated body models as gait features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, carrying and viewing angle. To remedy this issue, we propose a novel AutoEncoder framework, GaitNet, to explicitly disentangle appearance, canonical and pose features from RGB imagery. The LSTM integrates pose features over time as a dynamic gait feature while canonical features are averaged as a static gait feature. Both of them are utilized as classification features. In addition, we collect a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since it contains minimal gait cues compared to other views. FVG also includes other important variations, e.g., walking speed, carrying, and clothing. With extensive experiments on CASIA-B, USF, and FVG datasets, our method demonstrates superior performance to the SOTA quantitatively, the ability of feature disentanglement qualitatively, and promising computational efficiency. We further compare our GaitNet with state-of-the-art face recognition to demonstrate the advantages of gait biometrics identification under certain scenarios, e.g., long distance/lower resolutions, cross viewing angles. |
Tasks | Face Recognition, Gait Recognition |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.03051v1 |
https://arxiv.org/pdf/1909.03051v1.pdf | |
PWC | https://paperswithcode.com/paper/on-learning-disentangled-representations-for |
Repo | |
Framework | |
Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation
Title | Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation |
Authors | Jian Peng Bo Tang, Hao Jiang, Zhuo Li, Yinjie Lei, Tao Lin, Haifeng Li |
Abstract | Enabling a neural network to sequentially learn multiple tasks is of great significance for expanding the applicability of neural networks in realistic human application scenarios. However, as the task sequence increases, the model quickly forgets previously learned skills; we refer to this loss of memory of long sequences as long-term catastrophic forgetting. There are two main reasons for the long-term forgetting: first, as the tasks increase, the intersection of the low-error parameter subspace satisfying these tasks will become smaller and smaller or even non-existent; The second is the cumulative error in the process of protecting the knowledge of previous tasks. This paper, we propose a confrontation mechanism in which neural pruning and synaptic consolidation are used to overcome long-term catastrophic forgetting. This mechanism distills task-related knowledge into a small number of parameters, and retains the old knowledge by consolidating a small number of parameters, while sparing most parameters to learn the follow-up tasks, which not only avoids forgetting but also can learn a large number of tasks. Specifically, the neural pruning iteratively relaxes the parameter conditions of the current task to expand the common parameter subspace of tasks; The modified synaptic consolidation strategy is comprised of two components, a novel network structure information considered measurement is proposed to calculate the parameter importance, and a element-wise parameter updating strategy that is designed to prevent significant parameters being overridden in subsequent learning. We verified the method on image classification, and the results showed that our proposed ANPSC approach outperforms the state-of-the-art methods. The hyperparametric sensitivity test further demonstrates the robustness of our proposed approach. |
Tasks | Image Classification |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09091v1 |
https://arxiv.org/pdf/1912.09091v1.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-long-term-catastrophic-forgetting |
Repo | |
Framework | |
A Graph-Based Machine Learning Approach for Bot Detection
Title | A Graph-Based Machine Learning Approach for Bot Detection |
Authors | Abbas Abou Daya, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba |
Abstract | Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems which leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representations of network communications. In this paper, we propose a two-phased, graph-based bot detection system which leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our system detects multiple types of bots and is robust to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data. |
Tasks | |
Published | 2019-02-22 |
URL | http://arxiv.org/abs/1902.08538v1 |
http://arxiv.org/pdf/1902.08538v1.pdf | |
PWC | https://paperswithcode.com/paper/a-graph-based-machine-learning-approach-for |
Repo | |
Framework | |
When Collaborative Filtering Meets Reinforcement Learning
Title | When Collaborative Filtering Meets Reinforcement Learning |
Authors | Yu Lei, Wenjie Li |
Abstract | In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent’s optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets. |
Tasks | |
Published | 2019-02-02 |
URL | http://arxiv.org/abs/1902.00715v2 |
http://arxiv.org/pdf/1902.00715v2.pdf | |
PWC | https://paperswithcode.com/paper/when-collaborative-filtering-meets |
Repo | |
Framework | |
Inferring Dynamic Representations of Facial Actions from a Still Image
Title | Inferring Dynamic Representations of Facial Actions from a Still Image |
Authors | Siyang Song, Enrique Sánchez-Lozano, Linlin Shen, Alan Johnston, Michel Valstar |
Abstract | Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel approach to capture multiple scales of such temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular, 1) we propose a framework that infers a dynamic representation (DR) from a still image, which captures the bi-directional flow of time within a short time-window centered at the input image; 2) we show that we can train our method without the need of explicitly generating target representations, allowing the network to represent dynamics more broadly; and 3) we propose to apply a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time. |
Tasks | |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02382v1 |
http://arxiv.org/pdf/1904.02382v1.pdf | |
PWC | https://paperswithcode.com/paper/inferring-dynamic-representations-of-facial |
Repo | |
Framework | |
Indexing Graph Search Trees and Applications
Title | Indexing Graph Search Trees and Applications |
Authors | Sankardeep Chakraborty, Kunihiko Sadakane |
Abstract | We consider the problem of compactly representing the Depth First Search (DFS) tree of a given undirected or directed graph having $n$ vertices and $m$ edges while supporting various DFS related queries efficiently in the RAM with logarithmic word size. We study this problem in two well-known models: {\it indexing} and {\it encoding} models. While most of these queries can be supported easily in constant time using $O(n \lg n)$ bits\footnote{We use $\lg$ to denote logarithm to the base $2$.} of extra space, our goal here is, more specifically, to beat this trivial $O(n \lg n)$ bit space bound, yet not compromise too much on the running time of these queries. In the {\it indexing} model, the space bound of our solution involves the quantity $m$, hence, we obtain different bounds for sparse and dense graphs respectively. In the {\it encoding} model, we first give a space lower bound, followed by an almost optimal data structure with extremely fast query time. Central to our algorithm is a partitioning of the DFS tree into connected subtrees, and a compact way to store these connections. Finally, we also apply these techniques to compactly index the shortest path structure, biconnectivity structures among others. |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.07871v1 |
https://arxiv.org/pdf/1906.07871v1.pdf | |
PWC | https://paperswithcode.com/paper/indexing-graph-search-trees-and-applications |
Repo | |
Framework | |
Mapping road safety features from streetview imagery: A deep learning approach
Title | Mapping road safety features from streetview imagery: A deep learning approach |
Authors | Arpan Sainju, Zhe Jiang |
Abstract | Each year, around 6 million car accidents occur in the U.S. on average. Road safety features (e.g., concrete barriers, metal crash barriers, rumble strips) play an important role in preventing or mitigating vehicle crashes. Accurate maps of road safety features is an important component of safety management systems for federal or state transportation agencies, helping traffic engineers identify locations to invest on safety infrastructure. In current practice, mapping road safety features is largely done manually (e.g., observations on the road or visual interpretation of streetview imagery), which is both expensive and time consuming. In this paper, we propose a deep learning approach to automatically map road safety features from streetview imagery. Unlike existing Convolutional Neural Networks (CNNs) that classify each image individually, we propose to further add Recurrent Neural Network (Long Short Term Memory) to capture geographic context of images (spatial autocorrelation effect along linear road network paths). Evaluations on real world streetview imagery show that our proposed model outperforms several baseline methods. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.12647v1 |
https://arxiv.org/pdf/1907.12647v1.pdf | |
PWC | https://paperswithcode.com/paper/mapping-road-safety-features-from-streetview |
Repo | |
Framework | |
Recovery of Future Data via Convolution Nuclear Norm Minimization
Title | Recovery of Future Data via Convolution Nuclear Norm Minimization |
Authors | Guangcan Liu, Wayne Zhang |
Abstract | This paper is about recovering the unseen future data from a given sequence of historical samples, so called as future data recovery—a significant problem closely related to time series forecasting. For the first time, we study the problem from a perspective of tensor completion. Namely, we convert future data recovery into a more inclusive problem called sequential tensor completion (STC), which is to recover a tensor of sequential structure from some entries sampled arbitrarily from the tensor. Unlike the ordinary tensor completion (OTC) problem studied in the majority of literature, STC has a distinctive setup that the target tensor is sequential and not permutable, which means that the target owns rich spatio-temporal structures. This enables the possibility of restoring the arbitrarily selected missing entries, which is not possible under the framework of OTC. Then we propose two methods to address STC, including Discrete Fourier Transform based $\ell_1$ minimization ($\mathrm{DFT}{\ell_1}$) and Convolution Nuclear Norm Minimization (CNNM), where $\mathrm{DFT}{\ell_1}$ is indeed a special case of CNNM. Whenever the target is low-rank in some convolution domain, CNNM provably succeeds in solving STC. This immediately gives that $\mathrm{DFT}_{\ell_1}$ is also successful when the Fourier transform of the target is sparse, as convolution low-rankness is a generalization of Fourier sparsity. Experiments on univariate time series, images and videos show encouraging results. |
Tasks | Time Series, Time Series Forecasting |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03889v3 |
https://arxiv.org/pdf/1909.03889v3.pdf | |
PWC | https://paperswithcode.com/paper/recovery-of-future-data-via-convolution |
Repo | |
Framework | |
Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms
Title | Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms |
Authors | Imen Chakroun, Tom Vander Aa, Tom Ashby |
Abstract | Machine learning (ML) is probably the first and foremost used technique to deal with the size and complexity of the new generation of data. In this paper, we analyze one of the means to increase the performances of ML algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We document the possibilities of such reuse in some selected machine learning algorithms and give initial indicative results from our first experiments on data access improvement and algorithm redesign. |
Tasks | |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11203v3 |
https://arxiv.org/pdf/1904.11203v3.pdf | |
PWC | https://paperswithcode.com/paper/reviewing-data-access-patterns-and |
Repo | |
Framework | |
Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis
Title | Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis |
Authors | Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero |
Abstract | Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet. However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks. In this work, we propose to overcome this problem by learning a novel sentiment-aligned image embedding that is better suited for subsequent visual sentiment analysis. Our embedding leverages the intricate relation between emojis and images in large-scale and readily available data from social media. Emojis are language-agnostic, consistent, and carry a clear sentiment signal which make them an excellent proxy to learn a sentiment aligned embedding. Hence, we construct a novel dataset of 4 million images collected from Twitter with their associated emojis. We train a deep neural model for image embedding using emoji prediction task as a proxy. Our evaluation demonstrates that the proposed embedding outperforms the popular object-based counterpart consistently across several sentiment analysis benchmarks. Furthermore, without bell and whistles, our compact, effective and simple embedding outperforms the more elaborate and customized state-of-the-art deep models on these public benchmarks. Additionally, we introduce a novel emoji representation based on their visual emotional response which supports a deeper understanding of the emoji modality and their usage on social media. |
Tasks | Object Classification, Sentiment Analysis, Transfer Learning |
Published | 2019-07-14 |
URL | https://arxiv.org/abs/1907.06160v2 |
https://arxiv.org/pdf/1907.06160v2.pdf | |
PWC | https://paperswithcode.com/paper/smile-be-happy-emoji-embedding-for-visual |
Repo | |
Framework | |
Sketch-Fill-A-R: A Persona-Grounded Chit-Chat Generation Framework
Title | Sketch-Fill-A-R: A Persona-Grounded Chit-Chat Generation Framework |
Authors | Michael Shum, Stephan Zheng, Wojciech Kryściński, Caiming Xiong, Richard Socher |
Abstract | Human-like chit-chat conversation requires agents to generate responses that are fluent, engaging and consistent. We propose Sketch-Fill-A-R, a framework that uses a persona-memory to generate chit-chat responses in three phases. First, it generates dynamic sketch responses with open slots. Second, it generates candidate responses by filling slots with parts of its stored persona traits. Lastly, it ranks and selects the final response via a language model score. Sketch-Fill-A-R outperforms a state-of-the-art baseline both quantitatively (10-point lower perplexity) and qualitatively (preferred by 55% heads-up in single-turn and 20% higher in consistency in multi-turn user studies) on the Persona-Chat dataset. Finally, we extensively analyze Sketch-Fill-A-R’s responses and human feedback, and show it is more consistent and engaging by using more relevant responses and questions. |
Tasks | Language Modelling |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.13008v1 |
https://arxiv.org/pdf/1910.13008v1.pdf | |
PWC | https://paperswithcode.com/paper/sketch-fill-a-r-a-persona-grounded-chit-chat |
Repo | |
Framework | |
A Network-centric Framework for Auditing Recommendation Systems
Title | A Network-centric Framework for Auditing Recommendation Systems |
Authors | Abhisek Dash, Animesh Mukherjee, Saptarshi Ghosh |
Abstract | To improve the experience of consumers, all social media, commerce and entertainment sites deploy Recommendation Systems (RSs) that aim to help users locate interesting content. These RSs are black-boxes - the way a chunk of information is filtered out and served to a user from a large information base is mostly opaque. No one except the parent company generally has access to the entire information required for auditing these systems - neither the details of the algorithm nor the user-item interactions are ever made publicly available for third-party auditors. Hence auditing RSs remains an important challenge, especially with the recent concerns about how RSs are affecting the views of the society at large with new technical jargons like “echo chambers”, “confirmation biases”, “filter bubbles” etc. in place. Many prior works have evaluated different properties of RSs such as diversity, novelty, etc. However, most of these have focused on evaluating static snapshots of RSs. Today, auditors are not only interested in these static evaluations on a snapshot of the system, but also interested in how these systems are affecting the society in course of time. In this work, we propose a novel network-centric framework which is not only able to quantify various static properties of RSs, but also is able to quantify dynamic properties such as how likely RSs are to lead to polarization or segregation of information among their users. We apply the framework to several popular movie RSs to demonstrate its utility. |
Tasks | Recommendation Systems |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02710v2 |
http://arxiv.org/pdf/1902.02710v2.pdf | |
PWC | https://paperswithcode.com/paper/a-network-centric-framework-for-auditing |
Repo | |
Framework | |
Towards Explainable Neural-Symbolic Visual Reasoning
Title | Towards Explainable Neural-Symbolic Visual Reasoning |
Authors | Adrien Bennetot, Jean-Luc Laurent, Raja Chatila, Natalia Díaz-Rodríguez |
Abstract | Many high-performance models suffer from a lack of interpretability. There has been an increasing influx of work on explainable artificial intelligence (XAI) in order to disentangle what is meant and expected by XAI. Nevertheless, there is no general consensus on how to produce and judge explanations. In this paper, we discuss why techniques integrating connectionist and symbolic paradigms are the most efficient solutions to produce explanations for non-technical users and we propose a reasoning model, based on definitions by Doran et al. [2017] (arXiv:1710.00794) to explain a neural network’s decision. We use this explanation in order to correct bias in the network’s decision rationale. We accompany this model with an example of its potential use, based on the image captioning method in Burns et al. [2018] (arXiv:1803.09797). |
Tasks | Image Captioning, Visual Reasoning |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09065v2 |
https://arxiv.org/pdf/1909.09065v2.pdf | |
PWC | https://paperswithcode.com/paper/highlighting-bias-with-explainable-neural |
Repo | |
Framework | |
Distance Map Loss Penalty Term for Semantic Segmentation
Title | Distance Map Loss Penalty Term for Semantic Segmentation |
Authors | Francesco Caliva, Claudia Iriondo, Alejandro Morales Martinez, Sharmila Majumdar, Valentina Pedoia |
Abstract | Convolutional neural networks for semantic segmentation suffer from low performance at object boundaries. In medical imaging, accurate representation of tissue surfaces and volumes is important for tracking of disease biomarkers such as tissue morphology and shape features. In this work, we propose a novel distance map derived loss penalty term for semantic segmentation. We propose to use distance maps, derived from ground truth masks, to create a penalty term, guiding the network’s focus towards hard-to-segment boundary regions. We investigate the effects of this penalizing factor against cross-entropy, Dice, and focal loss, among others, evaluating performance on a 3D MRI bone segmentation task from the publicly available Osteoarthritis Initiative dataset. We observe a significant improvement in the quality of segmentation, with better shape preservation at bone boundaries and areas affected by partial volume. We ultimately aim to use our loss penalty term to improve the extraction of shape biomarkers and derive metrics to quantitatively evaluate the preservation of shape. |
Tasks | Semantic Segmentation |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03679v1 |
https://arxiv.org/pdf/1908.03679v1.pdf | |
PWC | https://paperswithcode.com/paper/distance-map-loss-penalty-term-for-semantic |
Repo | |
Framework | |
Large-scale representation learning from visually grounded untranscribed speech
Title | Large-scale representation learning from visually grounded untranscribed speech |
Authors | Gabriel Ilharco, Yuan Zhang, Jason Baldridge |
Abstract | Systems that can associate images with their spoken audio captions are an important step towards visually grounded language learning. We describe a scalable method to automatically generate diverse audio for image captioning datasets. This supports pretraining deep networks for encoding both audio and images, which we do via a dual encoder that learns to align latent representations from both modalities. We show that a masked margin softmax loss for such models is superior to the standard triplet loss. We fine-tune these models on the Flickr8k Audio Captions Corpus and obtain state-of-the-art results—improving recall in the top 10 from 29.6% to 49.5%. We also obtain human ratings on retrieval outputs to better assess the impact of incidentally matching image-caption pairs that were not associated in the data, finding that automatic evaluation substantially underestimates the quality of the retrieved results. |
Tasks | Image Captioning, Representation Learning |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.08782v1 |
https://arxiv.org/pdf/1909.08782v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-representation-learning-from |
Repo | |
Framework | |