Paper Group AWR 437
Riemannian adaptive stochastic gradient algorithms on matrix manifolds. DoorGym: A Scalable Door Opening Environment And Baseline Agent. Efficient Derivative Computation for Cumulative B-Splines on Lie Groups. Robustness for Non-Parametric Classification: A Generic Attack and Defense. #MeTooMaastricht: Building a chatbot to assist survivors of sexu …
Riemannian adaptive stochastic gradient algorithms on matrix manifolds
Title | Riemannian adaptive stochastic gradient algorithms on matrix manifolds |
Authors | Hiroyuki Kasai, Pratik Jawanpuria, Bamdev Mishra |
Abstract | Adaptive stochastic gradient algorithms in the Euclidean space have attracted much attention lately. Such explorations on Riemannian manifolds, on the other hand, are relatively new, limited, and challenging. This is because of the intrinsic non-linear structure of the underlying manifold and the absence of a canonical coordinate system. In machine learning applications, however, most manifolds of interest are represented as matrices with notions of row and column subspaces. In addition, the implicit manifold-related constraints may also lie on such subspaces. For example, the Grassmann manifold is the set of column subspaces. To this end, such a rich structure should not be lost by transforming matrices to just a stack of vectors while developing optimization algorithms on manifolds. We propose novel stochastic gradient algorithms for problems on Riemannian matrix manifolds by adapting the row and column subspaces of gradients. Our algorithms are provably convergent and they achieve the convergence rate of order $\mathcal{O}(\log (T)/\sqrt{T})$, where $T$ is the number of iterations. Our experiments illustrate the efficacy of the proposed algorithms on several applications. |
Tasks | |
Published | 2019-02-04 |
URL | https://arxiv.org/abs/1902.01144v5 |
https://arxiv.org/pdf/1902.01144v5.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-stochastic-gradient-algorithms-on |
Repo | https://github.com/hiroyuki-kasai/RSOpt |
Framework | none |
DoorGym: A Scalable Door Opening Environment And Baseline Agent
Title | DoorGym: A Scalable Door Opening Environment And Baseline Agent |
Authors | Yusuke Urakami, Alec Hodgkinson, Casey Carlin, Randall Leu, Luca Rigazio, Pieter Abbeel |
Abstract | Reinforcement Learning (RL) has brought forth ideas of autonomous robots that can navigate real-world environments with ease, aiding humans in a variety of tasks. RL agents have just begun to make their way out of simulation into the real world. Once in the real world, benchmark tasks often fail to transfer into useful skills. We introduce DoorGym, a simulation environment intended to be the first step to move RL from toy environments towards useful atomic skills that can be composed and extended towards a broader goal. DoorGym is an open-source door simulation framework designed to be highly configurable. We also provide a baseline PPO (Proximal Policy Optimization) and SAC (Soft Actor-Critic)implementation, which achieves a success rate of up to 70% for common tasks in this environment. Environment kit available here:https://github.com/PSVL/DoorGym/ |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01887v2 |
https://arxiv.org/pdf/1908.01887v2.pdf | |
PWC | https://paperswithcode.com/paper/doorgym-a-scalable-door-opening-environment |
Repo | https://github.com/PSVL/DoorGym |
Framework | pytorch |
Efficient Derivative Computation for Cumulative B-Splines on Lie Groups
Title | Efficient Derivative Computation for Cumulative B-Splines on Lie Groups |
Authors | Christiane Sommer, Vladyslav Usenko, David Schubert, Nikolaus Demmel, Daniel Cremers |
Abstract | Continuous-time trajectory representation has recently gained popularity for tasks where the fusion of high-frame-rate sensors and multiple unsynchronized devices is required. Lie group cumulative B-splines are a popular way of representing continuous trajectories without singularities. They have been used in near real-time SLAM and odometry systems with IMU, LiDAR, regular, RGB-D and event cameras, as well as for offline calibration. These applications require efficient computation of time derivatives (velocity, acceleration), but all prior works rely on a computationally suboptimal formulation. In this work we present an alternative derivation of time derivatives based on recurrence relations that needs $\mathcal{O}(k)$ instead of $\mathcal{O}(k^2)$ matrix operations (for a spline of order $k$) and results in simple and elegant expressions. While producing the same result, the proposed approach significantly speeds up the trajectory optimization and allows for computing simple analytic derivatives with respect to spline knots. The results presented in this paper pave the way for incorporating continuous-time trajectory representations into more applications where real-time performance is required. |
Tasks | Calibration |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08860v1 |
https://arxiv.org/pdf/1911.08860v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-derivative-computation-for |
Repo | https://github.com/VladyslavUsenko/basalt-mirror |
Framework | none |
Robustness for Non-Parametric Classification: A Generic Attack and Defense
Title | Robustness for Non-Parametric Classification: A Generic Attack and Defense |
Authors | Yao-Yuan Yang, Cyrus Rashtchian, Yizhen Wang, Kamalika Chaudhuri |
Abstract | Adversarially robust machine learning has received much recent attention. However, prior attacks and defenses for non-parametric classifiers have been developed in an ad-hoc or classifier-specific basis. In this work, we take a holistic look at adversarial examples for non-parametric classifiers, including nearest neighbors, decision trees, and random forests. We provide a general defense method, adversarial pruning, that works by preprocessing the dataset to become well-separated. To test our defense, we provide a novel attack that applies to a wide range of non-parametric classifiers. Theoretically, we derive an optimally robust classifier, which is analogous to the Bayes Optimal. We show that adversarial pruning can be viewed as a finite sample approximation to this optimal classifier. We empirically show that our defense and attack are either better than or competitive with prior work on non-parametric classifiers. Overall, our results provide a strong and broadly-applicable baseline for future work on robust non-parametrics. Code available at https://github.com/yangarbiter/adversarial-nonparametrics/ . |
Tasks | Adversarial Attack, Adversarial Defense |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03310v2 |
https://arxiv.org/pdf/1906.03310v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-examples-for-non-parametric |
Repo | https://github.com/yangarbiter/adversarial-nonparametrics |
Framework | none |
#MeTooMaastricht: Building a chatbot to assist survivors of sexual harassment
Title | #MeTooMaastricht: Building a chatbot to assist survivors of sexual harassment |
Authors | Tobias Bauer, Emre Devrim, Misha Glazunov, William Lopez Jaramillo, Balaganesh Mohan, Gerasimos Spanakis |
Abstract | Inspired by the recent social movement of #MeToo, we are building a chatbot to assist survivors of sexual harassment cases (designed for the city of Maastricht but can easily be extended). The motivation behind this work is twofold: properly assist survivors of such events by directing them to appropriate institutions that can offer them help and increase the incident documentation so as to gather more data about harassment cases which are currently under reported. We break down the problem into three data science/machine learning components: harassment type identification (treated as a classification problem), spatio-temporal information extraction (treated as Named Entity Recognition problem) and dialogue with the users (treated as a slot-filling based chatbot). We are able to achieve a success rate of more than 98% for the identification of a harassment-or-not case and around 80% for the specific type harassment identification. Locations and dates are identified with more than 90% accuracy and time occurrences prove more challenging with almost 80%. Finally, initial validation of the chatbot shows great potential for the further development and deployment of such a beneficial for the whole society tool. |
Tasks | Chatbot, Named Entity Recognition, Slot Filling, Temporal Information Extraction |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02809v1 |
https://arxiv.org/pdf/1909.02809v1.pdf | |
PWC | https://paperswithcode.com/paper/metoomaastricht-building-a-chatbot-to-assist |
Repo | https://github.com/edevrim/metoomaas |
Framework | none |
metric-learn: Metric Learning Algorithms in Python
Title | metric-learn: Metric Learning Algorithms in Python |
Authors | William de Vazelhes, CJ Carey, Yuan Tang, Nathalie Vauquier, Aurélien Bellet |
Abstract | metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT licence. |
Tasks | Metric Learning, Model Selection |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04710v1 |
https://arxiv.org/pdf/1908.04710v1.pdf | |
PWC | https://paperswithcode.com/paper/metric-learn-metric-learning-algorithms-in |
Repo | https://github.com/all-umass/metric_learn |
Framework | none |
Deeper and Wider Siamese Networks for Real-Time Visual Tracking
Title | Deeper and Wider Siamese Networks for Real-Time Visual Tracking |
Authors | Zhipeng Zhang, Houwen Peng |
Abstract | Siamese networks have drawn great attention in visual tracking because of their balanced accuracy and speed. However, the backbone networks used in Siamese trackers are relatively shallow, such as AlexNet [18], which does not fully take advantage of the capability of modern deep neural networks. In this paper, we investigate how to leverage deeper and wider convolutional neural networks to enhance tracking robustness and accuracy. We observe that direct replacement of backbones with existing powerful architectures, such as ResNet [14] and Inception [33], does not bring improvements. The main reasons are that 1)large increases in the receptive field of neurons lead to reduced feature discriminability and localization precision; and 2) the network padding for convolutions induces a positional bias in learning. To address these issues, we propose new residual modules to eliminate the negative impact of padding, and further design new architectures using these modules with controlled receptive field size and network stride. The designed architectures are lightweight and guarantee real-time tracking speed when applied to SiamFC [2] and SiamRPN [20]. Experiments show that solely due to the proposed network architectures, our SiamFC+ and SiamRPN+ obtain up to 9.8%/5.7% (AUC), 23.3%/8.8% (EAO) and 24.4%/25.0% (EAO) relative improvements over the original versions [2, 20] on the OTB-15, VOT-16 and VOT-17 datasets, respectively. |
Tasks | Real-Time Visual Tracking, Visual Object Tracking, Visual Tracking |
Published | 2019-01-07 |
URL | http://arxiv.org/abs/1901.01660v3 |
http://arxiv.org/pdf/1901.01660v3.pdf | |
PWC | https://paperswithcode.com/paper/deeper-and-wider-siamese-networks-for-real |
Repo | https://github.com/researchmm/SiamDW |
Framework | pytorch |
Progression Modelling for Online and Early Gesture Detection
Title | Progression Modelling for Online and Early Gesture Detection |
Authors | Vikram Gupta, Sai Kumar Dwivedi, Rishabh Dabral, Arjun Jain |
Abstract | Online and Early detection of gestures is crucial for building touchless gesture based interfaces. These interfaces should operate on a stream of video frames instead of the complete video and detect the presence of gestures at an earlier stage than post-completion for providing real time user experience. To achieve this, it is important to recognize the progression of the gesture across different stages so that appropriate responses can be triggered on reaching the desired execution stage. To address this, we propose a simple yet effective multi-task learning framework which models the progression of the gesture along with frame level recognition. The proposed framework recognizes the gestures at an early stage with high precision and also achieves state-of-the-art recognition accuracy of 87.8% which is closer to human accuracy of 88.4% on the NVIDIA gesture dataset in the offline configuration and advances the state-of-the-art by more than 4%. We also introduce tightly segmented annotations for the NVIDIA gesture dataset and setup a strong baseline for gesture localization for this dataset. We also evaluate our framework on the Montalbano dataset and report competitive results. |
Tasks | Multi-Task Learning |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06672v1 |
https://arxiv.org/pdf/1909.06672v1.pdf | |
PWC | https://paperswithcode.com/paper/progression-modelling-for-online-and-early |
Repo | https://github.com/vguptai/Neo-Nvidia-Annotations |
Framework | none |
AdvCodec: Towards A Unified Framework for Adversarial Text Generation
Title | AdvCodec: Towards A Unified Framework for Adversarial Text Generation |
Authors | Boxin Wang, Hengzhi Pei, Han Liu, Bo Li |
Abstract | While there has been great interest in generating imperceptible adversarial examples in continuous data domain (e.g. image and audio) to explore the model vulnerabilities, generating \emph{adversarial text} in the discrete domain is still challenging. The main contribution of this paper is to propose a general targeted attack framework AdvCodec for adversarial text generation which addresses the challenge of discrete input space and is easily adapted to general natural language processing (NLP) tasks. In particular, we propose a tree-based autoencoder to encode discrete text data into continuous vector space, upon which we optimize the adversarial perturbation. A tree-based decoder is then applied to ensure the grammar correctness of the generated text. It also enables the flexibility of making manipulations on different levels of text, such as sentence (AdvCodec(sent)) and word (AdvCodec(word)) levels. We consider multiple attacking scenarios, including appending an adversarial sentence or adding unnoticeable words to a given paragraph, to achieve the arbitrary targeted attack. To demonstrate the effectiveness of the proposed method, we consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that AdvCodec generated adversarial text can successfully attack the neural models without misleading the human. In particular, our attack causes a BERT-based sentiment classifier accuracy to drop from 0.703$ to 0.006, and a BERT-based QA model’s F1 score to drop from 88.62 to 33.21 (with best targeted attack F1 score as 46.54). Furthermore, we show that the white-box generated adversarial texts can transfer across other black-box models, shedding light on an effective way to examine the robustness of existing NLP models. |
Tasks | Adversarial Text, Question Answering, Sentiment Analysis, Text Generation |
Published | 2019-12-22 |
URL | https://arxiv.org/abs/1912.10375v1 |
https://arxiv.org/pdf/1912.10375v1.pdf | |
PWC | https://paperswithcode.com/paper/advcodec-towards-a-unified-framework-for-1 |
Repo | https://github.com/aisecure/AdvCodec |
Framework | pytorch |
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph
Title | Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph |
Authors | Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi |
Abstract | Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts. For example, a relationship ‘man, open, door’ involves a complex relation ‘open’ between concrete entities ‘man, door’. While much of the existing work has studied this problem in the context of still images, understanding visual relationships in videos has received limited attention. Due to their temporal nature, videos enable us to model and reason about a more comprehensive set of visual relationships, such as those requiring multiple (temporal) observations (e.g., ‘man, lift up, box’ vs. ‘man, put down, box’), as well as relationships that are often correlated through time (e.g., ‘woman, pay, money’ followed by ‘woman, buy, coffee’). In this paper, we construct a Conditional Random Field on a fully-connected spatio-temporal graph that exploits the statistical dependency between relational entities spatially and temporally. We introduce a novel gated energy function parametrization that learns adaptive relations conditioned on visual observations. Our model optimization is computationally efficient, and its space computation complexity is significantly amortized through our proposed parameterization. Experimental results on benchmark video datasets (ImageNet Video and Charades) demonstrate state-of-the-art performance across three standard relationship reasoning tasks: Detection, Tagging, and Recognition. |
Tasks | |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10547v2 |
http://arxiv.org/pdf/1903.10547v2.pdf | |
PWC | https://paperswithcode.com/paper/video-relationship-reasoning-using-gated |
Repo | https://github.com/yaohungt/GSTEG_CVPR_2019 |
Framework | pytorch |
Progressive Domain Adaptation for Object Detection
Title | Progressive Domain Adaptation for Object Detection |
Authors | Han-Kai Hsu, Chun-Han Yao, Yi-Hsuan Tsai, Wei-Chih Hung, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang |
Abstract | Recent deep learning methods for object detection rely on a large amount of bounding box annotations. Collecting these annotations is laborious and costly, yet supervised models do not generalize well when testing on images from a different distribution. Domain adaptation provides a solution by adapting existing labels to the target testing data. However, a large gap between domains could make adaptation a challenging task, which leads to unstable training processes and sub-optimal results. In this paper, we propose to bridge the domain gap with an intermediate domain and progressively solve easier adaptation subtasks. This intermediate domain is constructed by translating the source images to mimic the ones in the target domain. To tackle the domain-shift problem, we adopt adversarial learning to align distributions at the feature level. In addition, a weighted task loss is applied to deal with unbalanced image quality in the intermediate domain. Experimental results show that our method performs favorably against the state-of-the-art method in terms of the performance on the target domain. |
Tasks | Domain Adaptation, Object Detection |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11319v1 |
https://arxiv.org/pdf/1910.11319v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-domain-adaptation-for-object |
Repo | https://github.com/kevinhkhsu/DA_detection |
Framework | pytorch |
A Deep Journey into Super-resolution: A survey
Title | A Deep Journey into Super-resolution: A survey |
Authors | Saeed Anwar, Salman Khan, Nick Barnes |
Abstract | Deep convolutional networks based super-resolution is a fast-growing field with numerous practical applications. In this exposition, we extensively compare 30+ state-of-the-art super-resolution Convolutional Neural Networks (CNNs) over three classical and three recently introduced challenging datasets to benchmark single image super-resolution. We introduce a taxonomy for deep-learning based super-resolution networks that groups existing methods into nine categories including linear, residual, multi-branch, recursive, progressive, attention-based and adversarial designs. We also provide comparisons between the models in terms of network complexity, memory footprint, model input and output, learning details, the type of network losses and important architectural differences (e.g., depth, skip-connections, filters). The extensive evaluation performed, shows the consistent and rapid growth in the accuracy in the past few years along with a corresponding boost in model complexity and the availability of large-scale datasets. It is also observed that the pioneering methods identified as the benchmark have been significantly outperformed by the current contenders. Despite the progress in recent years, we identify several shortcomings of existing techniques and provide future research directions towards the solution of these open problems. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07523v3 |
https://arxiv.org/pdf/1904.07523v3.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-journey-into-super-resolution-a-survey |
Repo | https://github.com/saeed-anwar/SRsurvey |
Framework | none |
Deep Neural Network Compression for Image Classification and Object Detection
Title | Deep Neural Network Compression for Image Classification and Object Detection |
Authors | Georgios Tzelepis, Ahraz Asif, Saimir Baci, Selcuk Cavdar, Eren Erdal Aksoy |
Abstract | Neural networks have been notorious for being computationally expensive. This is mainly because neural networks are often over-parametrized and most likely have redundant nodes or layers as they are getting deeper and wider. Their demand for hardware resources prohibits their extensive use in embedded devices and puts restrictions on tasks like real-time image classification or object detection. In this work, we propose a network-agnostic model compression method infused with a novel dynamical clustering approach to reduce the computational cost and memory footprint of deep neural networks. We evaluated our new compression method on five different state-of-the-art image classification and object detection networks. In classification networks, we pruned about 95% of network parameters. In advanced detection networks such as YOLOv3, our proposed compression method managed to reduce the model parameters up to 59.70% which yielded 110X less memory without sacrificing much in accuracy. |
Tasks | Image Classification, Model Compression, Neural Network Compression, Object Detection |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02747v1 |
https://arxiv.org/pdf/1910.02747v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-compression-for-image |
Repo | https://github.com/AhrazA/modelcompression-2019 |
Framework | pytorch |
Calibrated Domain-Invariant Learning for Highly Generalizable Large Scale Re-Identification
Title | Calibrated Domain-Invariant Learning for Highly Generalizable Large Scale Re-Identification |
Authors | Ye Yuan, Wuyang Chen, Tianlong Chen, Yang Yang, Zhou Ren, Zhangyang Wang, Gang Hua |
Abstract | Many real-world applications, such as city-scale traffic monitoring and control, requires large-scale re-identification. However, previous ReID methods often failed to address two limitations in existing ReID benchmarks, i.e., low spatiotemporal coverage and sample imbalance. Notwithstanding their demonstrated success in every single benchmark, they have difficulties in generalizing to unseen environments. As a result, these methods are less applicable in a large-scale setting due to poor generalization. In seek for a highly generalizable large-scale ReID method, we present an adversarial domain invariant feature learning framework (ADIN) that explicitly learns to separate identity-related features from challenging variations, where for the first time “free” annotations in ReID data such as video timestamp and camera index are utilized. Furthermore, we find that the imbalance of nuisance classes jeopardizes the adversarial training, and for mitigation we propose a calibrated adversarial loss that is attentive to nuisance distribution. Experiments on existing large-scale person vehicle ReID datasets demonstrate that ADIN learns more robust and generalizable representations, as evidenced by its outstanding direct transfer performance across datasets, which is a criterion that can better measure the generalizability of large-scale ReID methods/ |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11314v2 |
https://arxiv.org/pdf/1911.11314v2.pdf | |
PWC | https://paperswithcode.com/paper/calibrated-domain-invariant-learning-for |
Repo | https://github.com/TAMU-VITA/ADIN |
Framework | none |
An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem
Title | An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem |
Authors | Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson |
Abstract | This paper introduces a new learning-based approach for approximately solving the Travelling Salesman Problem on 2D Euclidean graphs. We use deep Graph Convolutional Networks to build efficient TSP graph representations and output tours in a non-autoregressive manner via highly parallelized beam search. Our approach outperforms all recently proposed autoregressive deep learning techniques in terms of solution quality, inference speed and sample efficiency for problem instances of fixed graph sizes. In particular, we reduce the average optimality gap from 0.52% to 0.01% for 50 nodes, and from 2.26% to 1.39% for 100 nodes. Finally, despite improving upon other learning-based approaches for TSP, our approach falls short of standard Operations Research solvers. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01227v2 |
https://arxiv.org/pdf/1906.01227v2.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-graph-convolutional-network |
Repo | https://github.com/chaitjo/graph-convnet-tsp |
Framework | pytorch |