Paper Group AWR 306
Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies. Optimal Completion Distillation for Sequence Learning. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. A Modulation Module for Multi-task Learning with Applications in Image Retrieval. Learning to Reweight Example …
Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies
Title | Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies |
Authors | Sungryull Sohn, Junhyuk Oh, Honglak Lee |
Abstract | We introduce a new RL problem where the agent is required to generalize to a previously-unseen environment characterized by a subtask graph which describes a set of subtasks and their dependencies. Unlike existing hierarchical multitask RL approaches that explicitly describe what the agent should do at a high level, our problem only describes properties of subtasks and relationships among them, which requires the agent to perform complex reasoning to find the optimal subtask to execute. To solve this problem, we propose a neural subtask graph solver (NSGS) which encodes the subtask graph using a recursive neural network embedding. To overcome the difficulty of training, we propose a novel non-parametric gradient-based policy, graph reward propagation, to pre-train our NSGS agent and further finetune it through actor-critic method. The experimental results on two 2D visual domains show that our agent can perform complex reasoning to find a near-optimal way of executing the subtask graph and generalize well to the unseen subtask graphs. In addition, we compare our agent with a Monte-Carlo tree search (MCTS) method showing that our method is much more efficient than MCTS, and the performance of NSGS can be further improved by combining it with MCTS. |
Tasks | Hierarchical Reinforcement Learning, Network Embedding |
Published | 2018-07-19 |
URL | https://arxiv.org/abs/1807.07665v4 |
https://arxiv.org/pdf/1807.07665v4.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-reinforcement-learning-for-zero |
Repo | https://github.com/srsohn/subtask-graph-execution |
Framework | none |
Optimal Completion Distillation for Sequence Learning
Title | Optimal Completion Distillation for Sequence Learning |
Authors | Sara Sabour, William Chan, Mohammad Norouzi |
Abstract | We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving $9.3%$ WER and $4.5%$ WER respectively. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01398v2 |
http://arxiv.org/pdf/1810.01398v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-completion-distillation-for-sequence |
Repo | https://github.com/SaeedNajafi/pytorch-ocd |
Framework | pytorch |
TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation
Title | TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation |
Authors | François Hernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, Yannick Estève |
Abstract | In this paper, we present TED-LIUM release 3 corpus dedicated to speech recognition in English, that multiplies by more than two the available data to train acoustic models in comparison with TED-LIUM 2. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 hours of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones, even if the HMM-based ASR system still outperforms end-to-end ASR system when the size of audio training data is 452 hours, with respectively a Word Error Rate (WER) of 6.6% and 13.7%. Last, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy one that is the same as the one existing in release 2, and a new one, calibrated and designed to make experiments on speaker adaptation. Like the two first releases, TED-LIUM 3 corpus will be freely available for the research community. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2018-05-12 |
URL | https://arxiv.org/abs/1805.04699v4 |
https://arxiv.org/pdf/1805.04699v4.pdf | |
PWC | https://paperswithcode.com/paper/ted-lium-3-twice-as-much-data-and-corpus |
Repo | https://github.com/mdangschat/speech-corpus-dl |
Framework | none |
A Modulation Module for Multi-task Learning with Applications in Image Retrieval
Title | A Modulation Module for Multi-task Learning with Applications in Image Retrieval |
Authors | Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, Ying Wu |
Abstract | Multi-task learning has been widely adopted in many computer vision tasks to improve overall computation efficiency or boost the performance of individual tasks, under the assumption that those tasks are correlated and complementary to each other. However, the relationships between the tasks are complicated in practice, especially when the number of involved tasks scales up. When two tasks are of weak relevance, they may compete or even distract each other during joint training of shared parameters, and as a consequence undermine the learning of all the tasks. This will raise destructive interference which decreases learning efficiency of shared parameters and lead to low quality loss local optimum w.r.t. shared parameters. To address the this problem, we propose a general modulation module, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition. Equipped with this module, gradient directions from different tasks can be enforced to be consistent for those shared parameters, which benefits multi-task joint training. The module is end-to-end learnable without ad-hoc design for specific tasks, and can naturally handle many tasks at the same time. We apply our approach on two retrieval tasks, face retrieval on the CelebA dataset [1] and product retrieval on the UT-Zappos50K dataset [2, 3], and demonstrate its advantage over other multi-task learning methods in both accuracy and storage efficiency. |
Tasks | Image Retrieval, Multi-Task Learning |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06708v2 |
http://arxiv.org/pdf/1807.06708v2.pdf | |
PWC | https://paperswithcode.com/paper/a-modulation-module-for-multi-task-learning |
Repo | https://github.com/Zhaoxiangyun/Multi-Task-Modulation-Module |
Framework | tf |
Learning to Reweight Examples for Robust Deep Learning
Title | Learning to Reweight Examples for Robust Deep Learning |
Authors | Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun |
Abstract | Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available. |
Tasks | Meta-Learning |
Published | 2018-03-24 |
URL | https://arxiv.org/abs/1803.09050v3 |
https://arxiv.org/pdf/1803.09050v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-reweight-examples-for-robust-deep |
Repo | https://github.com/pfnet-research/robust_estimation |
Framework | none |
Detection of REM Sleep Behaviour Disorder by Automated Polysomnography Analysis
Title | Detection of REM Sleep Behaviour Disorder by Automated Polysomnography Analysis |
Authors | Navin Cooray, Fernando Andreotti, Christine Lo, Mkael Symmonds, Michele T. M. Hu, Maarten De Vos |
Abstract | Evidence suggests Rapid-Eye-Movement (REM) Sleep Behaviour Disorder (RBD) is an early predictor of Parkinson’s disease. This study proposes a fully-automated framework for RBD detection consisting of automated sleep staging followed by RBD identification. Analysis was assessed using a limited polysomnography montage from 53 participants with RBD and 53 age-matched healthy controls. Sleep stage classification was achieved using a Random Forest (RF) classifier and 156 features extracted from electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) channels. For RBD detection, a RF classifier was trained combining established techniques to quantify muscle atonia with additional features that incorporate sleep architecture and the EMG fractal exponent. Automated multi-state sleep staging achieved a 0.62 Cohen’s Kappa score. RBD detection accuracy improved by 10% to 96% (compared to individual established metrics) when using manually annotated sleep staging. Accuracy remained high (92%) when using automated sleep staging. This study outperforms established metrics and demonstrates that incorporating sleep architecture and sleep stage transitions can benefit RBD detection. This study also achieved automated sleep staging with a level of accuracy comparable to manual annotation. This study validates a tractable, fully-automated, and sensitive pipeline for RBD identification that could be translated to wearable take-home technology. |
Tasks | EEG |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04662v1 |
http://arxiv.org/pdf/1811.04662v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-rem-sleep-behaviour-disorder-by |
Repo | https://github.com/navsnav/RBD-Sleep-Detection |
Framework | none |
Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms
Title | Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms |
Authors | Sweta Agrawal, Amit Awekar |
Abstract | Harassment by cyberbullies is a significant phenomenon on the social media. Existing works for cyberbullying detection have at least one of the following three bottlenecks. First, they target only one particular social media platform (SMP). Second, they address just one topic of cyberbullying. Third, they rely on carefully handcrafted features of the data. We show that deep learning based models can overcome all three bottlenecks. Knowledge learned by these models on one dataset can be transferred to other datasets. We performed extensive experiments using three real-world datasets: Formspring (12k posts), Twitter (16k posts), and Wikipedia(100k posts). Our experiments provide several useful insights about cyberbullying detection. To the best of our knowledge, this is the first work that systematically analyzes cyberbullying detection on various topics across multiple SMPs using deep learning based models and transfer learning. |
Tasks | Transfer Learning |
Published | 2018-01-19 |
URL | http://arxiv.org/abs/1801.06482v1 |
http://arxiv.org/pdf/1801.06482v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-detecting-cyberbullying |
Repo | https://github.com/sweta20/Detecting-Cyberbullying-Across-SMPs |
Framework | tf |
LSTM-based Network for Human Gait Stability Prediction in an Intelligent Robotic Rollator
Title | LSTM-based Network for Human Gait Stability Prediction in an Intelligent Robotic Rollator |
Authors | Georgia Chalvatzaki, Petros Koutras, Jack Hadfield, Xanthi S. Papageorgiou, Costas S. Tzafestas, Petros Maragos |
Abstract | In this work, we present a novel framework for on-line human gait stability prediction of the elderly users of an intelligent robotic rollator using Long Short Term Memory (LSTM) networks, fusing multimodal RGB-D and Laser Range Finder (LRF) data from non-wearable sensors. A Deep Learning (DL) based approach is used for the upper body pose estimation. The detected pose is used for estimating the body Center of Mass (CoM) using Unscented Kalman Filter (UKF). An Augmented Gait State Estimation framework exploits the LRF data to estimate the legs’ positions and the respective gait phase. These estimates are the inputs of an encoder-decoder sequence to sequence model which predicts the gait stability state as Safe or Fall Risk walking. It is validated with data from real patients, by exploring different network architectures, hyperparameter settings and by comparing the proposed method with other baselines. The presented LSTM-based human gait stability predictor is shown to provide robust predictions of the human stability state, and thus has the potential to be integrated into a general user-adaptive control architecture as a fall-risk alarm. |
Tasks | Pose Estimation |
Published | 2018-12-01 |
URL | http://arxiv.org/abs/1812.00252v2 |
http://arxiv.org/pdf/1812.00252v2.pdf | |
PWC | https://paperswithcode.com/paper/lstm-based-network-for-human-gait-stability |
Repo | https://github.com/gchal/gaitStability |
Framework | none |
Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data
Title | Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data |
Authors | Michael A. Hedderich, Dietrich Klakow |
Abstract | Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier’s performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00745v2 |
http://arxiv.org/pdf/1807.00745v2.pdf | |
PWC | https://paperswithcode.com/paper/training-a-neural-network-in-a-low-resource |
Repo | https://github.com/uds-lsv/Training-a-Neural-Network-in-a-Low-Resource-Setting-on-Automatically-Annotated-Noisy-Data |
Framework | none |
Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks
Title | Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks |
Authors | Marios Anthimopoulos, Stergios Christodoulidis, Lukas Ebner, Thomas Geiser, Andreas Christe, Stavroula Mougiakakou |
Abstract | Early and accurate diagnosis of interstitial lung diseases (ILDs) is crucial for making treatment decisions, but can be challenging even for experienced radiologists. The diagnostic procedure is based on the detection and recognition of the different ILD pathologies in thoracic CT scans, yet their manifestation often appears similar. In this study, we propose the use of a deep purely convolutional neural network for the semantic segmentation of ILD patterns, as the basic component of a computer aided diagnosis (CAD) system for ILDs. The proposed CNN, which consists of convolutional layers with dilated filters, takes as input a lung CT image of arbitrary size and outputs the corresponding label map. We trained and tested the network on a dataset of 172 sparsely annotated CT scans, within a cross-validation scheme. The training was performed in an end-to-end and semi-supervised fashion, utilizing both labeled and non-labeled image regions. The experimental results show significant performance improvement with respect to the state of the art. |
Tasks | Semantic Segmentation |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06167v1 |
http://arxiv.org/pdf/1803.06167v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-segmentation-of-pathological-lung |
Repo | https://github.com/intact-project/LungNet |
Framework | none |
Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow
Title | Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow |
Authors | Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox |
Abstract | Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. For the first time we compare several strategies and techniques to estimate uncertainty in a large-scale computer vision task like optical flow estimation. Moreover, we introduce a new network architecture utilizing the Winner-Takes-All loss and show that this can provide complementary hypotheses and uncertainty estimates efficiently with a single forward pass and without the need for sampling or ensembles. Finally, we demonstrate the quality of the different uncertainty estimates, which is clearly above previous confidence measures on optical flow and allows for interactive frame rates. |
Tasks | Optical Flow Estimation |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07095v4 |
http://arxiv.org/pdf/1802.07095v4.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-estimates-and-multi-hypotheses |
Repo | https://github.com/lmb-freiburg/netdef-docker |
Framework | tf |
Deep Bi-Dense Networks for Image Super-Resolution
Title | Deep Bi-Dense Networks for Image Super-Resolution |
Authors | Yucheng Wang, Jialiang Shen, Jian Zhang |
Abstract | This paper proposes Deep Bi-Dense Networks (DBDN) for single image super-resolution. Our approach extends previous intra-block dense connection approaches by including novel inter-block dense connections. In this way, feature information propagates from a single dense block to all subsequent blocks, instead of to a single successor. To build a DBDN, we firstly construct intra-dense blocks, which extract and compress abundant local features via densely connected convolutional layers and compression layers for further feature learning. Then, we use an inter-block dense net to connect intra-dense blocks, which allow each intra-dense block propagates its own local features to all successors. Additionally, our bi-dense construction connects each block to the output, alleviating the vanishing gradient problems in training. The evaluation of our proposed method on five benchmark datasets shows that our DBDN outperforms the state of the art in SISR with a moderate number of network parameters. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.04873v1 |
http://arxiv.org/pdf/1810.04873v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-bi-dense-networks-for-image-super |
Repo | https://github.com/JannaShen/DBDN |
Framework | pytorch |
Efficient Single Image Super Resolution using Enhanced Learned Group Convolutions
Title | Efficient Single Image Super Resolution using Enhanced Learned Group Convolutions |
Authors | Vandit Jain, Prakhar Bansal, Abhinav Kumar Singh, Rajeev Srivastava |
Abstract | Convolutional Neural Networks (CNNs) have demonstrated great results for the single-image super-resolution (SISR) problem. Currently, most CNN algorithms promote deep and computationally expensive models to solve SISR. However, we propose a novel SISR method that uses relatively less number of computations. On training, we get group convolutions that have unused connections removed. We have refined this system specifically for the task at hand by removing unnecessary modules from original CondenseNet. Further, a reconstruction network consisting of deconvolutional layers has been used in order to upscale to high resolution. All these steps significantly reduce the number of computations required at testing time. Along with this, bicubic upsampled input is added to the network output for easier learning. Our model is named SRCondenseNet. We evaluate the method using various benchmark datasets and show that it performs favourably against the state-of-the-art methods in terms of both accuracy and number of computations required. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-08-26 |
URL | http://arxiv.org/abs/1808.08509v1 |
http://arxiv.org/pdf/1808.08509v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-single-image-super-resolution-using |
Repo | https://github.com/vandit15/SRCondenseNet |
Framework | pytorch |
POTs: Protective Optimization Technologies
Title | POTs: Protective Optimization Technologies |
Authors | Bogdan Kulynych, Rebekah Overdorf, Carmela Troncoso, Seda Gürses |
Abstract | Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness frameworks’ ability to capture a variety of harms caused by systems. We characterize fairness limitations using concepts from requirements engineering and from social sciences. We show that the focus on algorithms’ inputs and outputs misses harms that arise from systems interacting with the world; that the focus on bias and discrimination omits broader harms on populations and their environments; and that relying on service providers excludes scenarios where they are not cooperative or intentionally adversarial. We propose Protective Optimization Technologies (POTs). POTs provide means for affected parties to address the negative impacts of systems in the environment, expanding avenues for political contestation. POTs intervene from outside the system, do not require service providers to cooperate, and can serve to correct, shift, or expose harms that systems impose on populations and their environments. We illustrate the potential and limitations of POTs in two case studies: countering road congestion caused by traffic-beating applications, and recalibrating credit scoring for loan applicants. |
Tasks | Decision Making |
Published | 2018-06-07 |
URL | https://arxiv.org/abs/1806.02711v6 |
https://arxiv.org/pdf/1806.02711v6.pdf | |
PWC | https://paperswithcode.com/paper/pots-protective-optimization-technologies |
Repo | https://github.com/spring-epfl/pots |
Framework | none |
SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval
Title | SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval |
Authors | Peng Xu, Yongye Huang, Tongtong Yuan, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Zhanyu Ma, Jun Guo |
Abstract | We propose a deep hashing framework for sketch retrieval that, for the first time, works on a multi-million scale human sketch dataset. Leveraging on this large dataset, we explore a few sketch-specific traits that were otherwise under-studied in prior literature. Instead of following the conventional sketch recognition task, we introduce the novel problem of sketch hashing retrieval which is not only more challenging, but also offers a better testbed for large-scale sketch analysis, since: (i) more fine-grained sketch feature learning is required to accommodate the large variations in style and abstraction, and (ii) a compact binary code needs to be learned at the same time to enable efficient retrieval. Key to our network design is the embedding of unique characteristics of human sketch, where (i) a two-branch CNN-RNN architecture is adapted to explore the temporal ordering of strokes, and (ii) a novel hashing loss is specifically designed to accommodate both the temporal and abstract traits of sketches. By working with a 3.8M sketch dataset, we show that state-of-the-art hashing models specifically engineered for static images fail to perform well on temporal sketch data. Our network on the other hand not only offers the best retrieval performance on various code sizes, but also yields the best generalization performance under a zero-shot setting and when re-purposed for sketch recognition. Such superior performances effectively demonstrate the benefit of our sketch-specific design. |
Tasks | Sketch Recognition |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01401v1 |
http://arxiv.org/pdf/1804.01401v1.pdf | |
PWC | https://paperswithcode.com/paper/sketchmate-deep-hashing-for-million-scale |
Repo | https://github.com/tosmaster/imagevision |
Framework | pytorch |