October 20, 2019

3300 words 16 mins read

Paper Group AWR 306

Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies. Optimal Completion Distillation for Sequence Learning. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. A Modulation Module for Multi-task Learning with Applications in Image Retrieval. Learning to Reweight Example …

Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies


Title	Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies
Authors	Sungryull Sohn, Junhyuk Oh, Honglak Lee
Abstract	We introduce a new RL problem where the agent is required to generalize to a previously-unseen environment characterized by a subtask graph which describes a set of subtasks and their dependencies. Unlike existing hierarchical multitask RL approaches that explicitly describe what the agent should do at a high level, our problem only describes properties of subtasks and relationships among them, which requires the agent to perform complex reasoning to find the optimal subtask to execute. To solve this problem, we propose a neural subtask graph solver (NSGS) which encodes the subtask graph using a recursive neural network embedding. To overcome the difficulty of training, we propose a novel non-parametric gradient-based policy, graph reward propagation, to pre-train our NSGS agent and further finetune it through actor-critic method. The experimental results on two 2D visual domains show that our agent can perform complex reasoning to find a near-optimal way of executing the subtask graph and generalize well to the unseen subtask graphs. In addition, we compare our agent with a Monte-Carlo tree search (MCTS) method showing that our method is much more efficient than MCTS, and the performance of NSGS can be further improved by combining it with MCTS.
Tasks	Hierarchical Reinforcement Learning, Network Embedding
Published	2018-07-19
URL	https://arxiv.org/abs/1807.07665v4
PDF	https://arxiv.org/pdf/1807.07665v4.pdf
PWC	https://paperswithcode.com/paper/hierarchical-reinforcement-learning-for-zero
Repo	https://github.com/srsohn/subtask-graph-execution
Framework	none

Optimal Completion Distillation for Sequence Learning


Title	Optimal Completion Distillation for Sequence Learning
Authors	Sara Sabour, William Chan, Mohammad Norouzi
Abstract	We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving $9.3%$ WER and $4.5%$ WER respectively.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01398v2
PDF	http://arxiv.org/pdf/1810.01398v2.pdf
PWC	https://paperswithcode.com/paper/optimal-completion-distillation-for-sequence
Repo	https://github.com/SaeedNajafi/pytorch-ocd
Framework	pytorch

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation


Title	TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation
Authors	François Hernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, Yannick Estève
Abstract	In this paper, we present TED-LIUM release 3 corpus dedicated to speech recognition in English, that multiplies by more than two the available data to train acoustic models in comparison with TED-LIUM 2. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 hours of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones, even if the HMM-based ASR system still outperforms end-to-end ASR system when the size of audio training data is 452 hours, with respectively a Word Error Rate (WER) of 6.6% and 13.7%. Last, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy one that is the same as the one existing in release 2, and a new one, calibrated and designed to make experiments on speaker adaptation. Like the two first releases, TED-LIUM 3 corpus will be freely available for the research community.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2018-05-12
URL	https://arxiv.org/abs/1805.04699v4
PDF	https://arxiv.org/pdf/1805.04699v4.pdf
PWC	https://paperswithcode.com/paper/ted-lium-3-twice-as-much-data-and-corpus
Repo	https://github.com/mdangschat/speech-corpus-dl
Framework	none

A Modulation Module for Multi-task Learning with Applications in Image Retrieval


Title	A Modulation Module for Multi-task Learning with Applications in Image Retrieval
Authors	Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, Ying Wu
Abstract	Multi-task learning has been widely adopted in many computer vision tasks to improve overall computation efficiency or boost the performance of individual tasks, under the assumption that those tasks are correlated and complementary to each other. However, the relationships between the tasks are complicated in practice, especially when the number of involved tasks scales up. When two tasks are of weak relevance, they may compete or even distract each other during joint training of shared parameters, and as a consequence undermine the learning of all the tasks. This will raise destructive interference which decreases learning efficiency of shared parameters and lead to low quality loss local optimum w.r.t. shared parameters. To address the this problem, we propose a general modulation module, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition. Equipped with this module, gradient directions from different tasks can be enforced to be consistent for those shared parameters, which benefits multi-task joint training. The module is end-to-end learnable without ad-hoc design for specific tasks, and can naturally handle many tasks at the same time. We apply our approach on two retrieval tasks, face retrieval on the CelebA dataset [1] and product retrieval on the UT-Zappos50K dataset [2, 3], and demonstrate its advantage over other multi-task learning methods in both accuracy and storage efficiency.
Tasks	Image Retrieval, Multi-Task Learning
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06708v2
PDF	http://arxiv.org/pdf/1807.06708v2.pdf
PWC	https://paperswithcode.com/paper/a-modulation-module-for-multi-task-learning
Repo	https://github.com/Zhaoxiangyun/Multi-Task-Modulation-Module
Framework	tf

Learning to Reweight Examples for Robust Deep Learning


Title	Learning to Reweight Examples for Robust Deep Learning
Authors	Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun
Abstract	Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
Tasks	Meta-Learning
Published	2018-03-24
URL	https://arxiv.org/abs/1803.09050v3
PDF	https://arxiv.org/pdf/1803.09050v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-reweight-examples-for-robust-deep
Repo	https://github.com/pfnet-research/robust_estimation
Framework	none

Detection of REM Sleep Behaviour Disorder by Automated Polysomnography Analysis


Title	Detection of REM Sleep Behaviour Disorder by Automated Polysomnography Analysis
Authors	Navin Cooray, Fernando Andreotti, Christine Lo, Mkael Symmonds, Michele T. M. Hu, Maarten De Vos
Abstract	Evidence suggests Rapid-Eye-Movement (REM) Sleep Behaviour Disorder (RBD) is an early predictor of Parkinson’s disease. This study proposes a fully-automated framework for RBD detection consisting of automated sleep staging followed by RBD identification. Analysis was assessed using a limited polysomnography montage from 53 participants with RBD and 53 age-matched healthy controls. Sleep stage classification was achieved using a Random Forest (RF) classifier and 156 features extracted from electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) channels. For RBD detection, a RF classifier was trained combining established techniques to quantify muscle atonia with additional features that incorporate sleep architecture and the EMG fractal exponent. Automated multi-state sleep staging achieved a 0.62 Cohen’s Kappa score. RBD detection accuracy improved by 10% to 96% (compared to individual established metrics) when using manually annotated sleep staging. Accuracy remained high (92%) when using automated sleep staging. This study outperforms established metrics and demonstrates that incorporating sleep architecture and sleep stage transitions can benefit RBD detection. This study also achieved automated sleep staging with a level of accuracy comparable to manual annotation. This study validates a tractable, fully-automated, and sensitive pipeline for RBD identification that could be translated to wearable take-home technology.
Tasks	EEG
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04662v1
PDF	http://arxiv.org/pdf/1811.04662v1.pdf
PWC	https://paperswithcode.com/paper/detection-of-rem-sleep-behaviour-disorder-by
Repo	https://github.com/navsnav/RBD-Sleep-Detection
Framework	none


Title	Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms
Authors	Sweta Agrawal, Amit Awekar
Abstract	Harassment by cyberbullies is a significant phenomenon on the social media. Existing works for cyberbullying detection have at least one of the following three bottlenecks. First, they target only one particular social media platform (SMP). Second, they address just one topic of cyberbullying. Third, they rely on carefully handcrafted features of the data. We show that deep learning based models can overcome all three bottlenecks. Knowledge learned by these models on one dataset can be transferred to other datasets. We performed extensive experiments using three real-world datasets: Formspring (12k posts), Twitter (16k posts), and Wikipedia(100k posts). Our experiments provide several useful insights about cyberbullying detection. To the best of our knowledge, this is the first work that systematically analyzes cyberbullying detection on various topics across multiple SMPs using deep learning based models and transfer learning.
Tasks	Transfer Learning
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06482v1
PDF	http://arxiv.org/pdf/1801.06482v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-detecting-cyberbullying
Repo	https://github.com/sweta20/Detecting-Cyberbullying-Across-SMPs
Framework	tf

LSTM-based Network for Human Gait Stability Prediction in an Intelligent Robotic Rollator


Title	LSTM-based Network for Human Gait Stability Prediction in an Intelligent Robotic Rollator
Authors	Georgia Chalvatzaki, Petros Koutras, Jack Hadfield, Xanthi S. Papageorgiou, Costas S. Tzafestas, Petros Maragos
Abstract	In this work, we present a novel framework for on-line human gait stability prediction of the elderly users of an intelligent robotic rollator using Long Short Term Memory (LSTM) networks, fusing multimodal RGB-D and Laser Range Finder (LRF) data from non-wearable sensors. A Deep Learning (DL) based approach is used for the upper body pose estimation. The detected pose is used for estimating the body Center of Mass (CoM) using Unscented Kalman Filter (UKF). An Augmented Gait State Estimation framework exploits the LRF data to estimate the legs’ positions and the respective gait phase. These estimates are the inputs of an encoder-decoder sequence to sequence model which predicts the gait stability state as Safe or Fall Risk walking. It is validated with data from real patients, by exploring different network architectures, hyperparameter settings and by comparing the proposed method with other baselines. The presented LSTM-based human gait stability predictor is shown to provide robust predictions of the human stability state, and thus has the potential to be integrated into a general user-adaptive control architecture as a fall-risk alarm.
Tasks	Pose Estimation
Published	2018-12-01
URL	http://arxiv.org/abs/1812.00252v2
PDF	http://arxiv.org/pdf/1812.00252v2.pdf
PWC	https://paperswithcode.com/paper/lstm-based-network-for-human-gait-stability
Repo	https://github.com/gchal/gaitStability
Framework	none

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data


Title	Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data
Authors	Michael A. Hedderich, Dietrich Klakow
Abstract	Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier’s performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise.
Tasks
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00745v2
PDF	http://arxiv.org/pdf/1807.00745v2.pdf
PWC	https://paperswithcode.com/paper/training-a-neural-network-in-a-low-resource
Repo	https://github.com/uds-lsv/Training-a-Neural-Network-in-a-Low-Resource-Setting-on-Automatically-Annotated-Noisy-Data
Framework	none

Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks


Title	Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks
Authors	Marios Anthimopoulos, Stergios Christodoulidis, Lukas Ebner, Thomas Geiser, Andreas Christe, Stavroula Mougiakakou
Abstract	Early and accurate diagnosis of interstitial lung diseases (ILDs) is crucial for making treatment decisions, but can be challenging even for experienced radiologists. The diagnostic procedure is based on the detection and recognition of the different ILD pathologies in thoracic CT scans, yet their manifestation often appears similar. In this study, we propose the use of a deep purely convolutional neural network for the semantic segmentation of ILD patterns, as the basic component of a computer aided diagnosis (CAD) system for ILDs. The proposed CNN, which consists of convolutional layers with dilated filters, takes as input a lung CT image of arbitrary size and outputs the corresponding label map. We trained and tested the network on a dataset of 172 sparsely annotated CT scans, within a cross-validation scheme. The training was performed in an end-to-end and semi-supervised fashion, utilizing both labeled and non-labeled image regions. The experimental results show significant performance improvement with respect to the state of the art.
Tasks	Semantic Segmentation
Published	2018-03-16
URL	http://arxiv.org/abs/1803.06167v1
PDF	http://arxiv.org/pdf/1803.06167v1.pdf
PWC	https://paperswithcode.com/paper/semantic-segmentation-of-pathological-lung
Repo	https://github.com/intact-project/LungNet
Framework	none

Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow


Title	Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow
Authors	Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox
Abstract	Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. For the first time we compare several strategies and techniques to estimate uncertainty in a large-scale computer vision task like optical flow estimation. Moreover, we introduce a new network architecture utilizing the Winner-Takes-All loss and show that this can provide complementary hypotheses and uncertainty estimates efficiently with a single forward pass and without the need for sampling or ensembles. Finally, we demonstrate the quality of the different uncertainty estimates, which is clearly above previous confidence measures on optical flow and allows for interactive frame rates.
Tasks	Optical Flow Estimation
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07095v4
PDF	http://arxiv.org/pdf/1802.07095v4.pdf
PWC	https://paperswithcode.com/paper/uncertainty-estimates-and-multi-hypotheses
Repo	https://github.com/lmb-freiburg/netdef-docker
Framework	tf

Deep Bi-Dense Networks for Image Super-Resolution


Title	Deep Bi-Dense Networks for Image Super-Resolution
Authors	Yucheng Wang, Jialiang Shen, Jian Zhang
Abstract	This paper proposes Deep Bi-Dense Networks (DBDN) for single image super-resolution. Our approach extends previous intra-block dense connection approaches by including novel inter-block dense connections. In this way, feature information propagates from a single dense block to all subsequent blocks, instead of to a single successor. To build a DBDN, we firstly construct intra-dense blocks, which extract and compress abundant local features via densely connected convolutional layers and compression layers for further feature learning. Then, we use an inter-block dense net to connect intra-dense blocks, which allow each intra-dense block propagates its own local features to all successors. Additionally, our bi-dense construction connects each block to the output, alleviating the vanishing gradient problems in training. The evaluation of our proposed method on five benchmark datasets shows that our DBDN outperforms the state of the art in SISR with a moderate number of network parameters.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-10-11
URL	http://arxiv.org/abs/1810.04873v1
PDF	http://arxiv.org/pdf/1810.04873v1.pdf
PWC	https://paperswithcode.com/paper/deep-bi-dense-networks-for-image-super
Repo	https://github.com/JannaShen/DBDN
Framework	pytorch

Efficient Single Image Super Resolution using Enhanced Learned Group Convolutions


Title	Efficient Single Image Super Resolution using Enhanced Learned Group Convolutions
Authors	Vandit Jain, Prakhar Bansal, Abhinav Kumar Singh, Rajeev Srivastava
Abstract	Convolutional Neural Networks (CNNs) have demonstrated great results for the single-image super-resolution (SISR) problem. Currently, most CNN algorithms promote deep and computationally expensive models to solve SISR. However, we propose a novel SISR method that uses relatively less number of computations. On training, we get group convolutions that have unused connections removed. We have refined this system specifically for the task at hand by removing unnecessary modules from original CondenseNet. Further, a reconstruction network consisting of deconvolutional layers has been used in order to upscale to high resolution. All these steps significantly reduce the number of computations required at testing time. Along with this, bicubic upsampled input is added to the network output for easier learning. Our model is named SRCondenseNet. We evaluate the method using various benchmark datasets and show that it performs favourably against the state-of-the-art methods in terms of both accuracy and number of computations required.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-08-26
URL	http://arxiv.org/abs/1808.08509v1
PDF	http://arxiv.org/pdf/1808.08509v1.pdf
PWC	https://paperswithcode.com/paper/efficient-single-image-super-resolution-using
Repo	https://github.com/vandit15/SRCondenseNet
Framework	pytorch

POTs: Protective Optimization Technologies


Title	POTs: Protective Optimization Technologies
Authors	Bogdan Kulynych, Rebekah Overdorf, Carmela Troncoso, Seda Gürses
Abstract	Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness frameworks’ ability to capture a variety of harms caused by systems. We characterize fairness limitations using concepts from requirements engineering and from social sciences. We show that the focus on algorithms’ inputs and outputs misses harms that arise from systems interacting with the world; that the focus on bias and discrimination omits broader harms on populations and their environments; and that relying on service providers excludes scenarios where they are not cooperative or intentionally adversarial. We propose Protective Optimization Technologies (POTs). POTs provide means for affected parties to address the negative impacts of systems in the environment, expanding avenues for political contestation. POTs intervene from outside the system, do not require service providers to cooperate, and can serve to correct, shift, or expose harms that systems impose on populations and their environments. We illustrate the potential and limitations of POTs in two case studies: countering road congestion caused by traffic-beating applications, and recalibrating credit scoring for loan applicants.
Tasks	Decision Making
Published	2018-06-07
URL	https://arxiv.org/abs/1806.02711v6
PDF	https://arxiv.org/pdf/1806.02711v6.pdf
PWC	https://paperswithcode.com/paper/pots-protective-optimization-technologies
Repo	https://github.com/spring-epfl/pots
Framework	none

SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval


Title	SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval
Authors	Peng Xu, Yongye Huang, Tongtong Yuan, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Zhanyu Ma, Jun Guo
Abstract	We propose a deep hashing framework for sketch retrieval that, for the first time, works on a multi-million scale human sketch dataset. Leveraging on this large dataset, we explore a few sketch-specific traits that were otherwise under-studied in prior literature. Instead of following the conventional sketch recognition task, we introduce the novel problem of sketch hashing retrieval which is not only more challenging, but also offers a better testbed for large-scale sketch analysis, since: (i) more fine-grained sketch feature learning is required to accommodate the large variations in style and abstraction, and (ii) a compact binary code needs to be learned at the same time to enable efficient retrieval. Key to our network design is the embedding of unique characteristics of human sketch, where (i) a two-branch CNN-RNN architecture is adapted to explore the temporal ordering of strokes, and (ii) a novel hashing loss is specifically designed to accommodate both the temporal and abstract traits of sketches. By working with a 3.8M sketch dataset, we show that state-of-the-art hashing models specifically engineered for static images fail to perform well on temporal sketch data. Our network on the other hand not only offers the best retrieval performance on various code sizes, but also yields the best generalization performance under a zero-shot setting and when re-purposed for sketch recognition. Such superior performances effectively demonstrate the benefit of our sketch-specific design.
Tasks	Sketch Recognition
Published	2018-04-04
URL	http://arxiv.org/abs/1804.01401v1
PDF	http://arxiv.org/pdf/1804.01401v1.pdf
PWC	https://paperswithcode.com/paper/sketchmate-deep-hashing-for-million-scale
Repo	https://github.com/tosmaster/imagevision
Framework	pytorch