October 18, 2019

3158 words 15 mins read

Paper Group ANR 588

Learning Single-Image Depth from Videos using Quality Assessment Networks. A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection. Machine Theory of Mind. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement. Learning from a tiny dataset of manual annotations: a t …

Learning Single-Image Depth from Videos using Quality Assessment Networks


Title	Learning Single-Image Depth from Videos using Quality Assessment Networks
Authors	Weifeng Chen, Shengyi Qian, Jia Deng
Abstract	Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild.
Tasks	Depth Estimation
Published	2018-06-25
URL	http://arxiv.org/abs/1806.09573v3
PDF	http://arxiv.org/pdf/1806.09573v3.pdf
PWC	https://paperswithcode.com/paper/learning-single-image-depth-from-videos-using
Repo
Framework

A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection


Title	A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection
Authors	Tianxiang Liu, Ting Kei Pong, Akiko Takeda
Abstract	We consider the problem of minimizing a difference-of-convex (DC) function, which can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous possibly nonsmooth concave function. We refine the convergence analysis in [38] for the proximal DC algorithm with extrapolation (pDCA$_e$) and show that the whole sequence generated by the algorithm is convergent when the objective is level-bounded, {\em without} imposing differentiability assumptions in the concave part. Our analysis is based on a new potential function and we assume such a function is a Kurdyka-{\L}ojasiewicz (KL) function. We also establish a relationship between our KL assumption and the one used in [38]. Finally, we demonstrate how the pDCA$_e$ can be applied to a class of simultaneous sparse recovery and outlier detection problems arising from robust compressed sensing in signal processing and least trimmed squares regression in statistics. Specifically, we show that the objectives of these problems can be written as level-bounded DC functions whose concave parts are {\em typically nonsmooth}. Moreover, for a large class of loss functions and regularizers, the KL exponent of the corresponding potential function are shown to be 1/2, which implies that the pDCA$_e$ is locally linearly convergent when applied to these problems. Our numerical experiments show that the pDCA$_e$ usually outperforms the proximal DC algorithm with nonmonotone linesearch [24, Appendix A] in both CPU time and solution quality for this particular application.
Tasks	Outlier Detection
Published	2018-04-19
URL	http://arxiv.org/abs/1804.07213v1
PDF	http://arxiv.org/pdf/1804.07213v1.pdf
PWC	https://paperswithcode.com/paper/a-refined-convergence-analysis-of-pdca_e-with
Repo
Framework

Machine Theory of Mind


Title	Machine Theory of Mind
Authors	Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick
Abstract	Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans’ ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone. Through this process, it acquires a strong prior model for agents’ behaviour, as well as the ability to bootstrap to richer predictions about agents’ characteristics and mental states using only a small number of behavioural observations. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it passes classic ToM tasks such as the “Sally-Anne” test (Wimmer & Perner, 1983; Baron-Cohen et al., 1985) of recognising that others can hold false beliefs about the world. We argue that this system – which autonomously learns how to model other agents in its world – is an important step forward for developing multi-agent AI systems, for building intermediating technology for machine-human interaction, and for advancing the progress on interpretable AI.
Tasks	Meta-Learning
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07740v2
PDF	http://arxiv.org/pdf/1802.07740v2.pdf
PWC	https://paperswithcode.com/paper/machine-theory-of-mind
Repo
Framework


Title	RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
Authors	Kiwoo Shin, Youngwook Paul Kwon, Masayoshi Tomizuka
Abstract	We present RoarNet, a new approach for 3D object detection from a 2D image and 3D Lidar point clouds. Based on two-stage object detection framework with PointNet as our backbone network, we suggest several novel ideas to improve 3D object detection performance. The first part of our method, RoarNet_2D, estimates the 3D poses of objects from a monocular image, which approximates where to examine further, and derives multiple candidates that are geometrically feasible. This step significantly narrows down feasible 3D regions, which otherwise requires demanding processing of 3D point clouds in a huge search space. Then the second part, RoarNet_3D, takes the candidate regions and conducts in-depth inferences to conclude final poses in a recursive manner. Inspired by PointNet, RoarNet_3D processes 3D point clouds directly without any loss of data, leading to precise detection. We evaluate our method in KITTI, a 3D object detection benchmark. Our result shows that RoarNet has superior performance to state-of-the-art methods that are publicly available. Remarkably, RoarNet also outperforms state-of-the-art methods even in settings where Lidar and camera are not time synchronized, which is practically important for actual driving environments. RoarNet is implemented in Tensorflow and publicly available with pre-trained models.
Tasks	3D Object Detection, Object Detection
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03818v1
PDF	http://arxiv.org/pdf/1811.03818v1.pdf
PWC	https://paperswithcode.com/paper/roarnet-a-robust-3d-object-detection-based-on
Repo
Framework

Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition


Title	Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition
Authors	Tong Yu, Didier Mutter, Jacques Marescaux, Nicolas Padoy
Abstract	Vision algorithms capable of interpreting scenes from a real-time video stream are necessary for computer-assisted surgery systems to achieve context-aware behavior. In laparoscopic procedures one particular algorithm needed for such systems is the identification of surgical phases, for which the current state of the art is a model based on a CNN-LSTM. A number of previous works using models of this kind have trained them in a fully supervised manner, requiring a fully annotated dataset. Instead, our work confronts the problem of learning surgical phase recognition in scenarios presenting scarce amounts of annotated data (under 25% of all available video recordings). We propose a teacher/student type of approach, where a strong predictor called the teacher, trained beforehand on a small dataset of ground truth-annotated videos, generates synthetic annotations for a larger dataset, which another model - the student - learns from. In our case, the teacher features a novel CNN-biLSTM-CRF architecture, designed for offline inference only. The student, on the other hand, is a CNN-LSTM capable of making real-time predictions. Results for various amounts of manually annotated videos demonstrate the superiority of the new CNN-biLSTM-CRF predictor as well as improved performance from the CNN-LSTM trained using synthetic labels generated for unannotated videos. For both offline and online surgical phase recognition with very few annotated recordings available, this new teacher/student strategy provides a valuable performance improvement by efficiently leveraging the unannotated data.
Tasks
Published	2018-11-30
URL	http://arxiv.org/abs/1812.00033v2
PDF	http://arxiv.org/pdf/1812.00033v2.pdf
PWC	https://paperswithcode.com/paper/learning-from-a-tiny-dataset-of-manual
Repo
Framework

A Variational Inequality Perspective on Generative Adversarial Networks


Title	A Variational Inequality Perspective on Generative Adversarial Networks
Authors	Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien
Abstract	Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train. One common way to tackle this issue has been to propose new formulations of the GAN objective. Yet, surprisingly few studies have looked at optimization methods designed for this adversarial training. In this work, we cast GAN optimization problems in the general variational inequality framework. Tapping into the mathematical programming literature, we counter some common misconceptions about the difficulties of saddle point optimization and propose to extend techniques designed for variational inequalities to the training of GANs. We apply averaging, extrapolation and a computationally cheaper variant that we call extrapolation from the past to the stochastic gradient method (SGD) and Adam.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10551v4
PDF	http://arxiv.org/pdf/1802.10551v4.pdf
PWC	https://paperswithcode.com/paper/a-variational-inequality-perspective-on
Repo
Framework

Japanese Predicate Conjugation for Neural Machine Translation


Title	Japanese Predicate Conjugation for Neural Machine Translation
Authors	Michiki Kurosawa, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi
Abstract	Neural machine translation (NMT) has a drawback in that can generate only high-frequency words owing to the computational costs of the softmax function in the output layer. In Japanese-English NMT, Japanese predicate conjugation causes an increase in vocabulary size. For example, one verb can have as many as 19 surface varieties. In this research, we focus on predicate conjugation for compressing the vocabulary size in Japanese. The vocabulary list is filled with the various forms of verbs. We propose methods using predicate conjugation information without discarding linguistic information. The proposed methods can generate low-frequency words and deal with unknown words. Two methods were considered to introduce conjugation information: the first considers it as a token (conjugation token) and the second considers it as an embedded vector (conjugation feature). The results using these methods demonstrate that the vocabulary size can be compressed by approximately 86.1% (Tanaka corpus) and the NMT models can output the words not in the training data set. Furthermore, BLEU scores improved by 0.91 points in Japanese-to-English translation, and 0.32 points in English-to-Japanese translation with ASPEC.
Tasks	Machine Translation
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10047v1
PDF	http://arxiv.org/pdf/1805.10047v1.pdf
PWC	https://paperswithcode.com/paper/japanese-predicate-conjugation-for-neural
Repo
Framework

Investigating context features hidden in End-to-End TTS


Title	Investigating context features hidden in End-to-End TTS
Authors	Kohki Mametani, Tsuneo Kato, Seiichi Yamamoto
Abstract	Recent studies have introduced end-to-end TTS, which integrates the production of context and acoustic features in statistical parametric speech synthesis. As a result, a single neural network replaced laborious feature engineering with automated feature learning. However, little is known about what types of context information end-to-end TTS extracts from text input before synthesizing speech, and the previous knowledge about context features is barely utilized. In this work, we first point out the model similarity between end-to-end TTS and parametric TTS. Based on the similarity, we evaluate the quality of encoder outputs from an end-to-end TTS system against eight criteria that are derived from a standard set of context information used in parametric TTS. We conduct experiments using an evaluation procedure that has been newly developed in the machine learning literature for quantitative analysis of neural representations, while adapting it to the TTS domain. Experimental results show that the encoder outputs reflect both linguistic and phonetic contexts, such as vowel reduction at phoneme level, lexical stress at syllable level, and part-of-speech at word level, possibly due to the joint optimization of context and acoustic features.
Tasks	Feature Engineering, Speech Synthesis
Published	2018-11-04
URL	http://arxiv.org/abs/1811.01376v2
PDF	http://arxiv.org/pdf/1811.01376v2.pdf
PWC	https://paperswithcode.com/paper/investigating-context-features-hidden-in-end
Repo
Framework

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study


Title	Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study
Authors	Aditya Siddhant, Zachary C. Lipton
Abstract	Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.
Tasks	Active Learning
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05697v3
PDF	http://arxiv.org/pdf/1808.05697v3.pdf
PWC	https://paperswithcode.com/paper/deep-bayesian-active-learning-for-natural
Repo
Framework

Spatio-Temporal Structured Sparse Regression with Hierarchical Gaussian Process Priors


Title	Spatio-Temporal Structured Sparse Regression with Hierarchical Gaussian Process Priors
Authors	Danil Kuzin, Olga Isupova, Lyudmila Mihaylova
Abstract	This paper introduces a new sparse spatio-temporal structured Gaussian process regression framework for online and offline Bayesian inference. This is the first framework that gives a time-evolving representation of the interdependencies between the components of the sparse signal of interest. A hierarchical Gaussian process describes such structure and the interdependencies are represented via the covariance matrices of the prior distributions. The inference is based on the expectation propagation method and the theoretical derivation of the posterior distribution is provided in the paper. The inference framework is thoroughly evaluated over synthetic, real video and electroencephalography (EEG) data where the spatio-temporal evolving patterns need to be reconstructed with high accuracy. It is shown that it achieves 15% improvement of the F-measure compared with the alternating direction method of multipliers, spatio-temporal sparse Bayesian learning method and one-level Gaussian process model. Additionally, the required memory for the proposed algorithm is less than in the one-level Gaussian process model. This structured sparse regression framework is of broad applicability to source localisation and object detection problems with sparse signals.
Tasks	Bayesian Inference, EEG, Object Detection
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05561v1
PDF	http://arxiv.org/pdf/1807.05561v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-structured-sparse-regression
Repo
Framework

Dense Recurrent Neural Networks for Scene Labeling


Title	Dense Recurrent Neural Networks for Scene Labeling
Authors	Heng Fan, Haibin Ling
Abstract	Recently recurrent neural networks (RNNs) have demonstrated the ability to improve scene labeling through capturing long-range dependencies among image units. In this paper, we propose dense RNNs for scene labeling by exploring various long-range semantic dependencies among image units. In comparison with existing RNN based approaches, our dense RNNs are able to capture richer contextual dependencies for each image unit via dense connections between each pair of image units, which significantly enhances their discriminative power. Besides, to select relevant and meanwhile restrain irrelevant dependencies for each unit from dense connections, we introduce an attention model into dense RNNs. The attention model enables automatically assigning more importance to helpful dependencies while less weight to unconcerned dependencies. Integrating with convolutional neural networks (CNNs), our method achieves state-of-the-art performances on the PASCAL Context, MIT ADE20K and SiftFlow benchmarks.
Tasks	Scene Labeling
Published	2018-01-21
URL	http://arxiv.org/abs/1801.06831v1
PDF	http://arxiv.org/pdf/1801.06831v1.pdf
PWC	https://paperswithcode.com/paper/dense-recurrent-neural-networks-for-scene
Repo
Framework

Addressing Two Problems in Deep Knowledge Tracing via Prediction-Consistent Regularization


Title	Addressing Two Problems in Deep Knowledge Tracing via Prediction-Consistent Regularization
Authors	Chun-Kit Yeung, Dit-Yan Yeung
Abstract	Knowledge tracing is one of the key research areas for empowering personalized education. It is a task to model students’ mastery level of a knowledge component (KC) based on their historical learning trajectories. In recent years, a recurrent neural network model called deep knowledge tracing (DKT) has been proposed to handle the knowledge tracing task and literature has shown that DKT generally outperforms traditional methods. However, through our extensive experimentation, we have noticed two major problems in the DKT model. The first problem is that the model fails to reconstruct the observed input. As a result, even when a student performs well on a KC, the prediction of that KC’s mastery level decreases instead, and vice versa. Second, the predicted performance for KCs across time-steps is not consistent. This is undesirable and unreasonable because student’s performance is expected to transit gradually over time. To address these problems, we introduce regularization terms that correspond to reconstruction and waviness to the loss function of the original DKT model to enhance the consistency in prediction. Experiments show that the regularized loss function effectively alleviates the two problems without degrading the original task of DKT.
Tasks	Knowledge Tracing
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02180v1
PDF	http://arxiv.org/pdf/1806.02180v1.pdf
PWC	https://paperswithcode.com/paper/addressing-two-problems-in-deep-knowledge
Repo
Framework

Scientific Relation Extraction with Selectively Incorporated Concept Embeddings


Title	Scientific Relation Extraction with Selectively Incorporated Concept Embeddings
Authors	Yi Luan, Mari Ostendorf, Hannaneh Hajishirzi
Abstract	This paper describes our submission for the SemEval 2018 Task 7 shared task on semantic relation extraction and classification in scientific papers. We extend the end-to-end relation extraction model of (Miwa and Bansal) with enhancements such as a character-level encoding attention mechanism on selecting pretrained concept candidate embeddings. Our official submission ranked the second in relation classification task (Subtask 1.1 and Subtask 2 Senerio 2), and the first in the relation extraction task (Subtask 2 Scenario 1).
Tasks	Relation Classification, Relation Extraction
Published	2018-08-26
URL	http://arxiv.org/abs/1808.08643v1
PDF	http://arxiv.org/pdf/1808.08643v1.pdf
PWC	https://paperswithcode.com/paper/scientific-relation-extraction-with
Repo
Framework

Multi-Frame Quality Enhancement for Compressed Video


Title	Multi-Frame Quality Enhancement for Compressed Video
Authors	Ren Yang, Mai Xu, Zulin Wang, Tianyi Li
Abstract	The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, ignoring the similarity between consecutive frames. In this paper, we investigate that heavy quality fluctuation exists across compressed video frames, and thus low quality frames can be enhanced using the neighboring high quality frames, seen as Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as a first attempt in this direction. In our approach, we firstly develop a Support Vector Machine (SVM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are as the input. The MF-CNN compensates motion between the non-PQF and PQFs through the Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help of its nearest PQFs. Finally, the experiments validate the effectiveness and generality of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code of our MFQE approach is available at https://github.com/ryangBUAA/MFQE.git
Tasks	Motion Compensation
Published	2018-03-13
URL	http://arxiv.org/abs/1803.04680v4
PDF	http://arxiv.org/pdf/1803.04680v4.pdf
PWC	https://paperswithcode.com/paper/multi-frame-quality-enhancement-for
Repo
Framework

IEA: Inner Ensemble Average within a convolutional neural network


Title	IEA: Inner Ensemble Average within a convolutional neural network
Authors	Abduallah Mohamed, Xinrui Hua, Xianda Zhou, Christian Claudel
Abstract	Ensemble learning is a method of combining multiple trained models to improve model accuracy. We propose the usage of such methods, specifically ensemble average, inside Convolutional Neural Network (CNN) architectures by replacing the single convolutional layers with Inner Average Ensembles (IEA) of multiple convolutional layers. Empirical results on different benchmarking datasets show that CNN models using IEA outperform those with regular convolutional layers. A visual and a similarity score analysis of the features generated from IEA explains why it boosts the model performance.
Tasks
Published	2018-08-30
URL	https://arxiv.org/abs/1808.10350v5
PDF	https://arxiv.org/pdf/1808.10350v5.pdf
PWC	https://paperswithcode.com/paper/iea-inner-ensemble-average-within-a
Repo
Framework