Paper Group ANR 588
Learning Single-Image Depth from Videos using Quality Assessment Networks. A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection. Machine Theory of Mind. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement. Learning from a tiny dataset of manual annotations: a t …
Learning Single-Image Depth from Videos using Quality Assessment Networks
Title | Learning Single-Image Depth from Videos using Quality Assessment Networks |
Authors | Weifeng Chen, Shengyi Qian, Jia Deng |
Abstract | Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild. |
Tasks | Depth Estimation |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09573v3 |
http://arxiv.org/pdf/1806.09573v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-single-image-depth-from-videos-using |
Repo | |
Framework | |
A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection
Title | A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection |
Authors | Tianxiang Liu, Ting Kei Pong, Akiko Takeda |
Abstract | We consider the problem of minimizing a difference-of-convex (DC) function, which can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous possibly nonsmooth concave function. We refine the convergence analysis in [38] for the proximal DC algorithm with extrapolation (pDCA$_e$) and show that the whole sequence generated by the algorithm is convergent when the objective is level-bounded, {\em without} imposing differentiability assumptions in the concave part. Our analysis is based on a new potential function and we assume such a function is a Kurdyka-{\L}ojasiewicz (KL) function. We also establish a relationship between our KL assumption and the one used in [38]. Finally, we demonstrate how the pDCA$_e$ can be applied to a class of simultaneous sparse recovery and outlier detection problems arising from robust compressed sensing in signal processing and least trimmed squares regression in statistics. Specifically, we show that the objectives of these problems can be written as level-bounded DC functions whose concave parts are {\em typically nonsmooth}. Moreover, for a large class of loss functions and regularizers, the KL exponent of the corresponding potential function are shown to be 1/2, which implies that the pDCA$_e$ is locally linearly convergent when applied to these problems. Our numerical experiments show that the pDCA$_e$ usually outperforms the proximal DC algorithm with nonmonotone linesearch [24, Appendix A] in both CPU time and solution quality for this particular application. |
Tasks | Outlier Detection |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07213v1 |
http://arxiv.org/pdf/1804.07213v1.pdf | |
PWC | https://paperswithcode.com/paper/a-refined-convergence-analysis-of-pdca_e-with |
Repo | |
Framework | |
Machine Theory of Mind
Title | Machine Theory of Mind |
Authors | Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick |
Abstract | Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans’ ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone. Through this process, it acquires a strong prior model for agents’ behaviour, as well as the ability to bootstrap to richer predictions about agents’ characteristics and mental states using only a small number of behavioural observations. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it passes classic ToM tasks such as the “Sally-Anne” test (Wimmer & Perner, 1983; Baron-Cohen et al., 1985) of recognising that others can hold false beliefs about the world. We argue that this system – which autonomously learns how to model other agents in its world – is an important step forward for developing multi-agent AI systems, for building intermediating technology for machine-human interaction, and for advancing the progress on interpretable AI. |
Tasks | Meta-Learning |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07740v2 |
http://arxiv.org/pdf/1802.07740v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-theory-of-mind |
Repo | |
Framework | |
RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
Title | RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement |
Authors | Kiwoo Shin, Youngwook Paul Kwon, Masayoshi Tomizuka |
Abstract | We present RoarNet, a new approach for 3D object detection from a 2D image and 3D Lidar point clouds. Based on two-stage object detection framework with PointNet as our backbone network, we suggest several novel ideas to improve 3D object detection performance. The first part of our method, RoarNet_2D, estimates the 3D poses of objects from a monocular image, which approximates where to examine further, and derives multiple candidates that are geometrically feasible. This step significantly narrows down feasible 3D regions, which otherwise requires demanding processing of 3D point clouds in a huge search space. Then the second part, RoarNet_3D, takes the candidate regions and conducts in-depth inferences to conclude final poses in a recursive manner. Inspired by PointNet, RoarNet_3D processes 3D point clouds directly without any loss of data, leading to precise detection. We evaluate our method in KITTI, a 3D object detection benchmark. Our result shows that RoarNet has superior performance to state-of-the-art methods that are publicly available. Remarkably, RoarNet also outperforms state-of-the-art methods even in settings where Lidar and camera are not time synchronized, which is practically important for actual driving environments. RoarNet is implemented in Tensorflow and publicly available with pre-trained models. |
Tasks | 3D Object Detection, Object Detection |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03818v1 |
http://arxiv.org/pdf/1811.03818v1.pdf | |
PWC | https://paperswithcode.com/paper/roarnet-a-robust-3d-object-detection-based-on |
Repo | |
Framework | |
Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition
Title | Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition |
Authors | Tong Yu, Didier Mutter, Jacques Marescaux, Nicolas Padoy |
Abstract | Vision algorithms capable of interpreting scenes from a real-time video stream are necessary for computer-assisted surgery systems to achieve context-aware behavior. In laparoscopic procedures one particular algorithm needed for such systems is the identification of surgical phases, for which the current state of the art is a model based on a CNN-LSTM. A number of previous works using models of this kind have trained them in a fully supervised manner, requiring a fully annotated dataset. Instead, our work confronts the problem of learning surgical phase recognition in scenarios presenting scarce amounts of annotated data (under 25% of all available video recordings). We propose a teacher/student type of approach, where a strong predictor called the teacher, trained beforehand on a small dataset of ground truth-annotated videos, generates synthetic annotations for a larger dataset, which another model - the student - learns from. In our case, the teacher features a novel CNN-biLSTM-CRF architecture, designed for offline inference only. The student, on the other hand, is a CNN-LSTM capable of making real-time predictions. Results for various amounts of manually annotated videos demonstrate the superiority of the new CNN-biLSTM-CRF predictor as well as improved performance from the CNN-LSTM trained using synthetic labels generated for unannotated videos. For both offline and online surgical phase recognition with very few annotated recordings available, this new teacher/student strategy provides a valuable performance improvement by efficiently leveraging the unannotated data. |
Tasks | |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00033v2 |
http://arxiv.org/pdf/1812.00033v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-a-tiny-dataset-of-manual |
Repo | |
Framework | |
A Variational Inequality Perspective on Generative Adversarial Networks
Title | A Variational Inequality Perspective on Generative Adversarial Networks |
Authors | Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien |
Abstract | Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train. One common way to tackle this issue has been to propose new formulations of the GAN objective. Yet, surprisingly few studies have looked at optimization methods designed for this adversarial training. In this work, we cast GAN optimization problems in the general variational inequality framework. Tapping into the mathematical programming literature, we counter some common misconceptions about the difficulties of saddle point optimization and propose to extend techniques designed for variational inequalities to the training of GANs. We apply averaging, extrapolation and a computationally cheaper variant that we call extrapolation from the past to the stochastic gradient method (SGD) and Adam. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10551v4 |
http://arxiv.org/pdf/1802.10551v4.pdf | |
PWC | https://paperswithcode.com/paper/a-variational-inequality-perspective-on |
Repo | |
Framework | |
Japanese Predicate Conjugation for Neural Machine Translation
Title | Japanese Predicate Conjugation for Neural Machine Translation |
Authors | Michiki Kurosawa, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi |
Abstract | Neural machine translation (NMT) has a drawback in that can generate only high-frequency words owing to the computational costs of the softmax function in the output layer. In Japanese-English NMT, Japanese predicate conjugation causes an increase in vocabulary size. For example, one verb can have as many as 19 surface varieties. In this research, we focus on predicate conjugation for compressing the vocabulary size in Japanese. The vocabulary list is filled with the various forms of verbs. We propose methods using predicate conjugation information without discarding linguistic information. The proposed methods can generate low-frequency words and deal with unknown words. Two methods were considered to introduce conjugation information: the first considers it as a token (conjugation token) and the second considers it as an embedded vector (conjugation feature). The results using these methods demonstrate that the vocabulary size can be compressed by approximately 86.1% (Tanaka corpus) and the NMT models can output the words not in the training data set. Furthermore, BLEU scores improved by 0.91 points in Japanese-to-English translation, and 0.32 points in English-to-Japanese translation with ASPEC. |
Tasks | Machine Translation |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10047v1 |
http://arxiv.org/pdf/1805.10047v1.pdf | |
PWC | https://paperswithcode.com/paper/japanese-predicate-conjugation-for-neural |
Repo | |
Framework | |
Investigating context features hidden in End-to-End TTS
Title | Investigating context features hidden in End-to-End TTS |
Authors | Kohki Mametani, Tsuneo Kato, Seiichi Yamamoto |
Abstract | Recent studies have introduced end-to-end TTS, which integrates the production of context and acoustic features in statistical parametric speech synthesis. As a result, a single neural network replaced laborious feature engineering with automated feature learning. However, little is known about what types of context information end-to-end TTS extracts from text input before synthesizing speech, and the previous knowledge about context features is barely utilized. In this work, we first point out the model similarity between end-to-end TTS and parametric TTS. Based on the similarity, we evaluate the quality of encoder outputs from an end-to-end TTS system against eight criteria that are derived from a standard set of context information used in parametric TTS. We conduct experiments using an evaluation procedure that has been newly developed in the machine learning literature for quantitative analysis of neural representations, while adapting it to the TTS domain. Experimental results show that the encoder outputs reflect both linguistic and phonetic contexts, such as vowel reduction at phoneme level, lexical stress at syllable level, and part-of-speech at word level, possibly due to the joint optimization of context and acoustic features. |
Tasks | Feature Engineering, Speech Synthesis |
Published | 2018-11-04 |
URL | http://arxiv.org/abs/1811.01376v2 |
http://arxiv.org/pdf/1811.01376v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-context-features-hidden-in-end |
Repo | |
Framework | |
Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study
Title | Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study |
Authors | Aditya Siddhant, Zachary C. Lipton |
Abstract | Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling. |
Tasks | Active Learning |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05697v3 |
http://arxiv.org/pdf/1808.05697v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-bayesian-active-learning-for-natural |
Repo | |
Framework | |
Spatio-Temporal Structured Sparse Regression with Hierarchical Gaussian Process Priors
Title | Spatio-Temporal Structured Sparse Regression with Hierarchical Gaussian Process Priors |
Authors | Danil Kuzin, Olga Isupova, Lyudmila Mihaylova |
Abstract | This paper introduces a new sparse spatio-temporal structured Gaussian process regression framework for online and offline Bayesian inference. This is the first framework that gives a time-evolving representation of the interdependencies between the components of the sparse signal of interest. A hierarchical Gaussian process describes such structure and the interdependencies are represented via the covariance matrices of the prior distributions. The inference is based on the expectation propagation method and the theoretical derivation of the posterior distribution is provided in the paper. The inference framework is thoroughly evaluated over synthetic, real video and electroencephalography (EEG) data where the spatio-temporal evolving patterns need to be reconstructed with high accuracy. It is shown that it achieves 15% improvement of the F-measure compared with the alternating direction method of multipliers, spatio-temporal sparse Bayesian learning method and one-level Gaussian process model. Additionally, the required memory for the proposed algorithm is less than in the one-level Gaussian process model. This structured sparse regression framework is of broad applicability to source localisation and object detection problems with sparse signals. |
Tasks | Bayesian Inference, EEG, Object Detection |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05561v1 |
http://arxiv.org/pdf/1807.05561v1.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-structured-sparse-regression |
Repo | |
Framework | |
Dense Recurrent Neural Networks for Scene Labeling
Title | Dense Recurrent Neural Networks for Scene Labeling |
Authors | Heng Fan, Haibin Ling |
Abstract | Recently recurrent neural networks (RNNs) have demonstrated the ability to improve scene labeling through capturing long-range dependencies among image units. In this paper, we propose dense RNNs for scene labeling by exploring various long-range semantic dependencies among image units. In comparison with existing RNN based approaches, our dense RNNs are able to capture richer contextual dependencies for each image unit via dense connections between each pair of image units, which significantly enhances their discriminative power. Besides, to select relevant and meanwhile restrain irrelevant dependencies for each unit from dense connections, we introduce an attention model into dense RNNs. The attention model enables automatically assigning more importance to helpful dependencies while less weight to unconcerned dependencies. Integrating with convolutional neural networks (CNNs), our method achieves state-of-the-art performances on the PASCAL Context, MIT ADE20K and SiftFlow benchmarks. |
Tasks | Scene Labeling |
Published | 2018-01-21 |
URL | http://arxiv.org/abs/1801.06831v1 |
http://arxiv.org/pdf/1801.06831v1.pdf | |
PWC | https://paperswithcode.com/paper/dense-recurrent-neural-networks-for-scene |
Repo | |
Framework | |
Addressing Two Problems in Deep Knowledge Tracing via Prediction-Consistent Regularization
Title | Addressing Two Problems in Deep Knowledge Tracing via Prediction-Consistent Regularization |
Authors | Chun-Kit Yeung, Dit-Yan Yeung |
Abstract | Knowledge tracing is one of the key research areas for empowering personalized education. It is a task to model students’ mastery level of a knowledge component (KC) based on their historical learning trajectories. In recent years, a recurrent neural network model called deep knowledge tracing (DKT) has been proposed to handle the knowledge tracing task and literature has shown that DKT generally outperforms traditional methods. However, through our extensive experimentation, we have noticed two major problems in the DKT model. The first problem is that the model fails to reconstruct the observed input. As a result, even when a student performs well on a KC, the prediction of that KC’s mastery level decreases instead, and vice versa. Second, the predicted performance for KCs across time-steps is not consistent. This is undesirable and unreasonable because student’s performance is expected to transit gradually over time. To address these problems, we introduce regularization terms that correspond to reconstruction and waviness to the loss function of the original DKT model to enhance the consistency in prediction. Experiments show that the regularized loss function effectively alleviates the two problems without degrading the original task of DKT. |
Tasks | Knowledge Tracing |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02180v1 |
http://arxiv.org/pdf/1806.02180v1.pdf | |
PWC | https://paperswithcode.com/paper/addressing-two-problems-in-deep-knowledge |
Repo | |
Framework | |
Scientific Relation Extraction with Selectively Incorporated Concept Embeddings
Title | Scientific Relation Extraction with Selectively Incorporated Concept Embeddings |
Authors | Yi Luan, Mari Ostendorf, Hannaneh Hajishirzi |
Abstract | This paper describes our submission for the SemEval 2018 Task 7 shared task on semantic relation extraction and classification in scientific papers. We extend the end-to-end relation extraction model of (Miwa and Bansal) with enhancements such as a character-level encoding attention mechanism on selecting pretrained concept candidate embeddings. Our official submission ranked the second in relation classification task (Subtask 1.1 and Subtask 2 Senerio 2), and the first in the relation extraction task (Subtask 2 Scenario 1). |
Tasks | Relation Classification, Relation Extraction |
Published | 2018-08-26 |
URL | http://arxiv.org/abs/1808.08643v1 |
http://arxiv.org/pdf/1808.08643v1.pdf | |
PWC | https://paperswithcode.com/paper/scientific-relation-extraction-with |
Repo | |
Framework | |
Multi-Frame Quality Enhancement for Compressed Video
Title | Multi-Frame Quality Enhancement for Compressed Video |
Authors | Ren Yang, Mai Xu, Zulin Wang, Tianyi Li |
Abstract | The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, ignoring the similarity between consecutive frames. In this paper, we investigate that heavy quality fluctuation exists across compressed video frames, and thus low quality frames can be enhanced using the neighboring high quality frames, seen as Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as a first attempt in this direction. In our approach, we firstly develop a Support Vector Machine (SVM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are as the input. The MF-CNN compensates motion between the non-PQF and PQFs through the Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help of its nearest PQFs. Finally, the experiments validate the effectiveness and generality of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code of our MFQE approach is available at https://github.com/ryangBUAA/MFQE.git |
Tasks | Motion Compensation |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04680v4 |
http://arxiv.org/pdf/1803.04680v4.pdf | |
PWC | https://paperswithcode.com/paper/multi-frame-quality-enhancement-for |
Repo | |
Framework | |
IEA: Inner Ensemble Average within a convolutional neural network
Title | IEA: Inner Ensemble Average within a convolutional neural network |
Authors | Abduallah Mohamed, Xinrui Hua, Xianda Zhou, Christian Claudel |
Abstract | Ensemble learning is a method of combining multiple trained models to improve model accuracy. We propose the usage of such methods, specifically ensemble average, inside Convolutional Neural Network (CNN) architectures by replacing the single convolutional layers with Inner Average Ensembles (IEA) of multiple convolutional layers. Empirical results on different benchmarking datasets show that CNN models using IEA outperform those with regular convolutional layers. A visual and a similarity score analysis of the features generated from IEA explains why it boosts the model performance. |
Tasks | |
Published | 2018-08-30 |
URL | https://arxiv.org/abs/1808.10350v5 |
https://arxiv.org/pdf/1808.10350v5.pdf | |
PWC | https://paperswithcode.com/paper/iea-inner-ensemble-average-within-a |
Repo | |
Framework | |