Paper Group ANR 751
Three Mechanisms of Weight Decay Regularization. Hull Form Optimization with Principal Component Analysis and Deep Neural Network. SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis. Joint Estimation of Room Geometry and Modes with Compressed Sensing. End-to-End Monaural Multi-speaker ASR System without Pretra …
Three Mechanisms of Weight Decay Regularization
Title | Three Mechanisms of Weight Decay Regularization |
Authors | Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse |
Abstract | Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization. Literal weight decay has been shown to outperform $L_2$ regularization for optimizers for which they differ. We empirically investigate weight decay for three optimization algorithms (SGD, Adam, and K-FAC) and a variety of network architectures. We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization. Our results provide insight into how to improve the regularization of neural networks. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12281v1 |
http://arxiv.org/pdf/1810.12281v1.pdf | |
PWC | https://paperswithcode.com/paper/three-mechanisms-of-weight-decay |
Repo | |
Framework | |
Hull Form Optimization with Principal Component Analysis and Deep Neural Network
Title | Hull Form Optimization with Principal Component Analysis and Deep Neural Network |
Authors | Dongchi Yu, Lu Wang |
Abstract | Designing and modifying complex hull forms for optimal vessel performances have been a major challenge for naval architects. In the present study, Principal Component Analysis (PCA) is introduced to compress the geometric representation of a group of existing vessels, and the resulting principal scores are manipulated to generate a large number of derived hull forms, which are evaluated computationally for their calm-water performances. The results are subsequently used to train a Deep Neural Network (DNN) to accurately establish the relation between different hull forms and their associated performances. Then, based on the fast, parallel DNN-based hull-form evaluation, the large-scale search for optimal hull forms is performed. |
Tasks | |
Published | 2018-10-27 |
URL | http://arxiv.org/abs/1810.11701v1 |
http://arxiv.org/pdf/1810.11701v1.pdf | |
PWC | https://paperswithcode.com/paper/hull-form-optimization-with-principal |
Repo | |
Framework | |
SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis
Title | SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis |
Authors | Mina Bishay, Petar Palasek, Stefan Priebe, Ioannis Patras |
Abstract | Patients with schizophrenia often display impairments in the expression of emotion and speech and those are observed in their facial behaviour. Automatic analysis of patients’ facial expressions that is aimed at estimating symptoms of schizophrenia has received attention recently. However, the datasets that are typically used for training and evaluating the developed methods, contain only a small number of patients (4-34) and are recorded while the subjects were performing controlled tasks such as listening to life vignettes, or answering emotional questions. In this paper, we use videos of professional-patient interviews, in which symptoms were assessed in a standardised way as they should/may be assessed in practice, and which were recorded in realistic conditions (i.e. varying illumination levels and camera viewpoints) at the patients’ homes or at mental health services. We automatically analyse the facial behaviour of 91 out-patients - this is almost 3 times the number of patients in other studies - and propose SchiNet, a novel neural network architecture that estimates expression-related symptoms in two different assessment interviews. We evaluate the proposed SchiNet for patient-independent prediction of symptoms of schizophrenia. Experimental results show that some automatically detected facial expressions are significantly correlated to symptoms of schizophrenia, and that the proposed network for estimating symptom severity delivers promising results. |
Tasks | |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02531v1 |
http://arxiv.org/pdf/1808.02531v1.pdf | |
PWC | https://paperswithcode.com/paper/schinet-automatic-estimation-of-symptoms-of |
Repo | |
Framework | |
Joint Estimation of Room Geometry and Modes with Compressed Sensing
Title | Joint Estimation of Room Geometry and Modes with Compressed Sensing |
Authors | Helena Peić Tukuljac, Thach Pham Vu, Hervé Lissek, Pierre Vandergheynst |
Abstract | Acoustical behavior of a room for a given position of microphone and sound source is usually described using the room impulse response. If we rely on the standard uniform sampling, the estimation of room impulse response for arbitrary positions in the room requires a large number of measurements. In order to lower the required sampling rate, some solutions have emerged that exploit the sparse representation of the room wavefield in the terms of plane waves in the low-frequency domain. The plane wave representation has a simple form in rectangular rooms. In our solution, we observe the basic axial modes of the wave vector grid for extraction of the room geometry and then we propagate the knowledge to higher order modes out of the low-pass version of the measurements. Estimation of the approximate structure of the $k$-space should lead to the reduction in the terms of number of required measurements and in the increase of the speed of the reconstruction without great losses of quality. |
Tasks | |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.05879v1 |
http://arxiv.org/pdf/1802.05879v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-estimation-of-room-geometry-and-modes |
Repo | |
Framework | |
End-to-End Monaural Multi-speaker ASR System without Pretraining
Title | End-to-End Monaural Multi-speaker ASR System without Pretraining |
Authors | Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe |
Abstract | Recently, end-to-end models have become a popular approach as an alternative to traditional hybrid models in automatic speech recognition (ASR). The multi-speaker speech separation and recognition task is a central task in cocktail party problem. In this paper, we present a state-of-the-art monaural multi-speaker end-to-end automatic speech recognition model. In contrast to previous studies on the monaural multi-speaker speech recognition, this end-to-end framework is trained to recognize multiple label sequences completely from scratch. The system only requires the speech mixture and corresponding label sequences, without needing any indeterminate supervisions obtained from non-mixture speech or corresponding labels/alignments. Moreover, we exploited using the individual attention module for each separated speaker and the scheduled sampling to further improve the performance. Finally, we evaluate the proposed model on the 2-speaker mixed speech generated from the WSJ corpus and the wsj0-2mix dataset, which is a speech separation and recognition benchmark. The experiments demonstrate that the proposed methods can improve the performance of the end-to-end model in separating the overlapping speech and recognizing the separated streams. From the results, the proposed model leads to ~10.0% relative performance gains in terms of CER and WER respectively. |
Tasks | Speech Recognition, Speech Separation |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.02062v1 |
http://arxiv.org/pdf/1811.02062v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-monaural-multi-speaker-asr-system |
Repo | |
Framework | |
Are you eligible? Predicting adulthood from face images via class specific mean autoencoder
Title | Are you eligible? Predicting adulthood from face images via class specific mean autoencoder |
Authors | Maneet Singh, Shruti Nagpal, Mayank Vatsa, Richa Singh |
Abstract | Predicting if a person is an adult or a minor has several applications such as inspecting underage driving, preventing purchase of alcohol and tobacco by minors, and granting restricted access. The challenging nature of this problem arises due to the complex and unique physiological changes that are observed with age progression. This paper presents a novel deep learning based formulation, termed as Class Specific Mean Autoencoder, to learn the intra-class similarity and extract class-specific features. We propose that the feature of a particular class if brought similar/closer to the mean feature of that class can help in learning class-specific representations. The proposed formulation is applied for the task of adulthood classification which predicts whether the given face image is of an adult or not. Experiments are performed on two large databases and the results show that the proposed algorithm yields higher classification accuracy compared to existing algorithms and a Commercial-Off-The-Shelf system. |
Tasks | |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07385v1 |
http://arxiv.org/pdf/1803.07385v1.pdf | |
PWC | https://paperswithcode.com/paper/are-you-eligible-predicting-adulthood-from |
Repo | |
Framework | |
Planning in Dynamic Environments with Conditional Autoregressive Models
Title | Planning in Dynamic Environments with Conditional Autoregressive Models |
Authors | Johanna Hansen, Kyle Kastner, Aaron Courville, Gregory Dudek |
Abstract | We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oord et al., 2017b) for forward planning with MCTS. In order to test this method, we introduce a new environment featuring varying difficulty levels, along with moving goals and obstacles. The combination of high-quality frame generation and classical planning approaches nearly matches true environment performance for our task, demonstrating the usefulness of this method for model-based planning in dynamic environments. |
Tasks | |
Published | 2018-11-25 |
URL | http://arxiv.org/abs/1811.10097v1 |
http://arxiv.org/pdf/1811.10097v1.pdf | |
PWC | https://paperswithcode.com/paper/planning-in-dynamic-environments-with |
Repo | |
Framework | |
A Pipeline for Lenslet Light Field Quality Enhancement
Title | A Pipeline for Lenslet Light Field Quality Enhancement |
Authors | Pierre Matysiak, Mairéad Grogan, Mikaël Le Pendu, Martin Alain, Aljosa Smolic |
Abstract | In recent years, light fields have become a major research topic and their applications span across the entire spectrum of classical image processing. Among the different methods used to capture a light field are the lenslet cameras, such as those developed by Lytro. While these cameras give a lot of freedom to the user, they also create light field views that suffer from a number of artefacts. As a result, it is common to ignore a significant subset of these views when doing high-level light field processing. We propose a pipeline to process light field views, first with an enhanced processing of RAW images to extract subaperture images, then a colour correction process using a recent colour transfer algorithm, and finally a denoising process using a state of the art light field denoising approach. We show that our method improves the light field quality on many levels, by reducing ghosting artefacts and noise, as well as retrieving more accurate and homogeneous colours across the sub-aperture images. |
Tasks | Denoising |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05387v1 |
http://arxiv.org/pdf/1808.05387v1.pdf | |
PWC | https://paperswithcode.com/paper/a-pipeline-for-lenslet-light-field-quality |
Repo | |
Framework | |
Log-concave sampling: Metropolis-Hastings algorithms are fast
Title | Log-concave sampling: Metropolis-Hastings algorithms are fast |
Authors | Raaz Dwivedi, Yuansi Chen, Martin J. Wainwright, Bin Yu |
Abstract | We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA). The method draws samples by simulating a Markov chain obtained from the discretization of an appropriate Langevin diffusion, combined with an accept-reject step. Relative to known guarantees for the unadjusted Langevin algorithm (ULA), our bounds show that the use of an accept-reject step in MALA leads to an exponentially improved dependence on the error-tolerance. Concretely, in order to obtain samples with TV error at most $\delta$ for a density with condition number $\kappa$, we show that MALA requires $\mathcal{O} \big(\kappa d \log(1/\delta) \big)$ steps, as compared to the $\mathcal{O} \big(\kappa^2 d/\delta^2 \big)$ steps established in past work on ULA. We also demonstrate the gains of MALA over ULA for weakly log-concave densities. Furthermore, we derive mixing time bounds for the Metropolized random walk (MRW) and obtain $\mathcal{O}(\kappa)$ mixing time slower than MALA. We provide numerical examples that support our theoretical findings, and demonstrate the benefits of Metropolis-Hastings adjustment for Langevin-type sampling algorithms. |
Tasks | |
Published | 2018-01-08 |
URL | https://arxiv.org/abs/1801.02309v4 |
https://arxiv.org/pdf/1801.02309v4.pdf | |
PWC | https://paperswithcode.com/paper/log-concave-sampling-metropolis-hastings |
Repo | |
Framework | |
Validating Hyperspectral Image Segmentation
Title | Validating Hyperspectral Image Segmentation |
Authors | Jakub Nalepa, Michal Myller, Michal Kawulok |
Abstract | Hyperspectral satellite imaging attracts enormous research attention in the remote sensing community, hence automated approaches for precise segmentation of such imagery are being rapidly developed. In this letter, we share our observations on the strategy for validating hyperspectral image segmentation algorithms currently followed in the literature, and show that it can lead to over-optimistic experimental insights. We introduce a new routine for generating segmentation benchmarks, and use it to elaborate ready-to-use hyperspectral training-test data partitions. They can be utilized for fair validation of new and existing algorithms without any training-test data leakage. |
Tasks | Hyperspectral Image Segmentation, Semantic Segmentation |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03707v1 |
http://arxiv.org/pdf/1811.03707v1.pdf | |
PWC | https://paperswithcode.com/paper/validating-hyperspectral-image-segmentation |
Repo | |
Framework | |
Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise
Title | Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise |
Authors | Nikhil Krishnaswamy, Scott Friedman, James Pustejovsky |
Abstract | Many modern machine learning approaches require vast amounts of training data to learn new concepts; conversely, human learning often requires few examples–sometimes only one–from which the learner can abstract structural concepts. We present a novel approach to introducing new spatial structures to an AI agent, combining deep learning over qualitative spatial relations with various heuristic search algorithms. The agent extracts spatial relations from a sparse set of noisy examples of block-based structures, and trains convolutional and sequential models of those relation sets. To create novel examples of similar structures, the agent begins placing blocks on a virtual table, uses a CNN to predict the most similar complete example structure after each placement, an LSTM to predict the most likely set of remaining moves needed to complete it, and recommends one using heuristic search. We verify that the agent learned the concept by observing its virtual block-building activities, wherein it ranks each potential subsequent action toward building its learned concept. We empirically assess this approach with human participants’ ratings of the block structures. Initial results and qualitative evaluations of structures generated by the trained agent show where it has generalized concepts from the training data, which heuristics perform best within the search space, and how we might improve learning and execution. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11064v1 |
http://arxiv.org/pdf/1811.11064v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-deep-learning-and-qualitative |
Repo | |
Framework | |
Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk
Title | Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk |
Authors | Romain Brégier, Frédéric Devernay, Laetitia Leyrit, James Crowley |
Abstract | While 3D object detection and pose estimation has been studied for a long time, its evaluation is not yet completely satisfactory. Indeed, existing datasets typically consist in numerous acquisitions of only a few scenes because of the tediousness of pose annotation, and existing evaluation protocols cannot handle properly objects with symmetries. This work aims at addressing those two points. We first present automatic techniques to produce fully annotated RGBD data of many object instances in arbitrary poses, with which we produce a dataset of thousands of independent scenes of bulk parts composed of both real and synthetic images. We then propose a consistent evaluation methodology suitable for any rigid object, regardless of its symmetries. We illustrate it with two reference object detection and pose estimation methods on different objects, and show that incorporating symmetry considerations into pose estimation methods themselves can lead to significant performance gains. The proposed dataset is available at http://rbregier.github.io/dataset2017. |
Tasks | 3D Object Detection, Object Detection, Pose Estimation |
Published | 2018-06-21 |
URL | http://arxiv.org/abs/1806.08129v1 |
http://arxiv.org/pdf/1806.08129v1.pdf | |
PWC | https://paperswithcode.com/paper/symmetry-aware-evaluation-of-3d-object |
Repo | |
Framework | |
A Pulmonary Nodule Detection Model Based on Progressive Resolution and Hierarchical Saliency
Title | A Pulmonary Nodule Detection Model Based on Progressive Resolution and Hierarchical Saliency |
Authors | Junjie Zhang, Yong Xia, Yanning Zhang |
Abstract | Detection of pulmonary nodules on chest CT is an essential step in the early diagnosis of lung cancer, which is critical for best patient care. Although a number of computer-aided nodule detection methods have been published in the literature, these methods still have two major drawbacks: missing out true nodules during the detection of nodule candidates and less-accurate identification of nodules from non-nodule. In this paper, we propose an automated pulmonary nodule detection algorithm that jointly combines progressive resolution and hierarchical saliency. Specifically, we design a 3D progressive resolution-based densely dilated FCN, namely the progressive resolution network (PRN), to detect nodule candidates inside the lung, and construct a densely dilated 3D CNN with hierarchical saliency, namely the hierarchical saliency network (HSN), to simultaneously identify genuine nodules from those candidates and estimate the diameters of nodules. We evaluated our algorithm on the benchmark LUng Nodule Analysis 2016 (LUNA16) dataset and achieved a state-of-the-art detection score. Our results suggest that the proposed algorithm can effectively detect pulmonary nodules on chest CT and accurately estimate their diameters. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00598v1 |
http://arxiv.org/pdf/1807.00598v1.pdf | |
PWC | https://paperswithcode.com/paper/a-pulmonary-nodule-detection-model-based-on |
Repo | |
Framework | |
SeerNet at SemEval-2018 Task 1: Domain Adaptation for Affect in Tweets
Title | SeerNet at SemEval-2018 Task 1: Domain Adaptation for Affect in Tweets |
Authors | Venkatesh Duppada, Royal Jain, Sushant Hiray |
Abstract | The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from -3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1st position out of 75 teams which participated in the fore-mentioned sub-tasks. We outperform the baseline model by margins ranging from 49.2% to 76.4%, thus, pushing the state-of-the-art significantly. |
Tasks | Domain Adaptation |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06137v1 |
http://arxiv.org/pdf/1804.06137v1.pdf | |
PWC | https://paperswithcode.com/paper/seernet-at-semeval-2018-task-1-domain |
Repo | |
Framework | |
Word Familiarity and Frequency
Title | Word Familiarity and Frequency |
Authors | Kumiko Tanaka-Ishii, Hiroshi Terada |
Abstract | Word frequency is assumed to correlate with word familiarity, but the strength of this correlation has not been thoroughly investigated. In this paper, we report on our analysis of the correlation between a word familiarity rating list obtained through a psycholinguistic experiment and the log-frequency obtained from various corpora of different kinds and sizes (up to the terabyte scale) for English and Japanese. Major findings are threefold: First, for a given corpus, familiarity is necessary for a word to achieve high frequency, but familiar words are not necessarily frequent. Second, correlation increases with the corpus data size. Third, a corpus of spoken language correlates better than one of written language. These findings suggest that cognitive familiarity ratings are correlated to frequency, but more highly to that of spoken rather than written language. |
Tasks | |
Published | 2018-06-09 |
URL | http://arxiv.org/abs/1806.03431v1 |
http://arxiv.org/pdf/1806.03431v1.pdf | |
PWC | https://paperswithcode.com/paper/word-familiarity-and-frequency |
Repo | |
Framework | |