Paper Group AWR 20
Burst Denoising with Kernel Prediction Networks. Deep Functional Maps: Structured Prediction for Dense Shape Correspondence. Encoding of phonology in a recurrent neural model of grounded speech. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). An Introduction to Deep Learning for the Ph …
Burst Denoising with Kernel Prediction Networks
Title | Burst Denoising with Kernel Prediction Networks |
Authors | Ben Mildenhall, Jonathan T. Barron, Jiawen Chen, Dillon Sharlet, Ren Ng, Robert Carroll |
Abstract | We present a technique for jointly denoising bursts of images taken from a handheld camera. In particular, we propose a convolutional neural network architecture for predicting spatially varying kernels that can both align and denoise frames, a synthetic data generation approach based on a realistic noise formation model, and an optimization guided by an annealed loss function to avoid undesirable local minima. Our model matches or outperforms the state-of-the-art across a wide range of noise levels on both real and synthetic data. |
Tasks | Denoising, Synthetic Data Generation |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02327v2 |
http://arxiv.org/pdf/1712.02327v2.pdf | |
PWC | https://paperswithcode.com/paper/burst-denoising-with-kernel-prediction |
Repo | https://github.com/Pavelrst/DIP_Project |
Framework | tf |
Deep Functional Maps: Structured Prediction for Dense Shape Correspondence
Title | Deep Functional Maps: Structured Prediction for Dense Shape Correspondence |
Authors | Or Litany, Tal Remez, Emanuele Rodolà, Alex M. Bronstein, Michael M. Bronstein |
Abstract | We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a paradigm shift and design a structured prediction model in the space of functional maps, linear operators that provide a compact representation of the correspondence. We model the learning process via a deep residual network which takes dense descriptor fields defined on two shapes as input, and outputs a soft map between the two given objects. The resulting correspondence is shown to be accurate on several challenging benchmarks comprising multiple categories, synthetic models, real scans with acquisition artifacts, topological noise, and partiality. |
Tasks | Structured Prediction |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08686v2 |
http://arxiv.org/pdf/1704.08686v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-functional-maps-structured-prediction |
Repo | https://github.com/JM-data/Unsupervised_DeepFunctionalMaps |
Framework | tf |
Encoding of phonology in a recurrent neural model of grounded speech
Title | Encoding of phonology in a recurrent neural model of grounded speech |
Authors | Afra Alishahi, Marie Barking, Grzegorz Chrupała |
Abstract | We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics. |
Tasks | |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03815v2 |
http://arxiv.org/pdf/1706.03815v2.pdf | |
PWC | https://paperswithcode.com/paper/encoding-of-phonology-in-a-recurrent-neural |
Repo | https://github.com/gchrupala/encoding-of-phonology |
Framework | none |
How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
Title | How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks) |
Authors | Adrian Bulat, Georgios Tzimiropoulos |
Abstract | This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large yet synthetically expanded 2D facial landmark dataset and finally evaluate it on all other 2D facial landmark datasets. (b) We create a guided by 2D landmarks network which converts 2D landmark annotations to 3D and unifies all existing datasets, leading to the creation of LS3D-W, the largest and most challenging 3D facial landmark dataset to date ~230,000 images. (c) Following that, we train a neural network for 3D face alignment and evaluate it on the newly introduced LS3D-W. (d) We further look into the effect of all “traditional” factors affecting face alignment performance like large pose, initialization and resolution, and introduce a “new” one, namely the size of the network. (e) We show that both 2D and 3D face alignment networks achieve performance of remarkable accuracy which is probably close to saturating the datasets used. Training and testing code as well as the dataset can be downloaded from https://www.adrianbulat.com/face-alignment/ |
Tasks | Face Alignment |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07332v3 |
http://arxiv.org/pdf/1703.07332v3.pdf | |
PWC | https://paperswithcode.com/paper/how-far-are-we-from-solving-the-2d-3d-face |
Repo | https://github.com/1adrianb/2D-and-3D-face-alignment |
Framework | torch |
An Introduction to Deep Learning for the Physical Layer
Title | An Introduction to Deep Learning for the Physical Layer |
Authors | Timothy J. O’Shea, Jakob Hoydis |
Abstract | We present and discuss several novel applications of deep learning for the physical layer. By interpreting a communications system as an autoencoder, we develop a fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process. We show how this idea can be extended to networks of multiple transmitters and receivers and present the concept of radio transformer networks as a means to incorporate expert domain knowledge in the machine learning model. Lastly, we demonstrate the application of convolutional neural networks on raw IQ samples for modulation classification which achieves competitive accuracy with respect to traditional schemes relying on expert features. The paper is concluded with a discussion of open challenges and areas for future investigation. |
Tasks | |
Published | 2017-02-02 |
URL | http://arxiv.org/abs/1702.00832v2 |
http://arxiv.org/pdf/1702.00832v2.pdf | |
PWC | https://paperswithcode.com/paper/an-introduction-to-deep-learning-for-the |
Repo | https://github.com/vidits-kth/py-radio-autoencoder |
Framework | tf |
Deep Convolutional Denoising of Low-Light Images
Title | Deep Convolutional Denoising of Low-Light Images |
Authors | Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein |
Abstract | Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than ever due to the booming market for mobile cameras. Restricted form factor limits the amount of absorbed light, thus computational post-processing is called for. In this paper, we make use of the powerful framework of deep convolutional neural networks for Poisson denoising. We demonstrate how by training the same network with images having a specific peak value, our denoiser outperforms previous state-of-the-art by a large margin both visually and quantitatively. Being flexible and data-driven, our solution resolves the heavy ad hoc engineering used in previous methods and is an order of magnitude faster. We further show that by adding a reasonable prior on the class of the image being processed, another significant boost in performance is achieved. |
Tasks | Denoising |
Published | 2017-01-06 |
URL | http://arxiv.org/abs/1701.01687v1 |
http://arxiv.org/pdf/1701.01687v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-denoising-of-low-light |
Repo | https://github.com/TalRemez/deep_class_aware_denoising |
Framework | tf |
Provably Fair Representations
Title | Provably Fair Representations |
Authors | Daniel McNamara, Cheng Soon Ong, Robert C. Williamson |
Abstract | Machine learning systems are increasingly used to make decisions about people’s lives, such as whether to give someone a loan or whether to interview someone for a job. This has led to considerable interest in making such machine learning systems fair. One approach is to transform the input data used by the algorithm. This can be achieved by passing each input data point through a representation function prior to its use in training or testing. Techniques for learning such representation functions from data have been successful empirically, but typically lack theoretical fairness guarantees. We show that it is possible to prove that a representation function is fair according to common measures of both group and individual fairness, as well as useful with respect to a target task. These provable properties can be used in a governance model involving a data producer, a data user and a data regulator, where there is a separation of concerns between fairness and target task utility to ensure transparency and prevent perverse incentives. We formally define the ‘cost of mistrust’ of using this model compared to the setting where there is a single trusted party, and provide bounds on this cost in particular cases. We present a practical approach to learning fair representation functions and apply it to financial and criminal justice datasets. We evaluate the fairness and utility of these representation functions using measures motivated by our theoretical results. |
Tasks | |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04394v1 |
http://arxiv.org/pdf/1710.04394v1.pdf | |
PWC | https://paperswithcode.com/paper/provably-fair-representations |
Repo | https://github.com/eth-sri/lcifr |
Framework | pytorch |
Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs
Title | Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs |
Authors | Jun Yu, Xingxin Xu, Fei Gao, Shengjie Shi, Meng Wang, Dacheng Tao, Qingming Huang |
Abstract | Face photo-sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It is of wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this work, we propose to use the facial composition information to help the synthesis of face sketch/photo. Specially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. In CA-GAN, we utilize paired inputs including a face photo/sketch and the corresponding pixel-wise face labels for generating a sketch/photo. In addition, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. Finally, we use stacked CA-GANs (SCA-GAN) to further rectify defects and add compelling details. Experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. Our method achieves the state-of-the-art quality, reducing best previous Frechet Inception distance (FID) by a large margin. Besides, we demonstrate that the proposed method is of considerable generalization ability. We have made our code and results publicly available: https://fei-hdu.github.io/ca-gan/. |
Tasks | |
Published | 2017-12-04 |
URL | https://arxiv.org/abs/1712.00899v4 |
https://arxiv.org/pdf/1712.00899v4.pdf | |
PWC | https://paperswithcode.com/paper/towards-realistic-face-photo-sketch-synthesis |
Repo | https://github.com/fei-hdu/ca-gan |
Framework | pytorch |
Variational Encoding of Complex Dynamics
Title | Variational Encoding of Complex Dynamics |
Authors | Carlos X. Hernández, Hannah K. Wayment-Steele, Mohammad M. Sultan, Brooke E. Husic, Vijay S. Pande |
Abstract | Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged co-variate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear dynamics into just a few characteristic components. Recent work in the field of deep learning has led to the development of variational autoencoders (VAE), which are able to compress complex datasets into simpler manifolds. We present the use of a time-lagged VAE, or variational dynamics encoder (VDE), to reduce complex, nonlinear processes to a single embedding with high fidelity to the underlying dynamics. We demonstrate how the VDE is able to capture nontrivial dynamics in a variety of examples, including Brownian dynamics and atomistic protein folding. Additionally, we demonstrate a method for analyzing the VDE model, inspired by saliency mapping, to determine what features are selected by the VDE model to describe dynamics. The VDE presents an important step in applying techniques from deep learning to more accurately model and interpret complex biophysics. |
Tasks | Time Series |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08576v2 |
http://arxiv.org/pdf/1711.08576v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-encoding-of-complex-dynamics |
Repo | https://github.com/msultan/vde_metadynamics |
Framework | none |
Equivalence of Equilibrium Propagation and Recurrent Backpropagation
Title | Equivalence of Equilibrium Propagation and Recurrent Backpropagation |
Authors | Benjamin Scellier, Yoshua Bengio |
Abstract | Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase. In the first phase, both algorithms converge to a fixed point which corresponds to the configuration where the prediction is made. In the second phase, Equilibrium Propagation relaxes to another nearby fixed point corresponding to smaller prediction error, whereas Recurrent Backpropagation uses a side network to compute error derivatives iteratively. In this work we establish a close connection between these two algorithms. We show that, at every moment in the second phase, the temporal derivatives of the neural activities in Equilibrium Propagation are equal to the error derivatives computed iteratively by Recurrent Backpropagation in the side network. This work shows that it is not required to have a side network for the computation of error derivatives, and supports the hypothesis that, in biological neural networks, temporal derivatives of neural activities may code for error signals. |
Tasks | |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08416v2 |
http://arxiv.org/pdf/1711.08416v2.pdf | |
PWC | https://paperswithcode.com/paper/equivalence-of-equilibrium-propagation-and |
Repo | https://github.com/bscellier/Towards-a-Biologically-Plausible-Backprop |
Framework | pytorch |
Deterministic Approximate Methods for Maximum Consensus Robust Fitting
Title | Deterministic Approximate Methods for Maximum Consensus Robust Fitting |
Authors | Huu Le, Tat-Jun Chin, Anders Eriksson, Thanh-Toan Do, David Suter |
Abstract | Maximum consensus estimation plays a critically important role in robust fitting problems in computer vision. Currently, the most prevalent algorithms for consensus maximization draw from the class of randomized hypothesize-and-verify algorithms, which are cheap but can usually deliver only rough approximate solutions. On the other extreme, there are exact algorithms which are exhaustive search in nature and can be costly for practical-sized inputs. This paper fills the gap between the two extremes by proposing deterministic algorithms to approximately optimize the maximum consensus criterion. Our work begins by reformulating consensus maximization with linear complementarity constraints. Then, we develop two novel algorithms: one based on non-smooth penalty method with a Frank-Wolfe style optimization scheme, the other based on the Alternating Direction Method of Multipliers (ADMM). Both algorithms solve convex subproblems to efficiently perform the optimization. We demonstrate the capability of our algorithms to greatly improve a rough initial estimate, such as those obtained using least squares or a randomized algorithm. Compared to the exact algorithms, our approach is much more practical on realistic input sizes. Further, our approach is naturally applicable to estimation problems with geometric residuals |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10003v2 |
http://arxiv.org/pdf/1710.10003v2.pdf | |
PWC | https://paperswithcode.com/paper/deterministic-approximate-methods-for-maximum |
Repo | https://github.com/ZhipengCai/Demo---Deterministic-consensus-maximization-with-biconvex-programming |
Framework | none |
Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features
Title | Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features |
Authors | Fabian Isensee, Paul Jaeger, Peter M. Full, Ivo Wolf, Sandy Engelhardt, Klaus H. Maier-Hein |
Abstract | Cardiac magnetic resonance imaging improves on diagnosis of cardiovascular diseases by providing images at high spatiotemporal resolution. Manual evaluation of these time-series, however, is expensive and prone to biased and non-reproducible outcomes. In this paper, we present a method that addresses named limitations by integrating segmentation and disease classification into a fully automatic processing pipeline. We use an ensemble of UNet inspired architectures for segmentation of cardiac structures such as the left and right ventricular cavity (LVC, RVC) and the left ventricular myocardium (LVM) on each time instance of the cardiac cycle. For the classification task, information is extracted from the segmented time-series in form of comprehensive features handcrafted to reflect diagnostic clinical procedures. Based on these features we train an ensemble of heavily regularized multilayer perceptrons (MLP) and a random forest classifier to predict the pathologic target class. We evaluated our method on the ACDC dataset (4 pathology groups, 1 healthy group) and achieve dice scores of 0.945 (LVC), 0.908 (RVC) and 0.905 (LVM) in a cross-validation over the training set (100 cases) and 0.950 (LVC), 0.923 (RVC) and 0.911 (LVM) on the test set (50 cases). We report a classification accuracy of 94% on a training set cross-validation and 92% on the test set. Our results underpin the potential of machine learning methods for accurate, fast and reproducible segmentation and computer-assisted diagnosis (CAD). |
Tasks | Time Series |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00587v2 |
http://arxiv.org/pdf/1707.00587v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-cardiac-disease-assessment-on-cine |
Repo | https://github.com/MIC-DKFZ/ACDC2017 |
Framework | none |
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification
Title | Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification |
Authors | Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool |
Abstract | The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network Temporal 3D ConvNet'~(T3D) and its new temporal layer Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D ConvNets is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D ConvNets is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and finetuned on the target datasets, e.g. HMDB51/UCF101. The T3D codes will be released |
Tasks | Temporal Action Localization, Transfer Learning, Video Classification |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08200v1 |
http://arxiv.org/pdf/1711.08200v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-3d-convnets-new-architecture-and |
Repo | https://github.com/MohsenFayyaz89/T3D |
Framework | pytorch |
Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution
Title | Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution |
Authors | Topi Paananen, Juho Piironen, Michael Riis Andersen, Aki Vehtari |
Abstract | Variable selection for Gaussian process models is often done using automatic relevance determination, which uses the inverse length-scale parameter of each input variable as a proxy for variable relevance. This implicitly determined relevance has several drawbacks that prevent the selection of optimal input variables in terms of predictive performance. To improve on this, we propose two novel variable selection methods for Gaussian process models that utilize the predictions of a full model in the vicinity of the training points and thereby rank the variables based on their predictive relevance. Our empirical results on synthetic and real world data sets demonstrate improved variable selection compared to automatic relevance determination in terms of variability and predictive performance. |
Tasks | Gaussian Processes |
Published | 2017-12-21 |
URL | http://arxiv.org/abs/1712.08048v3 |
http://arxiv.org/pdf/1712.08048v3.pdf | |
PWC | https://paperswithcode.com/paper/variable-selection-for-gaussian-processes-via |
Repo | https://github.com/topipa/GP_varsel_KL_VAR |
Framework | none |
Towards Personalized Modeling of the Female Hormonal Cycle: Experiments with Mechanistic Models and Gaussian Processes
Title | Towards Personalized Modeling of the Female Hormonal Cycle: Experiments with Mechanistic Models and Gaussian Processes |
Authors | Iñigo Urteaga, David J. Albers, Marija Vlajic Wheeler, Anna Druet, Hans Raffauf, Noémie Elhadad |
Abstract | In this paper, we introduce a novel task for machine learning in healthcare, namely personalized modeling of the female hormonal cycle. The motivation for this work is to model the hormonal cycle and predict its phases in time, both for healthy individuals and for those with disorders of the reproductive system. Because there are individual differences in the menstrual cycle, we are particularly interested in personalized models that can account for individual idiosyncracies, towards identifying phenotypes of menstrual cycles. As a first step, we consider the hormonal cycle as a set of observations through time. We use a previously validated mechanistic model to generate realistic hormonal patterns, and experiment with Gaussian process regression to estimate their values over time. Specifically, we are interested in the feasibility of predicting menstrual cycle phases under varying learning conditions: number of cycles used for training, hormonal measurement noise and sampling rates, and informed vs. agnostic sampling of hormonal measurements. Our results indicate that Gaussian processes can help model the female menstrual cycle. We discuss the implications of our experiments in the context of modeling the female menstrual cycle. |
Tasks | Gaussian Processes |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1712.00117v1 |
http://arxiv.org/pdf/1712.00117v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-personalized-modeling-of-the-female |
Repo | https://github.com/iurteaga/hmc |
Framework | pytorch |