October 19, 2019

3052 words 15 mins read

Paper Group ANR 238

Learning to Estimate 3D Human Pose and Shape from a Single Color Image. Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising. NICE: Noise Injection and Clamping Estimation for Neural Network Quantization. Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains. A Comparison …

Learning to Estimate 3D Human Pose and Shape from a Single Color Image


Title	Learning to Estimate 3D Human Pose and Shape from a Single Color Image
Authors	Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, Kostas Daniilidis
Abstract	This work addresses the problem of estimating the full body 3D human pose and shape from a single color image. This is a task where iterative optimization-based solutions have typically prevailed, while Convolutional Networks (ConvNets) have suffered because of the lack of training data and their low resolution 3D predictions. Our work aims to bridge this gap and proposes an efficient and effective direct prediction method based on ConvNets. Central part to our approach is the incorporation of a parametric statistical body shape model (SMPL) within our end-to-end framework. This allows us to get very detailed 3D mesh results, while requiring estimation only of a small number of parameters, making it friendly for direct network prediction. Interestingly, we demonstrate that these parameters can be predicted reliably only from 2D keypoints and masks. These are typical outputs of generic 2D human analysis ConvNets, allowing us to relax the massive requirement that images with 3D shape ground truth are available for training. Simultaneously, by maintaining differentiability, at training time we generate the 3D mesh from the estimated parameters and optimize explicitly for the surface using a 3D per-vertex loss. Finally, a differentiable renderer is employed to project the 3D mesh to the image, which enables further refinement of the network, by optimizing for the consistency of the projection with 2D annotations (i.e., 2D keypoints or masks). The proposed approach outperforms previous baselines on this task and offers an attractive solution for direct prediction of 3D shape from a single color image.
Tasks
Published	2018-05-10
URL	http://arxiv.org/abs/1805.04092v1
PDF	http://arxiv.org/pdf/1805.04092v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-estimate-3d-human-pose-and-shape
Repo
Framework

Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising


Title	Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising
Authors	Cristóvão Cruz, Alessandro Foi, Vladimir Katkovnik, Karen Egiazarian
Abstract	We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, it uses standard pre-trained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance.
Tasks	Denoising, Image Denoising
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02112v2
PDF	http://arxiv.org/pdf/1803.02112v2.pdf
PWC	https://paperswithcode.com/paper/nonlocality-reinforced-convolutional-neural
Repo
Framework

NICE: Noise Injection and Clamping Estimation for Neural Network Quantization


Title	NICE: Noise Injection and Clamping Estimation for Neural Network Quantization
Authors	Chaim Baskin, Natan Liss, Yoav Chai, Evgenii Zheltonozhskii, Eli Schwartz, Raja Giryes, Avi Mendelson, Alexander M. Bronstein
Abstract	Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error. The \uniqname method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve the accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with low as 3-bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low power real-time applications. The implementation of the paper is available at https://github.com/Lancer555/NICE
Tasks	Quantization, Speech Recognition
Published	2018-09-29
URL	http://arxiv.org/abs/1810.00162v2
PDF	http://arxiv.org/pdf/1810.00162v2.pdf
PWC	https://paperswithcode.com/paper/nice-noise-injection-and-clamping-estimation
Repo
Framework

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains


Title	Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains
Authors	Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
Abstract	Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04624v1
PDF	http://arxiv.org/pdf/1806.04624v1.pdf
PWC	https://paperswithcode.com/paper/organizing-experience-a-deeper-look-at-replay
Repo
Framework

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models


Title	A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models
Authors	Chao Weng, Dong Yu
Abstract	In this work, three lattice-free (LF) discriminative training criteria for purely sequence-trained neural network acoustic models are compared on LVCSR tasks, namely maximum mutual information (MMI), boosted maximum mutual information (bMMI) and state-level minimum Bayes risk (sMBR). We demonstrate that, analogous to LF-MMI, a neural network acoustic model can also be trained from scratch using LF-bMMI or LF-sMBR criteria respectively without the need of cross-entropy pre-training. Furthermore, experimental results on Switchboard-300hrs and Switchboard+Fisher-2100hrs datasets show that models trained with LF-bMMI consistently outperform those trained with plain LF-MMI and achieve a relative word error rate (WER) reduction of 5% over competitive temporal convolution projected LSTM (TDNN-LSTMP) LF-MMI baselines.
Tasks	Large Vocabulary Continuous Speech Recognition
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03700v2
PDF	http://arxiv.org/pdf/1811.03700v2.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-lattice-free-discriminative
Repo
Framework

Machine Learning Algorithms for Classification of Microcirculation Images from Septic and Non-Septic Patients


Title	Machine Learning Algorithms for Classification of Microcirculation Images from Septic and Non-Septic Patients
Authors	Perikumar Javia, Aman Rana, Nathan Shapiro, Pratik Shah
Abstract	Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the receiver operating characteristics of the classifier was 0.92, the precision was 0.92 and the recall was 0.84. Codes representing the learned feature space of trained classifier were visualized using t-SNE embedding and were separable and distinguished between images from critically ill and non-septic patients. Using an unsupervised convolutional autoencoder, independent of the clinical diagnosis, we also report clustering of learned features from a compressed representation associated with healthy images and those with microcirculatory dysfunction. The feature space used by our trained classifier to distinguish between images from septic and non-septic patients has potential diagnostic application.
Tasks
Published	2018-10-24
URL	http://arxiv.org/abs/1811.02659v2
PDF	http://arxiv.org/pdf/1811.02659v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-algorithms-for
Repo
Framework

Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes


Title	Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes
Authors	Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato
Abstract	Bayesian Optimization (BO) methods are useful for optimizing functions that are expen- sive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. The acquisition function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, for example when some of the input variables take categorical or integer values, one has to introduce extra approximations. Consider a suggested input location taking values in the real line. Before doing the evaluation of the objective, a common approach is to use a one hot encoding approximation for categorical variables, or to round to the closest integer, in the case of integer-valued variables. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are categorical or integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods using Gaussian processes on problems with categorical or integer-valued variables.
Tasks	Gaussian Processes
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03463v2
PDF	http://arxiv.org/pdf/1805.03463v2.pdf
PWC	https://paperswithcode.com/paper/dealing-with-categorical-and-integer-valued
Repo
Framework

Importance Weighted Adversarial Nets for Partial Domain Adaptation


Title	Importance Weighted Adversarial Nets for Partial Domain Adaptation
Authors	Jing Zhang, Zewei Ding, Wanqing Li, Philip Ogunbona
Abstract	This paper proposes an importance weighted adversarial nets-based method for unsupervised domain adaptation, specific for partial domain adaptation where the target domain has less number of classes compared to the source domain. Previous domain adaptation methods generally assume the identical label spaces, such that reducing the distribution divergence leads to feasible knowledge transfer. However, such an assumption is no longer valid in a more realistic scenario that requires adaptation from a larger and more diverse source domain to a smaller target domain with less number of classes. This paper extends the adversarial nets-based domain adaptation and proposes a novel adversarial nets-based partial domain adaptation method to identify the source samples that are potentially from the outlier classes and, at the same time, reduce the shift of shared classes between domains.
Tasks	Domain Adaptation, Partial Domain Adaptation, Transfer Learning, Unsupervised Domain Adaptation
Published	2018-03-25
URL	http://arxiv.org/abs/1803.09210v2
PDF	http://arxiv.org/pdf/1803.09210v2.pdf
PWC	https://paperswithcode.com/paper/importance-weighted-adversarial-nets-for
Repo
Framework

On the Importance of Attention in Meta-Learning for Few-Shot Text Classification


Title	On the Importance of Attention in Meta-Learning for Few-Shot Text Classification
Authors	Xiang Jiang, Mohammad Havaei, Gabriel Chartrand, Hassan Chouaib, Thomas Vincent, Andrew Jesson, Nicolas Chapados, Stan Matwin
Abstract	Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The essential difference between MAML and ATAML is in the separation of task-agnostic representation learning and task-specific attentive adaptation. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets.
Tasks	Meta-Learning, Multi-Label Classification, Representation Learning, Text Classification
Published	2018-06-03
URL	http://arxiv.org/abs/1806.00852v1
PDF	http://arxiv.org/pdf/1806.00852v1.pdf
PWC	https://paperswithcode.com/paper/on-the-importance-of-attention-in-meta
Repo
Framework

Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images


Title	Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images
Authors	Yiming Wu, Wei Ji, Xi Li, Gang Wang, Jianwei Yin, Fei Wu
Abstract	As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images. Typically, the problem is modeled as learning a mapping function from images to hand joint coordinates in a data-driven manner. In this paper, we propose Context-Aware Deep Spatio-Temporal Network (CADSTN), a novel method to jointly model the spatio-temporal properties for hand pose estimation. Our proposed network is able to learn the representations of the spatial information and the temporal structure from the image sequences. Moreover, by adopting adaptive fusion method, the model is capable of dynamically weighting different predictions to lay emphasis on sufficient context. Our method is examined on two common benchmarks, the experimental results demonstrate that our proposed approach achieves the best or the second-best performance with state-of-the-art methods and runs in 60fps.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2018-10-06
URL	http://arxiv.org/abs/1810.02994v1
PDF	http://arxiv.org/pdf/1810.02994v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-deep-spatio-temporal-network
Repo
Framework

A Study of Student Learning Skills Using Fuzzy Relation Equations


Title	A Study of Student Learning Skills Using Fuzzy Relation Equations
Authors	Michael Gr. Voskoglou
Abstract	Fuzzy relation equations (FRE)are associated with the composition of binary fuzzy relations. In the present work FRE are used as a tool for studying the process of learning a new subject matter by a student class. A classroom application and other csuitable examples connected to the student learning of the derivative are also presented illustrating our results and useful conclusions are obtained.
Tasks
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00421v1
PDF	http://arxiv.org/pdf/1804.00421v1.pdf
PWC	https://paperswithcode.com/paper/a-study-of-student-learning-skills-using
Repo
Framework

Correction of AI systems by linear discriminants: Probabilistic foundations


Title	Correction of AI systems by linear discriminants: Probabilistic foundations
Authors	A. N. Gorban, A. Golubkov, B. Grechuk, E. M. Mirkes, I. Y. Tyukin
Abstract	Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the ‘ideal’ correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems).
Tasks
Published	2018-11-11
URL	http://arxiv.org/abs/1811.05321v1
PDF	http://arxiv.org/pdf/1811.05321v1.pdf
PWC	https://paperswithcode.com/paper/correction-of-ai-systems-by-linear
Repo
Framework

Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos


Title	Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos
Authors	Husheng Li
Abstract	The theoretical explanation for deep neural network (DNN) is still an open problem. In this paper DNN is considered as a discrete-time dynamical system due to its layered structure. The complexity provided by the nonlinearity in the dynamics is analyzed in terms of topological entropy and chaos characterized by Lyapunov exponents. The properties revealed for the dynamics of DNN are applied to analyze the corresponding capabilities of classification and generalization.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.03987v3
PDF	http://arxiv.org/pdf/1804.03987v3.pdf
PWC	https://paperswithcode.com/paper/analysis-on-the-nonlinear-dynamics-of-deep
Repo
Framework

Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks


Title	Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks
Authors	João Loula, Marco Baroni, Brenden M. Lake
Abstract	Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it’s seen as key to humans’ capacity for generalization in language. Recent work has studied systematic compositionality in modern seq2seq models using generalization to novel navigation instructions in a grounded environment as a probing tool, requiring models to quickly bootstrap the meaning of new words. We extend this framework here to settings where the model needs only to recombine well-trained functional words (such as “around” and “right”) in novel contexts. Our findings confirm and strengthen the earlier ones: seq2seq models can be impressively good at generalizing to novel combinations of previously-seen input, but only when they receive extensive training on the specific pattern to be generalized (e.g., generalizing from many examples of “X around right” to “jump around right”), while failing when generalization requires novel application of compositional rules (e.g., inferring the meaning of “around right” from those of “right” and “around”).
Tasks
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07545v1
PDF	http://arxiv.org/pdf/1807.07545v1.pdf
PWC	https://paperswithcode.com/paper/rearranging-the-familiar-testing
Repo
Framework

MPTV: Matching Pursuit Based Total Variation Minimization for Image Deconvolution


Title	MPTV: Matching Pursuit Based Total Variation Minimization for Image Deconvolution
Authors	Dong Gong, Mingkui Tan, Qinfeng Shi, Anton van den Hengel, Yanning Zhang
Abstract	Total variation (TV) regularization has proven effective for a range of computer vision tasks through its preferential weighting of sharp image edges. Existing TV-based methods, however, often suffer from the over-smoothing issue and solution bias caused by the homogeneous penalization. In this paper, we consider addressing these issues by applying inhomogeneous regularization on different image components. We formulate the inhomogeneous TV minimization problem as a convex quadratic constrained linear programming problem. Relying on this new model, we propose a matching pursuit based total variation minimization method (MPTV), specifically for image deconvolution. The proposed MPTV method is essentially a cutting-plane method, which iteratively activates a subset of nonzero image gradients, and then solves a subproblem focusing on those activated gradients only. Compared to existing methods, MPTV is less sensitive to the choice of the trade-off parameter between data fitting and regularization. Moreover, the inhomogeneity of MPTV alleviates the over-smoothing and ringing artifacts, and improves the robustness to errors in blur kernel. Extensive experiments on different tasks demonstrate the superiority of the proposed method over the current state-of-the-art.
Tasks	Image Deconvolution
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05438v1
PDF	http://arxiv.org/pdf/1810.05438v1.pdf
PWC	https://paperswithcode.com/paper/mptv-matching-pursuit-based-total-variation
Repo
Framework