May 7, 2019

3095 words 15 mins read

Paper Group ANR 78

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing. Efficient Nonparametric Smoothness Estimation. Autoencoder-based holographic image restoration. Unsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing Model. Going Deeper into Action Recognition: A Survey. Efficient Segmental Cascades for Speech Recognition. …

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing


Title	Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Authors	Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker
Abstract	Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02699v3
PDF	http://arxiv.org/pdf/1612.02699v3.pdf
PWC	https://paperswithcode.com/paper/deep-supervision-with-shape-concepts-for
Repo
Framework

Efficient Nonparametric Smoothness Estimation


Title	Efficient Nonparametric Smoothness Estimation
Authors	Shashank Singh, Simon S. Du, Barnabás Póczos
Abstract	Sobolev quantities (norms, inner products, and distances) of probability density functions are important in the theory of nonparametric statistics, but have rarely been used in practice, partly due to a lack of practical estimators. They also include, as special cases, $L^2$ quantities which are used in many applications. We propose and analyze a family of estimators for Sobolev quantities of unknown probability density functions. We bound the bias and variance of our estimators over finite samples, finding that they are generally minimax rate-optimal. Our estimators are significantly more computationally tractable than previous estimators, and exhibit a statistical/computational trade-off allowing them to adapt to computational constraints. We also draw theoretical connections to recent work on fast two-sample testing. Finally, we empirically validate our estimators on synthetic data.
Tasks
Published	2016-05-19
URL	http://arxiv.org/abs/1605.05785v2
PDF	http://arxiv.org/pdf/1605.05785v2.pdf
PWC	https://paperswithcode.com/paper/efficient-nonparametric-smoothness-estimation
Repo
Framework

Autoencoder-based holographic image restoration


Title	Autoencoder-based holographic image restoration
Authors	Tomoyoshi Shimobaba, Yutaka Endo, Ryuji Hirayama, Yuki Nagahama, Takayuki Takahashi, Takashi Nishitsuji, Takashi Kakue, Atsushi Shiraki, Naoki Takada, Nobuyuki Masuda, Tomoyoshi Ito
Abstract	We propose a holographic image restoration method using an autoencoder, which is an artificial neural network. Because holographic reconstructed images are often contaminated by direct light, conjugate light, and speckle noise, the discrimination of reconstructed images may be difficult. In this paper, we demonstrate the restoration of reconstructed images from holograms that record page data in holographic memory and QR codes by using the proposed method.
Tasks	Image Restoration
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03959v1
PDF	http://arxiv.org/pdf/1612.03959v1.pdf
PWC	https://paperswithcode.com/paper/autoencoder-based-holographic-image
Repo
Framework

Unsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing Model


Title	Unsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing Model
Authors	Qi Wei, Marcus Chen, Jean-Yves Tourneret, Simon Godsill
Abstract	In the community of remote sensing, nonlinear mixing models have recently received particular attention in hyperspectral image processing. In this paper, we present a novel nonlinear spectral unmixing method following the recent multilinear mixing model of [1], which includes an infinite number of terms related to interactions between different endmembers. The proposed unmixing method is unsupervised in the sense that the endmembers are estimated jointly with the abundances and other parameters of interest, i.e., the transition probability of undergoing further interactions. Non-negativity and sum-to one constraints are imposed on abundances while only nonnegativity is considered for endmembers. The resulting unmixing problem is formulated as a constrained nonlinear optimization problem, which is solved by a block coordinate descent strategy, consisting of updating the endmembers, abundances and transition probability iteratively. The proposed method is evaluated and compared with linear unmixing methods for synthetic and real hyperspectral datasets acquired by the AVIRIS sensor. The advantage of using non-linear unmixing as opposed to linear unmixing is clearly shown in these examples.
Tasks
Published	2016-04-14
URL	http://arxiv.org/abs/1604.04293v1
PDF	http://arxiv.org/pdf/1604.04293v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-nonlinear-spectral-unmixing
Repo
Framework

Going Deeper into Action Recognition: A Survey


Title	Going Deeper into Action Recognition: A Survey
Authors	Samitha Herath, Mehrtash Harandi, Fatih Porikli
Abstract	Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader.
Tasks	Domain Adaptation, Human Dynamics, Object Recognition, Semantic Segmentation, Temporal Action Localization
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04988v2
PDF	http://arxiv.org/pdf/1605.04988v2.pdf
PWC	https://paperswithcode.com/paper/going-deeper-into-action-recognition-a-survey
Repo
Framework

Efficient Segmental Cascades for Speech Recognition


Title	Efficient Segmental Cascades for Speech Recognition
Authors	Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Abstract	Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time.
Tasks	Speech Recognition
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00929v1
PDF	http://arxiv.org/pdf/1608.00929v1.pdf
PWC	https://paperswithcode.com/paper/efficient-segmental-cascades-for-speech
Repo
Framework

Active Detection and Localization of Textureless Objects in Cluttered Environments


Title	Active Detection and Localization of Textureless Objects in Cluttered Environments
Authors	Marco Imperoli, Alberto Pretto
Abstract	This paper introduces an active object detection and localization framework that combines a robust untextured object detection and 3D pose estimation algorithm with a novel next-best-view selection strategy. We address the detection and localization problems by proposing an edge-based registration algorithm that refines the object position by minimizing a cost directly extracted from a 3D image tensor that encodes the minimum distance to an edge point in a joint direction/location space. We face the next-best-view problem by exploiting a sequential decision process that, for each step, selects the next camera position which maximizes the mutual information between the state and the next observations. We solve the intrinsic intractability of this solution by generating observations that represent scene realizations, i.e. combination samples of object hypothesis provided by the object detector, while modeling the state by means of a set of constantly resampled particles. Experiments performed on different real world, challenging datasets confirm the effectiveness of the proposed methods.
Tasks	3D Pose Estimation, Object Detection, Pose Estimation
Published	2016-03-22
URL	http://arxiv.org/abs/1603.07022v1
PDF	http://arxiv.org/pdf/1603.07022v1.pdf
PWC	https://paperswithcode.com/paper/active-detection-and-localization-of
Repo
Framework

Mollifying Networks


Title	Mollifying Networks
Authors	Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio
Abstract	The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e.g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent. In this paper, we attack the problem of optimization of highly non-convex neural networks by starting with a smoothed – or \textit{mollified} – objective function that gradually has a more non-convex energy landscape during the training. Our proposition is inspired by the recent studies in continuation methods: similar to curriculum methods, we begin learning an easier (possibly convex) objective function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, objective function. The complexity of the mollified networks is controlled by a single hyperparameter which is annealed during the training. We show improvements on various difficult optimization tasks and establish a relationship with recent works on continuation methods for neural networks and mollifiers.
Tasks
Published	2016-08-17
URL	http://arxiv.org/abs/1608.04980v1
PDF	http://arxiv.org/pdf/1608.04980v1.pdf
PWC	https://paperswithcode.com/paper/mollifying-networks
Repo
Framework

Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems


Title	Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems
Authors	M. K. Mudunuru, S. Karra, N. Makedonska, T. Chen
Abstract	Subsurface applications including geothermal, geological carbon sequestration, oil and gas, etc., typically involve maximizing either the extraction of energy or the storage of fluids. Characterizing the subsurface is extremely complex due to heterogeneity and anisotropy. Due to this complexity, there are uncertainties in the subsurface parameters, which need to be estimated from multiple diverse as well as fragmented data streams. In this paper, we present a non-intrusive sequential inversion framework, for integrating data from geophysical and flow sources to constraint subsurface Discrete Fracture Networks (DFN). In this approach, we first estimate bounds on the statistics for the DFN fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained based on the flow data. The efficacy of this multi-physics based sequential inversion is demonstrated through a representative synthetic example.
Tasks
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04464v3
PDF	http://arxiv.org/pdf/1606.04464v3.pdf
PWC	https://paperswithcode.com/paper/sequential-geophysical-and-flow-inversion-to
Repo
Framework

SMASH: Physics-guided Reconstruction of Collisions from Videos


Title	SMASH: Physics-guided Reconstruction of Collisions from Videos
Authors	Aron Monszpart, Nils Thuerey, Niloy J. Mitra
Abstract	Collision sequences are commonly used in games and entertainment to add drama and excitement. Authoring even two body collisions in the real world can be difficult, as one has to get timing and the object trajectories to be correctly synchronized. After tedious trial-and-error iterations, when objects can actually be made to collide, then they are difficult to capture in 3D. In contrast, synthetically generating plausible collisions is difficult as it requires adjusting different collision parameters (e.g., object mass ratio, coefficient of restitution, etc.) and appropriate initial parameters. We present SMASH to directly read off appropriate collision parameters directly from raw input video recordings. Technically we enable this by utilizing laws of rigid body collision to regularize the problem of lifting 2D trajectories to a physically valid 3D reconstruction of the collision. The reconstructed sequences can then be modified and combined to easily author novel and plausible collisions. We evaluate our system on a range of synthetic scenes and demonstrate the effectiveness of our method by accurately reconstructing several complex real world collision events.
Tasks	3D Reconstruction
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08984v2
PDF	http://arxiv.org/pdf/1603.08984v2.pdf
PWC	https://paperswithcode.com/paper/smash-physics-guided-reconstruction-of
Repo
Framework

Fairness in Learning: Classic and Contextual Bandits


Title	Fairness in Learning: Classic and Contextual Bandits
Authors	Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth
Abstract	We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. We prove results of two types. First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a provably fair algorithm based on “chained” confidence intervals, and provide a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms
Tasks	Multi-Armed Bandits
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07139v2
PDF	http://arxiv.org/pdf/1605.07139v2.pdf
PWC	https://paperswithcode.com/paper/fairness-in-learning-classic-and-contextual
Repo
Framework

Clinical Text Prediction with Numerically Grounded Conditional Language Models


Title	Clinical Text Prediction with Numerically Grounded Conditional Language Models
Authors	Georgios P. Spithourakis, Steffen E. Petersen, Sebastian Riedel
Abstract	Assisted text input techniques can save time and effort and improve text quality. In this paper, we investigate how grounded and conditional extensions to standard neural language models can bring improvements in the tasks of word prediction and completion. These extensions incorporate a structured knowledge base and numerical values from the text into the context used to predict the next word. Our automated evaluation on a clinical dataset shows extended models significantly outperform standard models. Our best system uses both conditioning and grounding, because of their orthogonal benefits. For word prediction with a list of 5 suggestions, it improves recall from 25.03% to 71.28% and for word completion it improves keystroke savings from 34.35% to 44.81%, where theoretical bound for this dataset is 58.78%. We also perform a qualitative investigation of how models with lower perplexity occasionally fare better at the tasks. We found that at test time numbers have more influence on the document level than on individual word probabilities.
Tasks
Published	2016-10-20
URL	http://arxiv.org/abs/1610.06370v1
PDF	http://arxiv.org/pdf/1610.06370v1.pdf
PWC	https://paperswithcode.com/paper/clinical-text-prediction-with-numerically
Repo
Framework

Supervised topic models for clinical interpretability


Title	Supervised topic models for clinical interpretability
Authors	Michael C. Hughes, Huseyin Melih Elibol, Thomas McCoy, Roy Perlis, Finale Doshi-Velez
Abstract	Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often minimal, leading to poor predictions on heldout data. We investigate penalized optimization methods for training sLDA that produce interpretable topic-word parameters and useful heldout predictions, using recognition networks to speed-up inference. We report preliminary results on synthetic data and on predicting successful anti-depressant medication given a patient’s diagnostic history.
Tasks	Topic Models
Published	2016-12-06
URL	http://arxiv.org/abs/1612.01678v1
PDF	http://arxiv.org/pdf/1612.01678v1.pdf
PWC	https://paperswithcode.com/paper/supervised-topic-models-for-clinical
Repo
Framework

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing


Title	On Structured Sparsity of Phonological Posteriors for Linguistic Parsing
Authors	Milos Cernak, Afsaneh Asaei, Hervé Bourlard
Abstract	The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and invariant representations used in speech temporal organization. In the context of speech processing, a deep neural network (DNN) is an effective computational method to infer the probability of individual phonological classes from a short segment of speech signal. A vector of all phonological class probabilities is referred to as phonological posterior. There are only very few classes comprising a short term speech signal; hence, the phonological posterior is a sparse vector. Although the phonological posteriors are estimated at segmental level, we claim that they convey supra-segmental information. Specifically, we demonstrate that phonological posteriors are indicative of syllabic and prosodic events. Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients. To verify this hypothesis, we obtain a binary representation of phonological posteriors at the segmental level which is referred to as first-order sparsity structure; the high-order structures are obtained by the concatenation of first-order binary vectors. It is then confirmed that the classification of supra-segmental linguistic events, the problem known as linguistic parsing, can be achieved with high accuracy using asimple binary pattern matching of first-order or high-order structures.
Tasks
Published	2016-01-21
URL	http://arxiv.org/abs/1601.05647v3
PDF	http://arxiv.org/pdf/1601.05647v3.pdf
PWC	https://paperswithcode.com/paper/on-structured-sparsity-of-phonological
Repo
Framework

Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data


Title	Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data
Authors	Anuj Karpatne, Gowtham Atluri, James Faghmous, Michael Steinbach, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, Vipin Kumar
Abstract	Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
Tasks
Published	2016-12-27
URL	http://arxiv.org/abs/1612.08544v2
PDF	http://arxiv.org/pdf/1612.08544v2.pdf
PWC	https://paperswithcode.com/paper/theory-guided-data-science-a-new-paradigm-for
Repo
Framework