Paper Group ANR 78
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing. Efficient Nonparametric Smoothness Estimation. Autoencoder-based holographic image restoration. Unsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing Model. Going Deeper into Action Recognition: A Survey. Efficient Segmental Cascades for Speech Recognition. …
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Title | Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing |
Authors | Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker |
Abstract | Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02699v3 |
http://arxiv.org/pdf/1612.02699v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-supervision-with-shape-concepts-for |
Repo | |
Framework | |
Efficient Nonparametric Smoothness Estimation
Title | Efficient Nonparametric Smoothness Estimation |
Authors | Shashank Singh, Simon S. Du, Barnabás Póczos |
Abstract | Sobolev quantities (norms, inner products, and distances) of probability density functions are important in the theory of nonparametric statistics, but have rarely been used in practice, partly due to a lack of practical estimators. They also include, as special cases, $L^2$ quantities which are used in many applications. We propose and analyze a family of estimators for Sobolev quantities of unknown probability density functions. We bound the bias and variance of our estimators over finite samples, finding that they are generally minimax rate-optimal. Our estimators are significantly more computationally tractable than previous estimators, and exhibit a statistical/computational trade-off allowing them to adapt to computational constraints. We also draw theoretical connections to recent work on fast two-sample testing. Finally, we empirically validate our estimators on synthetic data. |
Tasks | |
Published | 2016-05-19 |
URL | http://arxiv.org/abs/1605.05785v2 |
http://arxiv.org/pdf/1605.05785v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-nonparametric-smoothness-estimation |
Repo | |
Framework | |
Autoencoder-based holographic image restoration
Title | Autoencoder-based holographic image restoration |
Authors | Tomoyoshi Shimobaba, Yutaka Endo, Ryuji Hirayama, Yuki Nagahama, Takayuki Takahashi, Takashi Nishitsuji, Takashi Kakue, Atsushi Shiraki, Naoki Takada, Nobuyuki Masuda, Tomoyoshi Ito |
Abstract | We propose a holographic image restoration method using an autoencoder, which is an artificial neural network. Because holographic reconstructed images are often contaminated by direct light, conjugate light, and speckle noise, the discrimination of reconstructed images may be difficult. In this paper, we demonstrate the restoration of reconstructed images from holograms that record page data in holographic memory and QR codes by using the proposed method. |
Tasks | Image Restoration |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03959v1 |
http://arxiv.org/pdf/1612.03959v1.pdf | |
PWC | https://paperswithcode.com/paper/autoencoder-based-holographic-image |
Repo | |
Framework | |
Unsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing Model
Title | Unsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing Model |
Authors | Qi Wei, Marcus Chen, Jean-Yves Tourneret, Simon Godsill |
Abstract | In the community of remote sensing, nonlinear mixing models have recently received particular attention in hyperspectral image processing. In this paper, we present a novel nonlinear spectral unmixing method following the recent multilinear mixing model of [1], which includes an infinite number of terms related to interactions between different endmembers. The proposed unmixing method is unsupervised in the sense that the endmembers are estimated jointly with the abundances and other parameters of interest, i.e., the transition probability of undergoing further interactions. Non-negativity and sum-to one constraints are imposed on abundances while only nonnegativity is considered for endmembers. The resulting unmixing problem is formulated as a constrained nonlinear optimization problem, which is solved by a block coordinate descent strategy, consisting of updating the endmembers, abundances and transition probability iteratively. The proposed method is evaluated and compared with linear unmixing methods for synthetic and real hyperspectral datasets acquired by the AVIRIS sensor. The advantage of using non-linear unmixing as opposed to linear unmixing is clearly shown in these examples. |
Tasks | |
Published | 2016-04-14 |
URL | http://arxiv.org/abs/1604.04293v1 |
http://arxiv.org/pdf/1604.04293v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-nonlinear-spectral-unmixing |
Repo | |
Framework | |
Going Deeper into Action Recognition: A Survey
Title | Going Deeper into Action Recognition: A Survey |
Authors | Samitha Herath, Mehrtash Harandi, Fatih Porikli |
Abstract | Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader. |
Tasks | Domain Adaptation, Human Dynamics, Object Recognition, Semantic Segmentation, Temporal Action Localization |
Published | 2016-05-16 |
URL | http://arxiv.org/abs/1605.04988v2 |
http://arxiv.org/pdf/1605.04988v2.pdf | |
PWC | https://paperswithcode.com/paper/going-deeper-into-action-recognition-a-survey |
Repo | |
Framework | |
Efficient Segmental Cascades for Speech Recognition
Title | Efficient Segmental Cascades for Speech Recognition |
Authors | Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu |
Abstract | Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time. |
Tasks | Speech Recognition |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00929v1 |
http://arxiv.org/pdf/1608.00929v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-segmental-cascades-for-speech |
Repo | |
Framework | |
Active Detection and Localization of Textureless Objects in Cluttered Environments
Title | Active Detection and Localization of Textureless Objects in Cluttered Environments |
Authors | Marco Imperoli, Alberto Pretto |
Abstract | This paper introduces an active object detection and localization framework that combines a robust untextured object detection and 3D pose estimation algorithm with a novel next-best-view selection strategy. We address the detection and localization problems by proposing an edge-based registration algorithm that refines the object position by minimizing a cost directly extracted from a 3D image tensor that encodes the minimum distance to an edge point in a joint direction/location space. We face the next-best-view problem by exploiting a sequential decision process that, for each step, selects the next camera position which maximizes the mutual information between the state and the next observations. We solve the intrinsic intractability of this solution by generating observations that represent scene realizations, i.e. combination samples of object hypothesis provided by the object detector, while modeling the state by means of a set of constantly resampled particles. Experiments performed on different real world, challenging datasets confirm the effectiveness of the proposed methods. |
Tasks | 3D Pose Estimation, Object Detection, Pose Estimation |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.07022v1 |
http://arxiv.org/pdf/1603.07022v1.pdf | |
PWC | https://paperswithcode.com/paper/active-detection-and-localization-of |
Repo | |
Framework | |
Mollifying Networks
Title | Mollifying Networks |
Authors | Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio |
Abstract | The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e.g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent. In this paper, we attack the problem of optimization of highly non-convex neural networks by starting with a smoothed – or \textit{mollified} – objective function that gradually has a more non-convex energy landscape during the training. Our proposition is inspired by the recent studies in continuation methods: similar to curriculum methods, we begin learning an easier (possibly convex) objective function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, objective function. The complexity of the mollified networks is controlled by a single hyperparameter which is annealed during the training. We show improvements on various difficult optimization tasks and establish a relationship with recent works on continuation methods for neural networks and mollifiers. |
Tasks | |
Published | 2016-08-17 |
URL | http://arxiv.org/abs/1608.04980v1 |
http://arxiv.org/pdf/1608.04980v1.pdf | |
PWC | https://paperswithcode.com/paper/mollifying-networks |
Repo | |
Framework | |
Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems
Title | Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems |
Authors | M. K. Mudunuru, S. Karra, N. Makedonska, T. Chen |
Abstract | Subsurface applications including geothermal, geological carbon sequestration, oil and gas, etc., typically involve maximizing either the extraction of energy or the storage of fluids. Characterizing the subsurface is extremely complex due to heterogeneity and anisotropy. Due to this complexity, there are uncertainties in the subsurface parameters, which need to be estimated from multiple diverse as well as fragmented data streams. In this paper, we present a non-intrusive sequential inversion framework, for integrating data from geophysical and flow sources to constraint subsurface Discrete Fracture Networks (DFN). In this approach, we first estimate bounds on the statistics for the DFN fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained based on the flow data. The efficacy of this multi-physics based sequential inversion is demonstrated through a representative synthetic example. |
Tasks | |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04464v3 |
http://arxiv.org/pdf/1606.04464v3.pdf | |
PWC | https://paperswithcode.com/paper/sequential-geophysical-and-flow-inversion-to |
Repo | |
Framework | |
SMASH: Physics-guided Reconstruction of Collisions from Videos
Title | SMASH: Physics-guided Reconstruction of Collisions from Videos |
Authors | Aron Monszpart, Nils Thuerey, Niloy J. Mitra |
Abstract | Collision sequences are commonly used in games and entertainment to add drama and excitement. Authoring even two body collisions in the real world can be difficult, as one has to get timing and the object trajectories to be correctly synchronized. After tedious trial-and-error iterations, when objects can actually be made to collide, then they are difficult to capture in 3D. In contrast, synthetically generating plausible collisions is difficult as it requires adjusting different collision parameters (e.g., object mass ratio, coefficient of restitution, etc.) and appropriate initial parameters. We present SMASH to directly read off appropriate collision parameters directly from raw input video recordings. Technically we enable this by utilizing laws of rigid body collision to regularize the problem of lifting 2D trajectories to a physically valid 3D reconstruction of the collision. The reconstructed sequences can then be modified and combined to easily author novel and plausible collisions. We evaluate our system on a range of synthetic scenes and demonstrate the effectiveness of our method by accurately reconstructing several complex real world collision events. |
Tasks | 3D Reconstruction |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08984v2 |
http://arxiv.org/pdf/1603.08984v2.pdf | |
PWC | https://paperswithcode.com/paper/smash-physics-guided-reconstruction-of |
Repo | |
Framework | |
Fairness in Learning: Classic and Contextual Bandits
Title | Fairness in Learning: Classic and Contextual Bandits |
Authors | Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth |
Abstract | We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. We prove results of two types. First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a provably fair algorithm based on “chained” confidence intervals, and provide a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms |
Tasks | Multi-Armed Bandits |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07139v2 |
http://arxiv.org/pdf/1605.07139v2.pdf | |
PWC | https://paperswithcode.com/paper/fairness-in-learning-classic-and-contextual |
Repo | |
Framework | |
Clinical Text Prediction with Numerically Grounded Conditional Language Models
Title | Clinical Text Prediction with Numerically Grounded Conditional Language Models |
Authors | Georgios P. Spithourakis, Steffen E. Petersen, Sebastian Riedel |
Abstract | Assisted text input techniques can save time and effort and improve text quality. In this paper, we investigate how grounded and conditional extensions to standard neural language models can bring improvements in the tasks of word prediction and completion. These extensions incorporate a structured knowledge base and numerical values from the text into the context used to predict the next word. Our automated evaluation on a clinical dataset shows extended models significantly outperform standard models. Our best system uses both conditioning and grounding, because of their orthogonal benefits. For word prediction with a list of 5 suggestions, it improves recall from 25.03% to 71.28% and for word completion it improves keystroke savings from 34.35% to 44.81%, where theoretical bound for this dataset is 58.78%. We also perform a qualitative investigation of how models with lower perplexity occasionally fare better at the tasks. We found that at test time numbers have more influence on the document level than on individual word probabilities. |
Tasks | |
Published | 2016-10-20 |
URL | http://arxiv.org/abs/1610.06370v1 |
http://arxiv.org/pdf/1610.06370v1.pdf | |
PWC | https://paperswithcode.com/paper/clinical-text-prediction-with-numerically |
Repo | |
Framework | |
Supervised topic models for clinical interpretability
Title | Supervised topic models for clinical interpretability |
Authors | Michael C. Hughes, Huseyin Melih Elibol, Thomas McCoy, Roy Perlis, Finale Doshi-Velez |
Abstract | Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often minimal, leading to poor predictions on heldout data. We investigate penalized optimization methods for training sLDA that produce interpretable topic-word parameters and useful heldout predictions, using recognition networks to speed-up inference. We report preliminary results on synthetic data and on predicting successful anti-depressant medication given a patient’s diagnostic history. |
Tasks | Topic Models |
Published | 2016-12-06 |
URL | http://arxiv.org/abs/1612.01678v1 |
http://arxiv.org/pdf/1612.01678v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-topic-models-for-clinical |
Repo | |
Framework | |
On Structured Sparsity of Phonological Posteriors for Linguistic Parsing
Title | On Structured Sparsity of Phonological Posteriors for Linguistic Parsing |
Authors | Milos Cernak, Afsaneh Asaei, Hervé Bourlard |
Abstract | The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and invariant representations used in speech temporal organization. In the context of speech processing, a deep neural network (DNN) is an effective computational method to infer the probability of individual phonological classes from a short segment of speech signal. A vector of all phonological class probabilities is referred to as phonological posterior. There are only very few classes comprising a short term speech signal; hence, the phonological posterior is a sparse vector. Although the phonological posteriors are estimated at segmental level, we claim that they convey supra-segmental information. Specifically, we demonstrate that phonological posteriors are indicative of syllabic and prosodic events. Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients. To verify this hypothesis, we obtain a binary representation of phonological posteriors at the segmental level which is referred to as first-order sparsity structure; the high-order structures are obtained by the concatenation of first-order binary vectors. It is then confirmed that the classification of supra-segmental linguistic events, the problem known as linguistic parsing, can be achieved with high accuracy using asimple binary pattern matching of first-order or high-order structures. |
Tasks | |
Published | 2016-01-21 |
URL | http://arxiv.org/abs/1601.05647v3 |
http://arxiv.org/pdf/1601.05647v3.pdf | |
PWC | https://paperswithcode.com/paper/on-structured-sparsity-of-phonological |
Repo | |
Framework | |
Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data
Title | Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data |
Authors | Anuj Karpatne, Gowtham Atluri, James Faghmous, Michael Steinbach, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, Vipin Kumar |
Abstract | Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science. |
Tasks | |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08544v2 |
http://arxiv.org/pdf/1612.08544v2.pdf | |
PWC | https://paperswithcode.com/paper/theory-guided-data-science-a-new-paradigm-for |
Repo | |
Framework | |