April 2, 2020

2974 words 14 mins read

Paper Group ANR 139

Deep Learning for Biomedical Image Reconstruction: A Survey. Sequence-to-Sequence Imputation of Missing Sensor Data. Human Action Performance using Deep Neuro-Fuzzy Recurrent Attention Model. Understanding the Decision Boundary of Deep Neural Networks: An Empirical Study. A deep network for sinogram and CT image reconstruction. Visual Question Answ …

Deep Learning for Biomedical Image Reconstruction: A Survey


Title	Deep Learning for Biomedical Image Reconstruction: A Survey
Authors	Hanene Ben Yedder, Ben Cardoen, Ghassan Hamarneh
Abstract	Medical imaging is an invaluable resource in medicine as it enables to peer inside the human body and provides scientists and physicians with a wealth of information indispensable for understanding, modelling, diagnosis, and treatment of diseases. Reconstruction algorithms entail transforming signals collected by acquisition hardware into interpretable images. Reconstruction is a challenging task given the ill-posed of the problem and the absence of exact analytic inverse transforms in practical cases. While the last decades witnessed impressive advancements in terms of new modalities, improved temporal and spatial resolution, reduced cost, and wider applicability, several improvements can still be envisioned such as reducing acquisition and reconstruction time to reduce patient’s exposure to radiation and discomfort while increasing clinics throughput and reconstruction accuracy. Furthermore, the deployment of biomedical imaging in handheld devices with small power requires a fine balance between accuracy and latency.
Tasks	Image Reconstruction
Published	2020-02-26
URL	https://arxiv.org/abs/2002.12351v1
PDF	https://arxiv.org/pdf/2002.12351v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-biomedical-image
Repo
Framework

Sequence-to-Sequence Imputation of Missing Sensor Data


Title	Sequence-to-Sequence Imputation of Missing Sensor Data
Authors	Joel Janek Dabrowski, Ashfaqur Rahman
Abstract	Although the sequence-to-sequence (encoder-decoder) model is considered the state-of-the-art in deep learning sequence models, there is little research into using this model for recovering missing sensor data. The key challenge is that the missing sensor data problem typically comprises three sequences (a sequence of observed samples, followed by a sequence of missing samples, followed by another sequence of observed samples) whereas, the sequence-to-sequence model only considers two sequences (an input sequence and an output sequence). We address this problem by formulating a sequence-to-sequence in a novel way. A forward RNN encodes the data observed before the missing sequence and a backward RNN encodes the data observed after the missing sequence. A decoder decodes the two encoders in a novel way to predict the missing data. We demonstrate that this model produces the lowest errors in 12% more cases than the current state-of-the-art.
Tasks	Imputation
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10767v1
PDF	https://arxiv.org/pdf/2002.10767v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-imputation-of-missing
Repo
Framework

Human Action Performance using Deep Neuro-Fuzzy Recurrent Attention Model


Title	Human Action Performance using Deep Neuro-Fuzzy Recurrent Attention Model
Authors	Nihar Bendre, Nima Ebadi, John J Prevost, Paul Rad
Abstract	A great number of computer vision publications have focused on distinguishing between human action recognition and classification rather than the intensity of actions performed. Indexing the intensity which determines the performance of human actions is a challenging task due to the uncertainty and information deficiency that exists in the video inputs. To remedy this uncertainty, in this paper we coupled fuzzy logic rules with the neural-based action recognition model to rate the intensity of a human action as intense or mild. In our approach, we used a Spatio-Temporal LSTM to generate the weights of the fuzzy-logic model, and then demonstrate through experiments that indexing of the action intensity is possible. We analyzed the integrated model by applying it to videos of human actions with different action intensities and were able to achieve an accuracy of 89.16% on our intensity indexing generated dataset. The integrated model demonstrates the ability of a neuro-fuzzy inference module to effectively estimate the intensity index of human actions.
Tasks	Temporal Action Localization
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10953v3
PDF	https://arxiv.org/pdf/2001.10953v3.pdf
PWC	https://paperswithcode.com/paper/human-action-performance-using-deep-neuro
Repo
Framework

Understanding the Decision Boundary of Deep Neural Networks: An Empirical Study


Title	Understanding the Decision Boundary of Deep Neural Networks: An Empirical Study
Authors	David Mickisch, Felix Assion, Florens Greßner, Wiebke Günther, Mariele Motta
Abstract	Despite achieving remarkable performance on many image classification tasks, state-of-the-art machine learning (ML) classifiers remain vulnerable to small input perturbations. Especially, the existence of adversarial examples raises concerns about the deployment of ML models in safety- and security-critical environments, like autonomous driving and disease detection. Over the last few years, numerous defense methods have been published with the goal of improving adversarial as well as corruption robustness. However, the proposed measures succeeded only to a very limited extent. This limited progress is partly due to the lack of understanding of the decision boundary and decision regions of deep neural networks. Therefore, we study the minimum distance of data points to the decision boundary and how this margin evolves over the training of a deep neural network. By conducting experiments on MNIST, FASHION-MNIST, and CIFAR-10, we observe that the decision boundary moves closer to natural images over training. This phenomenon even remains intact in the late epochs of training, where the classifier already obtains low training and test error rates. On the other hand, adversarial training appears to have the potential to prevent this undesired convergence of the decision boundary.
Tasks	Autonomous Driving, Image Classification
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01810v1
PDF	https://arxiv.org/pdf/2002.01810v1.pdf
PWC	https://paperswithcode.com/paper/understanding-the-decision-boundary-of-deep
Repo
Framework

A deep network for sinogram and CT image reconstruction


Title	A deep network for sinogram and CT image reconstruction
Authors	Wei Wang, Xiang-Gen Xia, Chuanjiang He, Zemin Ren, Jian Lu, Tianfu Wang, Baiying Lei
Abstract	A CT image can be well reconstructed when the sampling rate of the sinogram satisfies the Nyquist criteria and the sampled signal is noise-free. However, in practice, the sinogram is usually contaminated by noise, which degrades the quality of a reconstructed CT image. In this paper, we design a deep network for sinogram and CT image reconstruction. The network consists of two cascaded blocks that are linked by a filter backprojection (FBP) layer, where the former block is responsible for denoising and completing the sinograms while the latter is used to removing the noise and artifacts of the CT images. Experimental results show that the reconstructed CT images by our methods have the highest PSNR and SSIM in average compared to state of the art methods.
Tasks	Denoising, Image Reconstruction
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07150v1
PDF	https://arxiv.org/pdf/2001.07150v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-network-for-sinogram-and-ct-image
Repo
Framework

Visual Question Answering for Cultural Heritage


Title	Visual Question Answering for Cultural Heritage
Authors	Pietro Bongini, Federico Becattini, Andrew D. Bagdanov, Alberto Del Bimbo
Abstract	Technology and the fruition of cultural heritage are becoming increasingly more entwined, especially with the advent of smart audio guides, virtual and augmented reality, and interactive installations. Machine learning and computer vision are important components of this ongoing integration, enabling new interaction modalities between user and museum. Nonetheless, the most frequent way of interacting with paintings and statues still remains taking pictures. Yet images alone can only convey the aesthetics of the artwork, lacking is information which is often required to fully understand and appreciate it. Usually this additional knowledge comes both from the artwork itself (and therefore the image depicting it) and from an external source of knowledge, such as an information sheet. While the former can be inferred by computer vision algorithms, the latter needs more structured data to pair visual content with relevant information. Regardless of its source, this information still must be be effectively transmitted to the user. A popular emerging trend in computer vision is Visual Question Answering (VQA), in which users can interact with a neural network by posing questions in natural language and receiving answers about the visual content. We believe that this will be the evolution of smart audio guides for museum visits and simple image browsing on personal smartphones. This will turn the classic audio guide into a smart personal instructor with which the visitor can interact by asking for explanations focused on specific interests. The advantages are twofold: on the one hand the cognitive burden of the visitor will decrease, limiting the flow of information to what the user actually wants to hear; and on the other hand it proposes the most natural way of interacting with a guide, favoring engagement.
Tasks	Question Answering, Visual Question Answering
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09853v1
PDF	https://arxiv.org/pdf/2003.09853v1.pdf
PWC	https://paperswithcode.com/paper/visual-question-answering-for-cultural
Repo
Framework

Joint Deep Cross-Domain Transfer Learning for Emotion Recognition


Title	Joint Deep Cross-Domain Transfer Learning for Emotion Recognition
Authors	Dung Nguyen, Sridha Sridharan, Duc Thanh Nguyen, Simon Denman, Son N. Tran, Rui Zeng, Clinton Fookes
Abstract	Deep learning has been applied to achieve significant progress in emotion recognition. Despite such substantial progress, existing approaches are still hindered by insufficient training data, and the resulting models do not generalize well under mismatched conditions. To address this challenge, we propose a learning strategy which jointly transfers the knowledge learned from rich datasets to source-poor datasets. Our method is also able to learn cross-domain features which lead to improved recognition performance. To demonstrate the robustness of our proposed framework, we conducted experiments on three benchmark emotion datasets including eNTERFACE, SAVEE, and EMODB. Experimental results show that the proposed method surpassed state-of-the-art transfer learning schemes by a significant margin.
Tasks	Emotion Recognition, Transfer Learning
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11136v1
PDF	https://arxiv.org/pdf/2003.11136v1.pdf
PWC	https://paperswithcode.com/paper/joint-deep-cross-domain-transfer-learning-for
Repo
Framework

EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege’s Principle


Title	EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege’s Principle
Authors	Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
Abstract	We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. Motivated by Frege’s Context Principle from psychology, our approach combines three interpretations of context for emotion recognition. Our first interpretation is based on using multiple modalities(e.g. faces and gaits) for emotion recognition. For the second interpretation, we gather semantic context from the input image and use a self-attention-based CNN to encode this information. Finally, we use depth maps to model the third interpretation related to socio-dynamic interactions and proximity among agents. We demonstrate the efficiency of our network through experiments on EMOTIC, a benchmark dataset. We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods. We also introduce a new dataset, GroupWalk, which is a collection of videos captured in multiple real-world settings of people walking. We report an AP of 65.83 across 4 categories on GroupWalk, which is also an improvement over prior methods.
Tasks	Emotion Recognition, Multimodal Emotion Recognition
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06692v1
PDF	https://arxiv.org/pdf/2003.06692v1.pdf
PWC	https://paperswithcode.com/paper/emoticon-context-aware-multimodal-emotion
Repo
Framework

Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization


Title	Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization
Authors	Manuel Kaspar, Juan David Munoz Osorio, Jürgen Bock
Abstract	In this work we show how to use the Operational Space Control framework (OSC) under joint and cartesian constraints for reinforcement learning in cartesian space. Our method is therefore able to learn fast and with adjustable degrees of freedom, while we are able to transfer policies without additional dynamics randomizations on a KUKA LBR iiwa peg in-hole task. Before learning in simulation starts, we perform a system identification for aligning the simulation environment as far as possible with the dynamics of a real robot. Adding constraints to the OSC controller allows us to learn in a safe way on the real robot or to learn a flexible, goal conditioned policy that can be easily transferred from simulation to the real robot.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.11635v1
PDF	https://arxiv.org/pdf/2002.11635v1.pdf
PWC	https://paperswithcode.com/paper/sim2real-transfer-for-reinforcement-learning
Repo
Framework

Counterfactual Samples Synthesizing for Robust Visual Question Answering


Title	Counterfactual Samples Synthesizing for Robust Visual Question Answering
Authors	Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, Yueting Zhuang
Abstract	Despite Visual Question Answering (VQA) has realized impressive progress over the last few years, today’s VQA models tend to capture superficial linguistic correlations in the train set and fail to generalize to the test set with different QA distributions. To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP. However, since the complexity of design, current methods are unable to equip the ensemble-based models with two indispensable characteristics of an ideal VQA model: 1) visual-explainable: the model should rely on the right visual regions when making decisions. 2) question-sensitive: the model should be sensitive to the linguistic variations in question. To this end, we propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme. The CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions, and assigning different ground-truth answers. After training with the complementary samples (ie, the original and generated samples), the VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. In return, the performance of these models is further boosted. Extensive ablations have shown the effectiveness of CSS. Particularly, by building on top of the model LMH, we achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
Tasks	Question Answering, Visual Question Answering
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06576v1
PDF	https://arxiv.org/pdf/2003.06576v1.pdf
PWC	https://paperswithcode.com/paper/counterfactual-samples-synthesizing-for
Repo
Framework

Self-Supervised Learning for Domain Adaptation on Point-Clouds


Title	Self-Supervised Learning for Domain Adaptation on Point-Clouds
Authors	Idan Achituve, Haggai Maron, Gal Chechik
Abstract	Self-supervised learning (SSL) allows to learn useful representations from unlabeled data and has been applied effectively for domain adaptation (DA) on images. It is still unknown if and how it can be leveraged for domain adaptation for 3D perception. Here we describe the first study of SSL for DA on point-clouds. We introduce a new pretext task, Region Reconstruction, motivated by the deformations encountered in sim-to-real transformation. We also demonstrate how it can be combined with a training procedure motivated by the MixUp method. Evaluations on six domain adaptations across synthetic and real furniture data, demonstrate large improvement over previous work.
Tasks	Domain Adaptation
Published	2020-03-29
URL	https://arxiv.org/abs/2003.12641v1
PDF	https://arxiv.org/pdf/2003.12641v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-for-domain
Repo
Framework

Recent Advances and Challenges in Task-oriented Dialog System


Title	Recent Advances and Challenges in Task-oriented Dialog System
Authors	Zheng Zhang, Ryuichi Takanobu, Minlie Huang, Xiaoyan Zhu
Abstract	Due to the significance and value in human-computer interaction and natural language processing, task-oriented dialog systems are attracting more and more attention in both academic and industrial communities. In this paper, we survey recent advances and challenges in an issue-specific manner. We discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog system modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model in both pipeline and end-to-end models. We also review the recent progresses in dialog evaluation and some widely-used corpora. We believe that this survey can shed a light on future research in task-oriented dialog systems.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07490v2
PDF	https://arxiv.org/pdf/2003.07490v2.pdf
PWC	https://paperswithcode.com/paper/recent-advances-and-challenges-in-task
Repo
Framework

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited


Title	Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Authors	Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson
Abstract	Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, increases, and then again decreases. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data. We relate effective dimensionality to posterior contraction in Bayesian deep learning, model selection, double descent, and functional diversity in loss surfaces, leading to a richer understanding of the interplay between parameters and functions in deep models.
Tasks	Model Selection
Published	2020-03-04
URL	https://arxiv.org/abs/2003.02139v1
PDF	https://arxiv.org/pdf/2003.02139v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-parameter-counting-in-deep-models
Repo
Framework

Benchmarking Symbolic Execution Using Constraint Problems – Initial Results


Title	Benchmarking Symbolic Execution Using Constraint Problems – Initial Results
Authors	Sahil Verma, Roland H. C. Yap
Abstract	Symbolic execution is a powerful technique for bug finding and program testing. It is successful in finding bugs in real-world code. The core reasoning techniques use constraint solving, path exploration, and search, which are also the same techniques used in solving combinatorial problems, e.g., finite-domain constraint satisfaction problems (CSPs). We propose CSP instances as more challenging benchmarks to evaluate the effectiveness of the core techniques in symbolic execution. We transform CSP benchmarks into C programs suitable for testing the reasoning capabilities of symbolic execution tools. From a single CSP P, we transform P depending on transformation choice into different C programs. Preliminary testing with the KLEE, Tracer-X, and LLBMC tools show substantial runtime differences from transformation and solver choice. Our C benchmarks are effective in showing the limitations of existing symbolic execution tools. The motivation for this work is we believe that benchmarks of this form can spur the development and engineering of improved core reasoning in symbolic execution engines.
Tasks
Published	2020-01-22
URL	https://arxiv.org/abs/2001.07914v1
PDF	https://arxiv.org/pdf/2001.07914v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-symbolic-execution-using
Repo
Framework

Doubly Sparse Variational Gaussian Processes


Title	Doubly Sparse Variational Gaussian Processes
Authors	Vincent Adam, Stefanos Eleftheriadis, Nicolas Durrande, Artem Artemev, James Hensman
Abstract	The use of Gaussian process models is typically limited to datasets with a few tens of thousands of observations due to their complexity and memory footprint. The two most commonly used methods to overcome this limitation are 1) the variational sparse approximation which relies on inducing points and 2) the state-space equivalent formulation of Gaussian processes which can be seen as exploiting some sparsity in the precision matrix. We propose to take the best of both worlds: we show that the inducing point framework is still valid for state space models and that it can bring further computational and memory savings. Furthermore, we provide the natural gradient formulation for the proposed variational parameterisation. Finally, this work makes it possible to use the state-space formulation inside deep Gaussian process models as illustrated in one of the experiments.
Tasks	Gaussian Processes
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05363v1
PDF	https://arxiv.org/pdf/2001.05363v1.pdf
PWC	https://paperswithcode.com/paper/doubly-sparse-variational-gaussian-processes
Repo
Framework