Paper Group ANR 197
Understanding Community Structure in Layered Neural Networks. Unpaired Brain MR-to-CT Synthesis using a Structure-Constrained CycleGAN. Pushing the boundaries of parallel Deep Learning – A practical approach. Probabilistic approach to limited-data computed tomography reconstruction. Thermal Infrared Colorization via Conditional Generative Adversar …
Understanding Community Structure in Layered Neural Networks
Title | Understanding Community Structure in Layered Neural Networks |
Authors | Chihiro Watanabe, Kaoru Hiramatsu, Kunio Kashino |
Abstract | A layered neural network is now one of the most common choices for the prediction of high-dimensional practical data sets, where the relationship between input and output data is complex and cannot be represented well by simple conventional models. Its effectiveness is shown in various tasks, however, the lack of interpretability of the trained result by a layered neural network has limited its application area. In our previous studies, we proposed methods for extracting a simplified global structure of a trained layered neural network by classifying the units into communities according to their connection patterns with adjacent layers. These methods provided us with knowledge about the strength of the relationship between communities from the existence of bundled connections, which are determined by threshold processing of the connection ratio between pairs of communities. However, it has been difficult to understand the role of each community quantitatively by observing the modular structure. We could only know to which sets of the input and output dimensions each community was mainly connected, by tracing the bundled connections from the community to the input and output layers. Another problem is that the finally obtained modular structure is changed greatly depending on the setting of the threshold hyperparameter used for determining bundled connections. In this paper, we propose a new method for interpreting quantitatively the role of each community in inference, by defining the effect of each input dimension on a community, and the effect of a community on each output dimension. We show experimentally that our proposed method can reveal the role of each part of a layered neural network by applying the neural networks to three types of data sets, extracting communities from the trained network, and applying the proposed method to the community structure. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04778v1 |
http://arxiv.org/pdf/1804.04778v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-community-structure-in-layered |
Repo | |
Framework | |
Unpaired Brain MR-to-CT Synthesis using a Structure-Constrained CycleGAN
Title | Unpaired Brain MR-to-CT Synthesis using a Structure-Constrained CycleGAN |
Authors | Heran Yang, Jian Sun, Aaron Carass, Can Zhao, Junghoon Lee, Zongben Xu, Jerry Prince |
Abstract | The cycleGAN is becoming an influential method in medical image synthesis. However, due to a lack of direct constraints between input and synthetic images, the cycleGAN cannot guarantee structural consistency between these two images, and such consistency is of extreme importance in medical imaging. To overcome this, we propose a structure-constrained cycleGAN for brain MR-to-CT synthesis using unpaired data that defines an extra structure-consistency loss based on the modality independent neighborhood descriptor to constrain structural consistency. Additionally, we use a position-based selection strategy for selecting training images instead of a completely random selection scheme. Experimental results on synthesizing CT images from brain MR images demonstrate that our method is better than the conventional cycleGAN and approximates the cycleGAN trained with paired data. |
Tasks | Image Generation |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04536v1 |
http://arxiv.org/pdf/1809.04536v1.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-brain-mr-to-ct-synthesis-using-a |
Repo | |
Framework | |
Pushing the boundaries of parallel Deep Learning – A practical approach
Title | Pushing the boundaries of parallel Deep Learning – A practical approach |
Authors | Paolo Viviani, Maurizio Drocco, Marco Aldinucci |
Abstract | This work aims to assess the state of the art of data parallel deep neural network training, trying to identify potential research tracks to be exploited for performance improvement. Beside, it presents a design for a practical C++ library dedicated at implementing and unifying the current state of the art methodologies for parallel training in a performance-conscious framework, allowing the user to explore novel strategies without departing significantly from its usual work-flow. |
Tasks | |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09528v1 |
http://arxiv.org/pdf/1806.09528v1.pdf | |
PWC | https://paperswithcode.com/paper/pushing-the-boundaries-of-parallel-deep |
Repo | |
Framework | |
Probabilistic approach to limited-data computed tomography reconstruction
Title | Probabilistic approach to limited-data computed tomography reconstruction |
Authors | Zenith Purisha, Carl Jidling, Niklas Wahlström, Simo Särkkä, Thomas B. Schön |
Abstract | In this work, we consider the inverse problem of reconstructing the internal structure of an object from limited x-ray projections. We use a Gaussian process prior to model the target function and estimate its (hyper)parameters from measured data. In contrast to other established methods, this comes with the advantage of not requiring any manual parameter tuning, which usually arises in classical regularization strategies. Our method uses a basis function expansion technique for the Gaussian process which significantly reduces the computational complexity and avoids the need for numerical integration. The approach also allows for reformulation of come classical regularization methods as Laplacian and Tikhonov regularization as Gaussian process regression, and hence provides an efficient algorithm and principled means for their parameter tuning. Results from simulated and real data indicate that this approach is less sensitive to streak artifacts as compared to the commonly used method of filtered backprojection. |
Tasks | |
Published | 2018-09-11 |
URL | https://arxiv.org/abs/1809.03779v3 |
https://arxiv.org/pdf/1809.03779v3.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-approach-to-limited-data |
Repo | |
Framework | |
Thermal Infrared Colorization via Conditional Generative Adversarial Network
Title | Thermal Infrared Colorization via Conditional Generative Adversarial Network |
Authors | Xiaodong Kuang, Xiubao Sui, Chengwei Liu, Yuan Liu, Qian Chen, Guohua Gu |
Abstract | Transforming a thermal infrared image into a realistic RGB image is a challenging task. In this paper we propose a deep learning method to bridge this gap. We propose learning the transformation mapping using a coarse-to-fine generator that preserves the details. Since the standard mean squared loss cannot penalize the distance between colorized and ground truth images well, we propose a composite loss function that combines content, adversarial, perceptual and total variation losses. The content loss is used to recover global image information while the latter three losses are used to synthesize local realistic textures. Quantitative and qualitative experiments demonstrate that our approach significantly outperforms existing approaches. |
Tasks | Colorization |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05399v2 |
http://arxiv.org/pdf/1810.05399v2.pdf | |
PWC | https://paperswithcode.com/paper/thermal-infrared-colorization-via-conditional |
Repo | |
Framework | |
Customized Image Narrative Generation via Interactive Visual Question Generation and Answering
Title | Customized Image Narrative Generation via Interactive Visual Question Generation and Answering |
Authors | Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada |
Abstract | Image description task has been invariably examined in a static manner with qualitative presumptions held to be universally applicable, regardless of the scope or target of the description. In practice, however, different viewers may pay attention to different aspects of the image, and yield different descriptions or interpretations under various contexts. Such diversity in perspectives is difficult to derive with conventional image description techniques. In this paper, we propose a customized image narrative generation task, in which the users are interactively engaged in the generation process by providing answers to the questions. We further attempt to learn the user’s interest via repeating such interactive stages, and to automatically reflect the interest in descriptions for new images. Experimental results demonstrate that our model can generate a variety of descriptions from single image that cover a wider range of topics than conventional models, while being customizable to the target user of interaction. |
Tasks | Question Generation |
Published | 2018-04-27 |
URL | http://arxiv.org/abs/1805.00460v1 |
http://arxiv.org/pdf/1805.00460v1.pdf | |
PWC | https://paperswithcode.com/paper/customized-image-narrative-generation-via |
Repo | |
Framework | |
A deep learning pipeline for product recognition on store shelves
Title | A deep learning pipeline for product recognition on store shelves |
Authors | Alessio Tonioni, Eugenio Serra, Luigi Di Stefano |
Abstract | Recognition of grocery products in store shelves poses peculiar challenges. Firstly, the task mandates the recognition of an extremely high number of different items, in the order of several thousands for medium-small shops, with many of them featuring small inter and intra class variability. Then, available product databases usually include just one or a few studio-quality images per product (referred to herein as reference images), whilst at test time recognition is performed on pictures displaying a portion of a shelf containing several products and taken in the store by cheap cameras (referred to as query images). Moreover, as the items on sale in a store as well as their appearance change frequently over time, a practical recognition system should handle seamlessly new products/packages. Inspired by recent advances in object detection and image retrieval, we propose to leverage on state of the art object detectors based on deep learning to obtain an initial productagnostic item detection. Then, we pursue product recognition through a similarity search between global descriptors computed on reference and cropped query images. To maximize performance, we learn an ad-hoc global descriptor by a CNN trained on reference images based on an image embedding loss. Our system is computationally expensive at training time but can perform recognition rapidly and accurately at test time. |
Tasks | Image Retrieval, Object Detection |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.01733v3 |
http://arxiv.org/pdf/1810.01733v3.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-pipeline-for-product |
Repo | |
Framework | |
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering
Title | Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering |
Authors | Unnat Jain, Svetlana Lazebnik, Alexander Schwing |
Abstract | Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent progress in visual question answering, image captioning, and visual question generation shows that dialog systems may be realizable in the not too distant future. To this end, a novel dataset was introduced recently and encouraging results were demonstrated, particularly for question answering. In this paper, we demonstrate a simple symmetric discriminative baseline, that can be applied to both predicting an answer as well as predicting a question. We show that this method performs on par with the state of the art, even memory net based methods. In addition, for the first time on the visual dialog dataset, we assess the performance of a system asking questions, and demonstrate how visual dialog can be generated from discriminative question generation and question answering. |
Tasks | Image Captioning, Question Answering, Question Generation, Visual Dialog, Visual Question Answering |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11186v1 |
http://arxiv.org/pdf/1803.11186v1.pdf | |
PWC | https://paperswithcode.com/paper/two-can-play-this-game-visual-dialog-with |
Repo | |
Framework | |
Focus: Querying Large Video Datasets with Low Latency and Low Cost
Title | Focus: Querying Large Video Datasets with Low Latency and Low Cost |
Authors | Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu |
Abstract | Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering “after the fact” queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive and slow. We build Focus, a system for low-latency and low-cost querying on large video datasets. Focus uses cheap ingestion techniques to index the videos by the objects occurring in them. At ingest-time, it uses compression and video-specific specialization of CNNs. Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time. To reduce query time latency, we cluster similar objects and hence avoid redundant processing. Using experiments on video streams from traffic, surveillance and news channels, we see that Focus uses 58X fewer GPU cycles than running expensive ingest processors and is 37X faster than processing all the video at query time. |
Tasks | |
Published | 2018-01-10 |
URL | http://arxiv.org/abs/1801.03493v1 |
http://arxiv.org/pdf/1801.03493v1.pdf | |
PWC | https://paperswithcode.com/paper/focus-querying-large-video-datasets-with-low |
Repo | |
Framework | |
Autonomous Extraction of a Hierarchical Structure of Tasks in Reinforcement Learning, A Sequential Associate Rule Mining Approach
Title | Autonomous Extraction of a Hierarchical Structure of Tasks in Reinforcement Learning, A Sequential Associate Rule Mining Approach |
Authors | Behzad Ghazanfari, Fatemeh Afghah, Matthew E. Taylor |
Abstract | Reinforcement learning (RL) techniques, while often powerful, can suffer from slow learning speeds, particularly in high dimensional spaces. Decomposition of tasks into a hierarchical structure holds the potential to significantly speed up learning, generalization, and transfer learning. However, the current task decomposition techniques often rely on high-level knowledge provided by an expert (e.g. using dynamic Bayesian networks) to extract a hierarchical task structure; which is not necessarily available in autonomous systems. In this paper, we propose a novel method based on Sequential Association Rule Mining that can extract Hierarchical Structure of Tasks in Reinforcement Learning (SARM-HSTRL) in an autonomous manner for both Markov decision processes (MDPs) and factored MDPs. The proposed method leverages association rule mining to discover the causal and temporal relationships among states in different trajectories, and extracts a task hierarchy that captures these relationships among sub-goals as termination conditions of different sub-tasks. We prove that the extracted hierarchical policy offers a hierarchically optimal policy in MDPs and factored MDPs. It should be noted that SARM-HSTRL extracts this hierarchical optimal policy without having dynamic Bayesian networks in scenarios with a single task trajectory and also with multiple tasks’ trajectories. Furthermore, it has been theoretically and empirically shown that the extracted hierarchical task structure is consistent with trajectories and provides the most efficient, reliable, and compact structure under appropriate assumptions. The numerical results compare the performance of the proposed SARM-HSTRL method with conventional HRL algorithms in terms of the accuracy in detecting the sub-goals, the validity of the extracted hierarchies, and the speed of learning in several testbeds. |
Tasks | Transfer Learning |
Published | 2018-11-17 |
URL | http://arxiv.org/abs/1811.08275v1 |
http://arxiv.org/pdf/1811.08275v1.pdf | |
PWC | https://paperswithcode.com/paper/autonomous-extraction-of-a-hierarchical |
Repo | |
Framework | |
LCANet: End-to-End Lipreading with Cascaded Attention-CTC
Title | LCANet: End-to-End Lipreading with Cascaded Attention-CTC |
Authors | Kai Xu, Dawei Li, Nick Cassimatis, Xiaolong Wang |
Abstract | Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data. In this paper, we propose LCANet, an end-to-end deep neural network based lipreading system. LCANet encodes input video frames using a stacked 3D convolutional neural network (CNN), highway network and bidirectional GRU network. The encoder effectively captures both short-term and long-term spatio-temporal information. More importantly, LCANet incorporates a cascaded attention-CTC decoder to generate output texts. By cascading CTC with attention, it partially eliminates the defect of the conditional independence assumption of CTC within the hidden neural layers, and this yields notably performance improvement as well as faster convergence. The experimental results show the proposed system achieves a 1.3% CER and 3.0% WER on the GRID corpus database, leading to a 12.3% improvement compared to the state-of-the-art methods. |
Tasks | Lipreading, Speech Recognition |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04988v1 |
http://arxiv.org/pdf/1803.04988v1.pdf | |
PWC | https://paperswithcode.com/paper/lcanet-end-to-end-lipreading-with-cascaded |
Repo | |
Framework | |
Neural Networks with Structural Resistance to Adversarial Attacks
Title | Neural Networks with Structural Resistance to Adversarial Attacks |
Authors | Luca de Alfaro |
Abstract | In adversarial attacks to machine-learning classifiers, small perturbations are added to input that is correctly classified. The perturbations yield adversarial examples, which are virtually indistinguishable from the unperturbed input, and yet are misclassified. In standard neural networks used for deep learning, attackers can craft adversarial examples from most input to cause a misclassification of their choice. We introduce a new type of network units, called RBFI units, whose non-linear structure makes them inherently resistant to adversarial attacks. On permutation-invariant MNIST, in absence of adversarial attacks, networks using RBFI units match the performance of networks using sigmoid units, and are slightly below the accuracy of networks with ReLU units. When subjected to adversarial attacks, networks with RBFI units retain accuracies above 90% for attacks that degrade the accuracy of networks with ReLU or sigmoid units to below 2%. RBFI networks trained with regular input are superior in their resistance to adversarial attacks even to ReLU and sigmoid networks trained with the help of adversarial examples. The non-linear structure of RBFI units makes them difficult to train using standard gradient descent. We show that networks of RBFI units can be efficiently trained to high accuracies using pseudogradients, computed using functions especially crafted to facilitate learning instead of their true derivatives. We show that the use of pseudogradients makes training deep RBFI networks practical, and we compare several structural alternatives of RBFI networks for their accuracy. |
Tasks | |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09262v1 |
http://arxiv.org/pdf/1809.09262v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-with-structural-resistance-to |
Repo | |
Framework | |
Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots
Title | Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots |
Authors | Xin Ye, Zhe Lin, Haoxiang Li, Shibin Zheng, Yezhou Yang |
Abstract | We study the problem of learning a navigation policy for a robot to actively search for an object of interest in an indoor environment solely from its visual inputs. While scene-driven visual navigation has been widely studied, prior efforts on learning navigation policies for robots to find objects are limited. The problem is often more challenging than target scene finding as the target objects can be very small in the view and can be in an arbitrary pose. We approach the problem from an active perceiver perspective, and propose a novel framework that integrates a deep neural network based object recognition module and a deep reinforcement learning based action prediction mechanism. To validate our method, we conduct experiments on both a simulation dataset (AI2-THOR) and a real-world environment with a physical robot. We further propose a new decaying reward function to learn the control policy specific to the object searching task. Experimental results validate the efficacy of our method, which outperforms competing methods in both average trajectory length and success rate. |
Tasks | Object Recognition, Visual Navigation |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11174v1 |
http://arxiv.org/pdf/1807.11174v1.pdf | |
PWC | https://paperswithcode.com/paper/active-object-perceiver-recognition-guided |
Repo | |
Framework | |
A Deep Learning Approach for Multi-View Engagement Estimation of Children in a Child-Robot Joint Attention task
Title | A Deep Learning Approach for Multi-View Engagement Estimation of Children in a Child-Robot Joint Attention task |
Authors | Jack Hadfield, Georgia Chalvatzaki, Petros Koutras, Mehdi Khamassi, Costas S. Tzafestas, Petros Maragos |
Abstract | In this work we tackle the problem of child engagement estimation while children freely interact with a robot in their room. We propose a deep-based multi-view solution that takes advantage of recent developments in human pose detection. We extract the child’s pose from different RGB-D cameras placed elegantly in the room, fuse the results and feed them to a deep neural network trained for classifying engagement levels. The deep network contains a recurrent layer, in order to exploit the rich temporal information contained in the pose data. The resulting method outperforms a number of baseline classifiers, and provides a promising tool for better automatic understanding of a child’s attitude, interest and attention while cooperating with a robot. The goal is to integrate this model in next generation social robots as an attention monitoring tool during various CRI tasks both for Typically Developed (TD) children and children affected by autism (ASD). |
Tasks | |
Published | 2018-12-01 |
URL | http://arxiv.org/abs/1812.00253v1 |
http://arxiv.org/pdf/1812.00253v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-approach-for-multi-view |
Repo | |
Framework | |
Generalization in anti-causal learning
Title | Generalization in anti-causal learning |
Authors | Niki Kilbertus, Giambattista Parascandolo, Bernhard Schölkopf |
Abstract | The ability to learn and act in novel situations is still a prerogative of animate intelligence, as current machine learning methods mostly fail when moving beyond the standard i.i.d. setting. What is the reason for this discrepancy? Most machine learning tasks are anti-causal, i.e., we infer causes (labels) from effects (observations). Typically, in supervised learning we build systems that try to directly invert causal mechanisms. Instead, in this paper we argue that strong generalization capabilities crucially hinge on searching and validating meaningful hypotheses, requiring access to a causal model. In such a framework, we want to find a cause that leads to the observed effect. Anti-causal models are used to drive this search, but a causal model is required for validation. We investigate the fundamental differences between causal and anti-causal tasks, discuss implications for topics ranging from adversarial attacks to disentangling factors of variation, and provide extensive evidence from the literature to substantiate our view. We advocate for incorporating causal models in supervised learning to shift the paradigm from inference only, to search and validation. |
Tasks | |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00524v1 |
http://arxiv.org/pdf/1812.00524v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-in-anti-causal-learning |
Repo | |
Framework | |