July 30, 2019

3011 words 15 mins read

Paper Group AWR 75

A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification. The Space of Transferable Adversarial Examples. Relaxed Oracles for Semi-Supervised Clustering. Tensor Regression Networks with various Low-Rank Tensor Approximations. Detect to Track and Track to Detect. Offline bilingual word vectors, orthogonal tran …

A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification


Title	A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification
Authors	Yanan Sun, Bing Xue, Mengjie Zhang, Gary G. Yen
Abstract	Convolutional auto-encoders have shown their remarkable performance in stacking to deep convolutional neural networks for classifying image data during past several years. However, they are unable to construct the state-of-the-art convolutional neural networks due to their intrinsic architectures. In this regard, we propose a flexible convolutional auto-encoder by eliminating the constraints on the numbers of convolutional layers and pooling layers from the traditional convolutional auto-encoder. We also design an architecture discovery method by using particle swarm optimization, which is capable of automatically searching for the optimal architectures of the proposed flexible convolutional auto-encoder with much less computational resource and without any manual intervention. We use the designed architecture optimization algorithm to test the proposed flexible convolutional auto-encoder through utilizing one graphic processing unit card on four extensively used image classification datasets. Experimental results show that our work in this paper significantly outperform the peer competitors including the state-of-the-art algorithm.
Tasks	Image Classification
Published	2017-12-13
URL	http://arxiv.org/abs/1712.05042v2
PDF	http://arxiv.org/pdf/1712.05042v2.pdf
PWC	https://paperswithcode.com/paper/a-particle-swarm-optimization-based-flexible
Repo	https://github.com/yn-sun/evocae
Framework	tf

The Space of Transferable Adversarial Examples


Title	The Space of Transferable Adversarial Examples
Authors	Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
Abstract	Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time. They often transfer: the same adversarial example fools more than one model. In this work, we propose novel methods for estimating the previously unknown dimensionality of the space of adversarial inputs. We find that adversarial examples span a contiguous subspace of large (~25) dimensionality. Adversarial subspaces with higher dimensionality are more likely to intersect. We find that for two different models, a significant fraction of their subspaces is shared, thus enabling transferability. In the first quantitative analysis of the similarity of different models’ decision boundaries, we show that these boundaries are actually close in arbitrary directions, whether adversarial or benign. We conclude by formally studying the limits of transferability. We derive (1) sufficient conditions on the data distribution that imply transferability for simple model classes and (2) examples of scenarios in which transfer does not occur. These findings indicate that it may be possible to design defenses against transfer-based attacks, even for models that are vulnerable to direct attacks.
Tasks
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03453v2
PDF	http://arxiv.org/pdf/1704.03453v2.pdf
PWC	https://paperswithcode.com/paper/the-space-of-transferable-adversarial
Repo	https://github.com/panda1230/Adversarial_NoiseLearning_NoL
Framework	pytorch

Relaxed Oracles for Semi-Supervised Clustering


Title	Relaxed Oracles for Semi-Supervised Clustering
Authors	Taewan Kim, Joydeep Ghosh
Abstract	Pairwise “same-cluster” queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing “not-sure” answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle models are considered where ambiguity in answering depends on the distance between two points. We show that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle. Experimental results on synthetic and real data show the effectiveness of our approach in overcoming supervision uncertainties and yielding high quality clusters.
Tasks
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07433v1
PDF	http://arxiv.org/pdf/1711.07433v1.pdf
PWC	https://paperswithcode.com/paper/relaxed-oracles-for-semi-supervised
Repo	https://github.com/twankim/weaksemi
Framework	none

Tensor Regression Networks with various Low-Rank Tensor Approximations


Title	Tensor Regression Networks with various Low-Rank Tensor Approximations
Authors	Xingwei Cao, Guillaume Rabusseau
Abstract	Tensor regression networks achieve high compression rate of neural networks while having slight impact on performances. They do so by imposing low tensor rank structure on the weight matrices of fully connected layers. In recent years, tensor regression networks have been investigated from the perspective of their compressive power, however, the regularization effect of enforcing low-rank tensor structure has not been investigated enough. We study tensor regression networks using various low-rank tensor approximations, aiming to compare the compressive and regularization power of different low-rank constraints. We evaluate the compressive and regularization performances of the proposed model with both deep and shallow convolutional neural networks. The outcome of our experiment suggests the superiority of Global Average Pooling Layer over Tensor Regression Layer when applied to deep convolutional neural network with CIFAR-10 dataset. On the contrary, shallow convolutional neural networks with tensor regression layer and dropout achieved lower test error than both Global Average Pooling and fully-connected layer with dropout function when trained with a small number of samples.
Tasks
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09520v2
PDF	http://arxiv.org/pdf/1712.09520v2.pdf
PWC	https://paperswithcode.com/paper/tensor-regression-networks-with-various-low
Repo	https://github.com/Vixaer/LowRankTRN
Framework	tf

Detect to Track and Track to Detect


Title	Detect to Track and Track to Detect
Authors	Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman
Abstract	Recent approaches for high accuracy detection and tracking of object categories in video consist of complex multistage solutions that become more cumbersome each year. In this paper we propose a ConvNet architecture that jointly performs detection and tracking, solving the task in a simple and effective way. Our contributions are threefold: (i) we set up a ConvNet architecture for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression; (ii) we introduce correlation features that represent object co-occurrences across time to aid the ConvNet during tracking; and (iii) we link the frame level detections based on our across-frame tracklets to produce high accuracy detections at the video level. Our ConvNet architecture for spatiotemporal object detection is evaluated on the large-scale ImageNet VID dataset where it achieves state-of-the-art results. Our approach provides better single model performance than the winning method of the last ImageNet challenge while being conceptually much simpler. Finally, we show that by increasing the temporal stride we can dramatically increase the tracker speed.
Tasks	Object Detection
Published	2017-10-11
URL	http://arxiv.org/abs/1710.03958v2
PDF	http://arxiv.org/pdf/1710.03958v2.pdf
PWC	https://paperswithcode.com/paper/detect-to-track-and-track-to-detect
Repo	https://github.com/feichtenhofer/detect-track
Framework	none

Offline bilingual word vectors, orthogonal transformations and the inverted softmax


Title	Offline bilingual word vectors, orthogonal transformations and the inverted softmax
Authors	Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla
Abstract	Usually bilingual word vectors are trained “online”. Mikolov et al. showed they can also be found “offline”, whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel “inverted softmax” for identifying translation pairs, with which we improve the precision @1 of Mikolov’s original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a “pseudo-dictionary” from the identical character strings which appear in both languages, achieving 40% precision on the same test set. Finally, we extend our method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68%.
Tasks
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03859v1
PDF	http://arxiv.org/pdf/1702.03859v1.pdf
PWC	https://paperswithcode.com/paper/offline-bilingual-word-vectors-orthogonal
Repo	https://github.com/jiajunhua/facebookresearch-MUSE
Framework	pytorch

End-to-end Driving via Conditional Imitation Learning


Title	End-to-end Driving via Conditional Imitation Learning
Authors	Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, Alexey Dosovitskiy
Abstract	Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at https://youtu.be/cFtnflNe5fM
Tasks	Imitation Learning
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02410v2
PDF	http://arxiv.org/pdf/1710.02410v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-driving-via-conditional-imitation
Repo	https://github.com/bitsauce/Carla-ppo
Framework	tf

Deep CNN ensembles and suggestive annotations for infant brain MRI segmentation


Title	Deep CNN ensembles and suggestive annotations for infant brain MRI segmentation
Authors	Jose Dolz, Christian Desrosiers, Li Wang, Jing Yuan, Dinggang Shen, Ismail Ben Ayed
Abstract	Precise 3D segmentation of infant brain tissues is an essential step towards comprehensive volumetric studies and quantitative analysis of early brain developement. However, computing such segmentations is very challenging, especially for 6-month infant brain, due to the poor image quality, among other difficulties inherent to infant brain MRI, e.g., the isointense contrast between white and gray matter and the severe partial volume effect due to small brain sizes. This study investigates the problem with an ensemble of semi-dense fully convolutional neural networks (CNNs), which employs T1-weighted and T2-weighted MR images as input. We demonstrate that the ensemble agreement is highly correlated with the segmentation errors. Therefore, our method provides measures that can guide local user corrections. To the best of our knowledge, this work is the first ensemble of 3D CNNs for suggesting annotations within images. Furthermore, inspired by the very recent success of dense networks, we propose a novel architecture, SemiDenseNet, which connects all convolutional layers directly to the end of the network. Our architecture allows the efficient propagation of gradients during training, while limiting the number of parameters, requiring one order of magnitude less parameters than popular medical image segmentation networks such as 3D U-Net. Another contribution of our work is the study of the impact that early or late fusions of multiple image modalities might have on the performances of deep architectures. We report evaluations of our method on the public data of the MICCAI iSEG-2017 Challenge on 6-month infant brain MRI segmentation, and show very competitive results among 21 teams, ranking first or second in most metrics.
Tasks	Infant Brain Mri Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05319v2
PDF	http://arxiv.org/pdf/1712.05319v2.pdf
PWC	https://paperswithcode.com/paper/deep-cnn-ensembles-and-suggestive-annotations
Repo	https://github.com/josedolz/SemiDenseNet
Framework	none

Collaborative Deep Reinforcement Learning


Title	Collaborative Deep Reinforcement Learning
Authors	Kaixiang Lin, Shu Wang, Jiayu Zhou
Abstract	Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from different sources to assist the current learning goal. This collaborative learning procedure ensures that the knowledge is shared, continuously refined, and concluded from different perspectives to construct a more profound understanding. The idea of knowledge transfer has led to many advances in machine learning and data mining, but significant challenges remain, especially when it comes to reinforcement learning, heterogeneous model structures, and different learning tasks. Motivated by human collaborative learning, in this paper we propose a collaborative deep reinforcement learning (CDRL) framework that performs adaptive knowledge transfer among heterogeneous learning agents. Specifically, the proposed CDRL conducts a novel deep knowledge distillation method to address the heterogeneity among different learning tasks with a deep alignment network. Furthermore, we present an efficient collaborative Asynchronous Advantage Actor-Critic (cA3C) algorithm to incorporate deep knowledge distillation into the online training of agents, and demonstrate the effectiveness of the CDRL framework using extensive empirical evaluation on OpenAI gym.
Tasks	Transfer Learning
Published	2017-02-19
URL	http://arxiv.org/abs/1702.05796v1
PDF	http://arxiv.org/pdf/1702.05796v1.pdf
PWC	https://paperswithcode.com/paper/collaborative-deep-reinforcement-learning
Repo	https://github.com/illidanlab/cdrl
Framework	tf

Dynamic Routing Between Capsules


Title	Dynamic Routing Between Capsules
Authors	Sara Sabour, Nicholas Frosst, Geoffrey E Hinton
Abstract	A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule.
Tasks	Image Classification
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09829v2
PDF	http://arxiv.org/pdf/1710.09829v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-routing-between-capsules
Repo	https://github.com/Suraj-Panwar/Capsule_Network_based_Deep_Q_learning
Framework	tf

Leaf Counting with Deep Convolutional and Deconvolutional Networks


Title	Leaf Counting with Deep Convolutional and Deconvolutional Networks
Authors	Shubhra Aich, Ian Stavness
Abstract	In this paper, we investigate the problem of counting rosette leaves from an RGB image, an important task in plant phenotyping. We propose a data-driven approach for this task generalized over different plant species and imaging setups. To accomplish this task, we use state-of-the-art deep learning architectures: a deconvolutional network for initial segmentation and a convolutional network for leaf counting. Evaluation is performed on the leaf counting challenge dataset at CVPPP-2017. Despite the small number of training samples in this dataset, as compared to typical deep learning image sets, we obtain satisfactory performance on segmenting leaves from the background as a whole and counting the number of leaves using simple data augmentation strategies. Comparative analysis is provided against methods evaluated on the previous competition datasets. Our framework achieves mean and standard deviation of absolute count difference of 1.62 and 2.30 averaged over all five test datasets.
Tasks	Data Augmentation
Published	2017-08-24
URL	http://arxiv.org/abs/1708.07570v2
PDF	http://arxiv.org/pdf/1708.07570v2.pdf
PWC	https://paperswithcode.com/paper/leaf-counting-with-deep-convolutional-and
Repo	https://github.com/p2irc/leaf_count_ICCVW-2017
Framework	none

Deep Learning Methods for Improved Decoding of Linear Codes


Title	Deep Learning Methods for Improved Decoding of Linear Codes
Authors	Eliya Nachmani, Elad Marciano, Loren Lugosch, Warren J. Gross, David Burshtein, Yair Beery
Abstract	The problem of low complexity, close to optimal, channel decoding of linear codes with short to moderate block length is considered. It is shown that deep learning methods can be used to improve a standard belief propagation decoder, despite the large example space. Similar improvements are obtained for the min-sum algorithm. It is also shown that tying the parameters of the decoders across iterations, so as to form a recurrent neural network architecture, can be implemented with comparable results. The advantage is that significantly less parameters are required. We also introduce a recurrent neural decoder architecture based on the method of successive relaxation. Improvements over standard belief propagation are also observed on sparser Tanner graph representations of the codes. Furthermore, we demonstrate that the neural belief propagation decoder can be used to improve the performance, or alternatively reduce the computational complexity, of a close to optimal decoder of short BCH codes.
Tasks
Published	2017-06-21
URL	http://arxiv.org/abs/1706.07043v2
PDF	http://arxiv.org/pdf/1706.07043v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-methods-for-improved-decoding
Repo	https://github.com/lorenlugosch/neural-min-sum-decoding
Framework	tf

Learning Visual Reasoning Without Strong Priors


Title	Learning Visual Reasoning Without Strong Priors
Authors	Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville
Abstract	Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.
Tasks	Visual Reasoning
Published	2017-07-10
URL	http://arxiv.org/abs/1707.03017v5
PDF	http://arxiv.org/pdf/1707.03017v5.pdf
PWC	https://paperswithcode.com/paper/learning-visual-reasoning-without-strong
Repo	https://github.com/GuessWhatGame/clevr
Framework	tf

Few-Shot Learning with Graph Neural Networks


Title	Few-Shot Learning with Graph Neural Networks
Authors	Victor Garcia, Joan Bruna
Abstract	We propose to study the problem of few-shot learning with the prism of inference on a partially observed graphical model, constructed from a collection of input images whose label can be either observed or not. By assimilating generic message-passing inference algorithms with their neural-network counterparts, we define a graph neural network architecture that generalizes several of the recently proposed few-shot learning models. Besides providing improved numerical performance, our framework is easily extended to variants of few-shot learning, such as semi-supervised or active learning, demonstrating the ability of graph-based models to operate well on ‘relational’ tasks.
Tasks	Active Learning, Few-Shot Learning
Published	2017-11-10
URL	http://arxiv.org/abs/1711.04043v3
PDF	http://arxiv.org/pdf/1711.04043v3.pdf
PWC	https://paperswithcode.com/paper/few-shot-learning-with-graph-neural-networks
Repo	https://github.com/HoganZhang/few-shot-gnn
Framework	pytorch

Understanding Infographics through Textual and Visual Tag Prediction


Title	Understanding Infographics through Textual and Visual Tag Prediction
Authors	Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva
Abstract	We introduce the problem of visual hashtag discovery for infographics: extracting visual elements from an infographic that are diagnostic of its topic. Given an infographic as input, our computational approach automatically outputs textual and visual elements predicted to be representative of the infographic content. Concretely, from a curated dataset of 29K large infographic images sampled across 26 categories and 391 tags, we present an automated two step approach. First, we extract the text from an infographic and use it to predict text tags indicative of the infographic content. And second, we use these predicted text tags as a supervisory signal to localize the most diagnostic visual elements from within the infographic i.e. visual hashtags. We report performances on a categorization and multi-label tag prediction problem and compare our proposed visual hashtags to human annotations.
Tasks
Published	2017-09-26
URL	http://arxiv.org/abs/1709.09215v1
PDF	http://arxiv.org/pdf/1709.09215v1.pdf
PWC	https://paperswithcode.com/paper/understanding-infographics-through-textual
Repo	https://github.com/cvzoya/visuallydata
Framework	none