July 29, 2019

3007 words 15 mins read

Paper Group AWR 171

Modulating early visual processing by language. Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks. Klout Topics for Modeling Interests and Expertise of Users Across Social Networks. Beyond Sparsity: Tree Regularization of Deep Mode …

Modulating early visual processing by language


Title	Modulating early visual processing by language
Authors	Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville
Abstract	It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Tasks	Question Answering, Visual Question Answering
Published	2017-07-02
URL	http://arxiv.org/abs/1707.00683v3
PDF	http://arxiv.org/pdf/1707.00683v3.pdf
PWC	https://paperswithcode.com/paper/modulating-early-visual-processing-by
Repo	https://github.com/KushajveerSingh/SPADE-PyTorch
Framework	pytorch

Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World


Title	Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World
Authors	Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
Abstract	In this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments. In the online learning setting, where new input instances arrive sequentially in batches, the neuronal-birth is implemented by adding new units with random initial weights (random dictionary elements); the number of new units is determined by the current performance (representation error) of the dictionary, higher error causing an increase in the birth rate. Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity) on the dictionary within the block-coordinate descent optimization at each iteration of our online alternating minimization scheme, which iterates between the code and dictionary updates. Finally, hidden unit connectivity adaptation is facilitated by introducing sparsity in dictionary elements. Our empirical evaluation on several real-life datasets (images and language) as well as on synthetic data demonstrates that the proposed approach can considerably outperform the state-of-art fixed-size (nonadaptive) online sparse coding of Mairal et al. (2009) in the presence of nonstationary data. Moreover, we identify certain properties of the data (e.g., sparse inputs with nearly non-overlapping supports) and of the model (e.g., dictionary sparsity) associated with such improvements.
Tasks	Dictionary Learning, L2 Regularization, Representation Learning
Published	2017-01-22
URL	http://arxiv.org/abs/1701.06106v2
PDF	http://arxiv.org/pdf/1701.06106v2.pdf
PWC	https://paperswithcode.com/paper/neurogenesis-inspired-dictionary-learning
Repo	https://github.com/sgarg87/neurogenesis_inspired_dictionary_learning
Framework	none

Attend to You: Personalized Image Captioning with Context Sequence Memory Networks


Title	Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
Authors	Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim
Abstract	We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the user’s active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly collected Instagram dataset, consisting of 1.1M posts from 6.3K users. We propose a novel captioning model named Context Sequence Memory Network (CSMN). Its unique updates over previous memory network models include (i) exploiting memory as a repository for multiple types of context information, (ii) appending previously generated words into memory to capture long-term information without suffering from the vanishing gradient problem, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context understanding. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the effectiveness of the three novel features of CSMN and its performance enhancement for personalized image captioning over state-of-the-art captioning models.
Tasks	Image Captioning
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06485v2
PDF	http://arxiv.org/pdf/1704.06485v2.pdf
PWC	https://paperswithcode.com/paper/attend-to-you-personalized-image-captioning
Repo	https://github.com/cesc-park/attend2u
Framework	tf


Title	Klout Topics for Modeling Interests and Expertise of Users Across Social Networks
Authors	Sarah Ellinger, Prantik Bhattacharyya, Preeti Bhargava, Nemanja Spasojevic
Abstract	This paper presents Klout Topics, a lightweight ontology to describe social media users’ topics of interest and expertise. Klout Topics is designed to: be human-readable and consumer-friendly; cover multiple domains of knowledge in depth; and promote data extensibility via knowledge base entities. We discuss why this ontology is well-suited for text labeling and interest modeling applications, and how it compares to available alternatives. We show its coverage against common social media interest sets, and examples of how it is used to model the interests of over 780M social media users on Klout.com. Finally, we open the ontology for external use.
Tasks
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09824v1
PDF	http://arxiv.org/pdf/1710.09824v1.pdf
PWC	https://paperswithcode.com/paper/klout-topics-for-modeling-interests-and
Repo	https://github.com/klout/opendata
Framework	none

Beyond Sparsity: Tree Regularization of Deep Models for Interpretability


Title	Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
Authors	Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez
Abstract	The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power.
Tasks	Time Series
Published	2017-11-16
URL	http://arxiv.org/abs/1711.06178v1
PDF	http://arxiv.org/pdf/1711.06178v1.pdf
PWC	https://paperswithcode.com/paper/beyond-sparsity-tree-regularization-of-deep
Repo	https://github.com/wangyue2334/anomelies
Framework	none

Do latent tree learning models identify meaningful structure in sentences?


Title	Do latent tree learning models identify meaningful structure in sentences?
Authors	Adina Williams, Andrew Drozdov, Samuel R. Bowman
Abstract	Recent work on the problem of latent tree learning has made it possible to train neural networks that learn to both parse a sentence and use the resulting parse to interpret the sentence, all without exposure to ground-truth parse trees at training time. Surprisingly, these models often perform better at sentence understanding tasks than models that use parse trees from conventional parsers. This paper aims to investigate what these latent tree learning models learn. We replicate two such models in a shared codebase and find that (i) only one of these models outperforms conventional tree-structured models on sentence classification, (ii) its parsing strategies are not especially consistent across random restarts, (iii) the parses it produces tend to be shallower than standard Penn Treebank (PTB) parses, and (iv) they do not resemble those of PTB or any other semantic or syntactic formalism that the authors are aware of.
Tasks	Sentence Classification
Published	2017-09-04
URL	http://arxiv.org/abs/1709.01121v2
PDF	http://arxiv.org/pdf/1709.01121v2.pdf
PWC	https://paperswithcode.com/paper/do-latent-tree-learning-models-identify
Repo	https://github.com/NYU-MLL/spinn
Framework	pytorch

Asynchronous Decentralized Parallel Stochastic Gradient Descent


Title	Asynchronous Decentralized Parallel Stochastic Gradient Descent
Authors	Xiangru Lian, Wei Zhang, Ce Zhang, Ji Liu
Abstract	Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a parameter server suffer from 1) communication bottleneck at parameter servers when workers are many, and 2) significantly worse convergence when the traffic to parameter server is congested. Can we design an algorithm that is robust in a heterogeneous environment, while being communication efficient and maintaining the best-possible convergence rate? In this paper, we propose an asynchronous decentralized stochastic gradient decent algorithm (AD-PSGD) satisfying all above expectations. Our theoretical analysis shows AD-PSGD converges at the optimal $O(1/\sqrt{K})$ rate as SGD and has linear speedup w.r.t. number of workers. Empirically, AD-PSGD outperforms the best of decentralized parallel SGD (D-PSGD), asynchronous parallel SGD (A-PSGD), and standard data parallel SGD (AllReduce-SGD), often by orders of magnitude in a heterogeneous environment. When training ResNet-50 on ImageNet with up to 128 GPUs, AD-PSGD converges (w.r.t epochs) similarly to the AllReduce-SGD, but each epoch can be up to 4-8X faster than its synchronous counterparts in a network-sharing HPC environment. To the best of our knowledge, AD-PSGD is the first asynchronous algorithm that achieves a similar epoch-wise convergence rate as AllReduce-SGD, at an over 100-GPU scale.
Tasks
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06952v3
PDF	http://arxiv.org/pdf/1710.06952v3.pdf
PWC	https://paperswithcode.com/paper/asynchronous-decentralized-parallel
Repo	https://github.com/facebookresearch/stochastic_gradient_push
Framework	pytorch

Contrastive-center loss for deep neural networks


Title	Contrastive-center loss for deep neural networks
Authors	Ce Qi, Fei Su
Abstract	The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their non-corresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastive-center loss.
Tasks	Face Recognition, Image Classification
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07391v2
PDF	http://arxiv.org/pdf/1707.07391v2.pdf
PWC	https://paperswithcode.com/paper/contrastive-center-loss-for-deep-neural
Repo	https://github.com/FLHonker/Losses-in-image-classification-task
Framework	pytorch

How morphological development can guide evolution


Title	How morphological development can guide evolution
Authors	Sam Kriegman, Nick Cheney, Josh Bongard
Abstract	Organisms result from adaptive processes interacting across different time scales. One such interaction is that between development and evolution. Models have shown that development sweeps over several traits in a single agent, sometimes exposing promising static traits. Subsequent evolution can then canalize these rare traits. Thus, development can, under the right conditions, increase evolvability. Here, we report on a previously unknown phenomenon when embodied agents are allowed to develop and evolve: Evolution discovers body plans robust to control changes, these body plans become genetically assimilated, yet controllers for these agents are not assimilated. This allows evolution to continue climbing fitness gradients by tinkering with the developmental programs for controllers within these permissive body plans. This exposes a previously unknown detail about the Baldwin effect: instead of all useful traits becoming genetically assimilated, only traits that render the agent robust to changes in other traits become assimilated. We refer to this as differential canalization. This finding also has implications for the evolutionary design of artificial and embodied agents such as robots: robots robust to internal changes in their controllers may also be robust to external changes in their environment, such as transferal from simulation to reality or deployment in novel environments.
Tasks
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07387v5
PDF	http://arxiv.org/pdf/1711.07387v5.pdf
PWC	https://paperswithcode.com/paper/how-morphological-development-can-guide
Repo	https://github.com/skriegman/how-devo-can-guide-evo
Framework	none

Real-Time Seamless Single Shot 6D Object Pose Prediction


Title	Real-Time Seamless Single Shot 6D Object Pose Prediction
Authors	Bugra Tekin, Sudipta N. Sinha, Pascal Fua
Abstract	We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. Unlike a recently proposed single-shot technique for this task (Kehl et al., ICCV’17) that only predicts an approximate 6D pose that must then be refined, ours is accurate enough not to require additional post-processing. As a result, it is much faster - 50 fps on a Titan X (Pascal) GPU - and more suitable for real-time processing. The key component of our method is a new CNN architecture inspired by the YOLO network design that directly predicts the 2D image locations of the projected vertices of the object’s 3D bounding box. The object’s 6D pose is then estimated using a PnP algorithm. For single object and multiple object pose estimation on the LINEMOD and OCCLUSION datasets, our approach substantially outperforms other recent CNN-based approaches when they are all used without post-processing. During post-processing, a pose refinement step can be used to boost the accuracy of the existing methods, but at 10 fps or less, they are much slower than our method.
Tasks	6D Pose Estimation using RGB, Pose Estimation, Pose Prediction
Published	2017-11-24
URL	http://arxiv.org/abs/1711.08848v5
PDF	http://arxiv.org/pdf/1711.08848v5.pdf
PWC	https://paperswithcode.com/paper/real-time-seamless-single-shot-6d-object-pose
Repo	https://github.com/LungTakumi/SSPAndroid
Framework	pytorch

Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices


Title	Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices
Authors	Mehmet Süzen, Cornelius Weber, Joan J. Cerdà
Abstract	In this work a novel method to quantify spectral ergodicity for random matrices is presented. The new methodology combines approaches rooted in the metrics of Thirumalai-Mountain (TM) and Kullbach-Leibler (KL) divergence. The method is applied to a general study of deep and recurrent neural networks via the analysis of random matrix ensembles mimicking typical weight matrices of those systems. In particular, we examine circular random matrix ensembles: circular unitary ensemble (CUE), circular orthogonal ensemble (COE), and circular symplectic ensemble (CSE). Eigenvalue spectra and spectral ergodicity are computed for those ensembles as a function of network size. It is observed that as the matrix size increases the level of spectral ergodicity of the ensemble rises, i.e., the eigenvalue spectra obtained for a single realisation at random from the ensemble is closer to the spectra obtained averaging over the whole ensemble. Based on previous results we conjecture that success of deep learning architectures is strongly bound to the concept of spectral ergodicity. The method to compute spectral ergodicity proposed in this work could be used to optimise the size and architecture of deep as well as recurrent neural networks.
Tasks
Published	2017-04-25
URL	http://arxiv.org/abs/1704.08303v3
PDF	http://arxiv.org/pdf/1704.08303v3.pdf
PWC	https://paperswithcode.com/paper/spectral-ergodicity-in-deep-learning
Repo	https://github.com/msuzen/bristol
Framework	pytorch

Graph Based Relational Features for Collective Classification


Title	Graph Based Relational Features for Collective Classification
Authors	Immanuel Bayer, Uwe Nagel, Steffen Rendle
Abstract	Statistical Relational Learning (SRL) methods have shown that classification accuracy can be improved by integrating relations between samples. Techniques such as iterative classification or relaxation labeling achieve this by propagating information between related samples during the inference process. When only a few samples are labeled and connections between samples are sparse, collective inference methods have shown large improvements over standard feature-based ML methods. However, in contrast to feature based ML, collective inference methods require complex inference procedures and often depend on the strong assumption of label consistency among related samples. In this paper, we introduce new relational features for standard ML methods by extracting information from direct and indirect relations. We show empirically on three standard benchmark datasets that our relational features yield results comparable to collective inference methods. Finally we show that our proposal outperforms these methods when additional information is available.
Tasks	Relational Reasoning
Published	2017-02-09
URL	http://arxiv.org/abs/1702.02817v1
PDF	http://arxiv.org/pdf/1702.02817v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-relational-features-for
Repo	https://github.com/ibayer/PAKDD2015
Framework	none

Online Learning of a Memory for Learning Rates


Title	Online Learning of a Memory for Learning Rates
Authors	Franziska Meier, Daniel Kappler, Stefan Schaal
Abstract	The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online meta-learning algorithm that builds and optimizes a memory model of the optimal learning rate landscape from previously observed gradient behaviors. While performing task specific optimization, this memory of learning rates predicts how to scale currently observed gradients. After applying the gradient scaling our meta-learner updates its internal memory based on the observed effect its prediction had. Our meta-learner can be combined with any gradient-based optimizer, learns on the fly and can be transferred to new optimization tasks. In our evaluations we show that our meta-learning algorithm speeds up learning of MNIST classification and a variety of learning control tasks, either in batch or online learning settings.
Tasks	Meta-Learning
Published	2017-09-20
URL	http://arxiv.org/abs/1709.06709v2
PDF	http://arxiv.org/pdf/1709.06709v2.pdf
PWC	https://paperswithcode.com/paper/online-learning-of-a-memory-for-learning
Repo	https://github.com/fmeier/online-meta-learning
Framework	tf

Neural Networks Regularization Through Class-wise Invariant Representation Learning


Title	Neural Networks Regularization Through Class-wise Invariant Representation Learning
Authors	Soufiane Belharbi, Clément Chatelain, Romain Hérault, Sébastien Adam
Abstract	Training deep neural networks is known to require a large number of training samples. However, in many applications only few training samples are available. In this work, we tackle the issue of training neural networks for classification task when few training samples are available. We attempt to solve this issue by proposing a new regularization term that constrains the hidden layers of a network to learn class-wise invariant representations. In our regularization framework, learning invariant representations is generalized to the class membership where samples with the same class should have the same representation. Numerical experiments over MNIST and its variants showed that our proposal helps improving the generalization of neural network particularly when trained with few samples. We provide the source code of our framework https://github.com/sbelharbi/learning-class-invariant-features .
Tasks	Representation Learning
Published	2017-09-06
URL	http://arxiv.org/abs/1709.01867v4
PDF	http://arxiv.org/pdf/1709.01867v4.pdf
PWC	https://paperswithcode.com/paper/neural-networks-regularization-through-class
Repo	https://github.com/sbelharbi/learning-class-invariant-features
Framework	none

ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network


Title	ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network
Authors	Abhijit Guha Roy, Sailesh Conjeti, Sri Phani Krishna Karri, Debdoot Sheet, Amin Katouzian, Christian Wachinger, Nassir Navab
Abstract	Optical coherence tomography (OCT) is used for non-invasive diagnosis of diabetic macular edema assessing the retinal layers. In this paper, we propose a new fully convolutional deep architecture, termed ReLayNet, for end-to-end segmentation of retinal layers and fluid masses in eye OCT scans. ReLayNet uses a contracting path of convolutional blocks (encoders) to learn a hierarchy of contextual features, followed by an expansive path of convolutional blocks (decoders) for semantic segmentation. ReLayNet is trained to optimize a joint loss function comprising of weighted logistic regression and Dice overlap loss. The framework is validated on a publicly available benchmark dataset with comparisons against five state-of-the-art segmentation methods including two deep learning based approaches to substantiate its effectiveness.
Tasks	Semantic Segmentation
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02161v2
PDF	http://arxiv.org/pdf/1704.02161v2.pdf
PWC	https://paperswithcode.com/paper/relaynet-retinal-layer-and-fluid-segmentation
Repo	https://github.com/abhi4ssj/relaynet_pytorch
Framework	pytorch