Paper Group AWR 171
Modulating early visual processing by language. Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks. Klout Topics for Modeling Interests and Expertise of Users Across Social Networks. Beyond Sparsity: Tree Regularization of Deep Mode …
Modulating early visual processing by language
Title | Modulating early visual processing by language |
Authors | Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville |
Abstract | It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial. |
Tasks | Question Answering, Visual Question Answering |
Published | 2017-07-02 |
URL | http://arxiv.org/abs/1707.00683v3 |
http://arxiv.org/pdf/1707.00683v3.pdf | |
PWC | https://paperswithcode.com/paper/modulating-early-visual-processing-by |
Repo | https://github.com/KushajveerSingh/SPADE-PyTorch |
Framework | pytorch |
Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World
Title | Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World |
Authors | Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano |
Abstract | In this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments. In the online learning setting, where new input instances arrive sequentially in batches, the neuronal-birth is implemented by adding new units with random initial weights (random dictionary elements); the number of new units is determined by the current performance (representation error) of the dictionary, higher error causing an increase in the birth rate. Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity) on the dictionary within the block-coordinate descent optimization at each iteration of our online alternating minimization scheme, which iterates between the code and dictionary updates. Finally, hidden unit connectivity adaptation is facilitated by introducing sparsity in dictionary elements. Our empirical evaluation on several real-life datasets (images and language) as well as on synthetic data demonstrates that the proposed approach can considerably outperform the state-of-art fixed-size (nonadaptive) online sparse coding of Mairal et al. (2009) in the presence of nonstationary data. Moreover, we identify certain properties of the data (e.g., sparse inputs with nearly non-overlapping supports) and of the model (e.g., dictionary sparsity) associated with such improvements. |
Tasks | Dictionary Learning, L2 Regularization, Representation Learning |
Published | 2017-01-22 |
URL | http://arxiv.org/abs/1701.06106v2 |
http://arxiv.org/pdf/1701.06106v2.pdf | |
PWC | https://paperswithcode.com/paper/neurogenesis-inspired-dictionary-learning |
Repo | https://github.com/sgarg87/neurogenesis_inspired_dictionary_learning |
Framework | none |
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
Title | Attend to You: Personalized Image Captioning with Context Sequence Memory Networks |
Authors | Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim |
Abstract | We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the user’s active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly collected Instagram dataset, consisting of 1.1M posts from 6.3K users. We propose a novel captioning model named Context Sequence Memory Network (CSMN). Its unique updates over previous memory network models include (i) exploiting memory as a repository for multiple types of context information, (ii) appending previously generated words into memory to capture long-term information without suffering from the vanishing gradient problem, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context understanding. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the effectiveness of the three novel features of CSMN and its performance enhancement for personalized image captioning over state-of-the-art captioning models. |
Tasks | Image Captioning |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.06485v2 |
http://arxiv.org/pdf/1704.06485v2.pdf | |
PWC | https://paperswithcode.com/paper/attend-to-you-personalized-image-captioning |
Repo | https://github.com/cesc-park/attend2u |
Framework | tf |
Klout Topics for Modeling Interests and Expertise of Users Across Social Networks
Title | Klout Topics for Modeling Interests and Expertise of Users Across Social Networks |
Authors | Sarah Ellinger, Prantik Bhattacharyya, Preeti Bhargava, Nemanja Spasojevic |
Abstract | This paper presents Klout Topics, a lightweight ontology to describe social media users’ topics of interest and expertise. Klout Topics is designed to: be human-readable and consumer-friendly; cover multiple domains of knowledge in depth; and promote data extensibility via knowledge base entities. We discuss why this ontology is well-suited for text labeling and interest modeling applications, and how it compares to available alternatives. We show its coverage against common social media interest sets, and examples of how it is used to model the interests of over 780M social media users on Klout.com. Finally, we open the ontology for external use. |
Tasks | |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09824v1 |
http://arxiv.org/pdf/1710.09824v1.pdf | |
PWC | https://paperswithcode.com/paper/klout-topics-for-modeling-interests-and |
Repo | https://github.com/klout/opendata |
Framework | none |
Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
Title | Beyond Sparsity: Tree Regularization of Deep Models for Interpretability |
Authors | Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez |
Abstract | The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. |
Tasks | Time Series |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06178v1 |
http://arxiv.org/pdf/1711.06178v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-sparsity-tree-regularization-of-deep |
Repo | https://github.com/wangyue2334/anomelies |
Framework | none |
Do latent tree learning models identify meaningful structure in sentences?
Title | Do latent tree learning models identify meaningful structure in sentences? |
Authors | Adina Williams, Andrew Drozdov, Samuel R. Bowman |
Abstract | Recent work on the problem of latent tree learning has made it possible to train neural networks that learn to both parse a sentence and use the resulting parse to interpret the sentence, all without exposure to ground-truth parse trees at training time. Surprisingly, these models often perform better at sentence understanding tasks than models that use parse trees from conventional parsers. This paper aims to investigate what these latent tree learning models learn. We replicate two such models in a shared codebase and find that (i) only one of these models outperforms conventional tree-structured models on sentence classification, (ii) its parsing strategies are not especially consistent across random restarts, (iii) the parses it produces tend to be shallower than standard Penn Treebank (PTB) parses, and (iv) they do not resemble those of PTB or any other semantic or syntactic formalism that the authors are aware of. |
Tasks | Sentence Classification |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.01121v2 |
http://arxiv.org/pdf/1709.01121v2.pdf | |
PWC | https://paperswithcode.com/paper/do-latent-tree-learning-models-identify |
Repo | https://github.com/NYU-MLL/spinn |
Framework | pytorch |
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Title | Asynchronous Decentralized Parallel Stochastic Gradient Descent |
Authors | Xiangru Lian, Wei Zhang, Ce Zhang, Ji Liu |
Abstract | Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a parameter server suffer from 1) communication bottleneck at parameter servers when workers are many, and 2) significantly worse convergence when the traffic to parameter server is congested. Can we design an algorithm that is robust in a heterogeneous environment, while being communication efficient and maintaining the best-possible convergence rate? In this paper, we propose an asynchronous decentralized stochastic gradient decent algorithm (AD-PSGD) satisfying all above expectations. Our theoretical analysis shows AD-PSGD converges at the optimal $O(1/\sqrt{K})$ rate as SGD and has linear speedup w.r.t. number of workers. Empirically, AD-PSGD outperforms the best of decentralized parallel SGD (D-PSGD), asynchronous parallel SGD (A-PSGD), and standard data parallel SGD (AllReduce-SGD), often by orders of magnitude in a heterogeneous environment. When training ResNet-50 on ImageNet with up to 128 GPUs, AD-PSGD converges (w.r.t epochs) similarly to the AllReduce-SGD, but each epoch can be up to 4-8X faster than its synchronous counterparts in a network-sharing HPC environment. To the best of our knowledge, AD-PSGD is the first asynchronous algorithm that achieves a similar epoch-wise convergence rate as AllReduce-SGD, at an over 100-GPU scale. |
Tasks | |
Published | 2017-10-18 |
URL | http://arxiv.org/abs/1710.06952v3 |
http://arxiv.org/pdf/1710.06952v3.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-decentralized-parallel |
Repo | https://github.com/facebookresearch/stochastic_gradient_push |
Framework | pytorch |
Contrastive-center loss for deep neural networks
Title | Contrastive-center loss for deep neural networks |
Authors | Ce Qi, Fei Su |
Abstract | The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their non-corresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastive-center loss. |
Tasks | Face Recognition, Image Classification |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07391v2 |
http://arxiv.org/pdf/1707.07391v2.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-center-loss-for-deep-neural |
Repo | https://github.com/FLHonker/Losses-in-image-classification-task |
Framework | pytorch |
How morphological development can guide evolution
Title | How morphological development can guide evolution |
Authors | Sam Kriegman, Nick Cheney, Josh Bongard |
Abstract | Organisms result from adaptive processes interacting across different time scales. One such interaction is that between development and evolution. Models have shown that development sweeps over several traits in a single agent, sometimes exposing promising static traits. Subsequent evolution can then canalize these rare traits. Thus, development can, under the right conditions, increase evolvability. Here, we report on a previously unknown phenomenon when embodied agents are allowed to develop and evolve: Evolution discovers body plans robust to control changes, these body plans become genetically assimilated, yet controllers for these agents are not assimilated. This allows evolution to continue climbing fitness gradients by tinkering with the developmental programs for controllers within these permissive body plans. This exposes a previously unknown detail about the Baldwin effect: instead of all useful traits becoming genetically assimilated, only traits that render the agent robust to changes in other traits become assimilated. We refer to this as differential canalization. This finding also has implications for the evolutionary design of artificial and embodied agents such as robots: robots robust to internal changes in their controllers may also be robust to external changes in their environment, such as transferal from simulation to reality or deployment in novel environments. |
Tasks | |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07387v5 |
http://arxiv.org/pdf/1711.07387v5.pdf | |
PWC | https://paperswithcode.com/paper/how-morphological-development-can-guide |
Repo | https://github.com/skriegman/how-devo-can-guide-evo |
Framework | none |
Real-Time Seamless Single Shot 6D Object Pose Prediction
Title | Real-Time Seamless Single Shot 6D Object Pose Prediction |
Authors | Bugra Tekin, Sudipta N. Sinha, Pascal Fua |
Abstract | We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. Unlike a recently proposed single-shot technique for this task (Kehl et al., ICCV’17) that only predicts an approximate 6D pose that must then be refined, ours is accurate enough not to require additional post-processing. As a result, it is much faster - 50 fps on a Titan X (Pascal) GPU - and more suitable for real-time processing. The key component of our method is a new CNN architecture inspired by the YOLO network design that directly predicts the 2D image locations of the projected vertices of the object’s 3D bounding box. The object’s 6D pose is then estimated using a PnP algorithm. For single object and multiple object pose estimation on the LINEMOD and OCCLUSION datasets, our approach substantially outperforms other recent CNN-based approaches when they are all used without post-processing. During post-processing, a pose refinement step can be used to boost the accuracy of the existing methods, but at 10 fps or less, they are much slower than our method. |
Tasks | 6D Pose Estimation using RGB, Pose Estimation, Pose Prediction |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.08848v5 |
http://arxiv.org/pdf/1711.08848v5.pdf | |
PWC | https://paperswithcode.com/paper/real-time-seamless-single-shot-6d-object-pose |
Repo | https://github.com/LungTakumi/SSPAndroid |
Framework | pytorch |
Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices
Title | Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices |
Authors | Mehmet Süzen, Cornelius Weber, Joan J. Cerdà |
Abstract | In this work a novel method to quantify spectral ergodicity for random matrices is presented. The new methodology combines approaches rooted in the metrics of Thirumalai-Mountain (TM) and Kullbach-Leibler (KL) divergence. The method is applied to a general study of deep and recurrent neural networks via the analysis of random matrix ensembles mimicking typical weight matrices of those systems. In particular, we examine circular random matrix ensembles: circular unitary ensemble (CUE), circular orthogonal ensemble (COE), and circular symplectic ensemble (CSE). Eigenvalue spectra and spectral ergodicity are computed for those ensembles as a function of network size. It is observed that as the matrix size increases the level of spectral ergodicity of the ensemble rises, i.e., the eigenvalue spectra obtained for a single realisation at random from the ensemble is closer to the spectra obtained averaging over the whole ensemble. Based on previous results we conjecture that success of deep learning architectures is strongly bound to the concept of spectral ergodicity. The method to compute spectral ergodicity proposed in this work could be used to optimise the size and architecture of deep as well as recurrent neural networks. |
Tasks | |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.08303v3 |
http://arxiv.org/pdf/1704.08303v3.pdf | |
PWC | https://paperswithcode.com/paper/spectral-ergodicity-in-deep-learning |
Repo | https://github.com/msuzen/bristol |
Framework | pytorch |
Graph Based Relational Features for Collective Classification
Title | Graph Based Relational Features for Collective Classification |
Authors | Immanuel Bayer, Uwe Nagel, Steffen Rendle |
Abstract | Statistical Relational Learning (SRL) methods have shown that classification accuracy can be improved by integrating relations between samples. Techniques such as iterative classification or relaxation labeling achieve this by propagating information between related samples during the inference process. When only a few samples are labeled and connections between samples are sparse, collective inference methods have shown large improvements over standard feature-based ML methods. However, in contrast to feature based ML, collective inference methods require complex inference procedures and often depend on the strong assumption of label consistency among related samples. In this paper, we introduce new relational features for standard ML methods by extracting information from direct and indirect relations. We show empirically on three standard benchmark datasets that our relational features yield results comparable to collective inference methods. Finally we show that our proposal outperforms these methods when additional information is available. |
Tasks | Relational Reasoning |
Published | 2017-02-09 |
URL | http://arxiv.org/abs/1702.02817v1 |
http://arxiv.org/pdf/1702.02817v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-relational-features-for |
Repo | https://github.com/ibayer/PAKDD2015 |
Framework | none |
Online Learning of a Memory for Learning Rates
Title | Online Learning of a Memory for Learning Rates |
Authors | Franziska Meier, Daniel Kappler, Stefan Schaal |
Abstract | The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online meta-learning algorithm that builds and optimizes a memory model of the optimal learning rate landscape from previously observed gradient behaviors. While performing task specific optimization, this memory of learning rates predicts how to scale currently observed gradients. After applying the gradient scaling our meta-learner updates its internal memory based on the observed effect its prediction had. Our meta-learner can be combined with any gradient-based optimizer, learns on the fly and can be transferred to new optimization tasks. In our evaluations we show that our meta-learning algorithm speeds up learning of MNIST classification and a variety of learning control tasks, either in batch or online learning settings. |
Tasks | Meta-Learning |
Published | 2017-09-20 |
URL | http://arxiv.org/abs/1709.06709v2 |
http://arxiv.org/pdf/1709.06709v2.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-of-a-memory-for-learning |
Repo | https://github.com/fmeier/online-meta-learning |
Framework | tf |
Neural Networks Regularization Through Class-wise Invariant Representation Learning
Title | Neural Networks Regularization Through Class-wise Invariant Representation Learning |
Authors | Soufiane Belharbi, Clément Chatelain, Romain Hérault, Sébastien Adam |
Abstract | Training deep neural networks is known to require a large number of training samples. However, in many applications only few training samples are available. In this work, we tackle the issue of training neural networks for classification task when few training samples are available. We attempt to solve this issue by proposing a new regularization term that constrains the hidden layers of a network to learn class-wise invariant representations. In our regularization framework, learning invariant representations is generalized to the class membership where samples with the same class should have the same representation. Numerical experiments over MNIST and its variants showed that our proposal helps improving the generalization of neural network particularly when trained with few samples. We provide the source code of our framework https://github.com/sbelharbi/learning-class-invariant-features . |
Tasks | Representation Learning |
Published | 2017-09-06 |
URL | http://arxiv.org/abs/1709.01867v4 |
http://arxiv.org/pdf/1709.01867v4.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-regularization-through-class |
Repo | https://github.com/sbelharbi/learning-class-invariant-features |
Framework | none |
ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network
Title | ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network |
Authors | Abhijit Guha Roy, Sailesh Conjeti, Sri Phani Krishna Karri, Debdoot Sheet, Amin Katouzian, Christian Wachinger, Nassir Navab |
Abstract | Optical coherence tomography (OCT) is used for non-invasive diagnosis of diabetic macular edema assessing the retinal layers. In this paper, we propose a new fully convolutional deep architecture, termed ReLayNet, for end-to-end segmentation of retinal layers and fluid masses in eye OCT scans. ReLayNet uses a contracting path of convolutional blocks (encoders) to learn a hierarchy of contextual features, followed by an expansive path of convolutional blocks (decoders) for semantic segmentation. ReLayNet is trained to optimize a joint loss function comprising of weighted logistic regression and Dice overlap loss. The framework is validated on a publicly available benchmark dataset with comparisons against five state-of-the-art segmentation methods including two deep learning based approaches to substantiate its effectiveness. |
Tasks | Semantic Segmentation |
Published | 2017-04-07 |
URL | http://arxiv.org/abs/1704.02161v2 |
http://arxiv.org/pdf/1704.02161v2.pdf | |
PWC | https://paperswithcode.com/paper/relaynet-retinal-layer-and-fluid-segmentation |
Repo | https://github.com/abhi4ssj/relaynet_pytorch |
Framework | pytorch |