Paper Group ANR 921
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks. Invariant and Equivariant Graph Networks. Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift. Towards Providing Explanations for AI Planner Decisions. JTAV: Jointly Learning Social Media Content Representati …
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks
Title | Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks |
Authors | Guangzeng Xie, Yitan Wang, Shuchang Zhou, Zhihua Zhang |
Abstract | In this paper we explore acceleration techniques for large scale nonconvex optimization problems with special focuses on deep neural networks. The extrapolation scheme is a classical approach for accelerating stochastic gradient descent for convex optimization, but it does not work well for nonconvex optimization typically. Alternatively, we propose an interpolation scheme to accelerate nonconvex optimization and call the method Interpolatron. We explain motivation behind Interpolatron and conduct a thorough empirical analysis. Empirical results on DNNs of great depths (e.g., 98-layer ResNet and 200-layer ResNet) on CIFAR-10 and ImageNet show that Interpolatron can converge much faster than the state-of-the-art methods such as the SGD with momentum and Adam. Furthermore, Anderson’s acceleration, in which mixing coefficients are computed by least-squares estimation, can also be used to improve the performance. Both Interpolatron and Anderson’s acceleration are easy to implement and tune. We also show that Interpolatron has linear convergence rate under certain regularity assumptions. |
Tasks | |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06753v1 |
http://arxiv.org/pdf/1805.06753v1.pdf | |
PWC | https://paperswithcode.com/paper/interpolatron-interpolation-or-extrapolation |
Repo | |
Framework | |
Invariant and Equivariant Graph Networks
Title | Invariant and Equivariant Graph Networks |
Authors | Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman |
Abstract | Invariant and equivariant networks have been successfully used for learning images, sets, point clouds, and graphs. A basic challenge in developing such networks is finding the maximal collection of invariant and equivariant linear layers. Although this question is answered for the first three examples (for popular transformations, at-least), a full characterization of invariant and equivariant linear layers for graphs is not known. In this paper we provide a characterization of all permutation invariant and equivariant linear layers for (hyper-)graph data, and show that their dimension, in case of edge-value graph data, is 2 and 15, respectively. More generally, for graph data defined on k-tuples of nodes, the dimension is the k-th and 2k-th Bell numbers. Orthogonal bases for the layers are computed, including generalization to multi-graph data. The constant number of basis elements and their characteristics allow successfully applying the networks to different size graphs. From the theoretical point of view, our results generalize and unify recent advancement in equivariant deep learning. In particular, we show that our model is capable of approximating any message passing neural network Applying these new linear layers in a simple deep neural network framework is shown to achieve comparable results to state-of-the-art and to have better expressivity than previous invariant and equivariant bases. |
Tasks | |
Published | 2018-12-24 |
URL | http://arxiv.org/abs/1812.09902v2 |
http://arxiv.org/pdf/1812.09902v2.pdf | |
PWC | https://paperswithcode.com/paper/invariant-and-equivariant-graph-networks |
Repo | |
Framework | |
Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift
Title | Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift |
Authors | Ruijia Xu, Ziliang Chen, Wangmeng Zuo, Junjie Yan, Liang Lin |
Abstract | Unsupervised domain adaptation (UDA) conventionally assumes labeled source samples coming from a single underlying source distribution. Whereas in practical scenario, labeled data are typically collected from diverse sources. The multiple sources are different not only from the target but also from each other, thus, domain adaptater should not be modeled in the same way. Moreover, those sources may not completely share their categories, which further brings a new transfer challenge called category shift. In this paper, we propose a deep cocktail network (DCTN) to battle the domain and category shifts among multiple sources. Motivated by the theoretical results in \cite{mansour2009domain}, the target distribution can be represented as the weighted combination of source distributions, and, the multi-source unsupervised domain adaptation via DCTN is then performed as two alternating steps: i) It deploys multi-way adversarial learning to minimize the discrepancy between the target and each of the multiple source domains, which also obtains the source-specific perplexity scores to denote the possibilities that a target sample belongs to different source domains. ii) The multi-source category classifiers are integrated with the perplexity scores to classify target sample, and the pseudo-labeled target samples together with source samples are utilized to update the multi-source category classifier and the feature extractor. We evaluate DCTN in three domain adaptation benchmarks, which clearly demonstrate the superiority of our framework. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00830v1 |
http://arxiv.org/pdf/1803.00830v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-cocktail-network-multi-source |
Repo | |
Framework | |
Towards Providing Explanations for AI Planner Decisions
Title | Towards Providing Explanations for AI Planner Decisions |
Authors | Rita Borgo, Michael Cashmore, Daniele Magazzeni |
Abstract | In order to engender trust in AI, humans must understand what an AI system is trying to achieve, and why. To overcome this problem, the underlying AI process must produce justifications and explanations that are both transparent and comprehensible to the user. AI Planning is well placed to be able to address this challenge. In this paper we present a methodology to provide initial explanations for the decisions made by the planner. Explanations are created by allowing the user to suggest alternative actions in plans and then compare the resulting plans with the one found by the planner. The methodology is implemented in the new XAI-Plan framework. |
Tasks | |
Published | 2018-10-15 |
URL | http://arxiv.org/abs/1810.06338v1 |
http://arxiv.org/pdf/1810.06338v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-providing-explanations-for-ai-planner |
Repo | |
Framework | |
JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features
Title | JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features |
Authors | Hongru Liang, Haozheng Wang, Jun Wang, Shaodi You, Zhe Sun, Jin-Mao Wei, Zhenglu Yang |
Abstract | Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrates our proposed model outperforms the state-of-the-art approaches by a large margin. |
Tasks | Information Retrieval, Recommendation Systems |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01483v1 |
http://arxiv.org/pdf/1806.01483v1.pdf | |
PWC | https://paperswithcode.com/paper/jtav-jointly-learning-social-media-content |
Repo | |
Framework | |
JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs
Title | JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs |
Authors | Eunji Jeong, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Byung-Gon Chun |
Abstract | The rapid evolution of deep neural networks is demanding deep learning (DL) frameworks not only to satisfy the requirement of quickly executing large computations, but also to support straightforward programming models for quickly implementing and experimenting with complex network structures. However, existing frameworks fail to excel in both departments simultaneously, leading to diverged efforts for optimizing performance and improving usability. This paper presents JANUS, a system that combines the advantages from both sides by transparently converting an imperative DL program written in Python, the de-facto scripting language for DL, into an efficiently executable symbolic dataflow graph. JANUS can convert various dynamic features of Python, including dynamic control flow, dynamic types, and impure functions, into elements of a symbolic dataflow graph. Experiments demonstrate that JANUS can achieve fast DL training by exploiting the techniques imposed by symbolic graph-based DL frameworks, while maintaining the simple and flexible programmability of imperative DL frameworks at the same time. |
Tasks | |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01329v2 |
http://arxiv.org/pdf/1812.01329v2.pdf | |
PWC | https://paperswithcode.com/paper/janus-fast-and-flexible-deep-learning-via |
Repo | |
Framework | |
Reciprocal Attention Fusion for Visual Question Answering
Title | Reciprocal Attention Fusion for Visual Question Answering |
Authors | Moshiur R Farazi, Salman H Khan |
Abstract | Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel attention mechanism that jointly considers reciprocal relationships between the two levels of visual details. The bottom-up attention thus generated is further coalesced with the top-down information to only focus on the scene elements that are most relevant to a given question. Our design hierarchically fuses multi-modal information i.e., language, object- and gird-level features, through an efficient tensor decomposition scheme. The proposed model improves the state-of-the-art single model performances from 67.9% to 68.2% on VQAv1 and from 65.7% to 67.4% on VQAv2, demonstrating a significant boost. |
Tasks | Question Answering, Visual Question Answering |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04247v2 |
http://arxiv.org/pdf/1805.04247v2.pdf | |
PWC | https://paperswithcode.com/paper/reciprocal-attention-fusion-for-visual |
Repo | |
Framework | |
Optimized Participation of Multiple Fusion Functions in Consensus Creation: An Evolutionary Approach
Title | Optimized Participation of Multiple Fusion Functions in Consensus Creation: An Evolutionary Approach |
Authors | Elaheh Rashedi, Abdolreza Mirzaei |
Abstract | Recent studies show that ensemble methods enhance the stability and robustness of unsupervised learning. These approaches are successfully utilized to construct multiple clustering and combine them into a one representative consensus clustering of an improved quality. The quality of the consensus clustering is directly depended on fusion functions used in combination. In this article, the hierarchical clustering ensemble techniques are extended by introducing a new evolutionary fusion function. In the proposed method, multiple hierarchical clustering methods are generated via bagging. Thereafter, the consensus clustering is obtained using the search capability of genetic algorithm among different aggregated clustering methods made by different fusion functions. Putting some popular data sets to empirical study, the quality of the proposed method is compared with regular clustering ensembles. Experimental results demonstrate the accuracy improvement of the aggregated clustering results. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12270v1 |
http://arxiv.org/pdf/1805.12270v1.pdf | |
PWC | https://paperswithcode.com/paper/optimized-participation-of-multiple-fusion |
Repo | |
Framework | |
Online Non-Additive Path Learning under Full and Partial Information
Title | Online Non-Additive Path Learning under Full and Partial Information |
Authors | Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian, Manfred K. Warmuth |
Abstract | We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction. We present new online algorithms for path learning with non-additive count-based gains for the three settings of full information, semi-bandit and full bandit with very favorable regret guarantees. A key component of our algorithms is the definition and computation of an intermediate context-dependent automaton that enables us to use existing algorithms designed for additive gains. We further apply our methods to the important application of ensemble structured prediction. Finally, beyond count-based gains, we give an efficient implementation of the EXP3 algorithm for the full bandit setting with an arbitrary (non-additive) gain. |
Tasks | Structured Prediction |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06518v4 |
http://arxiv.org/pdf/1804.06518v4.pdf | |
PWC | https://paperswithcode.com/paper/online-non-additive-path-learning-under-full |
Repo | |
Framework | |
Inference of the three-dimensional chromatin structure and its temporal behavior
Title | Inference of the three-dimensional chromatin structure and its temporal behavior |
Authors | Bianca-Cristina Cristescu, Zalán Borsos, John Lygeros, María Rodríguez Martínez, Maria Anna Rapsomaniki |
Abstract | Understanding the three-dimensional (3D) structure of the genome is essential for elucidating vital biological processes and their links to human disease. To determine how the genome folds within the nucleus, chromosome conformation capture methods such as HiC have recently been employed. However, computational methods that exploit the resulting high-throughput, high-resolution data are still suffering from important limitations. In this work, we explore the idea of manifold learning for the 3D chromatin structure inference and present a novel method, REcurrent Autoencoders for CHromatin 3D structure prediction (REACH-3D). Our framework employs autoencoders with recurrent neural units to reconstruct the chromatin structure. In comparison to existing methods, REACH-3D makes no transfer function assumption and permits dynamic analysis. Evaluating REACH-3D on synthetic data indicated high agreement with the ground truth. When tested on real experimental HiC data, REACH-3D recovered most faithfully the expected biological properties and obtained the highest correlation coefficient with microscopy measurements. Last, REACH-3D was applied to dynamic HiC data, where it successfully modeled chromatin conformation during the cell cycle. |
Tasks | |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09619v1 |
http://arxiv.org/pdf/1811.09619v1.pdf | |
PWC | https://paperswithcode.com/paper/inference-of-the-three-dimensional-chromatin |
Repo | |
Framework | |
Rehabilitating the ColorChecker Dataset for Illuminant Estimation
Title | Rehabilitating the ColorChecker Dataset for Illuminant Estimation |
Authors | Ghalia Hemrit, Graham D. Finlayson, Arjan Gijsenij, Peter Gehler, Simone Bianco, Brian Funt, Mark Drew, Lilong Shi |
Abstract | In a previous work, it was shown that there is a curious problem with the benchmark ColorChecker dataset for illuminant estimation. To wit, this dataset has at least 3 different sets of ground-truths. Typically, for a single algorithm a single ground-truth is used. But then different algorithms, whose performance is measured with respect to different ground-truths, are compared against each other and then ranked. This makes no sense. We show in this paper that there are also errors in how each ground-truth set was calculated. As a result, all performance rankings based on the ColorChecker dataset - and there are scores of these - are inaccurate. In this paper, we re-generate a new ‘recommended’ set of ground-truth based on the calculation methodology described by Shi and Funt. We then review the performance evaluation of a range of illuminant estimation algorithms. Compared with the legacy ground-truths, we find that the difference in how algorithms perform can be large, with many local rankings of algorithms being reversed. Finally, we draw the readers attention to our new ‘open’ data repository which, we hope, will allow the ColorChecker set to be rehabilitated and once again to become a useful benchmark for illuminant estimation algorithms. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.12262v3 |
http://arxiv.org/pdf/1805.12262v3.pdf | |
PWC | https://paperswithcode.com/paper/rehabilitating-the-colorchecker-dataset-for |
Repo | |
Framework | |
Improving Long-Horizon Forecasts with Expectation-Biased LSTM Networks
Title | Improving Long-Horizon Forecasts with Expectation-Biased LSTM Networks |
Authors | Aya Abdelsalam Ismail, Timothy Wood, Héctor Corrada Bravo |
Abstract | State-of-the-art forecasting methods using Recurrent Neural Net- works (RNN) based on Long-Short Term Memory (LSTM) cells have shown exceptional performance targeting short-horizon forecasts, e.g given a set of predictor features, forecast a target value for the next few time steps in the future. However, in many applica- tions, the performance of these methods decays as the forecasting horizon extends beyond these few time steps. This paper aims to explore the challenges of long-horizon forecasting using LSTM networks. Here, we illustrate the long-horizon forecasting problem in datasets from neuroscience and energy supply management. We then propose expectation-biasing, an approach motivated by the literature of Dynamic Belief Networks, as a solution to improve long-horizon forecasting using LSTMs. We propose two LSTM ar- chitectures along with two methods for expectation biasing that significantly outperforms standard practice. |
Tasks | |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06776v1 |
http://arxiv.org/pdf/1804.06776v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-long-horizon-forecasts-with |
Repo | |
Framework | |
Layer Trajectory LSTM
Title | Layer Trajectory LSTM |
Authors | Jinyu Li, Changliang Liu, Yifan Gong |
Abstract | It is popular to stack LSTM layers to get better modeling power, especially when large amount of training data is available. However, an LSTM-RNN with too many vanilla LSTM layers is very hard to train and there still exists the gradient vanishing issue if the network goes too deep. This issue can be partially solved by adding skip connections between layers, such as residual LSTM. In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM. This layer-LSTM scans the outputs from time-LSTMs, and uses the summarized layer trajectory information for final senone classification. The forward-propagation of time-LSTM and layer-LSTM can be handled in two separate threads in parallel so that the network computation time is the same as the standard time-LSTM. With a layer-LSTM running through layers, a gated path is provided from the output layer to the bottom layer, alleviating the gradient vanishing issue. Trained with 30 thousand hours of EN-US Microsoft internal data, the proposed ltLSTM performed significantly better than the standard multi-layer LSTM and residual LSTM, with up to 9.0% relative word error rate reduction across different tasks. |
Tasks | |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09522v1 |
http://arxiv.org/pdf/1808.09522v1.pdf | |
PWC | https://paperswithcode.com/paper/layer-trajectory-lstm |
Repo | |
Framework | |
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation
Title | Training Neural Speech Recognition Systems with Synthetic Speech Augmentation |
Authors | Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin |
Abstract | Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers. The lack of such open free datasets is one of the main issues preventing advancements in ASR research. To address this problem, we propose to augment a natural speech dataset with synthetic speech. We train very large end-to-end neural speech recognition models using the LibriSpeech dataset augmented with synthetic speech. These new models achieve state of the art Word Error Rate (WER) for character-level based models without an external language model. |
Tasks | Language Modelling, Speech Recognition |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.00707v1 |
http://arxiv.org/pdf/1811.00707v1.pdf | |
PWC | https://paperswithcode.com/paper/training-neural-speech-recognition-systems |
Repo | |
Framework | |
Planning and Learning with Stochastic Action Sets
Title | Planning and Learning with Stochastic Action Sets |
Authors | Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans |
Abstract | In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution. |
Tasks | Q-Learning |
Published | 2018-05-07 |
URL | http://arxiv.org/abs/1805.02363v1 |
http://arxiv.org/pdf/1805.02363v1.pdf | |
PWC | https://paperswithcode.com/paper/planning-and-learning-with-stochastic-action |
Repo | |
Framework | |