Paper Group ANR 154
REVE: Regularizing Deep Learning with Variational Entropy Bound. Rule based Approach for Word Normalization by resolving Transcription Ambiguity in Transliterated Search Queries. Memory-Efficient Adaptive Optimization. Improving Semantic Parsing with Neural Generator-Reranker Architecture. How to Evaluate the Next System: Automatic Dialogue Evaluat …
REVE: Regularizing Deep Learning with Variational Entropy Bound
Title | REVE: Regularizing Deep Learning with Variational Entropy Bound |
Authors | Antoine Saporta, Yifu Chen, Michael Blot, Matthieu Cord |
Abstract | Studies on generalization performance of machine learning algorithms under the scope of information theory suggest that compressed representations can guarantee good generalization, inspiring many compression-based regularization methods. In this paper, we introduce REVE, a new regularization scheme. Noting that compressing the representation can be sub-optimal, our first contribution is to identify a variable that is directly responsible for the final prediction. Our method aims at compressing the class conditioned entropy of this latter variable. Second, we introduce a variational upper bound on this conditional entropy term. Finally, we propose a scheme to instantiate a tractable loss that is integrated within the training procedure of the neural network and demonstrate its efficiency on different neural networks and datasets. |
Tasks | |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06816v1 |
https://arxiv.org/pdf/1910.06816v1.pdf | |
PWC | https://paperswithcode.com/paper/reve-regularizing-deep-learning-with |
Repo | |
Framework | |
Rule based Approach for Word Normalization by resolving Transcription Ambiguity in Transliterated Search Queries
Title | Rule based Approach for Word Normalization by resolving Transcription Ambiguity in Transliterated Search Queries |
Authors | Varsha Pathak, Manish Joshi |
Abstract | Query term matching with document term matching is the basic function of any best effort Information Retrieval models like Vector Space Model. In our problem of SMS based Information Systems we expect common people to participate in information search. Our system allows mobile users to formulate their queries in their own words, own transliteration style and spelling formation. To achieve this flexibility we have resolved the term level ambiguity due to inherent transcription noise in user query terms. We have developed a rule based approach to select most relevantly close standard term for each noisy term in the user query. We have used four different versions of the rule based algorithm with variation in the rule set. We have formulated this rule set including the basic Levenshtein minimum edit distance algorithm for term matching. This paper presents the experiments and corresponding results of Marathi and Hindi language literature information system. We have experimented on Marathi and Hindi literature which include songs, gazals, powadas, bharud and other types in a standard transliteration form like ITRANS. |
Tasks | Information Retrieval, Transliteration |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07233v1 |
https://arxiv.org/pdf/1910.07233v1.pdf | |
PWC | https://paperswithcode.com/paper/rule-based-approach-for-word-normalization-by |
Repo | |
Framework | |
Memory-Efficient Adaptive Optimization
Title | Memory-Efficient Adaptive Optimization |
Authors | Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer |
Abstract | Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter, thus introducing significant memory overheads that restrict the size of the model being used as well as the number of examples in a mini-batch. We describe an effective and flexible adaptive optimization method with greatly reduced memory overhead. Our method retains the benefits of per-parameter adaptivity while allowing significantly larger models and batch sizes. We give convergence guarantees for our method, and demonstrate its effectiveness in training very large translation and language models with up to 2-fold speedups compared to the state-of-the-art. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-01-30 |
URL | https://arxiv.org/abs/1901.11150v2 |
https://arxiv.org/pdf/1901.11150v2.pdf | |
PWC | https://paperswithcode.com/paper/memory-efficient-adaptive-optimization-for |
Repo | |
Framework | |
Improving Semantic Parsing with Neural Generator-Reranker Architecture
Title | Improving Semantic Parsing with Neural Generator-Reranker Architecture |
Authors | Huseyin A. Inan, Gaurav Singh Tomar, Huapu Pan |
Abstract | Semantic parsing is the problem of deriving machine interpretable meaning representations from natural language utterances. Neural models with encoder-decoder architectures have recently achieved substantial improvements over traditional methods. Although neural semantic parsers appear to have relatively high recall using large beam sizes, there is room for improvement with respect to one-best precision. In this work, we propose a generator-reranker architecture for semantic parsing. The generator produces a list of potential candidates and the reranker, which consists of a pre-processing step for the candidates followed by a novel critic network, reranks these candidates based on the similarity between each candidate and the input sentence. We show the advantages of this approach along with how it improves the parsing performance through extensive analysis. We experiment our model on three semantic parsing datasets (GEO, ATIS, and OVERNIGHT). The overall architecture achieves the state-of-the-art results in all three datasets. |
Tasks | Semantic Parsing |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12764v1 |
https://arxiv.org/pdf/1909.12764v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-semantic-parsing-with-neural |
Repo | |
Framework | |
How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning
Title | How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning |
Authors | Lu Li, Zhongheng He, Xiangyang Zhou, Dianhai Yu |
Abstract | Automatic dialogue evaluation plays a crucial role in open-domain dialogue research. Previous works train neural networks with limited annotation for conducting automatic dialogue evaluation, which would naturally affect the evaluation fairness as dialogue systems close to the scope of training corpus would have more preference than the other ones. In this paper, we study alleviating this problem from the perspective of continual learning: given an existing neural dialogue evaluator and the next system to be evaluated, we fine-tune the learned neural evaluator by selectively forgetting/updating its parameters, to jointly fit dialogue systems have been and will be evaluated. Our motivation is to seek for a lifelong and low-cost automatic evaluation for dialogue systems, rather than to reconstruct the evaluator over and over again. Experimental results show that our continual evaluator achieves comparable performance with reconstructing new evaluators, while requires significantly lower resources. |
Tasks | Continual Learning |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04664v1 |
https://arxiv.org/pdf/1912.04664v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-evaluate-the-next-system-automatic |
Repo | |
Framework | |
Marpa, A practical general parser: the recognizer
Title | Marpa, A practical general parser: the recognizer |
Authors | Jeffrey Kegler |
Abstract | The Marpa recognizer is described. Marpa is a practical and fully implemented algorithm for the recognition, parsing and evaluation of context-free grammars. The Marpa recognizer is the first to unite the improvements to Earley’s algorithm found in Joop Leo’s 1991 paper to those in Aycock and Horspool’s 2002 paper. Marpa tracks the full state of the parse, at it proceeds, in a form convenient for the application. This greatly improves error detection and enables event-driven parsing. One such technique is “Ruby Slippers” parsing, in which the input is altered in response to the parser’s expectations. |
Tasks | |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.08129v1 |
https://arxiv.org/pdf/1910.08129v1.pdf | |
PWC | https://paperswithcode.com/paper/marpa-a-practical-general-parser-the |
Repo | |
Framework | |
ExpertoCoder: Capturing Divergent Brain Regions Using Mixture of Regression Experts
Title | ExpertoCoder: Capturing Divergent Brain Regions Using Mixture of Regression Experts |
Authors | Subba Reddy Oota, Naresh Manwani, Raju S. Bapi |
Abstract | fMRI semantic category understanding using linguistic encoding models attempts to learn a forward mapping that relates stimuli to the corresponding brain activation. Classical encoding models use linear multivariate methods to predict brain activation (all the voxels) given the stimulus. However, these methods mainly assume multiple regions as one vast uniform region or several independent regions, ignoring connections among them. In this paper, we present a mixture of experts model for predicting brain activity patterns. Given a new stimulus, the model predicts the entire brain activation as a weighted linear combination of activation of multiple experts. We argue that each expert captures activity patterns related to a particular region of interest (ROI) in the human brain. Thus, the utility of the proposed model is twofold. It not only accurately predicts the brain activation for a given stimulus, but it also reveals the level of activation of individual brain regions. Results of our experiments highlight the importance of the proposed model for predicting brain activation. This study also helps in understanding which of the brain regions get activated together, given a certain kind of stimulus. Importantly, we suggest that the mixture of regression experts (MoRE) framework successfully combines the two principles of organization of function in the brain, namely that of specialization and integration. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12299v1 |
https://arxiv.org/pdf/1909.12299v1.pdf | |
PWC | https://paperswithcode.com/paper/expertocoder-capturing-divergent-brain |
Repo | |
Framework | |
Back-Projection based Fidelity Term for Ill-Posed Linear Inverse Problems
Title | Back-Projection based Fidelity Term for Ill-Posed Linear Inverse Problems |
Authors | Tom Tirer, Raja Giryes |
Abstract | Ill-posed linear inverse problems appear in many image processing applications, such as deblurring, super-resolution and compressed sensing. Many restoration strategies involve minimizing a cost function, which is composed of fidelity and prior terms, balanced by a regularization parameter. While a vast amount of research has been focused on different prior models, the fidelity term is almost always chosen to be the least squares (LS) objective, that encourages fitting the linearly transformed optimization variable to the observations. In this paper, we examine a different fidelity term, which has been implicitly used by the recently proposed iterative denoising and backward projections (IDBP) framework. This term encourages agreement between the projection of the optimization variable onto the row space of the linear operator and the pseudo-inverse of the linear operator (“back-projection”) applied on the observations. We analytically examine the difference between the two fidelity terms for Tikhonov regularization and identify cases (such as a badly conditioned linear operator) where the new term has an advantage over the standard LS one. Moreover, we demonstrate empirically that the behavior of the two induced cost functions for sophisticated convex and non-convex priors, such as total-variation, BM3D, and deep generative models, correlates with the obtained theoretical analysis. |
Tasks | Deblurring, Denoising, Super-Resolution |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06794v2 |
https://arxiv.org/pdf/1906.06794v2.pdf | |
PWC | https://paperswithcode.com/paper/back-projection-based-fidelity-term-for-ill |
Repo | |
Framework | |
Label Dependent Deep Variational Paraphrase Generation
Title | Label Dependent Deep Variational Paraphrase Generation |
Authors | Siamak Shakeri, Abhinav Sethy |
Abstract | Generating paraphrases that are lexically similar but semantically different is a challenging task. Paraphrases of this form can be used to augment data sets for various NLP tasks such as machine reading comprehension and question answering with non-trivial negative examples. In this article, we propose a deep variational model to generate paraphrases conditioned on a label that specifies whether the paraphrases are semantically related or not. We also present new training recipes and KL regularization techniques that improve the performance of variational paraphrasing models. Our proposed model demonstrates promising results in enhancing the generative power of the model by employing label-dependent generation on paraphrasing datasets. |
Tasks | Machine Reading Comprehension, Paraphrase Generation, Question Answering, Reading Comprehension |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.11952v1 |
https://arxiv.org/pdf/1911.11952v1.pdf | |
PWC | https://paperswithcode.com/paper/label-dependent-deep-variational-paraphrase |
Repo | |
Framework | |
Decentralized Multi-Agent Actor-Critic with Generative Inference
Title | Decentralized Multi-Agent Actor-Critic with Generative Inference |
Authors | Kevin Corder, Manuel M. Vindiola, Keith Decker |
Abstract | Recent multi-agent actor-critic methods have utilized centralized training with decentralized execution to address the non-stationarity of co-adapting agents. This training paradigm constrains learning to the centralized phase such that only pre-learned policies may be used during the decentralized phase, which performs poorly when agent communications are delayed, noisy, or disrupted. In this work, we propose a new system that can gracefully handle partially-observable information due to communication disruptions during decentralized execution. Our approach augments the multi-agent actor-critic method’s centralized training phase with generative modeling so that agents may infer other agents’ observations when provided with locally available context. Our method is evaluated on three tasks that require agents to combine local and remote observations communicated by other agents. We evaluate our approach by introducing both partial observability during decentralized execution, and show that decentralized training on inferred observations performs as well or better than existing actor-critic methods. |
Tasks | |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.03058v1 |
https://arxiv.org/pdf/1910.03058v1.pdf | |
PWC | https://paperswithcode.com/paper/decentralized-multi-agent-actor-critic-with |
Repo | |
Framework | |
Winning Isn’t Everything: Enhancing Game Development with Intelligent Agents
Title | Winning Isn’t Everything: Enhancing Game Development with Intelligent Agents |
Authors | Yunqi Zhao, Igor Borovikov, Fernando de Mesentier Silva, Ahmad Beirami, Jason Rupert, Caedmon Somers, Jesse Harder, John Kolen, Jervis Pinto, Reza Pourabolghasem, James Pestrak, Harold Chaput, Mohsen Sardari, Long Lin, Sundeep Narravula, Navid Aghdaie, Kazi Zaman |
Abstract | Recently, there have been several high-profile achievements of agents learning to play games against humans and beat them. In this paper, we study the problem of training intelligent agents in service of game development. Unlike the agents built to “beat the game”, our agents aim to produce human-like behavior to help with game evaluation and balancing. We discuss two fundamental metrics based on which we measure the human-likeness of agents, namely skill and style, which are multi-faceted concepts with practical implications outlined in this paper. We report four case studies in which the style and skill requirements inform the choice of algorithms and metrics used to train agents; ranging from A* search to state-of-the-art deep reinforcement learning. We, further, show that the learning potential of state-of-the-art deep RL models does not seamlessly transfer from the benchmark environments to target ones without heavily tuning their hyperparameters, leading to linear scaling of the engineering efforts and computational cost with the number of target domains. |
Tasks | |
Published | 2019-03-25 |
URL | https://arxiv.org/abs/1903.10545v3 |
https://arxiv.org/pdf/1903.10545v3.pdf | |
PWC | https://paperswithcode.com/paper/winning-isnt-everything-training-human-like |
Repo | |
Framework | |
Understanding Important Features of Deep Learning Models for Transmission Electron Microscopy Image Segmentation
Title | Understanding Important Features of Deep Learning Models for Transmission Electron Microscopy Image Segmentation |
Authors | James P. Horwath, Dmitri N. Zakharov, Remi Megret, Eric A. Stach |
Abstract | Cutting edge deep learning techniques allow for image segmentation with great speed and accuracy. However, application to problems in materials science is often difficult since these complex models may have difficultly learning physical parameters. In situ electron microscopy provides a clear platform for utilizing automated image analysis. In this work we consider the case of studying coarsening dynamics in supported nanoparticles, which is important for understanding e.g. the degradation of industrial catalysts. By systematically studying dataset preparation, neural network architecture, and accuracy evaluation, we describe important considerations in applying deep learning to physical applications, where generalizable and convincing models are required. |
Tasks | Electron Microscopy Image Segmentation, Semantic Segmentation |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06077v1 |
https://arxiv.org/pdf/1912.06077v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-important-features-of-deep |
Repo | |
Framework | |
Distributionally Robust Optimization: A Review
Title | Distributionally Robust Optimization: A Review |
Authors | Hamed Rahimian, Sanjay Mehrotra |
Abstract | The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization. |
Tasks | |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.05659v1 |
https://arxiv.org/pdf/1908.05659v1.pdf | |
PWC | https://paperswithcode.com/paper/distributionally-robust-optimization-a-review |
Repo | |
Framework | |
Learning Local Forward Models on Unforgiving Games
Title | Learning Local Forward Models on Unforgiving Games |
Authors | Alexander Dockhorn, Simon M. Lucas, Vanessa Volz, Ivan Bravi, Raluca D. Gaina, Diego Perez-Liebana |
Abstract | This paper examines learning approaches for forward models based on local cell transition functions. We provide a formal definition of local forward models for which we propose two basic learning approaches. Our analysis is based on the game Sokoban, where a wrong action can lead to an unsolvable game state. Therefore, an accurate prediction of an action’s resulting state is necessary to avoid this scenario. In contrast to learning the complete state transition function, local forward models allow extracting multiple training examples from a single state transition. In this way, the Hash Set model, as well as the Decision Tree model, quickly learn to predict upcoming state transitions of both the training and the test set. Applying the model using a statistical forward planner showed that the best models can be used to satisfying degree even in cases in which the test levels have not yet been seen. Our evaluation includes an analysis of various local neighbourhood patterns and sizes to test the learners’ capabilities in case too few or too many attributes are extracted, of which the latter has shown do degrade the performance of the model learner. |
Tasks | |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00442v1 |
https://arxiv.org/pdf/1909.00442v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-local-forward-models-on-unforgiving |
Repo | |
Framework | |
Slanted Stixels: A way to represent steep streets
Title | Slanted Stixels: A way to represent steep streets |
Authors | Daniel Hernandez-Juarez, Lukas Schneider, Pau Cebrian, Antonio Espinosa, David Vazquez, Antonio M. Lopez, Uwe Franke, Marc Pollefeys, Juan C. Moure |
Abstract | This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a Fully Convolutional Network (FCN), which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01466v1 |
https://arxiv.org/pdf/1910.01466v1.pdf | |
PWC | https://paperswithcode.com/paper/slanted-stixels-a-way-to-represent-steep |
Repo | |
Framework | |