February 2, 2020

3214 words 16 mins read

Paper Group AWR 40

Paper Group AWR 40

Probing Natural Language Inference Models through Semantic Fragments. Facet-Aware Evaluation for Extractive Text Summarization. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction. Practical Lossless Compression with Latent Variables using Bits Back C …

Probing Natural Language Inference Models through Semantic Fragments

Title Probing Natural Language Inference Models through Semantic Fragments
Authors Kyle Richardson, Hai Hu, Lawrence S. Moss, Ashish Sabharwal
Abstract Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are involved in natural language inference (NLI) and go beyond basic linguistic understanding, it is unclear the extent to which they are captured in existing NLI benchmarks and effectively learned by models. To investigate this, we propose the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models. This approach to creating challenge datasets allows direct control over the semantic diversity and complexity of the targeted linguistic phenomena, and results in a more precise characterization of a model’s linguistic behavior. Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new fragments, even though the phenomena probed here are central to the NLI task. (b) On the other hand, with only a few minutes of additional fine-tuning—with a carefully selected learning rate and a novel variation of “inoculation”—a BERT-based model can master all of these logic and monotonicity fragments while retaining its performance on established NLI benchmarks.
Tasks Natural Language Inference
Published 2019-09-16
URL https://arxiv.org/abs/1909.07521v2
PDF https://arxiv.org/pdf/1909.07521v2.pdf
PWC https://paperswithcode.com/paper/probing-natural-language-inference-models
Repo https://github.com/yakazimir/semantic_fragments
Framework pytorch

Facet-Aware Evaluation for Extractive Text Summarization

Title Facet-Aware Evaluation for Extractive Text Summarization
Authors Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han
Abstract Commonly adopted metrics for extractive text summarization like ROUGE focus on the lexical similarity and are facet-agnostic. In this paper, we present a facet-aware evaluation procedure for better assessment of the information coverage in extracted summaries while still supporting automatic evaluation once annotated. Specifically, we treat \textit{facet} instead of \textit{token} as the basic unit for evaluation, manually annotate the \textit{support sentences} for each facet, and directly evaluate extractive methods by comparing the indices of extracted sentences with support sentences. We demonstrate the benefits of the proposed setup by performing a thorough \textit{quantitative} investigation on the CNN/Daily Mail dataset, which in the meantime reveals useful insights of state-of-the-art summarization methods.\footnote{Data can be found at \url{https://github.com/morningmoni/FAR}.
Tasks Text Summarization
Published 2019-08-27
URL https://arxiv.org/abs/1908.10383v1
PDF https://arxiv.org/pdf/1908.10383v1.pdf
PWC https://paperswithcode.com/paper/facet-aware-evaluation-for-extractive-text
Repo https://github.com/morningmoni/FAR
Framework none

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Title SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
Authors Siddharth Reddy, Anca D. Dragan, Sergey Levine
Abstract Learning to imitate expert behavior from demonstrations can be challenging, especially in environments with high-dimensional, continuous observations and unknown dynamics. Supervised learning methods based on behavioral cloning (BC) suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. Recent methods based on reinforcement learning (RL), such as inverse RL and generative adversarial imitation learning (GAIL), overcome this issue by training an RL agent to match the demonstrations over a long horizon. Since the true reward function for the task is unknown, these methods learn a reward function from the demonstrations, often using complex and brittle approximation techniques that involve adversarial training. We propose a simple alternative that still uses RL, but does not require learning a reward function. The key idea is to provide the agent with an incentive to match the demonstrations over a long horizon, by encouraging it to return to demonstrated states upon encountering new, out-of-distribution states. We accomplish this by giving the agent a constant reward of r=+1 for matching the demonstrated action in a demonstrated state, and a constant reward of r=0 for all other behavior. Our method, which we call soft Q imitation learning (SQIL), can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm. Theoretically, we show that SQIL can be interpreted as a regularized variant of BC that uses a sparsity prior to encourage long-horizon imitation. Empirically, we show that SQIL outperforms BC and achieves competitive results compared to GAIL, on a variety of image-based and low-dimensional tasks in Box2D, Atari, and MuJoCo.
Tasks Imitation Learning, Q-Learning
Published 2019-05-27
URL https://arxiv.org/abs/1905.11108v3
PDF https://arxiv.org/pdf/1905.11108v3.pdf
PWC https://paperswithcode.com/paper/sqil-imitation-learning-via-regularized
Repo https://github.com/dnishio/DSAC
Framework none

Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction

Title Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction
Authors Yi Wei, Shaohui Liu, Wang Zhao, Jiwen Lu, Jie Zhou
Abstract In this paper, we present a new perspective towards image-based shape generation. Most existing deep learning based shape reconstruction methods employ a single-view deterministic model which is sometimes insufficient to determine a single groundtruth shape because the back part is occluded. In this work, we first introduce a conditional generative network to model the uncertainty for single-view reconstruction. Then, we formulate the task of multi-view reconstruction as taking the intersection of the predicted shape spaces on each single image. We design new differentiable guidance including the front constraint, the diversity constraint, and the consistency loss to enable effective single-view conditional generation and multi-view synthesis. Experimental results and ablation studies show that our proposed approach outperforms state-of-the-art methods on 3D reconstruction test error and demonstrate its generalization ability on real world data.
Tasks 3D Reconstruction
Published 2019-04-14
URL http://arxiv.org/abs/1904.06699v2
PDF http://arxiv.org/pdf/1904.06699v2.pdf
PWC https://paperswithcode.com/paper/conditional-single-view-shape-generation-for
Repo https://github.com/weiyithu/OptimizeMVS
Framework tf

Practical Lossless Compression with Latent Variables using Bits Back Coding

Title Practical Lossless Compression with Latent Variables using Bits Back Coding
Authors James Townsend, Tom Bird, David Barber
Abstract Deep latent variable models have seen recent success in many data domains. Lossless compression is an application of these models which, despite having the potential to be highly useful, has yet to be implemented in a practical manner. We present `Bits Back with ANS’ (BB-ANS), a scheme to perform lossless compression with latent variable models at a near optimal rate. We demonstrate this scheme by using it to compress the MNIST dataset with a variational auto-encoder model (VAE), achieving compression rates superior to standard methods with only a simple VAE. Given that the scheme is highly amenable to parallelization, we conclude that with a sufficiently high quality generative model this scheme could be used to achieve substantial improvements in compression rate with acceptable running time. We make our implementation available open source at https://github.com/bits-back/bits-back . |
Tasks Latent Variable Models
Published 2019-01-15
URL http://arxiv.org/abs/1901.04866v1
PDF http://arxiv.org/pdf/1901.04866v1.pdf
PWC https://paperswithcode.com/paper/practical-lossless-compression-with-latent
Repo https://github.com/fhkingma/bitswap
Framework pytorch

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

Title A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Authors Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-wen Lin, Qi Tian
Abstract Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description. Most existing REC methods follow a multi-stage pipeline, which are computationally expensive and greatly limit the application of REC. In this paper, we propose a one-stage model towards real-time REC, termed Real-time Global Inference Network (RealGIN). RealGIN addresses the diversity and complexity issues in REC with two innovative designs: the Adaptive Feature Selection (AFS) and the Global Attentive ReAsoNing unit (GARAN). AFS adaptively fuses features at different semantic levels to handle the varying content of expressions. GARAN uses the textual feature as a pivot to collect expression-related visual information from all regions, and thenselectively diffuse such information back to all regions, which provides sufficient context for modeling the complex linguistic conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIt and Flickr30k, the proposed RealGIN outperforms most prior works and achieves very competitive performances against the most advanced method, i.e., MAttNet. Most importantly, under the same hardware, RealGIN can boost the processing speed by about 10 times over the existing methods.
Tasks Feature Selection
Published 2019-12-07
URL https://arxiv.org/abs/1912.03478v1
PDF https://arxiv.org/pdf/1912.03478v1.pdf
PWC https://paperswithcode.com/paper/a-real-time-global-inference-network-for-one
Repo https://github.com/luogen1996/Real-time-Global-Inference-Network
Framework pytorch

Improving Missing Data Imputation with Deep Generative Models

Title Improving Missing Data Imputation with Deep Generative Models
Authors Ramiro D. Camino, Christian A. Hammerschmidt, Radu State
Abstract Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear which method is preferable for different use cases. The goal of this work is twofold: we present a comparison between missing data imputation solutions based on deep generative models, and we propose improvements over those methodologies. We run our experiments using known real life datasets with different characteristics, removing values at random and reconstructing them with several imputation techniques. Our results show that the presence or absence of categorical variables can alter the selection of the best model, and that some models are more stable than others after similar runs with different random number generator seeds.
Tasks Imputation
Published 2019-02-27
URL http://arxiv.org/abs/1902.10666v1
PDF http://arxiv.org/pdf/1902.10666v1.pdf
PWC https://paperswithcode.com/paper/improving-missing-data-imputation-with-deep
Repo https://github.com/rcamino/multi-categorical-gans
Framework pytorch

Aggregation Cross-Entropy for Sequence Recognition

Title Aggregation Cross-Entropy for Sequence Recognition
Authors Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie
Abstract In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem. The code is publicly available at https://github.com/summerlvsong/Aggregation-Cross-Entropy.
Tasks
Published 2019-04-17
URL http://arxiv.org/abs/1904.08364v2
PDF http://arxiv.org/pdf/1904.08364v2.pdf
PWC https://paperswithcode.com/paper/aggregation-cross-entropy-for-sequence
Repo https://github.com/summerlvsong/Aggregation-Cross-Entropy
Framework pytorch

A Degeneracy Framework for Scalable Graph Autoencoders

Title A Degeneracy Framework for Scalable Graph Autoencoders
Authors Guillaume Salha, Romain Hennequin, Viet Anh Tran, Michalis Vazirgiannis
Abstract In this paper, we present a general framework to scale graph autoencoders (AE) and graph variational autoencoders (VAE). This framework leverages graph degeneracy concepts to train models only from a dense subset of nodes instead of using the entire graph. Together with a simple yet effective propagation mechanism, our approach significantly improves scalability and training speed while preserving performance. We evaluate and discuss our method on several variants of existing graph AE and VAE, providing the first application of these models to large graphs with up to millions of nodes and edges. We achieve empirically competitive results w.r.t. several popular scalable node embedding methods, which emphasizes the relevance of pursuing further research towards more scalable graph AE and VAE.
Tasks
Published 2019-02-23
URL https://arxiv.org/abs/1902.08813v2
PDF https://arxiv.org/pdf/1902.08813v2.pdf
PWC https://paperswithcode.com/paper/a-degeneracy-framework-for-scalable-graph
Repo https://github.com/deezer/linear_graph_autoencoders
Framework tf

A nonparametric framework for inferring orders of categorical data from category-real ordered pairs

Title A nonparametric framework for inferring orders of categorical data from category-real ordered pairs
Authors Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong
Abstract Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode $A$ instead of $B$ to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using Estimation statistics framework. Not only reporting whether an order of categories exists, but our framework also reports the magnitude of difference of each consecutive pairs of categories in the order. In large dataset, our framework is scalable well compared with the existing framework. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes based on information of 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices based on 1060 companies’ closing prices of NASDAQ stock markets between years 2000 and 2016. The results of careers ordering show income inequality among different careers. The stock market results illustrate dynamics of sector domination that can change over time. Our approach is able to be applied in any research area that has category-real ordered pairs. Our proposed “Dominant-Distribution Network” provides a novel approach to gain new insight of analyzing category orders. The software of this framework is available for researchers or practitioners within R package: EDOIF.
Tasks
Published 2019-11-15
URL https://arxiv.org/abs/1911.06723v1
PDF https://arxiv.org/pdf/1911.06723v1.pdf
PWC https://paperswithcode.com/paper/a-nonparametric-framework-for-inferring
Repo https://github.com/DarkEyes/EDOIF
Framework none

XGBoostLSS – An extension of XGBoost to probabilistic forecasting

Title XGBoostLSS – An extension of XGBoost to probabilistic forecasting
Authors Alexander März
Abstract We propose a new framework of XGBoost that predicts the entire conditional distribution of a univariate response variable. In particular, XGBoostLSS models all moments of a parametric distribution (i.e., mean, location, scale and shape [LSS]) instead of the conditional mean only. Choosing from a wide range of continuous, discrete and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of XGBoost, as it allows to gain additional insight into the data generating process, as well as to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived. We present both a simulation study and real world examples that demonstrate the virtues of our approach.
Tasks
Published 2019-07-06
URL https://arxiv.org/abs/1907.03178v4
PDF https://arxiv.org/pdf/1907.03178v4.pdf
PWC https://paperswithcode.com/paper/xgboostlss-an-extension-of-xgboost-to
Repo https://github.com/StatMixedML/XGBoostLSS
Framework none

Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

Title Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression
Authors Deeksha Adil, Richard Peng, Sushant Sachdeva
Abstract Linear regression in $\ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $\ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p \in [2,\infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+\varepsilon)$-approximate solution in $O(p^{3.5} m^{\frac{p-2}{2(p-1)}} \log \frac{m}{\varepsilon}) \le O_p(\sqrt{m} \log \frac{m}{\varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10–50x, and is the fastest among available implementations in the high-accuracy regime.
Tasks
Published 2019-07-16
URL https://arxiv.org/abs/1907.07167v2
PDF https://arxiv.org/pdf/1907.07167v2.pdf
PWC https://paperswithcode.com/paper/fast-provably-convergent-irls-algorithm-for-p
Repo https://github.com/utoronto-theory/pIRLS
Framework none

Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS

Title Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS
Authors Masataro Asai
Abstract Recent work on Neural-Symbolic systems that learn the discrete planning model from images has opened a promising direction for expanding the scope of Automated Planning and Scheduling to the raw, noisy data. However, previous work only partially addressed this problem, utilizing the black-box neural model as the successor generator. In this work, we propose Double-Stage Action Model Acquisition (DSAMA), a system that obtains a descriptive PDDL action model with explicit preconditions and effects over the propositional variables unsupervized-learned from images. DSAMA trains a set of Random Forest rule-based classifiers and compiles them into logical formulae in PDDL. While we obtained a competitively accurate PDDL model compared to a black-box model, we observed that the resulting PDDL is too large and complex for the state-of-the-art standard planners such as Fast Downward primarily due to the PDDL-SAS+ translator bottleneck. From this negative result, we argue that this translator bottleneck cannot be addressed just by using a different, existing rule-based learning method, and we point to the potential future directions.
Tasks
Published 2019-12-11
URL https://arxiv.org/abs/1912.05492v1
PDF https://arxiv.org/pdf/1912.05492v1.pdf
PWC https://paperswithcode.com/paper/neural-symbolic-descriptive-action-model-from
Repo https://github.com/guicho271828/dsama
Framework none

Use What You Have: Video Retrieval Using Representations From Collaborative Experts

Title Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Authors Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman
Abstract The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild’ vary a lot in terms of degree of specificity, with some queries describing specific details such as the names of famous identities, content from speech, or text available on the screen. Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended. For this we exploit existing knowledge in the form of pre-trained semantic embeddings which include ‘general’ features such as motion, appearance, and scene features from visual content. We also explore the use of more ‘specific’ cues from ASR and OCR which are intermittently available for videos and find that these signals remain challenging to use effectively for retrieval. We propose a collaborative experts model to aggregate information from these different pre-trained experts and assess our approach empirically on five retrieval benchmarks: MSR-VTT, LSMDC, MSVD, DiDeMo, and ActivityNet. Code and data can be found at www.robots.ox.ac.uk/~vgg/research/collaborative-experts/. This paper contains a correction to results reported in the previous version. |
Tasks Video Retrieval
Published 2019-07-31
URL https://arxiv.org/abs/1907.13487v2
PDF https://arxiv.org/pdf/1907.13487v2.pdf
PWC https://paperswithcode.com/paper/use-what-you-have-video-retrieval-using
Repo https://github.com/albanie/collaborative-experts
Framework pytorch

Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation

Title Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation
Authors Yingjing Lu, Runde Yang
Abstract Self-explaining models are models that reveal decision making parameters in an interpretable manner so that the model reasoning process can be directly understood by human beings. General Linear Models (GLMs) are self-explaining because the model weights directly show how each feature contributes to the output value. However, deep neural networks (DNNs) are in general not self-explaining due to the non-linearity of the activation functions, complex architectures, obscure feature extraction and transformation process. In this work, we illustrate the fact that existing deep architectures are hard to interpret because each hidden layer carries a mix of low level features and high level features. As a solution, we propose a novel feature leveling architecture that isolates low level features from high level features on a per-layer basis to better utilize the GLM layer in the proposed architecture for interpretation. Experimental results show that our modified models are able to achieve competitive results comparing to main-stream architectures on standard datasets while being more self-explainable. Our implementations and configurations are publicly available for reproductions
Tasks Decision Making
Published 2019-05-24
URL https://arxiv.org/abs/1905.10009v2
PDF https://arxiv.org/pdf/1905.10009v2.pdf
PWC https://paperswithcode.com/paper/not-all-features-are-equal-feature-leveling
Repo https://github.com/YingjingLu/FLNN
Framework tf
comments powered by Disqus