February 2, 2020

3214 words 16 mins read

Paper Group AWR 40

Probing Natural Language Inference Models through Semantic Fragments. Facet-Aware Evaluation for Extractive Text Summarization. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction. Practical Lossless Compression with Latent Variables using Bits Back C …

Probing Natural Language Inference Models through Semantic Fragments


Title	Probing Natural Language Inference Models through Semantic Fragments
Authors	Kyle Richardson, Hai Hu, Lawrence S. Moss, Ashish Sabharwal
Abstract	Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are involved in natural language inference (NLI) and go beyond basic linguistic understanding, it is unclear the extent to which they are captured in existing NLI benchmarks and effectively learned by models. To investigate this, we propose the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models. This approach to creating challenge datasets allows direct control over the semantic diversity and complexity of the targeted linguistic phenomena, and results in a more precise characterization of a model’s linguistic behavior. Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new fragments, even though the phenomena probed here are central to the NLI task. (b) On the other hand, with only a few minutes of additional fine-tuning—with a carefully selected learning rate and a novel variation of “inoculation”—a BERT-based model can master all of these logic and monotonicity fragments while retaining its performance on established NLI benchmarks.
Tasks	Natural Language Inference
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07521v2
PDF	https://arxiv.org/pdf/1909.07521v2.pdf
PWC	https://paperswithcode.com/paper/probing-natural-language-inference-models
Repo	https://github.com/yakazimir/semantic_fragments
Framework	pytorch

Facet-Aware Evaluation for Extractive Text Summarization


Title	Facet-Aware Evaluation for Extractive Text Summarization
Authors	Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han
Abstract	Commonly adopted metrics for extractive text summarization like ROUGE focus on the lexical similarity and are facet-agnostic. In this paper, we present a facet-aware evaluation procedure for better assessment of the information coverage in extracted summaries while still supporting automatic evaluation once annotated. Specifically, we treat \textit{facet} instead of \textit{token} as the basic unit for evaluation, manually annotate the \textit{support sentences} for each facet, and directly evaluate extractive methods by comparing the indices of extracted sentences with support sentences. We demonstrate the benefits of the proposed setup by performing a thorough \textit{quantitative} investigation on the CNN/Daily Mail dataset, which in the meantime reveals useful insights of state-of-the-art summarization methods.\footnote{Data can be found at \url{https://github.com/morningmoni/FAR}.
Tasks	Text Summarization
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10383v1
PDF	https://arxiv.org/pdf/1908.10383v1.pdf
PWC	https://paperswithcode.com/paper/facet-aware-evaluation-for-extractive-text
Repo	https://github.com/morningmoni/FAR
Framework	none

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards


Title	SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
Authors	Siddharth Reddy, Anca D. Dragan, Sergey Levine
Abstract	Learning to imitate expert behavior from demonstrations can be challenging, especially in environments with high-dimensional, continuous observations and unknown dynamics. Supervised learning methods based on behavioral cloning (BC) suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. Recent methods based on reinforcement learning (RL), such as inverse RL and generative adversarial imitation learning (GAIL), overcome this issue by training an RL agent to match the demonstrations over a long horizon. Since the true reward function for the task is unknown, these methods learn a reward function from the demonstrations, often using complex and brittle approximation techniques that involve adversarial training. We propose a simple alternative that still uses RL, but does not require learning a reward function. The key idea is to provide the agent with an incentive to match the demonstrations over a long horizon, by encouraging it to return to demonstrated states upon encountering new, out-of-distribution states. We accomplish this by giving the agent a constant reward of r=+1 for matching the demonstrated action in a demonstrated state, and a constant reward of r=0 for all other behavior. Our method, which we call soft Q imitation learning (SQIL), can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm. Theoretically, we show that SQIL can be interpreted as a regularized variant of BC that uses a sparsity prior to encourage long-horizon imitation. Empirically, we show that SQIL outperforms BC and achieves competitive results compared to GAIL, on a variety of image-based and low-dimensional tasks in Box2D, Atari, and MuJoCo.
Tasks	Imitation Learning, Q-Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11108v3
PDF	https://arxiv.org/pdf/1905.11108v3.pdf
PWC	https://paperswithcode.com/paper/sqil-imitation-learning-via-regularized
Repo	https://github.com/dnishio/DSAC
Framework	none

Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction


Title	Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction
Authors	Yi Wei, Shaohui Liu, Wang Zhao, Jiwen Lu, Jie Zhou
Abstract	In this paper, we present a new perspective towards image-based shape generation. Most existing deep learning based shape reconstruction methods employ a single-view deterministic model which is sometimes insufficient to determine a single groundtruth shape because the back part is occluded. In this work, we first introduce a conditional generative network to model the uncertainty for single-view reconstruction. Then, we formulate the task of multi-view reconstruction as taking the intersection of the predicted shape spaces on each single image. We design new differentiable guidance including the front constraint, the diversity constraint, and the consistency loss to enable effective single-view conditional generation and multi-view synthesis. Experimental results and ablation studies show that our proposed approach outperforms state-of-the-art methods on 3D reconstruction test error and demonstrate its generalization ability on real world data.
Tasks	3D Reconstruction
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06699v2
PDF	http://arxiv.org/pdf/1904.06699v2.pdf
PWC	https://paperswithcode.com/paper/conditional-single-view-shape-generation-for
Repo	https://github.com/weiyithu/OptimizeMVS
Framework	tf

Practical Lossless Compression with Latent Variables using Bits Back Coding


Title	Practical Lossless Compression with Latent Variables using Bits Back Coding
Authors	James Townsend, Tom Bird, David Barber
Abstract	Deep latent variable models have seen recent success in many data domains. Lossless compression is an application of these models which, despite having the potential to be highly useful, has yet to be implemented in a practical manner. We present `Bits Back with ANS’ (BB-ANS), a scheme to perform lossless compression with latent variable models at a near optimal rate. We demonstrate this scheme by using it to compress the MNIST dataset with a variational auto-encoder model (VAE), achieving compression rates superior to standard methods with only a simple VAE. Given that the scheme is highly amenable to parallelization, we conclude that with a sufficiently high quality generative model this scheme could be used to achieve substantial improvements in compression rate with acceptable running time. We make our implementation available open source at https://github.com/bits-back/bits-back . \|
Tasks	Latent Variable Models
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04866v1
PDF	http://arxiv.org/pdf/1901.04866v1.pdf
PWC	https://paperswithcode.com/paper/practical-lossless-compression-with-latent
Repo	https://github.com/fhkingma/bitswap
Framework	pytorch

A Real-time Global Inference Network for One-stage Referring Expression Comprehension


Title	A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Authors	Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-wen Lin, Qi Tian
Abstract	Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description. Most existing REC methods follow a multi-stage pipeline, which are computationally expensive and greatly limit the application of REC. In this paper, we propose a one-stage model towards real-time REC, termed Real-time Global Inference Network (RealGIN). RealGIN addresses the diversity and complexity issues in REC with two innovative designs: the Adaptive Feature Selection (AFS) and the Global Attentive ReAsoNing unit (GARAN). AFS adaptively fuses features at different semantic levels to handle the varying content of expressions. GARAN uses the textual feature as a pivot to collect expression-related visual information from all regions, and thenselectively diffuse such information back to all regions, which provides sufficient context for modeling the complex linguistic conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIt and Flickr30k, the proposed RealGIN outperforms most prior works and achieves very competitive performances against the most advanced method, i.e., MAttNet. Most importantly, under the same hardware, RealGIN can boost the processing speed by about 10 times over the existing methods.
Tasks	Feature Selection
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03478v1
PDF	https://arxiv.org/pdf/1912.03478v1.pdf
PWC	https://paperswithcode.com/paper/a-real-time-global-inference-network-for-one
Repo	https://github.com/luogen1996/Real-time-Global-Inference-Network
Framework	pytorch

Improving Missing Data Imputation with Deep Generative Models


Title	Improving Missing Data Imputation with Deep Generative Models
Authors	Ramiro D. Camino, Christian A. Hammerschmidt, Radu State
Abstract	Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear which method is preferable for different use cases. The goal of this work is twofold: we present a comparison between missing data imputation solutions based on deep generative models, and we propose improvements over those methodologies. We run our experiments using known real life datasets with different characteristics, removing values at random and reconstructing them with several imputation techniques. Our results show that the presence or absence of categorical variables can alter the selection of the best model, and that some models are more stable than others after similar runs with different random number generator seeds.
Tasks	Imputation
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10666v1
PDF	http://arxiv.org/pdf/1902.10666v1.pdf
PWC	https://paperswithcode.com/paper/improving-missing-data-imputation-with-deep
Repo	https://github.com/rcamino/multi-categorical-gans
Framework	pytorch

Aggregation Cross-Entropy for Sequence Recognition


Title	Aggregation Cross-Entropy for Sequence Recognition
Authors	Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie
Abstract	In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem. The code is publicly available at https://github.com/summerlvsong/Aggregation-Cross-Entropy.
Tasks
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08364v2
PDF	http://arxiv.org/pdf/1904.08364v2.pdf
PWC	https://paperswithcode.com/paper/aggregation-cross-entropy-for-sequence
Repo	https://github.com/summerlvsong/Aggregation-Cross-Entropy
Framework	pytorch

A Degeneracy Framework for Scalable Graph Autoencoders


Title	A Degeneracy Framework for Scalable Graph Autoencoders
Authors	Guillaume Salha, Romain Hennequin, Viet Anh Tran, Michalis Vazirgiannis
Abstract	In this paper, we present a general framework to scale graph autoencoders (AE) and graph variational autoencoders (VAE). This framework leverages graph degeneracy concepts to train models only from a dense subset of nodes instead of using the entire graph. Together with a simple yet effective propagation mechanism, our approach significantly improves scalability and training speed while preserving performance. We evaluate and discuss our method on several variants of existing graph AE and VAE, providing the first application of these models to large graphs with up to millions of nodes and edges. We achieve empirically competitive results w.r.t. several popular scalable node embedding methods, which emphasizes the relevance of pursuing further research towards more scalable graph AE and VAE.
Tasks
Published	2019-02-23
URL	https://arxiv.org/abs/1902.08813v2
PDF	https://arxiv.org/pdf/1902.08813v2.pdf
PWC	https://paperswithcode.com/paper/a-degeneracy-framework-for-scalable-graph
Repo	https://github.com/deezer/linear_graph_autoencoders
Framework	tf

A nonparametric framework for inferring orders of categorical data from category-real ordered pairs


Title	A nonparametric framework for inferring orders of categorical data from category-real ordered pairs
Authors	Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong
Abstract	Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode $A$ instead of $B$ to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using Estimation statistics framework. Not only reporting whether an order of categories exists, but our framework also reports the magnitude of difference of each consecutive pairs of categories in the order. In large dataset, our framework is scalable well compared with the existing framework. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes based on information of 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices based on 1060 companies’ closing prices of NASDAQ stock markets between years 2000 and 2016. The results of careers ordering show income inequality among different careers. The stock market results illustrate dynamics of sector domination that can change over time. Our approach is able to be applied in any research area that has category-real ordered pairs. Our proposed “Dominant-Distribution Network” provides a novel approach to gain new insight of analyzing category orders. The software of this framework is available for researchers or practitioners within R package: EDOIF.
Tasks
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06723v1
PDF	https://arxiv.org/pdf/1911.06723v1.pdf
PWC	https://paperswithcode.com/paper/a-nonparametric-framework-for-inferring
Repo	https://github.com/DarkEyes/EDOIF
Framework	none

XGBoostLSS – An extension of XGBoost to probabilistic forecasting


Title	XGBoostLSS – An extension of XGBoost to probabilistic forecasting
Authors	Alexander März
Abstract	We propose a new framework of XGBoost that predicts the entire conditional distribution of a univariate response variable. In particular, XGBoostLSS models all moments of a parametric distribution (i.e., mean, location, scale and shape [LSS]) instead of the conditional mean only. Choosing from a wide range of continuous, discrete and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of XGBoost, as it allows to gain additional insight into the data generating process, as well as to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived. We present both a simulation study and real world examples that demonstrate the virtues of our approach.
Tasks
Published	2019-07-06
URL	https://arxiv.org/abs/1907.03178v4
PDF	https://arxiv.org/pdf/1907.03178v4.pdf
PWC	https://paperswithcode.com/paper/xgboostlss-an-extension-of-xgboost-to
Repo	https://github.com/StatMixedML/XGBoostLSS
Framework	none

Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression


Title	Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression
Authors	Deeksha Adil, Richard Peng, Sushant Sachdeva
Abstract	Linear regression in $\ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $\ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p \in [2,\infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+\varepsilon)$-approximate solution in $O(p^{3.5} m^{\frac{p-2}{2(p-1)}} \log \frac{m}{\varepsilon}) \le O_p(\sqrt{m} \log \frac{m}{\varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10–50x, and is the fastest among available implementations in the high-accuracy regime.
Tasks
Published	2019-07-16
URL	https://arxiv.org/abs/1907.07167v2
PDF	https://arxiv.org/pdf/1907.07167v2.pdf
PWC	https://paperswithcode.com/paper/fast-provably-convergent-irls-algorithm-for-p
Repo	https://github.com/utoronto-theory/pIRLS
Framework	none

Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS


Title	Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS
Authors	Masataro Asai
Abstract	Recent work on Neural-Symbolic systems that learn the discrete planning model from images has opened a promising direction for expanding the scope of Automated Planning and Scheduling to the raw, noisy data. However, previous work only partially addressed this problem, utilizing the black-box neural model as the successor generator. In this work, we propose Double-Stage Action Model Acquisition (DSAMA), a system that obtains a descriptive PDDL action model with explicit preconditions and effects over the propositional variables unsupervized-learned from images. DSAMA trains a set of Random Forest rule-based classifiers and compiles them into logical formulae in PDDL. While we obtained a competitively accurate PDDL model compared to a black-box model, we observed that the resulting PDDL is too large and complex for the state-of-the-art standard planners such as Fast Downward primarily due to the PDDL-SAS+ translator bottleneck. From this negative result, we argue that this translator bottleneck cannot be addressed just by using a different, existing rule-based learning method, and we point to the potential future directions.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05492v1
PDF	https://arxiv.org/pdf/1912.05492v1.pdf
PWC	https://paperswithcode.com/paper/neural-symbolic-descriptive-action-model-from
Repo	https://github.com/guicho271828/dsama
Framework	none

Use What You Have: Video Retrieval Using Representations From Collaborative Experts


Title	Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Authors	Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman
Abstract	The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild’ vary a lot in terms of degree of specificity, with some queries describing specific details such as the names of famous identities, content from speech, or text available on the screen. Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended. For this we exploit existing knowledge in the form of pre-trained semantic embeddings which include ‘general’ features such as motion, appearance, and scene features from visual content. We also explore the use of more ‘specific’ cues from ASR and OCR which are intermittently available for videos and find that these signals remain challenging to use effectively for retrieval. We propose a collaborative experts model to aggregate information from these different pre-trained experts and assess our approach empirically on five retrieval benchmarks: MSR-VTT, LSMDC, MSVD, DiDeMo, and ActivityNet. Code and data can be found at www.robots.ox.ac.uk/~vgg/research/collaborative-experts/. This paper contains a correction to results reported in the previous version. \|
Tasks	Video Retrieval
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13487v2
PDF	https://arxiv.org/pdf/1907.13487v2.pdf
PWC	https://paperswithcode.com/paper/use-what-you-have-video-retrieval-using
Repo	https://github.com/albanie/collaborative-experts
Framework	pytorch

Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation


Title	Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation
Authors	Yingjing Lu, Runde Yang
Abstract	Self-explaining models are models that reveal decision making parameters in an interpretable manner so that the model reasoning process can be directly understood by human beings. General Linear Models (GLMs) are self-explaining because the model weights directly show how each feature contributes to the output value. However, deep neural networks (DNNs) are in general not self-explaining due to the non-linearity of the activation functions, complex architectures, obscure feature extraction and transformation process. In this work, we illustrate the fact that existing deep architectures are hard to interpret because each hidden layer carries a mix of low level features and high level features. As a solution, we propose a novel feature leveling architecture that isolates low level features from high level features on a per-layer basis to better utilize the GLM layer in the proposed architecture for interpretation. Experimental results show that our modified models are able to achieve competitive results comparing to main-stream architectures on standard datasets while being more self-explainable. Our implementations and configurations are publicly available for reproductions
Tasks	Decision Making
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10009v2
PDF	https://arxiv.org/pdf/1905.10009v2.pdf
PWC	https://paperswithcode.com/paper/not-all-features-are-equal-feature-leveling
Repo	https://github.com/YingjingLu/FLNN
Framework	tf