Paper Group AWR 40
Probing Natural Language Inference Models through Semantic Fragments. Facet-Aware Evaluation for Extractive Text Summarization. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction. Practical Lossless Compression with Latent Variables using Bits Back C …
Probing Natural Language Inference Models through Semantic Fragments
Title | Probing Natural Language Inference Models through Semantic Fragments |
Authors | Kyle Richardson, Hai Hu, Lawrence S. Moss, Ashish Sabharwal |
Abstract | Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are involved in natural language inference (NLI) and go beyond basic linguistic understanding, it is unclear the extent to which they are captured in existing NLI benchmarks and effectively learned by models. To investigate this, we propose the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models. This approach to creating challenge datasets allows direct control over the semantic diversity and complexity of the targeted linguistic phenomena, and results in a more precise characterization of a model’s linguistic behavior. Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new fragments, even though the phenomena probed here are central to the NLI task. (b) On the other hand, with only a few minutes of additional fine-tuning—with a carefully selected learning rate and a novel variation of “inoculation”—a BERT-based model can master all of these logic and monotonicity fragments while retaining its performance on established NLI benchmarks. |
Tasks | Natural Language Inference |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07521v2 |
https://arxiv.org/pdf/1909.07521v2.pdf | |
PWC | https://paperswithcode.com/paper/probing-natural-language-inference-models |
Repo | https://github.com/yakazimir/semantic_fragments |
Framework | pytorch |
Facet-Aware Evaluation for Extractive Text Summarization
Title | Facet-Aware Evaluation for Extractive Text Summarization |
Authors | Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han |
Abstract | Commonly adopted metrics for extractive text summarization like ROUGE focus on the lexical similarity and are facet-agnostic. In this paper, we present a facet-aware evaluation procedure for better assessment of the information coverage in extracted summaries while still supporting automatic evaluation once annotated. Specifically, we treat \textit{facet} instead of \textit{token} as the basic unit for evaluation, manually annotate the \textit{support sentences} for each facet, and directly evaluate extractive methods by comparing the indices of extracted sentences with support sentences. We demonstrate the benefits of the proposed setup by performing a thorough \textit{quantitative} investigation on the CNN/Daily Mail dataset, which in the meantime reveals useful insights of state-of-the-art summarization methods.\footnote{Data can be found at \url{https://github.com/morningmoni/FAR}. |
Tasks | Text Summarization |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10383v1 |
https://arxiv.org/pdf/1908.10383v1.pdf | |
PWC | https://paperswithcode.com/paper/facet-aware-evaluation-for-extractive-text |
Repo | https://github.com/morningmoni/FAR |
Framework | none |
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
Title | SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards |
Authors | Siddharth Reddy, Anca D. Dragan, Sergey Levine |
Abstract | Learning to imitate expert behavior from demonstrations can be challenging, especially in environments with high-dimensional, continuous observations and unknown dynamics. Supervised learning methods based on behavioral cloning (BC) suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. Recent methods based on reinforcement learning (RL), such as inverse RL and generative adversarial imitation learning (GAIL), overcome this issue by training an RL agent to match the demonstrations over a long horizon. Since the true reward function for the task is unknown, these methods learn a reward function from the demonstrations, often using complex and brittle approximation techniques that involve adversarial training. We propose a simple alternative that still uses RL, but does not require learning a reward function. The key idea is to provide the agent with an incentive to match the demonstrations over a long horizon, by encouraging it to return to demonstrated states upon encountering new, out-of-distribution states. We accomplish this by giving the agent a constant reward of r=+1 for matching the demonstrated action in a demonstrated state, and a constant reward of r=0 for all other behavior. Our method, which we call soft Q imitation learning (SQIL), can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm. Theoretically, we show that SQIL can be interpreted as a regularized variant of BC that uses a sparsity prior to encourage long-horizon imitation. Empirically, we show that SQIL outperforms BC and achieves competitive results compared to GAIL, on a variety of image-based and low-dimensional tasks in Box2D, Atari, and MuJoCo. |
Tasks | Imitation Learning, Q-Learning |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11108v3 |
https://arxiv.org/pdf/1905.11108v3.pdf | |
PWC | https://paperswithcode.com/paper/sqil-imitation-learning-via-regularized |
Repo | https://github.com/dnishio/DSAC |
Framework | none |
Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction
Title | Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction |
Authors | Yi Wei, Shaohui Liu, Wang Zhao, Jiwen Lu, Jie Zhou |
Abstract | In this paper, we present a new perspective towards image-based shape generation. Most existing deep learning based shape reconstruction methods employ a single-view deterministic model which is sometimes insufficient to determine a single groundtruth shape because the back part is occluded. In this work, we first introduce a conditional generative network to model the uncertainty for single-view reconstruction. Then, we formulate the task of multi-view reconstruction as taking the intersection of the predicted shape spaces on each single image. We design new differentiable guidance including the front constraint, the diversity constraint, and the consistency loss to enable effective single-view conditional generation and multi-view synthesis. Experimental results and ablation studies show that our proposed approach outperforms state-of-the-art methods on 3D reconstruction test error and demonstrate its generalization ability on real world data. |
Tasks | 3D Reconstruction |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06699v2 |
http://arxiv.org/pdf/1904.06699v2.pdf | |
PWC | https://paperswithcode.com/paper/conditional-single-view-shape-generation-for |
Repo | https://github.com/weiyithu/OptimizeMVS |
Framework | tf |
Practical Lossless Compression with Latent Variables using Bits Back Coding
Title | Practical Lossless Compression with Latent Variables using Bits Back Coding |
Authors | James Townsend, Tom Bird, David Barber |
Abstract | Deep latent variable models have seen recent success in many data domains. Lossless compression is an application of these models which, despite having the potential to be highly useful, has yet to be implemented in a practical manner. We present `Bits Back with ANS’ (BB-ANS), a scheme to perform lossless compression with latent variable models at a near optimal rate. We demonstrate this scheme by using it to compress the MNIST dataset with a variational auto-encoder model (VAE), achieving compression rates superior to standard methods with only a simple VAE. Given that the scheme is highly amenable to parallelization, we conclude that with a sufficiently high quality generative model this scheme could be used to achieve substantial improvements in compression rate with acceptable running time. We make our implementation available open source at https://github.com/bits-back/bits-back . | |
Tasks | Latent Variable Models |
Published | 2019-01-15 |
URL | http://arxiv.org/abs/1901.04866v1 |
http://arxiv.org/pdf/1901.04866v1.pdf | |
PWC | https://paperswithcode.com/paper/practical-lossless-compression-with-latent |
Repo | https://github.com/fhkingma/bitswap |
Framework | pytorch |
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Title | A Real-time Global Inference Network for One-stage Referring Expression Comprehension |
Authors | Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-wen Lin, Qi Tian |
Abstract | Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description. Most existing REC methods follow a multi-stage pipeline, which are computationally expensive and greatly limit the application of REC. In this paper, we propose a one-stage model towards real-time REC, termed Real-time Global Inference Network (RealGIN). RealGIN addresses the diversity and complexity issues in REC with two innovative designs: the Adaptive Feature Selection (AFS) and the Global Attentive ReAsoNing unit (GARAN). AFS adaptively fuses features at different semantic levels to handle the varying content of expressions. GARAN uses the textual feature as a pivot to collect expression-related visual information from all regions, and thenselectively diffuse such information back to all regions, which provides sufficient context for modeling the complex linguistic conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIt and Flickr30k, the proposed RealGIN outperforms most prior works and achieves very competitive performances against the most advanced method, i.e., MAttNet. Most importantly, under the same hardware, RealGIN can boost the processing speed by about 10 times over the existing methods. |
Tasks | Feature Selection |
Published | 2019-12-07 |
URL | https://arxiv.org/abs/1912.03478v1 |
https://arxiv.org/pdf/1912.03478v1.pdf | |
PWC | https://paperswithcode.com/paper/a-real-time-global-inference-network-for-one |
Repo | https://github.com/luogen1996/Real-time-Global-Inference-Network |
Framework | pytorch |
Improving Missing Data Imputation with Deep Generative Models
Title | Improving Missing Data Imputation with Deep Generative Models |
Authors | Ramiro D. Camino, Christian A. Hammerschmidt, Radu State |
Abstract | Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear which method is preferable for different use cases. The goal of this work is twofold: we present a comparison between missing data imputation solutions based on deep generative models, and we propose improvements over those methodologies. We run our experiments using known real life datasets with different characteristics, removing values at random and reconstructing them with several imputation techniques. Our results show that the presence or absence of categorical variables can alter the selection of the best model, and that some models are more stable than others after similar runs with different random number generator seeds. |
Tasks | Imputation |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10666v1 |
http://arxiv.org/pdf/1902.10666v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-missing-data-imputation-with-deep |
Repo | https://github.com/rcamino/multi-categorical-gans |
Framework | pytorch |
Aggregation Cross-Entropy for Sequence Recognition
Title | Aggregation Cross-Entropy for Sequence Recognition |
Authors | Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie |
Abstract | In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem. The code is publicly available at https://github.com/summerlvsong/Aggregation-Cross-Entropy. |
Tasks | |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08364v2 |
http://arxiv.org/pdf/1904.08364v2.pdf | |
PWC | https://paperswithcode.com/paper/aggregation-cross-entropy-for-sequence |
Repo | https://github.com/summerlvsong/Aggregation-Cross-Entropy |
Framework | pytorch |
A Degeneracy Framework for Scalable Graph Autoencoders
Title | A Degeneracy Framework for Scalable Graph Autoencoders |
Authors | Guillaume Salha, Romain Hennequin, Viet Anh Tran, Michalis Vazirgiannis |
Abstract | In this paper, we present a general framework to scale graph autoencoders (AE) and graph variational autoencoders (VAE). This framework leverages graph degeneracy concepts to train models only from a dense subset of nodes instead of using the entire graph. Together with a simple yet effective propagation mechanism, our approach significantly improves scalability and training speed while preserving performance. We evaluate and discuss our method on several variants of existing graph AE and VAE, providing the first application of these models to large graphs with up to millions of nodes and edges. We achieve empirically competitive results w.r.t. several popular scalable node embedding methods, which emphasizes the relevance of pursuing further research towards more scalable graph AE and VAE. |
Tasks | |
Published | 2019-02-23 |
URL | https://arxiv.org/abs/1902.08813v2 |
https://arxiv.org/pdf/1902.08813v2.pdf | |
PWC | https://paperswithcode.com/paper/a-degeneracy-framework-for-scalable-graph |
Repo | https://github.com/deezer/linear_graph_autoencoders |
Framework | tf |
A nonparametric framework for inferring orders of categorical data from category-real ordered pairs
Title | A nonparametric framework for inferring orders of categorical data from category-real ordered pairs |
Authors | Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong |
Abstract | Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode $A$ instead of $B$ to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using Estimation statistics framework. Not only reporting whether an order of categories exists, but our framework also reports the magnitude of difference of each consecutive pairs of categories in the order. In large dataset, our framework is scalable well compared with the existing framework. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes based on information of 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices based on 1060 companies’ closing prices of NASDAQ stock markets between years 2000 and 2016. The results of careers ordering show income inequality among different careers. The stock market results illustrate dynamics of sector domination that can change over time. Our approach is able to be applied in any research area that has category-real ordered pairs. Our proposed “Dominant-Distribution Network” provides a novel approach to gain new insight of analyzing category orders. The software of this framework is available for researchers or practitioners within R package: EDOIF. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06723v1 |
https://arxiv.org/pdf/1911.06723v1.pdf | |
PWC | https://paperswithcode.com/paper/a-nonparametric-framework-for-inferring |
Repo | https://github.com/DarkEyes/EDOIF |
Framework | none |
XGBoostLSS – An extension of XGBoost to probabilistic forecasting
Title | XGBoostLSS – An extension of XGBoost to probabilistic forecasting |
Authors | Alexander März |
Abstract | We propose a new framework of XGBoost that predicts the entire conditional distribution of a univariate response variable. In particular, XGBoostLSS models all moments of a parametric distribution (i.e., mean, location, scale and shape [LSS]) instead of the conditional mean only. Choosing from a wide range of continuous, discrete and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of XGBoost, as it allows to gain additional insight into the data generating process, as well as to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived. We present both a simulation study and real world examples that demonstrate the virtues of our approach. |
Tasks | |
Published | 2019-07-06 |
URL | https://arxiv.org/abs/1907.03178v4 |
https://arxiv.org/pdf/1907.03178v4.pdf | |
PWC | https://paperswithcode.com/paper/xgboostlss-an-extension-of-xgboost-to |
Repo | https://github.com/StatMixedML/XGBoostLSS |
Framework | none |
Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression
Title | Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression |
Authors | Deeksha Adil, Richard Peng, Sushant Sachdeva |
Abstract | Linear regression in $\ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $\ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p \in [2,\infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+\varepsilon)$-approximate solution in $O(p^{3.5} m^{\frac{p-2}{2(p-1)}} \log \frac{m}{\varepsilon}) \le O_p(\sqrt{m} \log \frac{m}{\varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10–50x, and is the fastest among available implementations in the high-accuracy regime. |
Tasks | |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.07167v2 |
https://arxiv.org/pdf/1907.07167v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-provably-convergent-irls-algorithm-for-p |
Repo | https://github.com/utoronto-theory/pIRLS |
Framework | none |
Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS
Title | Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS |
Authors | Masataro Asai |
Abstract | Recent work on Neural-Symbolic systems that learn the discrete planning model from images has opened a promising direction for expanding the scope of Automated Planning and Scheduling to the raw, noisy data. However, previous work only partially addressed this problem, utilizing the black-box neural model as the successor generator. In this work, we propose Double-Stage Action Model Acquisition (DSAMA), a system that obtains a descriptive PDDL action model with explicit preconditions and effects over the propositional variables unsupervized-learned from images. DSAMA trains a set of Random Forest rule-based classifiers and compiles them into logical formulae in PDDL. While we obtained a competitively accurate PDDL model compared to a black-box model, we observed that the resulting PDDL is too large and complex for the state-of-the-art standard planners such as Fast Downward primarily due to the PDDL-SAS+ translator bottleneck. From this negative result, we argue that this translator bottleneck cannot be addressed just by using a different, existing rule-based learning method, and we point to the potential future directions. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05492v1 |
https://arxiv.org/pdf/1912.05492v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-symbolic-descriptive-action-model-from |
Repo | https://github.com/guicho271828/dsama |
Framework | none |
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Title | Use What You Have: Video Retrieval Using Representations From Collaborative Experts |
Authors | Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman |
Abstract | The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild’ vary a lot in terms of degree of specificity, with some queries describing specific details such as the names of famous identities, content from speech, or text available on the screen. Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended. For this we exploit existing knowledge in the form of pre-trained semantic embeddings which include ‘general’ features such as motion, appearance, and scene features from visual content. We also explore the use of more ‘specific’ cues from ASR and OCR which are intermittently available for videos and find that these signals remain challenging to use effectively for retrieval. We propose a collaborative experts model to aggregate information from these different pre-trained experts and assess our approach empirically on five retrieval benchmarks: MSR-VTT, LSMDC, MSVD, DiDeMo, and ActivityNet. Code and data can be found at www.robots.ox.ac.uk/~vgg/research/collaborative-experts/. This paper contains a correction to results reported in the previous version. | |
Tasks | Video Retrieval |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1907.13487v2 |
https://arxiv.org/pdf/1907.13487v2.pdf | |
PWC | https://paperswithcode.com/paper/use-what-you-have-video-retrieval-using |
Repo | https://github.com/albanie/collaborative-experts |
Framework | pytorch |
Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation
Title | Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation |
Authors | Yingjing Lu, Runde Yang |
Abstract | Self-explaining models are models that reveal decision making parameters in an interpretable manner so that the model reasoning process can be directly understood by human beings. General Linear Models (GLMs) are self-explaining because the model weights directly show how each feature contributes to the output value. However, deep neural networks (DNNs) are in general not self-explaining due to the non-linearity of the activation functions, complex architectures, obscure feature extraction and transformation process. In this work, we illustrate the fact that existing deep architectures are hard to interpret because each hidden layer carries a mix of low level features and high level features. As a solution, we propose a novel feature leveling architecture that isolates low level features from high level features on a per-layer basis to better utilize the GLM layer in the proposed architecture for interpretation. Experimental results show that our modified models are able to achieve competitive results comparing to main-stream architectures on standard datasets while being more self-explainable. Our implementations and configurations are publicly available for reproductions |
Tasks | Decision Making |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10009v2 |
https://arxiv.org/pdf/1905.10009v2.pdf | |
PWC | https://paperswithcode.com/paper/not-all-features-are-equal-feature-leveling |
Repo | https://github.com/YingjingLu/FLNN |
Framework | tf |