Paper Group ANR 716
Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference. Minimal penalties and the slope heuristics: a survey. A Study of the Effect of Resolving Negation and Sentiment Analysis in Recognizing Text Entailment for Arabic. Exploiting Oxide Based Resistive RAM Variability for Bayesian Neural Network Hardware Design. Real …
Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
Title | Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference |
Authors | Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush |
Abstract | Natural Language Inference (NLI) datasets often contain hypothesis-only biases—artifacts that allow models to achieve non-trivial performance without learning whether a premise entails a hypothesis. We propose two probabilistic methods to build models that are more robust to such biases and better transfer across datasets. In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise. We evaluate our methods on synthetic and existing NLI datasets by training on datasets containing biases and testing on datasets containing no (or different) hypothesis-only biases. Our results indicate that these methods can make NLI models more robust to dataset-specific artifacts, transferring better than a baseline architecture in 9 out of 12 NLI datasets. Additionally, we provide an extensive analysis of the interplay of our methods with known biases in NLI datasets, as well as the effects of encouraging models to ignore biases and fine-tuning on target datasets. |
Tasks | Natural Language Inference |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04380v1 |
https://arxiv.org/pdf/1907.04380v1.pdf | |
PWC | https://paperswithcode.com/paper/dont-take-the-premise-for-granted-mitigating |
Repo | |
Framework | |
Minimal penalties and the slope heuristics: a survey
Title | Minimal penalties and the slope heuristics: a survey |
Authors | Sylvain Arlot |
Abstract | Birg{'e} and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized since to some “minimal-penalty algorithms”. This paper reviews the theoretical results obtained for such algorithms, with a self-contained proof in the simplest framework, precise proof ideas for further generalizations, and a few new results. Explicit connections are made with residual-variance estimators-with an original contribution on this topic, showing that for this task the slope heuristics performs almost as well as a residual-based estimator with the best model choice-and some classical algorithms such as L-curve or elbow heuristics, Mallows’ C p , and Akaike’s FPE. Practical issues are also addressed, including two new practical definitions of minimal-penalty algorithms that are compared on synthetic data to previously-proposed definitions. Finally, several conjectures and open problems are suggested as future research directions. |
Tasks | |
Published | 2019-01-22 |
URL | https://arxiv.org/abs/1901.07277v2 |
https://arxiv.org/pdf/1901.07277v2.pdf | |
PWC | https://paperswithcode.com/paper/minimal-penalties-and-the-slope-heuristics-a |
Repo | |
Framework | |
A Study of the Effect of Resolving Negation and Sentiment Analysis in Recognizing Text Entailment for Arabic
Title | A Study of the Effect of Resolving Negation and Sentiment Analysis in Recognizing Text Entailment for Arabic |
Authors | Fatima T. AL-Khawaldeh |
Abstract | Recognizing the entailment relation showed that its influence to extract the semantic inferences in wide-ranging natural language processing domains (text summarization, question answering, etc.) and enhanced the results of their output. For Arabic language, few attempts concerns with Arabic entailment problem. This paper aims to increase the entailment accuracy for Arabic texts by resolving negation of the text-hypothesis pair and determining the polarity of the text-hypothesis pair whether it is Positive, Negative or Neutral. It is noticed that the absence of negation detection feature gives inaccurate results when detecting the entailment relation since the negation revers the truth. The negation words are considered stop words and removed from the text-hypothesis pair which may lead wrong entailment decision. Another case not solved previously, it is impossible that the positive text entails negative text and vice versa. In this paper, in order to classify the text-hypothesis pair polarity, a sentiment analysis tool is used. We show that analyzing the polarity of the text-hypothesis pair increases the entailment accuracy. to evaluate our approach we used a dataset for Arabic textual entailment (ArbTEDS) consisted of 618 text-hypothesis pairs and showed that the Arabic entailment accuracy is increased by resolving negation for entailment relation and analyzing the polarity of the text-hypothesis pair. |
Tasks | Natural Language Inference, Negation Detection, Question Answering, Sentiment Analysis, Text Summarization |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.03871v1 |
https://arxiv.org/pdf/1907.03871v1.pdf | |
PWC | https://paperswithcode.com/paper/a-study-of-the-effect-of-resolving-negation |
Repo | |
Framework | |
Exploiting Oxide Based Resistive RAM Variability for Bayesian Neural Network Hardware Design
Title | Exploiting Oxide Based Resistive RAM Variability for Bayesian Neural Network Hardware Design |
Authors | Akul Malhotra, Sen Lu, Kezhou Yang, Abhronil Sengupta |
Abstract | Uncertainty plays a key role in real-time machine learning. As a significant shift from standard deep networks, which does not consider any uncertainty formulation during its training or inference, Bayesian deep networks are being currently investigated where the network is envisaged as an ensemble of plausible models learnt by the Bayes’ formulation in response to uncertainties in sensory data. Bayesian deep networks consider each synaptic weight as a sample drawn from a probability distribution with learnt mean and variance. This paper elaborates on a hardware design that exploits cycle-to-cycle variability of oxide based Resistive Random Access Memories (RRAMs) as a means to realize such a probabilistic sampling function, instead of viewing it as a disadvantage. |
Tasks | |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.08555v5 |
https://arxiv.org/pdf/1911.08555v5.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-oxide-based-resistive-ram |
Repo | |
Framework | |
Real-time tree search with pessimistic scenarios
Title | Real-time tree search with pessimistic scenarios |
Authors | Takayuki Osogami, Toshihiro Takahashi |
Abstract | Autonomous agents need to make decisions in a sequential manner, under partially observable environment, and in consideration of how other agents behave. In critical situations, such decisions need to be made in real time for example to avoid collisions and recover to safe conditions. We propose a technique of tree search where a deterministic and pessimistic scenario is used after a specified depth. Because there is no branching with the deterministic scenario, the proposed technique allows us to take into account the events that can occur far ahead in the future. The effectiveness of the proposed technique is demonstrated in Pommerman, a multi-agent environment used in a NeurIPS 2018 competition, where the agents that implement the proposed technique have won the first and third places. |
Tasks | |
Published | 2019-02-28 |
URL | https://arxiv.org/abs/1902.10870v2 |
https://arxiv.org/pdf/1902.10870v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-tree-search-with-pessimistic |
Repo | |
Framework | |
Distilling Translations with Visual Awareness
Title | Distilling Translations with Visual Awareness |
Authors | Julia Ive, Pranava Madhyastha, Lucia Specia |
Abstract | Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient. As a consequence, models tend to learn to ignore this information. We propose a translate-and-refine approach to this problem where images are only used by a second stage decoder. This approach is trained jointly to generate a good first draft translation and to improve over this draft by (i) making better use of the target language textual context (both left and right-side contexts) and (ii) making use of visual context. This approach leads to the state of the art results. Additionally, we show that it has the ability to recover from erroneous or missing words in the source language. |
Tasks | Machine Translation, Multimodal Machine Translation |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07701v1 |
https://arxiv.org/pdf/1906.07701v1.pdf | |
PWC | https://paperswithcode.com/paper/distilling-translations-with-visual-awareness |
Repo | |
Framework | |
A multi-task U-net for segmentation with lazy labels
Title | A multi-task U-net for segmentation with lazy labels |
Authors | Rihuan Ke, Aurélie Bugeau, Nicolas Papadakis, Peter Schuetz, Carola-Bibiane Schönlieb |
Abstract | The need for labour intensive pixel-wise annotation is a major limitation of many fully supervised learning methods for image segmentation. In this paper, we propose a deep convolutional neural network for multi-class segmentation that circumvents this problem by being trainable on coarse data labels combined with only a very small number of images with pixel-wise annotations. We call this new labelling strategy ‘lazy’ labels. Image segmentation is then stratified into three connected tasks: rough detection of class instances, separation of wrongly connected objects without a clear boundary, and pixel-wise segmentation to find the accurate boundaries of each object. These problems are integrated into a multitask learning framework and the model is trained end-to-end in a semi-supervised fashion. The method is applied on a dataset of food microscopy images. We show that the model gives accurate segmentation results even if exact boundary labels are missing for a majority of the annotated data. This allows more flexibility and efficiency for training deep neural networks that are data hungry in a practical setting where manual annotation is expensive, by collecting more lazy (rough) annotations than precisely segmented images. |
Tasks | Semantic Segmentation |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.12177v1 |
https://arxiv.org/pdf/1906.12177v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-task-u-net-for-segmentation-with-lazy |
Repo | |
Framework | |
Latent Unexpected and Useful Recommendation
Title | Latent Unexpected and Useful Recommendation |
Authors | Pan Li, Alexander Tuzhilin |
Abstract | Providing unexpected recommendations is an important task for recommender systems. To do this, we need to start from the expectations of users and deviate from these expectations when recommending items. Previously proposed approaches model user expectations in the feature space, making them limited to the items that the user has visited or expected by the deduction of associated rules, without including the items that the user could also expect from the latent, complex and heterogeneous interactions between users, items and entities. In this paper, we define unexpectedness in the latent space rather than in the feature space and develop a novel Latent Convex Hull (LCH) method to provide unexpected recommendations. Extensive experiments on two real-world datasets demonstrate the effectiveness of the proposed model that significantly outperforms alternative state-of-the-art unexpected recommendation methods in terms of unexpectedness measures while achieving the same level of accuracy. |
Tasks | Recommendation Systems |
Published | 2019-05-04 |
URL | https://arxiv.org/abs/1905.01546v1 |
https://arxiv.org/pdf/1905.01546v1.pdf | |
PWC | https://paperswithcode.com/paper/latent-unexpected-and-useful-recommendation |
Repo | |
Framework | |
Frequency-Aware Reconstruction of Fluid Simulations with Generative Networks
Title | Frequency-Aware Reconstruction of Fluid Simulations with Generative Networks |
Authors | Simon Biland, Vinicius C. Azevedo, Byungsoo Kim, Barbara Solenthaler |
Abstract | Convolutional neural networks were recently employed to fully reconstruct fluid simulation data from a set of reduced parameters. However, since (de-)convolutions traditionally trained with supervised L1-loss functions do not discriminate between low and high frequencies in the data, the error is not minimized efficiently for higher bands. This directly correlates with the quality of the perceived results, since missing high frequency details are easily noticeable. In this paper, we analyze the reconstruction quality of generative networks and present a frequency-aware loss function that is able to focus on specific bands of the dataset during training time. We show that our approach improves reconstruction quality of fluid simulation data in mid-frequency bands, yielding perceptually better results while requiring comparable training time. |
Tasks | |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08776v1 |
https://arxiv.org/pdf/1912.08776v1.pdf | |
PWC | https://paperswithcode.com/paper/frequency-aware-reconstruction-of-fluid |
Repo | |
Framework | |
General Dynamic Neural Networks for explainable PID parameter tuning in control engineering: An extensive comparison
Title | General Dynamic Neural Networks for explainable PID parameter tuning in control engineering: An extensive comparison |
Authors | Johannes Günther, Elias Reichensdörfer, Patrick M. Pilarski, Klaus Diepold |
Abstract | Automation, the ability to run processes without human supervision, is one of the most important drivers of increased scalability and productivity. Modern automation largely relies on forms of closed loop control, wherein a controller interacts with a controlled process via actions, based on observations. Despite an increase in the use of machine learning for process control, most deployed controllers still are linear Proportional-Integral-Derivative (PID) controllers. PID controllers perform well on linear and near-linear systems but are not robust enough for more complex processes. As a main contribution of this paper, we examine the utility of extending standard PID controllers with General Dynamic Neural Networks (GDNN); we show that GDNN (neural) PID controllers perform well on a range of control systems and highlight what is needed to make them a stable, scalable, and interpretable option for control. To do so, we provide a comprehensive study using four different benchmark processes. All control environments are evaluated with and without noise as well as with and without disturbances. The neural PID controller performs better than standard PID control in 15 of 16 tasks and better than model-based control in 13 of 16 tasks. As a second contribution of this work, we address the Achilles heel that prevents neural networks from being used in real-world control processes so far: lack of interpretability. We use bounded-input bounded-output stability analysis to evaluate the parameters suggested by the neural network, thus making them understandable for human engineers. This combination of rigorous evaluation paired with better explainability is an important step towards the acceptance of neural-network-based control approaches for real-world systems. It is furthermore an important step towards explainable and safe applied artificial intelligence. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13268v1 |
https://arxiv.org/pdf/1905.13268v1.pdf | |
PWC | https://paperswithcode.com/paper/general-dynamic-neural-networks-for |
Repo | |
Framework | |
The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial
Title | The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial |
Authors | Benyamin Ghojogh, Mark Crowley |
Abstract | In this tutorial paper, we first define mean squared error, variance, covariance, and bias of both random variables and classification/predictor models. Then, we formulate the true and generalization errors of the model for both training and validation/test instances where we make use of the Stein’s Unbiased Risk Estimator (SURE). We define overfitting, underfitting, and generalization using the obtained true and generalization errors. We introduce cross validation and two well-known examples which are $K$-fold and leave-one-out cross validations. We briefly introduce generalized cross validation and then move on to regularization where we use the SURE again. We work on both $\ell_2$ and $\ell_1$ norm regularizations. Then, we show that bootstrap aggregating (bagging) reduces the variance of estimation. Boosting, specifically AdaBoost, is introduced and it is explained as both an additive model and a maximum margin model, i.e., Support Vector Machine (SVM). The upper bound on the generalization error of boosting is also provided to show why boosting prevents from overfitting. As examples of regularization, the theory of ridge and lasso regressions, weight decay, noise injection to input/weights, and early stopping are explained. Random forest, dropout, histogram of oriented gradients, and single shot multi-box detector are explained as examples of bagging in machine learning and computer vision. Finally, boosting tree and SVM models are mentioned as examples of boosting. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.12787v1 |
https://arxiv.org/pdf/1905.12787v1.pdf | |
PWC | https://paperswithcode.com/paper/the-theory-behind-overfitting-cross |
Repo | |
Framework | |
Understanding Memory Modules on Learning Simple Algorithms
Title | Understanding Memory Modules on Learning Simple Algorithms |
Authors | Kexin Wang, Yu Zhou, Shaonan Wang, Jiajun Zhang, Chengqing Zong |
Abstract | Recent work has shown that memory modules are crucial for the generalization ability of neural networks on learning simple algorithms. However, we still have little understanding of the working mechanism of memory modules. To alleviate this problem, we apply a two-step analysis pipeline consisting of first inferring hypothesis about what strategy the model has learned according to visualization and then verify it by a novel proposed qualitative analysis method based on dimension reduction. Using this method, we have analyzed two popular memory-augmented neural networks, neural Turing machine and stack-augmented neural network on two simple algorithm tasks including reversing a random sequence and evaluation of arithmetic expressions. Results have shown that on the former task both models can learn to generalize and on the latter task only the stack-augmented model can do so. We show that different strategies are learned by the models, in which specific categories of input are monitored and different policies are made based on that to change the memory. |
Tasks | Dimensionality Reduction |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00820v1 |
https://arxiv.org/pdf/1907.00820v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-memory-modules-on-learning |
Repo | |
Framework | |
Predicting Human Activities from User-Generated Content
Title | Predicting Human Activities from User-Generated Content |
Authors | Steven R. Wilson, Rada Mihalcea |
Abstract | The activities we do are linked to our interests, personality, political preferences, and decisions we make about the future. In this paper, we explore the task of predicting human activities from user-generated content. We collect a dataset containing instances of social media users writing about a range of everyday activities. We then use a state-of-the-art sentence embedding framework tailored to recognize the semantics of human activities and perform an automatic clustering of these activities. We train a neural network model to make predictions about which clusters contain activities that were performed by a given user based on the text of their previous posts and self-description. Additionally, we explore the degree to which incorporating inferred user traits into our model helps with this prediction task. |
Tasks | Sentence Embedding |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08540v1 |
https://arxiv.org/pdf/1907.08540v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-human-activities-from-user |
Repo | |
Framework | |
VC Classes are Adversarially Robustly Learnable, but Only Improperly
Title | VC Classes are Adversarially Robustly Learnable, but Only Improperly |
Authors | Omar Montasser, Steve Hanneke, Nathan Srebro |
Abstract | We study the question of learning an adversarially robust predictor. We show that any hypothesis class $\mathcal{H}$ with finite VC dimension is robustly PAC learnable with an improper learning rule. The requirement of being improper is necessary as we exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are not robustly PAC learnable with any proper learning rule. |
Tasks | |
Published | 2019-02-12 |
URL | https://arxiv.org/abs/1902.04217v2 |
https://arxiv.org/pdf/1902.04217v2.pdf | |
PWC | https://paperswithcode.com/paper/vc-classes-are-adversarially-robustly |
Repo | |
Framework | |
Video Interpolation and Prediction with Unsupervised Landmarks
Title | Video Interpolation and Prediction with Unsupervised Landmarks |
Authors | Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro |
Abstract | Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space, achieving long-range predictions. However, these latent representations are often non-interpretable, and therefore difficult to manipulate. This work poses video prediction and interpolation as unsupervised latent structure inference followed by a temporal prediction in this latent space. The latent representations capture foreground semantics without explicit supervision such as keypoints or poses. Further, as each landmark can be mapped to a coordinate indicating where a semantic part is positioned, we can reliably interpolate within the coordinate domain to achieve predictable motion interpolation. Given an image decoder capable of mapping these landmarks back to the image domain, we are able to achieve high-quality long-range video interpolation and extrapolation by operating on the landmark representation space. |
Tasks | Optical Flow Estimation, Video Prediction |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02749v1 |
https://arxiv.org/pdf/1909.02749v1.pdf | |
PWC | https://paperswithcode.com/paper/video-interpolation-and-prediction-with |
Repo | |
Framework | |