October 20, 2019

2941 words 14 mins read

Paper Group AWR 323

Understanding Learned Models by Identifying Important Features at the Right Resolution. Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models. Evaluation of Unsupervised Compositional Representations. Underwater Fish Detection using Deep Learning for Water Power Applications. Adaptive Semi-supervised Learn …

Understanding Learned Models by Identifying Important Features at the Right Resolution


Title	Understanding Learned Models by Identifying Important Features at the Right Resolution
Authors	Kyubin Lee, Akshay Sood, Mark Craven
Abstract	In many application domains, it is important to characterize how complex learned models make their decisions across the distribution of instances. One way to do this is to identify the features and interactions among them that contribute to a model’s predictive accuracy. We present a model-agnostic approach to this task that makes the following specific contributions. Our approach (i) tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model’s loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications.
Tasks
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07279v2
PDF	http://arxiv.org/pdf/1811.07279v2.pdf
PWC	https://paperswithcode.com/paper/understanding-learned-models-by-identifying
Repo	https://github.com/Craven-Biostat-Lab/mihifepe
Framework	none

Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models


Title	Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models
Authors	Victor Chernozhukov, Denis Nekipelov, Vira Semenova, Vasilis Syrgkanis
Abstract	We propose an l1-regularized M-estimator for a high-dimensional sparse parameter that is identified by a class of semiparametric conditional moment restrictions (CMR). We estimate the nonparametric nuisance parameter by modern machine learning methods. Plugging the first-stage estimate into the CMR, we construct the M-estimator loss function for the target parameter so that its gradient is insensitive (formally, Neyman-orthogonal) with respect to the first-stage regularization bias. As a result, the estimator achieves oracle convergence rate \sqrt{k \log p/n}, where oracle knows the true first stage and solves only a parametric problem. We apply our results to conditional moment models with missing data, games of incomplete information and treatment effects in regression models with non-linear link functions.
Tasks
Published	2018-06-13
URL	https://arxiv.org/abs/1806.04823v4
PDF	https://arxiv.org/pdf/1806.04823v4.pdf
PWC	https://paperswithcode.com/paper/plug-in-regularized-estimation-of-high
Repo	https://github.com/Microsoft/EconML
Framework	none

Evaluation of Unsupervised Compositional Representations


Title	Evaluation of Unsupervised Compositional Representations
Authors	Hanan Aldarmaki, Mona Diab
Abstract	We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks. Our results confirm that weighted vector averaging can outperform context-sensitive models in most benchmarks, but structural features encoded in RNN models can also be useful in certain classification tasks. We analyzed some of the evaluation datasets to identify the aspects of meaning they measure and the characteristics of the various models that explain their performance variance.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04713v2
PDF	http://arxiv.org/pdf/1806.04713v2.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-unsupervised-compositional
Repo	https://github.com/h-aldarmaki/sentence_eval
Framework	none

Underwater Fish Detection using Deep Learning for Water Power Applications


Title	Underwater Fish Detection using Deep Learning for Water Power Applications
Authors	Wenwei Xu, Shari Matzner
Abstract	Clean energy from oceans and rivers is becoming a reality with the development of new technologies like tidal and instream turbines that generate electricity from naturally flowing water. These new technologies are being monitored for effects on fish and other wildlife using underwater video. Methods for automated analysis of underwater video are needed to lower the costs of analysis and improve accuracy. A deep learning model, YOLO, was trained to recognize fish in underwater video using three very different datasets recorded at real-world water power sites. Training and testing with examples from all three datasets resulted in a mean average precision (mAP) score of 0.5392. To test how well a model could generalize to new datasets, the model was trained using examples from only two of the datasets and then tested on examples from all three datasets. The resulting model could not recognize fish in the dataset that was not part of the training set. The mAP scores on the other two datasets that were included in the training set were higher than the scores achieved by the model trained on all three datasets. These results indicate that different methods are needed in order to produce a trained model that can generalize to new data sets such as those encountered in real world applications.
Tasks	Fish Detection
Published	2018-11-05
URL	http://arxiv.org/abs/1811.01494v1
PDF	http://arxiv.org/pdf/1811.01494v1.pdf
PWC	https://paperswithcode.com/paper/underwater-fish-detection-using-deep-learning
Repo	https://github.com/wenweixu/keras-yolo3
Framework	none

Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification


Title	Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification
Authors	Ruidan He, Wee Sun Lee, Hwee Tou Ng, Daniel Dahlmeier
Abstract	We consider the cross-domain sentiment classification problem, where a sentiment classifier is to be learned from a source domain and to be generalized to a target domain. Our approach explicitly minimizes the distance between the source and the target instances in an embedded feature space. With the difference between source and target minimized, we then exploit additional information from the target domain by consolidating the idea of semi-supervised learning, for which, we jointly employ two regularizations – entropy minimization and self-ensemble bootstrapping – to incorporate the unlabeled target data for classifier refinement. Our experimental results demonstrate that the proposed approach can better leverage unlabeled data from the target domain and achieve substantial improvements over baseline methods in various experimental settings.
Tasks	Sentiment Analysis
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00530v1
PDF	http://arxiv.org/pdf/1809.00530v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-semi-supervised-learning-for-cross
Repo	https://github.com/ruidan/DAS
Framework	tf

Highway State Gating for Recurrent Highway Networks: improving information flow through time


Title	Highway State Gating for Recurrent Highway Networks: improving information flow through time
Authors	Ron Shoham, Haim Permuter
Abstract	Recurrent Neural Networks (RNNs) play a major role in the field of sequential learning, and have outperformed traditional algorithms on many benchmarks. Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2-4 layers. Recurrent Highway Networks (RHNs) were introduced in order to tackle this issue. These have achieved state-of-the-art performance on a few benchmarks using a depth of 10 layers. However, the performance of this architecture suffers from a bottleneck, and ceases to improve when an attempt is made to add more layers. In this work, we analyze the causes for this, and postulate that the main source is the way that the information flows through time. We introduce a novel and simple variation for the RHN cell, called Highway State Gating (HSG), which allows adding more layers, while continuing to improve performance. By using a gating mechanism for the state, we allow the net to “choose” whether to pass information directly through time, or to gate it. This mechanism also allows the gradient to back-propagate directly through time and, therefore, results in a slightly faster convergence. We use the Penn Treebank (PTB) dataset as a platform for empirical proof of concept. Empirical results show that the improvement due to Highway State Gating is for all depths, and as the depth increases, the improvement also increases.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09238v1
PDF	http://arxiv.org/pdf/1805.09238v1.pdf
PWC	https://paperswithcode.com/paper/highway-state-gating-for-recurrent-highway
Repo	https://github.com/KurochkinAlexey/Hierarchical-Attention-Based-Recurrent-Highway-Networks-for-Time-Series-Prediction
Framework	pytorch

Towards Neural Machine Translation for African Languages


Title	Towards Neural Machine Translation for African Languages
Authors	Jade Z. Abbott, Laura Martinus
Abstract	Given that South African education is in crisis, strategies for improvement and sustainability of high-quality, up-to-date education must be explored. In the migration of education online, inclusion of machine translation for low-resourced local languages becomes necessary. This paper aims to spur the use of current neural machine translation (NMT) techniques for low-resourced local languages. The paper demonstrates state-of-the-art performance on English-to-Setswana translation using the Autshumato dataset. The use of the Transformer architecture beat previous techniques by 5.33 BLEU points. This demonstrates the promise of using current NMT techniques for African languages.
Tasks	Machine Translation
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05467v1
PDF	http://arxiv.org/pdf/1811.05467v1.pdf
PWC	https://paperswithcode.com/paper/towards-neural-machine-translation-for
Repo	https://github.com/LauraMartinus/ukuxhumana
Framework	tf

An Online Plug-and-Play Algorithm for Regularized Image Reconstruction


Title	An Online Plug-and-Play Algorithm for Regularized Image Reconstruction
Authors	Yu Sun, Brendt Wohlberg, Ulugbek S. Kamilov
Abstract	Plug-and-play priors (PnP) is a powerful framework for regularizing imaging inverse problems by using advanced denoisers within an iterative algorithm. Recent experimental evidence suggests that PnP algorithms achieve state-of-the-art performance in a range of imaging applications. In this paper, we introduce a new online PnP algorithm based on the iterative shrinkage/thresholding algorithm (ISTA). The proposed algorithm uses only a subset of measurements at every iteration, which makes it scalable to very large datasets. We present a new theoretical convergence analysis, for both batch and online variants of PnP-ISTA, for denoisers that do not necessarily correspond to proximal operators. We also present simulations illustrating the applicability of the algorithm to image reconstruction in diffraction tomography. The results in this paper have the potential to expand the applicability of the PnP framework to very large and redundant datasets.
Tasks	Image Reconstruction
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04693v1
PDF	http://arxiv.org/pdf/1809.04693v1.pdf
PWC	https://paperswithcode.com/paper/an-online-plug-and-play-algorithm-for
Repo	https://github.com/sunyumark/2019-TCI-OnlinePnP
Framework	none

Active Learning with Partial Feedback


Title	Active Learning with Partial Feedback
Authors	Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan
Abstract	While many active learning papers assume that the learner can simply ask for a label and receive it, real annotation often presents a mismatch between the form of a label (say, one among many classes), and the form of an annotation (typically yes/no binary feedback). To annotate examples corpora for multiclass classification, we might need to ask multiple yes/no questions, exploiting a label hierarchy if one is available. To address this more realistic setting, we propose active learning with partial feedback (ALPF), where the learner must actively choose both which example to label and which binary question to ask. At each step, the learner selects an example, asking if it belongs to a chosen (possibly composite) class. Each answer eliminates some classes, leaving the learner with a partial label. The learner may then either ask more questions about the same example (until an exact label is uncovered) or move on immediately, leaving the first example partially labeled. Active learning with partial labels requires (i) a sampling strategy to choose (example, class) pairs, and (ii) learning from partial labels between rounds. Experiments on Tiny ImageNet demonstrate that our most effective method improves 26% (relative) in top-1 classification accuracy compared to i.i.d. baselines and standard active learners given 30% of the annotation budget that would be required (naively) to annotate the dataset. Moreover, ALPF-learners fully annotate TinyImageNet at 42% lower cost. Surprisingly, we observe that accounting for per-example annotation costs can alter the conventional wisdom that active learners should solicit labels for hard examples.
Tasks	Active Learning
Published	2018-02-21
URL	https://arxiv.org/abs/1802.07427v4
PDF	https://arxiv.org/pdf/1802.07427v4.pdf
PWC	https://paperswithcode.com/paper/active-learning-with-partial-feedback
Repo	https://github.com/peiyunh/alpf
Framework	tf

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks


Title	A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Authors	Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin
Abstract	Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.
Tasks
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03888v2
PDF	http://arxiv.org/pdf/1807.03888v2.pdf
PWC	https://paperswithcode.com/paper/a-simple-unified-framework-for-detecting-out
Repo	https://github.com/pokaxpoka/deep_Mahalanobis_detector
Framework	pytorch

Stochastic Gradient Push for Distributed Deep Learning


Title	Stochastic Gradient Push for Distributed Deep Learning
Authors	Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat
Abstract	Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. We empirically validate the performance of SGP on image classification (ResNet-50, ImageNet) and machine translation (Transformer, WMT’16 En-De) workloads. Our code will be made publicly available.
Tasks	Image Classification, Machine Translation
Published	2018-11-27
URL	https://arxiv.org/abs/1811.10792v3
PDF	https://arxiv.org/pdf/1811.10792v3.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-push-for-distributed-deep
Repo	https://github.com/facebookresearch/stochastic_gradient_push
Framework	pytorch

Generalizing Word Embeddings using Bag of Subwords


Title	Generalizing Word Embeddings using Bag of Subwords
Authors	Jinman Zhao, Sidharth Mudgal, Yingyu Liang
Abstract	We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character $n$-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model’s ability in capturing the relationship between words’ textual representations and their embeddings.
Tasks	Word Embeddings
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04259v1
PDF	http://arxiv.org/pdf/1809.04259v1.pdf
PWC	https://paperswithcode.com/paper/generalizing-word-embeddings-using-bag-of
Repo	https://github.com/jmzhao/bag-of-substring-embedder
Framework	none

Learning Conditioned Graph Structures for Interpretable Visual Question Answering


Title	Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Authors	Will Norcliffe-Brown, Efstathios Vafeias, Sarah Parisot
Abstract	Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are consequently merged using a variety of techniques. Nonetheless, very few rely on higher level image representations, which can capture semantic and spatial relationships. In this paper, we propose a novel graph-based approach for Visual Question Answering. Our method combines a graph learner module, which learns a question specific graph representation of the input image, with the recent concept of graph convolutions, aiming to learn image representations that capture question specific interactions. We test our approach on the VQA v2 dataset using a simple baseline architecture enhanced by the proposed graph learner module. We obtain promising results with 66.18% accuracy and demonstrate the interpretability of the proposed method. Code can be found at github.com/aimbrain/vqa-project.
Tasks	Question Answering, Visual Question Answering
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07243v6
PDF	http://arxiv.org/pdf/1806.07243v6.pdf
PWC	https://paperswithcode.com/paper/learning-conditioned-graph-structures-for
Repo	https://github.com/aimbrain/vqa-project
Framework	pytorch

Adversarial Attacks on Neural Networks for Graph Data


Title	Adversarial Attacks on Neural Networks for Graph Data
Authors	Daniel Zügner, Amir Akbarnejad, Stephan Günnemann
Abstract	Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model. We generate adversarial perturbations targeting the node’s features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given.
Tasks	Node Classification
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07984v3
PDF	http://arxiv.org/pdf/1805.07984v3.pdf
PWC	https://paperswithcode.com/paper/adversarial-attacks-on-neural-networks-for
Repo	https://github.com/danielzuegner/nettack
Framework	tf

No One is Perfect: Analysing the Performance of Question Answering Components over the DBpedia Knowledge Graph


Title	No One is Perfect: Analysing the Performance of Question Answering Components over the DBpedia Knowledge Graph
Authors	Kuldeep Singh, Ioanna Lytra, Arun Sethupat Radhakrishna, Saeedeh Shekarpour, Maria-Esther Vidal, Jens Lehmann
Abstract	Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting and most approaches currently use a pipeline of processing steps connecting a sequence of components. In this article, we analyse and micro evaluate the behaviour of 29 available QA components for DBpedia knowledge graph that were released by the research community since 2010. As a result, we provide a perspective on collective failure cases, suggest characteristics of QA components that prevent them from performing better and provide future challenges and research directions for the field.
Tasks	Knowledge Graphs, Question Answering
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10044v1
PDF	http://arxiv.org/pdf/1809.10044v1.pdf
PWC	https://paperswithcode.com/paper/no-one-is-perfect-analysing-the-performance
Repo	https://github.com/dice-group/NLIWOD
Framework	none