Paper Group AWR 323
Understanding Learned Models by Identifying Important Features at the Right Resolution. Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models. Evaluation of Unsupervised Compositional Representations. Underwater Fish Detection using Deep Learning for Water Power Applications. Adaptive Semi-supervised Learn …
Understanding Learned Models by Identifying Important Features at the Right Resolution
Title | Understanding Learned Models by Identifying Important Features at the Right Resolution |
Authors | Kyubin Lee, Akshay Sood, Mark Craven |
Abstract | In many application domains, it is important to characterize how complex learned models make their decisions across the distribution of instances. One way to do this is to identify the features and interactions among them that contribute to a model’s predictive accuracy. We present a model-agnostic approach to this task that makes the following specific contributions. Our approach (i) tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model’s loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications. |
Tasks | |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07279v2 |
http://arxiv.org/pdf/1811.07279v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-learned-models-by-identifying |
Repo | https://github.com/Craven-Biostat-Lab/mihifepe |
Framework | none |
Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models
Title | Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models |
Authors | Victor Chernozhukov, Denis Nekipelov, Vira Semenova, Vasilis Syrgkanis |
Abstract | We propose an l1-regularized M-estimator for a high-dimensional sparse parameter that is identified by a class of semiparametric conditional moment restrictions (CMR). We estimate the nonparametric nuisance parameter by modern machine learning methods. Plugging the first-stage estimate into the CMR, we construct the M-estimator loss function for the target parameter so that its gradient is insensitive (formally, Neyman-orthogonal) with respect to the first-stage regularization bias. As a result, the estimator achieves oracle convergence rate \sqrt{k \log p/n}, where oracle knows the true first stage and solves only a parametric problem. We apply our results to conditional moment models with missing data, games of incomplete information and treatment effects in regression models with non-linear link functions. |
Tasks | |
Published | 2018-06-13 |
URL | https://arxiv.org/abs/1806.04823v4 |
https://arxiv.org/pdf/1806.04823v4.pdf | |
PWC | https://paperswithcode.com/paper/plug-in-regularized-estimation-of-high |
Repo | https://github.com/Microsoft/EconML |
Framework | none |
Evaluation of Unsupervised Compositional Representations
Title | Evaluation of Unsupervised Compositional Representations |
Authors | Hanan Aldarmaki, Mona Diab |
Abstract | We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks. Our results confirm that weighted vector averaging can outperform context-sensitive models in most benchmarks, but structural features encoded in RNN models can also be useful in certain classification tasks. We analyzed some of the evaluation datasets to identify the aspects of meaning they measure and the characteristics of the various models that explain their performance variance. |
Tasks | |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04713v2 |
http://arxiv.org/pdf/1806.04713v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluation-of-unsupervised-compositional |
Repo | https://github.com/h-aldarmaki/sentence_eval |
Framework | none |
Underwater Fish Detection using Deep Learning for Water Power Applications
Title | Underwater Fish Detection using Deep Learning for Water Power Applications |
Authors | Wenwei Xu, Shari Matzner |
Abstract | Clean energy from oceans and rivers is becoming a reality with the development of new technologies like tidal and instream turbines that generate electricity from naturally flowing water. These new technologies are being monitored for effects on fish and other wildlife using underwater video. Methods for automated analysis of underwater video are needed to lower the costs of analysis and improve accuracy. A deep learning model, YOLO, was trained to recognize fish in underwater video using three very different datasets recorded at real-world water power sites. Training and testing with examples from all three datasets resulted in a mean average precision (mAP) score of 0.5392. To test how well a model could generalize to new datasets, the model was trained using examples from only two of the datasets and then tested on examples from all three datasets. The resulting model could not recognize fish in the dataset that was not part of the training set. The mAP scores on the other two datasets that were included in the training set were higher than the scores achieved by the model trained on all three datasets. These results indicate that different methods are needed in order to produce a trained model that can generalize to new data sets such as those encountered in real world applications. |
Tasks | Fish Detection |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01494v1 |
http://arxiv.org/pdf/1811.01494v1.pdf | |
PWC | https://paperswithcode.com/paper/underwater-fish-detection-using-deep-learning |
Repo | https://github.com/wenweixu/keras-yolo3 |
Framework | none |
Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification
Title | Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification |
Authors | Ruidan He, Wee Sun Lee, Hwee Tou Ng, Daniel Dahlmeier |
Abstract | We consider the cross-domain sentiment classification problem, where a sentiment classifier is to be learned from a source domain and to be generalized to a target domain. Our approach explicitly minimizes the distance between the source and the target instances in an embedded feature space. With the difference between source and target minimized, we then exploit additional information from the target domain by consolidating the idea of semi-supervised learning, for which, we jointly employ two regularizations – entropy minimization and self-ensemble bootstrapping – to incorporate the unlabeled target data for classifier refinement. Our experimental results demonstrate that the proposed approach can better leverage unlabeled data from the target domain and achieve substantial improvements over baseline methods in various experimental settings. |
Tasks | Sentiment Analysis |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00530v1 |
http://arxiv.org/pdf/1809.00530v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-semi-supervised-learning-for-cross |
Repo | https://github.com/ruidan/DAS |
Framework | tf |
Highway State Gating for Recurrent Highway Networks: improving information flow through time
Title | Highway State Gating for Recurrent Highway Networks: improving information flow through time |
Authors | Ron Shoham, Haim Permuter |
Abstract | Recurrent Neural Networks (RNNs) play a major role in the field of sequential learning, and have outperformed traditional algorithms on many benchmarks. Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2-4 layers. Recurrent Highway Networks (RHNs) were introduced in order to tackle this issue. These have achieved state-of-the-art performance on a few benchmarks using a depth of 10 layers. However, the performance of this architecture suffers from a bottleneck, and ceases to improve when an attempt is made to add more layers. In this work, we analyze the causes for this, and postulate that the main source is the way that the information flows through time. We introduce a novel and simple variation for the RHN cell, called Highway State Gating (HSG), which allows adding more layers, while continuing to improve performance. By using a gating mechanism for the state, we allow the net to “choose” whether to pass information directly through time, or to gate it. This mechanism also allows the gradient to back-propagate directly through time and, therefore, results in a slightly faster convergence. We use the Penn Treebank (PTB) dataset as a platform for empirical proof of concept. Empirical results show that the improvement due to Highway State Gating is for all depths, and as the depth increases, the improvement also increases. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09238v1 |
http://arxiv.org/pdf/1805.09238v1.pdf | |
PWC | https://paperswithcode.com/paper/highway-state-gating-for-recurrent-highway |
Repo | https://github.com/KurochkinAlexey/Hierarchical-Attention-Based-Recurrent-Highway-Networks-for-Time-Series-Prediction |
Framework | pytorch |
Towards Neural Machine Translation for African Languages
Title | Towards Neural Machine Translation for African Languages |
Authors | Jade Z. Abbott, Laura Martinus |
Abstract | Given that South African education is in crisis, strategies for improvement and sustainability of high-quality, up-to-date education must be explored. In the migration of education online, inclusion of machine translation for low-resourced local languages becomes necessary. This paper aims to spur the use of current neural machine translation (NMT) techniques for low-resourced local languages. The paper demonstrates state-of-the-art performance on English-to-Setswana translation using the Autshumato dataset. The use of the Transformer architecture beat previous techniques by 5.33 BLEU points. This demonstrates the promise of using current NMT techniques for African languages. |
Tasks | Machine Translation |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05467v1 |
http://arxiv.org/pdf/1811.05467v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-neural-machine-translation-for |
Repo | https://github.com/LauraMartinus/ukuxhumana |
Framework | tf |
An Online Plug-and-Play Algorithm for Regularized Image Reconstruction
Title | An Online Plug-and-Play Algorithm for Regularized Image Reconstruction |
Authors | Yu Sun, Brendt Wohlberg, Ulugbek S. Kamilov |
Abstract | Plug-and-play priors (PnP) is a powerful framework for regularizing imaging inverse problems by using advanced denoisers within an iterative algorithm. Recent experimental evidence suggests that PnP algorithms achieve state-of-the-art performance in a range of imaging applications. In this paper, we introduce a new online PnP algorithm based on the iterative shrinkage/thresholding algorithm (ISTA). The proposed algorithm uses only a subset of measurements at every iteration, which makes it scalable to very large datasets. We present a new theoretical convergence analysis, for both batch and online variants of PnP-ISTA, for denoisers that do not necessarily correspond to proximal operators. We also present simulations illustrating the applicability of the algorithm to image reconstruction in diffraction tomography. The results in this paper have the potential to expand the applicability of the PnP framework to very large and redundant datasets. |
Tasks | Image Reconstruction |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04693v1 |
http://arxiv.org/pdf/1809.04693v1.pdf | |
PWC | https://paperswithcode.com/paper/an-online-plug-and-play-algorithm-for |
Repo | https://github.com/sunyumark/2019-TCI-OnlinePnP |
Framework | none |
Active Learning with Partial Feedback
Title | Active Learning with Partial Feedback |
Authors | Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan |
Abstract | While many active learning papers assume that the learner can simply ask for a label and receive it, real annotation often presents a mismatch between the form of a label (say, one among many classes), and the form of an annotation (typically yes/no binary feedback). To annotate examples corpora for multiclass classification, we might need to ask multiple yes/no questions, exploiting a label hierarchy if one is available. To address this more realistic setting, we propose active learning with partial feedback (ALPF), where the learner must actively choose both which example to label and which binary question to ask. At each step, the learner selects an example, asking if it belongs to a chosen (possibly composite) class. Each answer eliminates some classes, leaving the learner with a partial label. The learner may then either ask more questions about the same example (until an exact label is uncovered) or move on immediately, leaving the first example partially labeled. Active learning with partial labels requires (i) a sampling strategy to choose (example, class) pairs, and (ii) learning from partial labels between rounds. Experiments on Tiny ImageNet demonstrate that our most effective method improves 26% (relative) in top-1 classification accuracy compared to i.i.d. baselines and standard active learners given 30% of the annotation budget that would be required (naively) to annotate the dataset. Moreover, ALPF-learners fully annotate TinyImageNet at 42% lower cost. Surprisingly, we observe that accounting for per-example annotation costs can alter the conventional wisdom that active learners should solicit labels for hard examples. |
Tasks | Active Learning |
Published | 2018-02-21 |
URL | https://arxiv.org/abs/1802.07427v4 |
https://arxiv.org/pdf/1802.07427v4.pdf | |
PWC | https://paperswithcode.com/paper/active-learning-with-partial-feedback |
Repo | https://github.com/peiyunh/alpf |
Framework | tf |
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Title | A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks |
Authors | Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin |
Abstract | Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models. |
Tasks | |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03888v2 |
http://arxiv.org/pdf/1807.03888v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-unified-framework-for-detecting-out |
Repo | https://github.com/pokaxpoka/deep_Mahalanobis_detector |
Framework | pytorch |
Stochastic Gradient Push for Distributed Deep Learning
Title | Stochastic Gradient Push for Distributed Deep Learning |
Authors | Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat |
Abstract | Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. We empirically validate the performance of SGP on image classification (ResNet-50, ImageNet) and machine translation (Transformer, WMT’16 En-De) workloads. Our code will be made publicly available. |
Tasks | Image Classification, Machine Translation |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.10792v3 |
https://arxiv.org/pdf/1811.10792v3.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-push-for-distributed-deep |
Repo | https://github.com/facebookresearch/stochastic_gradient_push |
Framework | pytorch |
Generalizing Word Embeddings using Bag of Subwords
Title | Generalizing Word Embeddings using Bag of Subwords |
Authors | Jinman Zhao, Sidharth Mudgal, Yingyu Liang |
Abstract | We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character $n$-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model’s ability in capturing the relationship between words’ textual representations and their embeddings. |
Tasks | Word Embeddings |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04259v1 |
http://arxiv.org/pdf/1809.04259v1.pdf | |
PWC | https://paperswithcode.com/paper/generalizing-word-embeddings-using-bag-of |
Repo | https://github.com/jmzhao/bag-of-substring-embedder |
Framework | none |
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Title | Learning Conditioned Graph Structures for Interpretable Visual Question Answering |
Authors | Will Norcliffe-Brown, Efstathios Vafeias, Sarah Parisot |
Abstract | Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are consequently merged using a variety of techniques. Nonetheless, very few rely on higher level image representations, which can capture semantic and spatial relationships. In this paper, we propose a novel graph-based approach for Visual Question Answering. Our method combines a graph learner module, which learns a question specific graph representation of the input image, with the recent concept of graph convolutions, aiming to learn image representations that capture question specific interactions. We test our approach on the VQA v2 dataset using a simple baseline architecture enhanced by the proposed graph learner module. We obtain promising results with 66.18% accuracy and demonstrate the interpretability of the proposed method. Code can be found at github.com/aimbrain/vqa-project. |
Tasks | Question Answering, Visual Question Answering |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07243v6 |
http://arxiv.org/pdf/1806.07243v6.pdf | |
PWC | https://paperswithcode.com/paper/learning-conditioned-graph-structures-for |
Repo | https://github.com/aimbrain/vqa-project |
Framework | pytorch |
Adversarial Attacks on Neural Networks for Graph Data
Title | Adversarial Attacks on Neural Networks for Graph Data |
Authors | Daniel Zügner, Amir Akbarnejad, Stephan Günnemann |
Abstract | Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model. We generate adversarial perturbations targeting the node’s features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given. |
Tasks | Node Classification |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07984v3 |
http://arxiv.org/pdf/1805.07984v3.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-attacks-on-neural-networks-for |
Repo | https://github.com/danielzuegner/nettack |
Framework | tf |
No One is Perfect: Analysing the Performance of Question Answering Components over the DBpedia Knowledge Graph
Title | No One is Perfect: Analysing the Performance of Question Answering Components over the DBpedia Knowledge Graph |
Authors | Kuldeep Singh, Ioanna Lytra, Arun Sethupat Radhakrishna, Saeedeh Shekarpour, Maria-Esther Vidal, Jens Lehmann |
Abstract | Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting and most approaches currently use a pipeline of processing steps connecting a sequence of components. In this article, we analyse and micro evaluate the behaviour of 29 available QA components for DBpedia knowledge graph that were released by the research community since 2010. As a result, we provide a perspective on collective failure cases, suggest characteristics of QA components that prevent them from performing better and provide future challenges and research directions for the field. |
Tasks | Knowledge Graphs, Question Answering |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.10044v1 |
http://arxiv.org/pdf/1809.10044v1.pdf | |
PWC | https://paperswithcode.com/paper/no-one-is-perfect-analysing-the-performance |
Repo | https://github.com/dice-group/NLIWOD |
Framework | none |