Paper Group ANR 137
BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle. Fully Convolutional Deep Network Architectures for Automatic Short Glass Fiber Semantic Segmentation from CT scans. Understanding Sparse JL for Feature Hashing. Image Detection and Digit Recognition to solve Sudoku as a Constraint Satisfac …
BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
Title | BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle |
Authors | Peter West, Ari Holtzman, Jan Buys, Yejin Choi |
Abstract | The principle of the Information Bottleneck (Tishby et al. 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence. Our iterative algorithm under the Information Bottleneck objective searches gradually shorter subsequences of the given sentence while maximizing the probability of the next sentence conditioned on the summary. Using only pretrained language models with no direct supervision, our approach can efficiently perform extractive sentence summarization over a large corpus. Building on our unsupervised extractive summarization (BottleSumEx), we then present a new approach to self-supervised abstractive summarization (BottleSumSelf), where a transformer-based language model is trained on the output summaries of our unsupervised method. Empirical results demonstrate that our extractive method outperforms other unsupervised models on multiple automatic metrics. In addition, we find that our self-supervised abstractive model outperforms unsupervised baselines (including our own) by human evaluation along multiple attributes. |
Tasks | Abstractive Text Summarization, Language Modelling |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07405v2 |
https://arxiv.org/pdf/1909.07405v2.pdf | |
PWC | https://paperswithcode.com/paper/bottlesum-unsupervised-and-self-supervised |
Repo | |
Framework | |
Fully Convolutional Deep Network Architectures for Automatic Short Glass Fiber Semantic Segmentation from CT scans
Title | Fully Convolutional Deep Network Architectures for Automatic Short Glass Fiber Semantic Segmentation from CT scans |
Authors | Tomasz Konopczyński, Danish Rathore, Jitendra Rathore, Thorben Kröger, Lei Zheng, Christoph S. Garbe, Simone Carmignato, Jürgen Hesser |
Abstract | We present the first attempt to perform short glass fiber semantic segmentation from X-ray computed tomography volumetric datasets at medium (3.9 {\mu}m isotropic) and low (8.3 {\mu}m isotropic) resolution using deep learning architectures. We performed experiments on both synthetic and real CT scans and evaluated deep fully convolutional architectures with both 2D and 3D kernels. Our artificial neural networks outperform existing methods at both medium and low resolution scans. |
Tasks | Semantic Segmentation |
Published | 2019-01-04 |
URL | http://arxiv.org/abs/1901.01211v1 |
http://arxiv.org/pdf/1901.01211v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-deep-network |
Repo | |
Framework | |
Understanding Sparse JL for Feature Hashing
Title | Understanding Sparse JL for Feature Hashing |
Authors | Meena Jagadeesan |
Abstract | Feature hashing and other random projection schemes are commonly used to reduce the dimensionality of feature vectors. The goal is to efficiently project a high-dimensional feature vector living in $\mathbb{R}^n$ into a much lower-dimensional space $\mathbb{R}^m$, while approximately preserving Euclidean norm. These schemes can be constructed using sparse random projections, for example using a sparse Johnson-Lindenstrauss (JL) transform. A line of work introduced by Weinberger et. al (ICML ‘09) analyzes the accuracy of sparse JL with sparsity 1 on feature vectors with small $\ell_\infty$-to-$\ell_2$ norm ratio. Recently, Freksen, Kamma, and Larsen (NeurIPS ‘18) closed this line of work by proving a tight tradeoff between $\ell_\infty$-to-$\ell_2$ norm ratio and accuracy for sparse JL with sparsity $1$. In this paper, we demonstrate the benefits of using sparsity $s$ greater than $1$ in sparse JL on feature vectors. Our main result is a tight tradeoff between $\ell_\infty$-to-$\ell_2$ norm ratio and accuracy for a general sparsity $s$, that significantly generalizes the result of Freksen et. al. Our result theoretically demonstrates that sparse JL with $s > 1$ can have significantly better norm-preservation properties on feature vectors than sparse JL with $s = 1$; we also empirically demonstrate this finding. |
Tasks | |
Published | 2019-03-08 |
URL | https://arxiv.org/abs/1903.03605v2 |
https://arxiv.org/pdf/1903.03605v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-sparse-jl-for-feature-hashing |
Repo | |
Framework | |
Image Detection and Digit Recognition to solve Sudoku as a Constraint Satisfaction Problem
Title | Image Detection and Digit Recognition to solve Sudoku as a Constraint Satisfaction Problem |
Authors | Aditya Narayanaswamy, Yichuan Philip Ma, Piyush Shrivastava |
Abstract | Sudoku is a puzzle well-known to the scientific community with simple rules of completion, which may require a com-plex line of reasoning. This paper addresses the problem of partitioning the Sudoku image into a 1-D array, recognizing digits from the array and representing it as a Constraint Sat-isfaction Problem (CSP). In this paper, we introduce new fea-ture extraction techniques for recognizing digits, which are used with our benchmark classifiers in conjunction with the CSP algorithms to provide performance assessment. Experi-mental results show that application of CSP techniques can decrease the solution’s search time by eliminating incon-sistent values from the search space. |
Tasks | |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10701v1 |
https://arxiv.org/pdf/1905.10701v1.pdf | |
PWC | https://paperswithcode.com/paper/image-detection-and-digit-recognition-to |
Repo | |
Framework | |
Neural Integration of Continuous Dynamics
Title | Neural Integration of Continuous Dynamics |
Authors | Margaret Trautner, Sai Ravela |
Abstract | Neural dynamical systems are dynamical systems that are described at least in part by neural networks. The class of continuous-time neural dynamical systems must, however, be numerically integrated for simulation and learning. Here, we present a compact neural circuit for two common numerical integrators: the explicit fixed-step Runge-Kutta method of any order and the semi-implicit/predictor-corrector Adams-Bashforth-Moulton method. Modeled as constant-sized recurrent networks embedding a continuous neural differential equation, they achieve fully neural temporal output. Using the polynomial class of dynamical systems, we demonstrate the equivalence of neural and numerical integration. |
Tasks | |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10309v1 |
https://arxiv.org/pdf/1911.10309v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-integration-of-continuous-dynamics |
Repo | |
Framework | |
Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)
Title | Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version) |
Authors | Gregory Plumb, Maruan Al-Shedivat, Eric Xing, Ameet Talwalkar |
Abstract | Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, which lack guarantees about their explanation quality. We propose an alternative to these approaches by directly regularizing a black-box model for interpretability at training time. Our approach explicitly connects three key aspects of interpretable machine learning: (i) the model’s innate explainability, (ii) the explanation system used at test time, and (iii) the metrics that measure explanation quality. Our regularization results in substantial improvement in terms of the explanation fidelity and stability metrics across a range of datasets and black-box explanation systems while slightly improving accuracy. Further, if the resulting model is still not sufficiently interpretable, the weight of the regularization term can be adjusted to achieve the desired trade-off between accuracy and interpretability. Finally, we justify theoretically that the benefits of explanation-based regularization generalize to unseen points. |
Tasks | Interpretable Machine Learning |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.01431v1 |
https://arxiv.org/pdf/1906.01431v1.pdf | |
PWC | https://paperswithcode.com/paper/regularizing-black-box-models-for-improved-1 |
Repo | |
Framework | |
Applications of a Novel Knowledge Discovery and Data Mining Process Model for Metabolomics
Title | Applications of a Novel Knowledge Discovery and Data Mining Process Model for Metabolomics |
Authors | Ahmed BaniMustafa, Nigel Hardy |
Abstract | This work demonstrates the execution of a novel process model for knowledge discovery and data mining for metabolomics (MeKDDaM). It aims to illustrate MeKDDaM process model applicability using four different real-world applications and to highlight its strengths and unique features. The demonstrated applications provide coverage for metabolite profiling, target analysis, and metabolic fingerprinting. The data analysed in these applications were captured by chromatographic separation and mass spectrometry technique (LC-MS), Fourier transform infrared spectroscopy (FT-IR), and nuclear magnetic resonance spectroscopy (NMR) and involve the analysis of plant, animal, and human samples. The process was executed using both data-driven and hypothesis-driven data mining approaches in order to perform various data mining goals and tasks by applying a number of data mining techniques. The applications were selected to achieve a range of analytical goals and research questions and to provide coverage for metabolite profiling, target analysis, and metabolic fingerprinting using datasets that were captured by NMR, LC-MS, and FT-IR using samples of a plant, animal, and human origin. The process was applied using an implementation environment which was created in order to provide a computer-aided realisation of the process model execution. |
Tasks | |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.03755v2 |
https://arxiv.org/pdf/1907.03755v2.pdf | |
PWC | https://paperswithcode.com/paper/applications-of-a-novel-knowledge-discovery |
Repo | |
Framework | |
Interactive Lungs Auscultation with Reinforcement Learning Agent
Title | Interactive Lungs Auscultation with Reinforcement Learning Agent |
Authors | Tomasz Grzywalski, Riccardo Belluzzo, Szymon Drgas, Agnieszka Cwalinska, Honorata Hafke-Dys |
Abstract | To perform a precise auscultation for the purposes of examination of respiratory system normally requires the presence of an experienced doctor. With most recent advances in machine learning and artificial intelligence, automatic detection of pathological breath phenomena in sounds recorded with stethoscope becomes a reality. But to perform a full auscultation in home environment by layman is another matter, especially if the patient is a child. In this paper we propose a unique application of Reinforcement Learning for training an agent that interactively guides the end user throughout the auscultation procedure. We show that \textit{intelligent} selection of auscultation points by the agent reduces time of the examination fourfold without significant decrease in diagnosis accuracy compared to exhaustive auscultation. |
Tasks | |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.11238v1 |
https://arxiv.org/pdf/1907.11238v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-lungs-auscultation-with |
Repo | |
Framework | |
Counterfactual Visual Explanations
Title | Counterfactual Visual Explanations |
Authors | Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee |
Abstract | In this work, we develop a technique to produce counterfactual visual explanations. Given a ‘query’ image $I$ for which a vision system predicts class $c$, a counterfactual visual explanation identifies how $I$ could change such that the system would output a different specified class $c'$. To do this, we select a ‘distractor’ image $I'$ that the system predicts as class $c'$ and identify spatial regions in $I$ and $I'$ such that replacing the identified region in $I$ with the identified region in $I'$ would push the system towards classifying $I$ as $c'$. We apply our approach to multiple image classification datasets generating qualitative results showcasing the interpretability and discriminativeness of our counterfactual explanations. To explore the effectiveness of our explanations in teaching humans, we present machine teaching experiments for the task of fine-grained bird classification. We find that users trained to distinguish bird species fare better when given access to counterfactual explanations in addition to training examples. |
Tasks | Image Classification |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07451v2 |
https://arxiv.org/pdf/1904.07451v2.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-visual-explanations |
Repo | |
Framework | |
Learning Word Embeddings with Domain Awareness
Title | Learning Word Embeddings with Domain Awareness |
Authors | Guoyin Wang, Yan Song, Yue Zhang, Dong Yu |
Abstract | Word embeddings are traditionally trained on a large corpus in an unsupervised setting, with no specific design for incorporating domain knowledge. This can lead to unsatisfactory performances when training data originate from heterogeneous domains. In this paper, we propose two novel mechanisms for domain-aware word embedding training, namely domain indicator and domain attention, which integrate domain-specific knowledge into the widely used SG and CBOW models, respectively. The two methods are based on a joint learning paradigm and ensure that words in a target domain are intensively focused when trained on a source domain corpus. Qualitative and quantitative evaluation confirm the validity and effectiveness of our models. Compared to baseline methods, our method is particularly effective in near-cold-start scenarios. |
Tasks | Learning Word Embeddings, Word Embeddings |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03249v3 |
https://arxiv.org/pdf/1906.03249v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-word-embeddings-with-domain |
Repo | |
Framework | |
Hybrid Neural Tagging Model for Open Relation Extraction
Title | Hybrid Neural Tagging Model for Open Relation Extraction |
Authors | Shengbin Jia, Yang Xiang |
Abstract | Open relation extraction (ORE) remains a challenge to obtain a semantic representation by discovering arbitrary relation tuples from the unstructured text. Conventional methods heavily depend on feature engineering or syntactic parsing, they are inefficient or error-cascading. Recently, leveraging supervised deep learning structures to address the ORE task is an extraordinarily promising way. However, there are two main challenges: (1) The lack of enough labeled corpus to support supervised training; (2) The exploration of specific neural architecture that adapts to the characteristics of open relation extracting. In this paper, to overcome these difficulties, we build a large-scale, high-quality training corpus in a fully automated way, and design a tagging scheme to assist in transforming the ORE task into a sequence tagging processing. Furthermore, we propose a hybrid neural network model (HNN4ORT) for open relation tagging. The model employs the Ordered Neurons LSTM to encode potential syntactic information for capturing the associations among the arguments and relations. It also emerges a novel Dual Aware Mechanism, including Local-aware Attention and Global-aware Convolution. The dual aware nesses complement each other so that the model can take the sentence-level semantics as a global perspective, and at the same time implement salient local features to achieve sparse annotation. Experimental results on various testing sets show that our model can achieve state-of-the-art performances compared to the conventional methods or other neural models. |
Tasks | Feature Engineering, Relation Extraction |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1908.01761v3 |
https://arxiv.org/pdf/1908.01761v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-open-relation-extraction-via-an |
Repo | |
Framework | |
Training Data Distribution Search with Ensemble Active Learning
Title | Training Data Distribution Search with Ensemble Active Learning |
Authors | Kashyap Chitta, Jose M. Alvarez, Elmar Haussmann, Clement Farabet |
Abstract | Deep Neural Networks (DNNs) often rely on very large datasets for training. Given the large size of such datasets, it is conceivable that they contain certain samples that either do not contribute or negatively impact the DNN’s optimization. Modifying the training distribution in a way that excludes such samples could provide an effective solution to both improve performance and reduce training time. In this paper, we propose to scale up ensemble Active Learning methods to perform acquisition at a large scale (10k to 500k samples at a time). We do this with ensembles of hundreds of models, obtained at a minimal computational cost by reusing intermediate training checkpoints. This allows us to automatically and efficiently perform a training data distribution search for large labeled datasets. We observe that our approach obtains favorable subsets of training data, which can be used to train more accurate DNNs than training with the entire dataset. We perform an extensive experimental study of this phenomenon on three image classification benchmarks (CIFAR-10, CIFAR-100 and ImageNet), analyzing the impact of initialization schemes, acquisition functions and ensemble configurations. We demonstrate that data subsets identified with a lightweight ResNet-18 ensemble remain effective when used to train deep models like ResNet-101 and DenseNet-121. Our results provide strong empirical evidence that optimizing the training data distribution can provide significant benefits on large scale vision tasks. |
Tasks | Active Learning, Image Classification |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12737v2 |
https://arxiv.org/pdf/1905.12737v2.pdf | |
PWC | https://paperswithcode.com/paper/less-is-more-an-exploration-of-data |
Repo | |
Framework | |
Interactive Video Retrieval with Dialog
Title | Interactive Video Retrieval with Dialog |
Authors | Sho Maeoki, Kohei Uehara, Tatsuya Harada |
Abstract | Now that everyone can easily record videos, the quantity of which is continuously increasing, research on methods for improved video retrieval is important in the contemporary world. In cases where target videos are to be identified within a large collection gathered by individuals, the appropriate information must be obtained to retrieve the correct video within a large number of similar items in the target database. The purpose of this research is to retrieve target videos in such cases by introducing an interaction, or a dialog, between the system and the user. We propose a system to retrieve videos by asking questions about the content of the videos and leveraging the user’s responses to the questions. Additionally, we confirmed the usefulness of the proposed system through experiments using the dataset called AVSD which includes videos and dialogs about the videos. |
Tasks | Video Retrieval |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02442v1 |
https://arxiv.org/pdf/1905.02442v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-video-retrieval-with-dialog |
Repo | |
Framework | |
Experimental quantum homodyne tomography via machine learning
Title | Experimental quantum homodyne tomography via machine learning |
Authors | E. S. Tiunov, V. V. Tiunova, A. E. Ulanov, A. I. Lvovsky, A. K. Fedorov |
Abstract | Complete characterization of states and processes that occur within quantum devices is crucial for understanding and testing their potential to outperform classical technologies for communications and computing. However, solving this task with current state-of-the-art techniques becomes unwieldy for large and complex quantum systems. Here we realize and experimentally demonstrate a method for complete characterization of a quantum harmonic oscillator based on an artificial neural network known as the restricted Boltzmann machine. We apply the method to optical homodyne tomography and show it to allow full estimation of quantum states based on a smaller amount of experimental data compared to state-of-the-art methods. We link this advantage to reduced overfitting. Although our experiment is in the optical domain, our method provides a way of exploring quantum resources in a broad class of large-scale physical systems, such as superconducting circuits, atomic and molecular ensembles, and optomechanical systems. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06589v2 |
https://arxiv.org/pdf/1907.06589v2.pdf | |
PWC | https://paperswithcode.com/paper/experimental-machine-learning-quantum |
Repo | |
Framework | |
Extracting and Learning a Dependency-Enhanced Type Lexicon for Dutch
Title | Extracting and Learning a Dependency-Enhanced Type Lexicon for Dutch |
Authors | Konstantinos Kogkalidis |
Abstract | This thesis is concerned with type-logical grammars and their practical applicability as tools of reasoning about sentence syntax and semantics. The focal point is narrowed to Dutch, a language exhibiting a large degree of word order variability. In order to overcome difficulties arising as a result of that variability, the thesis explores and expands upon a type grammar based on Multiplicative Intuitionistic Linear Logic, agnostic to word order but enriched with decorations that aim to reduce its proof-theoretic complexity. An algorithm for the conversion of dependency-annotated sentences into type sequences is then implemented, populating the type logic with concrete, data-driven lexical types. Two experiments are ran on the resulting grammar instantiation. The first pertains to the learnability of the type-assignment process by a neural architecture. A novel application of a self-attentive sequence transduction model is proposed; contrary to established practices, it constructs types inductively by internalizing the type-formation syntax, thus exhibiting generalizability beyond a pre-specified type vocabulary. The second revolves around a deductive parsing system that can resolve structural ambiguities by consulting both word and type information; preliminary results suggest both excellent computational efficiency and performance. |
Tasks | |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02955v2 |
https://arxiv.org/pdf/1909.02955v2.pdf | |
PWC | https://paperswithcode.com/paper/extracting-and-learning-a-dependency-enhanced |
Repo | |
Framework | |