January 25, 2020

3381 words 16 mins read

Paper Group NAWR 41

Paper Group NAWR 41

Deep imitation learning for molecular inverse problems. Eliciting Knowledge from Experts: Automatic Transcript Parsing for Cognitive Task Analysis. Categorized Bandits. Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights. Multiattentive Recurrent Neural Network Architecture for Multilingual Readability Assessment. Co …

Deep imitation learning for molecular inverse problems

Title Deep imitation learning for molecular inverse problems
Authors Eric Jonas
Abstract Many measurement modalities arise from well-understood physical processes and result in information-rich but difficult-to-interpret data. Much of this data still requires laborious human interpretation. This is the case in nuclear magnetic resonance (NMR) spectroscopy, where the observed spectrum of a molecule provides a distinguishing fingerprint of its bond structure. Here we solve the resulting inverse problem: given a molecular formula and a spectrum, can we infer the chemical structure? We show for a wide variety of molecules we can quickly compute the correct molecular structure, and can detect with reasonable certainty when our method cannot. We treat this as a problem of graph-structured prediction, where armed with per-vertex information on a subset of the vertices, we infer the edges and edge types. We frame the problem as a Markov decision process (MDP) and incrementally construct molecules one bond at a time, training a deep neural network via imitation learning, where we learn to imitate a subisomorphic oracle which knows which remaining bonds are correct. Our method is fast, accurate, and is the first among recent chemical-graph generation approaches to exploit per-vertex information and generate graphs with vertex constraints. Our method points the way towards automation of molecular structure identification and potentially active learning for spectroscopy.
Tasks Active Learning, Graph Generation, Imitation Learning, Structured Prediction
Published 2019-12-01
URL http://papers.nips.cc/paper/8744-deep-imitation-learning-for-molecular-inverse-problems
PDF http://papers.nips.cc/paper/8744-deep-imitation-learning-for-molecular-inverse-problems.pdf
PWC https://paperswithcode.com/paper/deep-imitation-learning-for-molecular-inverse
Repo https://github.com/thejonaslab/2019-NeurIPS-molecular-inverse-problems
Framework none

Eliciting Knowledge from Experts: Automatic Transcript Parsing for Cognitive Task Analysis

Title Eliciting Knowledge from Experts: Automatic Transcript Parsing for Cognitive Task Analysis
Authors Junyi Du, He Jiang, Jiaming Shen, Xiang Ren
Abstract Cognitive task analysis (CTA) is a type of analysis in applied psychology aimed at eliciting and representing the knowledge and thought processes of domain experts. In CTA, often heavy human labor is involved to parse the interview transcript into structured knowledge (e.g., flowchart for different actions). To reduce human efforts and scale the process, automated CTA transcript parsing is desirable. However, this task has unique challenges as (1) it requires the understanding of long-range context information in conversational text; and (2) the amount of labeled data is limited and indirect{—}i.e., context-aware, noisy, and low-resource. In this paper, we propose a weakly-supervised information extraction framework for automated CTA transcript parsing. We partition the parsing process into a sequence labeling task and a text span-pair relation extraction task, with distant supervision from human-curated protocol files. To model long-range context information for extracting sentence relations, neighbor sentences are involved as a part of input. Different types of models for capturing context dependency are then applied. We manually annotate real-world CTA transcripts to facilitate the evaluation of the parsing tasks.
Tasks Relation Extraction
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1420/
PDF https://www.aclweb.org/anthology/P19-1420
PWC https://paperswithcode.com/paper/eliciting-knowledge-from-experts-automatic
Repo https://github.com/cnrpman/procedural-extraction
Framework none

Categorized Bandits

Title Categorized Bandits
Authors Matthieu Jedor, Vianney Perchet, Jonathan Louedec
Abstract We introduce a new stochastic multi-armed bandit setting where arms are grouped inside ``ordered’’ categories. The motivating example comes from e-commerce, where a customer typically has a greater appetence for items of a specific well-identified but unknown category than any other one. We introduce three concepts of ordering between categories, inspired by stochastic dominance between random variables, which are gradually weaker so that more and more bandit scenarios satisfy at least one of them. We first prove instance-dependent lower bounds on the cumulative regret for each of these models, indicating how the complexity of the bandit problems increases with the generality of the ordering concept considered. We also provide algorithms that fully leverage the structure of the model with their associated theoretical guarantees. Finally, we have conducted an analysis on real data to highlight that those ordered categories actually exist in practice. |
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9586-categorized-bandits
PDF http://papers.nips.cc/paper/9586-categorized-bandits.pdf
PWC https://paperswithcode.com/paper/categorized-bandits
Repo https://github.com/mjedor/categorized-bandits
Framework none

Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights

Title Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights
Authors Maria Jahja, David Farrow, Roni Rosenfeld, Ryan J. Tibshirani
Abstract The Kalman filter (KF) is one of the most widely used tools for data assimilation and sequential estimation. In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a ``flat prior’') and an augmented measurement space. This reformulation—which we refer to as augmented measurement sensor fusion (SF)—is conceptually interesting, because the transformed system here is seemingly static (as there is effectively no process model), but we can still capture the state dynamics inherent to the KF by folding the process model into the measurement space. Further, this reformulation of the KF turns out to be useful in settings in which past states are observed eventually (at some lag). Here, when the measurement noise covariance is estimated by the empirical covariance, we show that the state predictions from SF are equivalent to those from a regression of past states on past measurements, subject to particular linear constraints (reflecting the relationships encoded in the measurement map). This allows us to port standard ideas (say, regularization methods) in regression over to dynamical systems. For example, we can posit multiple candidate process models, fold all of them into the measurement model, transform to the regression perspective, and apply $\ell_1$ penalization to perform process model selection. We give various empirical demonstrations, and focus on an application to nowcasting the weekly incidence of influenza in the US. |
Tasks Model Selection, Sensor Fusion
Published 2019-12-01
URL http://papers.nips.cc/paper/9475-kalman-filter-sensor-fusion-and-constrained-regression-equivalences-and-insights
PDF http://papers.nips.cc/paper/9475-kalman-filter-sensor-fusion-and-constrained-regression-equivalences-and-insights.pdf
PWC https://paperswithcode.com/paper/kalman-filter-sensor-fusion-and-constrained
Repo https://github.com/mariajahja/kf-sf-flu-nowcasting
Framework none

Multiattentive Recurrent Neural Network Architecture for Multilingual Readability Assessment

Title Multiattentive Recurrent Neural Network Architecture for Multilingual Readability Assessment
Authors Ion Madrazo Azpiazu, Maria Soledad Pera
Abstract We present a multiattentive recurrent neural network architecture for automatic multilingual readability assessment. This architecture considers raw words as its main input, but internally captures text structure and informs its word attention process using other syntax- and morphology-related datapoints, known to be of great importance to readability. This is achieved by a multiattentive strategy that allows the neural network to focus on specific parts of a text for predicting its reading level. We conducted an exhaustive evaluation using data sets targeting multiple languages and prediction task types, to compare the proposed model with traditional, state-of-the-art, and other neural network strategies.
Tasks
Published 2019-03-01
URL https://www.aclweb.org/anthology/Q19-1028/
PDF https://www.aclweb.org/anthology/Q19-1028
PWC https://paperswithcode.com/paper/multiattentive-recurrent-neural-network
Repo https://github.com/ionmadrazo/Vec2Read
Framework pytorch

Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection

Title Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection
Authors Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, Le Zhang
Abstract The large availability of depth sensors provides valuable complementary information for salient object detection (SOD) in RGBD images. However, due to the inherent difference between RGB and depth information, extracting features from the depth channel using ImageNet pre-trained backbone models and fusing them with RGB features directly are sub-optimal. In this paper, we utilize contrast prior, which used to be a dominant cue in none deep learning based SOD approaches, into CNNs-based architecture to enhance the depth information. The enhanced depth cues are further integrated with RGB features for SOD, using a novel fluid pyramid integration, which can make better use of multi-scale cross-modal features. Comprehensive experiments on 5 challenging benchmark datasets demonstrate the superiority of the architecture CPFP over 9 state-of-the-art alternative methods.
Tasks Object Detection, Salient Object Detection
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Zhao_Contrast_Prior_and_Fluid_Pyramid_Integration_for_RGBD_Salient_Object_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Contrast_Prior_and_Fluid_Pyramid_Integration_for_RGBD_Salient_Object_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/contrast-prior-and-fluid-pyramid-integration
Repo https://github.com/JXingZhao/ContrastPrior
Framework none

Stacked Cross Refinement Network for Edge-Aware Salient Object Detection

Title Stacked Cross Refinement Network for Edge-Aware Salient Object Detection
Authors Zhe Wu, Li Su, Qingming Huang
Abstract Salient object detection is a fundamental computer vision task. The majority of existing algorithms focus on aggregating multi-level features of pre-trained convolutional neural networks. Moreover, some researchers attempt to utilize edge information for auxiliary training. However, existing edge-aware models design unidirectional frameworks which only use edge features to improve the segmentation features. Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper. Our framework aims to simultaneously refine multi-level features of salient object detection and edge detection by stacking Cross Refinement Unit (CRU). According to the logical interrelations, the CRU designs two direction-specific integration operations, and bidirectionally passes messages between the two tasks. Incorporating the refined edge-preserving features with the typical U-Net, our model detects salient objects accurately. Extensive experiments conducted on six benchmark datasets demonstrate that our method outperforms existing state-of-the-art algorithms in both accuracy and efficiency. Besides, the attribute-based performance on the SOC dataset show that the proposed model ranks first in the majority of challenging scenes. Code can be found at https://github.com/wuzhe71/SCAN.
Tasks Edge Detection, Object Detection, Salient Object Detection
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Wu_Stacked_Cross_Refinement_Network_for_Edge-Aware_Salient_Object_Detection_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Wu_Stacked_Cross_Refinement_Network_for_Edge-Aware_Salient_Object_Detection_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/stacked-cross-refinement-network-for-edge
Repo https://github.com/wuzhe71/SCAN
Framework pytorch

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Title A Hierarchically-Labeled Portuguese Hate Speech Dataset
Authors Paula Fortuna, Jo{~a}o Rocha da Silva, Juan Soler-Company, Leo Wanner, S{'e}rgio Nunes
Abstract Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning are applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. Firstly, non-experts annotated the tweets with binary labels ({}hate{'} vs. {}no-hate{'}). Secondly, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. This hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.
Tasks Word Embeddings
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3510/
PDF https://www.aclweb.org/anthology/W19-3510
PWC https://paperswithcode.com/paper/a-hierarchically-labeled-portuguese-hate
Repo https://github.com/paulafortuna/Portuguese-Hate-Speech-Dataset
Framework none

Towards a Zero-One Law for Column Subset Selection

Title Towards a Zero-One Law for Column Subset Selection
Authors Zhao Song, David Woodruff, Peilin Zhong
Abstract There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given $n$-by-$n$ matrix $A$, $\min_{\textrm{rank-}k~B}\A-B_1$, or more generally finding a rank-$k$ matrix $B$ which minimizes the sum of $p$-th powers of absolute values of differences, $\min_{\textrm{rank-}k~B}\A-B_p^p$. Many of these algorithms are linear time columns subset selection algorithms, returning a subset of $\poly(k \log n)$ columns whose cost is no more than a $\poly(k)$ factor larger than the cost of the best rank-$k$ matrix. The above error measures are special cases of the following general entrywise low rank approximation problem: given an arbitrary function $g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}$, find a rank-$k$ matrix $B$ which minimizes $\A-B_g = \sum_{i,j}g(A_{i,j}-B_{i,j})$. A natural question is which functions $g$ admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models. In this work we give approximation algorithms for {\it every} function $g$ which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary. Further, our algorithm is efficient if the function $g$ admits an efficient approximate regression algorithm. Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e.g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8844-towards-a-zero-one-law-for-column-subset-selection
PDF http://papers.nips.cc/paper/8844-towards-a-zero-one-law-for-column-subset-selection.pdf
PWC https://paperswithcode.com/paper/towards-a-zero-one-law-for-column-subset
Repo https://github.com/zpl7840/general_loss_column_subset_selection
Framework none

L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise

Title L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise
Authors Yilun Xu, Peng Cao, Yuqing Kong, Yizhou Wang
Abstract Accurately annotating large scale dataset is notoriously expensive both in time and in money. Although acquiring low-quality-annotated dataset can be much cheaper, it often badly damages the performance of trained models when using such dataset without particular treatment. Various methods have been proposed for learning with noisy labels. However, most methods only handle limited kinds of noise patterns, require auxiliary information or steps (e.g., knowing or estimating the noise transition matrix), or lack theoretical justification. In this paper, we propose a novel information-theoretic loss function, L_DMI, for training deep neural networks robust to label noise. The core of L_DMI is a generalized version of mutual information, termed Determinant based Mutual Information (DMI), which is not only information-monotone but also relatively invariant. To the best of our knowledge, L_DMI is the first loss function that is provably robust to instance-independent label noise, regardless of noise pattern, and it can be applied to any existing classification neural networks straightforwardly without any auxiliary information. In addition to theoretical justification, we also empirically show that using L_DMI outperforms all other counterparts in the classification task on both image dataset and natural language dataset include Fashion-MNIST, CIFAR-10, Dogs vs. Cats, MR with a variety of synthesized noise patterns and noise amounts, as well as a real-world dataset Clothing1M.
Tasks Image Classification
Published 2019-12-01
URL http://papers.nips.cc/paper/8853-l_dmi-a-novel-information-theoretic-loss-function-for-training-deep-nets-robust-to-label-noise
PDF http://papers.nips.cc/paper/8853-l_dmi-a-novel-information-theoretic-loss-function-for-training-deep-nets-robust-to-label-noise.pdf
PWC https://paperswithcode.com/paper/l_dmi-a-novel-information-theoretic-loss
Repo https://github.com/Newbeeer/L_DMI
Framework pytorch

Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision

Title Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision
Authors Nuno C. Garcia, Pietro Morerio, Vittorio Murino
Abstract Diverse input data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while a (training) dataset could be accurately designed to include a variety of sensory inputs, it is often the case that not all modalities are available in real life (testing) scenarios, when the model is to be deployed. This raises the challenge of how to learn robust representations leveraging multimodal data in the training stage, while considering limitations at test time, such as noisy or missing modalities. This chapter presents a new approach for multimodal video action recognition, developed within the unified frameworks of distillation and privileged information, named generalized distillation. We consider the particular case of learning representations from depth and RGB videos, while relying on RGB data only at test time. Our approach consists in training a hallucination network that learns to distill depth features through multiplicative connections of spatiotemporal representations, leveraging soft labels and hard labels, and the euclidean distance between feature maps. We report state-of-the-art or comparable results on video action recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the UWA3DII and Northwestern-UCLA.
Tasks Action Recognition In Videos, Multimodal Activity Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published 2019-01-01
URL https://doi.org/10.1016/B978-0-12-817358-9.00018-4
PDF https://www.researchgate.net/publication/334901581_Cross-modal_Learning_by_Hallucinating_Missing_Modalities_in_RGB-D_Vision
PWC https://paperswithcode.com/paper/cross-modal-learning-by-hallucinating-missing
Repo https://github.com/ncgarcia/modality-distillation
Framework tf

Content Differences in Syntactic and Semantic Representation

Title Content Differences in Syntactic and Semantic Representation
Authors Daniel Hershcovich, Omri Abend, Ari Rappoport
Abstract Syntactic analysis plays an important role in semantic parsing, but the nature of this role remains a topic of ongoing debate. The debate has been constrained by the scarcity of empirical comparative studies between syntactic and semantic schemes, which hinders the development of parsing methods informed by the details of target schemes and constructions. We target this gap, and take Universal Dependencies (UD) and UCCA as a test case. After abstracting away from differences of convention or formalism, we find that most content divergences can be ascribed to: (1) UCCA{'}s distinction between a Scene and a non-Scene; (2) UCCA{'}s distinction between primary relations, secondary ones and participants; (3) different treatment of multi-word expressions, and (4) different treatment of inter-clause linkage. We further discuss the long tail of cases where the two schemes take markedly different approaches. Finally, we show that the proposed comparison methodology can be used for fine-grained evaluation of UCCA parsing, highlighting both challenges and potential sources for improvement. The substantial differences between the schemes suggest that semantic parsers are likely to benefit downstream text understanding applications beyond their syntactic counterparts.
Tasks Semantic Parsing
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-1047/
PDF https://www.aclweb.org/anthology/N19-1047
PWC https://paperswithcode.com/paper/content-differences-in-syntactic-and-semantic-1
Repo https://github.com/danielhers/synsem
Framework none

QBSO-FS: A Reinforcement Learning Based Bee Swarm Optimization Metaheuristic for Feature Selection

Title QBSO-FS: A Reinforcement Learning Based Bee Swarm Optimization Metaheuristic for Feature Selection
Authors Souhila Sadeg, Leila Hamdad, Amine Riad Remache, Mehdi Nedjmeddine Karech, Karima Benatchba, Zineb Habbas
Abstract Feature selection is often used before a data mining or a machine learning task in order to build more accurate models. It is considered as a hard optimization problem and metaheuristics give very satisfactory results for such problems. In this work, we propose a hybrid metaheuristic that integrates a reinforcement learning algorithm with Bee Swarm Optimization metaheuristic (BSO) for solving feature selection problem. QBSO-FS follows the wrapper approach. It uses a hybrid version of BSO with Q-learning for generating feature subsets and a classifier to evaluate them. The goal of using Q-learning is to benefit from the advantage of reinforcement learning to make the search process more adaptive and more efficient. The performances of QBSO-FS are evaluated on 20 well-known datasets and the results are compared with those of original BSO and other recently published methods. The results show that QBO-FS outperforms BSO-FS for large instances and gives very satisfactory results compared to recently published algorithms.
Tasks Feature Selection, Multi-agent Reinforcement Learning, Q-Learning
Published 2019-05-16
URL https://link.springer.com/chapter/10.1007%2F978-3-030-20518-8_65
PDF https://sci-hub.tw/https://doi.org/10.1007/978-3-030-20518-8_65
PWC https://paperswithcode.com/paper/qbso-fs-a-reinforcement-learning-based-bee
Repo https://github.com/amineremache/qbso-fs
Framework none

Hubless Nearest Neighbor Search for Bilingual Lexicon Induction

Title Hubless Nearest Neighbor Search for Bilingual Lexicon Induction
Authors Jiaji Huang, Qiang Qiu, Kenneth Church
Abstract Bilingual Lexicon Induction (BLI) is the task of translating words from corpora in two languages. Recent advances in BLI work by aligning the two word embedding spaces. Following that, a key step is to retrieve the nearest neighbor (NN) in the target space given the source word. However, a phenomenon called hubness often degrades the accuracy of NN. Hubness appears as some data points, called hubs, being extra-ordinarily close to many of the other data points. Reducing hubness is necessary for retrieval tasks. One successful example is Inverted SoFtmax (ISF), recently proposed to improve NN. This work proposes a new method, Hubless Nearest Neighbor (HNN), to mitigate hubness. HNN differs from NN by imposing an additional equal preference assumption. Moreover, the HNN formulation explains why ISF works as well as it does. Empirical results demonstrate that HNN outperforms NN, ISF and other state-of-the-art. For reproducibility and follow-ups, we have published all code.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1399/
PDF https://www.aclweb.org/anthology/P19-1399
PWC https://paperswithcode.com/paper/hubless-nearest-neighbor-search-for-bilingual
Repo https://github.com/baidu-research/HNN
Framework none

On Making Stochastic Classifiers Deterministic

Title On Making Stochastic Classifiers Deterministic
Authors Andrew Cotter, Maya Gupta, Harikrishna Narasimhan
Abstract Stochastic classifiers arise in a number of machine learning problems, and have become especially prominent of late, as they often result from constrained optimization problems, e.g. for fairness, churn, or custom losses. Despite their utility, the inherent randomness of stochastic classifiers may cause them to be problematic to use in practice for a variety of practical reasons. In this paper, we attempt to answer the theoretical question of how well a stochastic classifier can be approximated by a deterministic one, and compare several different approaches, proving lower and upper bounds. We also experimentally investigate the pros and cons of these methods, not only in regard to how successfully each deterministic classifier approximates the original stochastic classifier, but also in terms of how well each addresses the other issues that can make stochastic classifiers undesirable.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9273-on-making-stochastic-classifiers-deterministic
PDF http://papers.nips.cc/paper/9273-on-making-stochastic-classifiers-deterministic.pdf
PWC https://paperswithcode.com/paper/on-making-stochastic-classifiers
Repo https://github.com/google-research/google-research
Framework tf
comments powered by Disqus