January 25, 2020

3129 words 15 mins read

Paper Group ANR 1688

Paper Group ANR 1688

PDC – a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed. Context-Aware Learning for Neural Machine Translation. From Brain Imaging to Graph Analysis: a study on ADNI’s patient cohort. Adaptive Collaborative Similarity Learning for Unsupervised Multi-view Feature Selection. Is Deeper Better only when Sh …

PDC – a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed

Title PDC – a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed
Authors Rezarta Islamaj, Lana Yeganova, Won Kim, Natalie Xie, W. John Wilbur
Abstract The need to organize a large collection in a manner that facilitates human comprehension is crucial given the ever-increasing volumes of information. In this work, we present PDC (probabilistic distributional clustering), a novel algorithm that, given a document collection, computes disjoint term sets representing topics in the collection. The algorithm relies on probabilities of word co-occurrences to partition the set of terms appearing in the collection of documents into disjoint groups of related terms. In this work, we also present an environment to visualize the computed topics in the term space and retrieve the most related PubMed articles for each group of terms. We illustrate the algorithm by applying it to PubMed documents on the topic of suicide. Suicide is a major public health problem identified as the tenth leading cause of death in the US. In this application, our goal is to provide a global view of the mental health literature pertaining to the subject of suicide, and through this, to help create a rich environment of multifaceted data to guide health care researchers in their endeavor to better understand the breadth, depth and scope of the problem. We demonstrate the usefulness of the proposed algorithm by providing a web portal that allows mental health researchers to peruse the suicide-related literature in PubMed.
Tasks
Published 2019-12-04
URL https://arxiv.org/abs/1912.02077v1
PDF https://arxiv.org/pdf/1912.02077v1.pdf
PWC https://paperswithcode.com/paper/pdc-a-probabilistic-distributional-clustering
Repo
Framework

Context-Aware Learning for Neural Machine Translation

Title Context-Aware Learning for Neural Machine Translation
Authors Sébastien Jean, Kyunghyun Cho
Abstract Interest in larger-context neural machine translation, including document-level and multi-modal translation, has been growing. Multiple works have proposed new network architectures or evaluation schemes, but potentially helpful context is still sometimes ignored by larger-context translation models. In this paper, we propose a novel learning algorithm that explicitly encourages a neural translation model to take into account additional context using a multilevel pair-wise ranking loss. We evaluate the proposed learning algorithm with a transformer-based larger-context translation system on document-level translation. By comparing performance using actual and random contexts, we show that a model trained with the proposed algorithm is more sensitive to the additional context.
Tasks Machine Translation
Published 2019-03-12
URL http://arxiv.org/abs/1903.04715v1
PDF http://arxiv.org/pdf/1903.04715v1.pdf
PWC https://paperswithcode.com/paper/context-aware-learning-for-neural-machine
Repo
Framework

From Brain Imaging to Graph Analysis: a study on ADNI’s patient cohort

Title From Brain Imaging to Graph Analysis: a study on ADNI’s patient cohort
Authors Rui Zhang, Luca Giancardo, Danilo A. Pena, Yejin Kim, Hanghang Tong, Xiaoqian Jiang
Abstract In this paper, we studied the association between the change of structural brain volumes to the potential development of Alzheimer’s disease (AD). Using a simple abstraction technique, we converted regional cortical and subcortical volume differences over two time points for each study subject into a graph. We then obtained substructures of interest using a graph decomposition algorithm in order to extract pivotal nodes via multi-view feature selection. Intensive experiments using robust classification frameworks were conducted to evaluate the performance of using the brain substructures obtained under different thresholds. The results indicated that compact substructures acquired by examining the differences between patient groups were sufficient to discriminate between AD and healthy controls with an area under the receiver operating curve of 0.72.
Tasks Feature Selection
Published 2019-05-14
URL https://arxiv.org/abs/1905.05861v1
PDF https://arxiv.org/pdf/1905.05861v1.pdf
PWC https://paperswithcode.com/paper/from-brain-imaging-to-graph-analysis-a-study
Repo
Framework

Adaptive Collaborative Similarity Learning for Unsupervised Multi-view Feature Selection

Title Adaptive Collaborative Similarity Learning for Unsupervised Multi-view Feature Selection
Authors Xiao Dong, Lei Zhu, Xuemeng Song, Jingjing Li, Zhiyong Cheng
Abstract In this paper, we investigate the research problem of unsupervised multi-view feature selection. Conventional solutions first simply combine multiple pre-constructed view-specific similarity structures into a collaborative similarity structure, and then perform the subsequent feature selection. These two processes are separate and independent. The collaborative similarity structure remains fixed during feature selection. Further, the simple undirected view combination may adversely reduce the reliability of the ultimate similarity structure for feature selection, as the view-specific similarity structures generally involve noises and outlying entries. To alleviate these problems, we propose an adaptive collaborative similarity learning (ACSL) for multi-view feature selection. We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework. Moreover, a reasonable rank constraint is devised to adaptively learn an ideal collaborative similarity structure with proper similarity combination weights and desirable neighbor assignment, both of which could positively facilitate the feature selection. An effective solution guaranteed with the proved convergence is derived to iteratively tackle the formulated optimization problem. Experiments demonstrate the superiority of the proposed approach.
Tasks Feature Selection
Published 2019-04-25
URL http://arxiv.org/abs/1904.11228v1
PDF http://arxiv.org/pdf/1904.11228v1.pdf
PWC https://paperswithcode.com/paper/adaptive-collaborative-similarity-learning
Repo
Framework

Is Deeper Better only when Shallow is Good?

Title Is Deeper Better only when Shallow is Good?
Authors Eran Malach, Shai Shalev-Shwartz
Abstract Understanding the power of depth in feed-forward neural networks is an ongoing challenge in the field of deep learning theory. While current works account for the importance of depth for the expressive power of neural-networks, it remains an open question whether these benefits are exploited during a gradient-based optimization process. In this work we explore the relation between expressivity properties of deep networks and the ability to train them efficiently using gradient-based algorithms. We give a depth separation argument for distributions with fractal structure, showing that they can be expressed efficiently by deep networks, but not with shallow ones. These distributions have a natural coarse-to-fine structure, and we show that the balance between the coarse and fine details has a crucial effect on whether the optimization process is likely to succeed. We prove that when the distribution is concentrated on the fine details, gradient-based algorithms are likely to fail. Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be well approximated by shallower networks, and we conjecture that this property holds in general.
Tasks
Published 2019-03-08
URL http://arxiv.org/abs/1903.03488v1
PDF http://arxiv.org/pdf/1903.03488v1.pdf
PWC https://paperswithcode.com/paper/is-deeper-better-only-when-shallow-is-good
Repo
Framework

A Compare-Aggregate Model with Latent Clustering for Answer Selection

Title A Compare-Aggregate Model with Latent Clustering for Answer Selection
Authors Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung
Abstract In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TREC-QA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance for both datasets.
Tasks Answer Selection, Language Modelling, Question Answering, Transfer Learning
Published 2019-05-30
URL https://arxiv.org/abs/1905.12897v2
PDF https://arxiv.org/pdf/1905.12897v2.pdf
PWC https://paperswithcode.com/paper/a-compare-aggregate-model-with-latent
Repo
Framework

Analysis of Regression Tree Fitting Algorithms in Learning to Rank

Title Analysis of Regression Tree Fitting Algorithms in Learning to Rank
Authors Tian Xia, Shaodan Zhai, Shaojun Wang
Abstract In learning to rank area, industry-level applications have been dominated by gradient boosting framework, which fits a tree using least square error principle. While in classification area, another tree fitting principle, weighted least square error, has been widely used, such as LogitBoost and its variants. However, there is a lack of analysis on the relationship between the two principles in the scenario of learning to rank. We propose a new principle named least objective loss based error that enables us to analyze the issue above as well as several important learning to rank models. We also implement two typical and strong systems and conduct our experiments in two real-world datasets. Experimental results show that our proposed method brings moderate improvements over least square error principle.
Tasks Learning-To-Rank
Published 2019-09-12
URL https://arxiv.org/abs/1909.05965v1
PDF https://arxiv.org/pdf/1909.05965v1.pdf
PWC https://paperswithcode.com/paper/analysis-of-regression-tree-fitting
Repo
Framework

Uncertain Natural Language Inference

Title Uncertain Natural Language Inference
Authors Tongfei Chen, Zhengping Jiang, Keisuke Sakaguchi, Benjamin Van Durme
Abstract We propose a refinement of Natural Language Inference (NLI), called Uncertain Natural Language Inference (UNLI), that shifts away from categorical labels, targeting instead the direct prediction of subjective probability assessments. Chiefly, we demonstrate the feasibility of collecting annotations for UNLI by relabeling a portion of the SNLI dataset under a psychologically motivated probabilistic scale, where items even with the same categorical label, e.g., “contradictions” differ in how likely people judge them to be strictly impossible given a premise. We describe two modeling approaches, as direct scalar regression and as learning-to-rank, finding that existing categorically labeled NLI data can be used in pre-training. Our best models correlate well with humans, demonstrating models are capable of more subtle inferences than the ternary bin assignment employed in current NLI tasks.
Tasks Learning-To-Rank, Natural Language Inference
Published 2019-09-06
URL https://arxiv.org/abs/1909.03042v1
PDF https://arxiv.org/pdf/1909.03042v1.pdf
PWC https://paperswithcode.com/paper/uncertain-natural-language-inference
Repo
Framework

Sitatapatra: Blocking the Transfer of Adversarial Samples

Title Sitatapatra: Blocking the Transfer of Adversarial Samples
Authors Ilia Shumailov, Xitong Gao, Yiren Zhao, Robert Mullins, Ross Anderson, Cheng-Zhong Xu
Abstract Convolutional Neural Networks (CNNs) are widely used to solve classification tasks in computer vision. However, they can be tricked into misclassifying specially crafted `adversarial’ samples – and samples built to trick one model often work alarmingly well against other models trained on the same task. In this paper we introduce Sitatapatra, a system designed to block the transfer of adversarial samples. It diversifies neural networks using a key, as in cryptography, and provides a mechanism for detecting attacks. What’s more, when adversarial samples are detected they can typically be traced back to the individual device that was used to develop them. The run-time overheads are minimal permitting the use of Sitatapatra on constrained systems. |
Tasks
Published 2019-01-23
URL https://arxiv.org/abs/1901.08121v2
PDF https://arxiv.org/pdf/1901.08121v2.pdf
PWC https://paperswithcode.com/paper/sitatapatra-blocking-the-transfer-of
Repo
Framework

An empirical study of the relation between network architecture and complexity

Title An empirical study of the relation between network architecture and complexity
Authors Emir Konuk, Kevin Smith
Abstract In this preregistration submission, we propose an empirical study of how networks handle changes in complexity of the data. We investigate the effect of network capacity on generalization performance in the face of increasing data complexity. For this, we measure the generalization error for an image classification task where the number of classes steadily increases. We compare a number of modern architectures at different scales in this setting. The methodology, setup, and hypotheses described in this proposal were evaluated by peer review before experiments were conducted.
Tasks Image Classification
Published 2019-11-11
URL https://arxiv.org/abs/1911.04120v1
PDF https://arxiv.org/pdf/1911.04120v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-study-of-the-relation-between
Repo
Framework

DARTS+: Improved Differentiable Architecture Search with Early Stopping

Title DARTS+: Improved Differentiable Architecture Search with Early Stopping
Authors Hanwen Liang, Shifeng Zhang, Jiacheng Sun, Xingqiu He, Weiran Huang, Kechen Zhuang, Zhenguo Li
Abstract Recently, there has been a growing interest in automating the process of neural architecture design, and the Differentiable Architecture Search (DARTS) method makes the process available within a few GPU days. In particular, a hyper-network called one-shot model is introduced, over which the architecture can be searched continuously with gradient descent. However, the performance of DARTS is often observed to collapse when the number of search epochs becomes large. Meanwhile, lots of “skip-connects” are found in the selected architectures. In this paper, we claim that the cause of the collapse is that there exist cooperation and competition in the bi-level optimization in DARTS, where the architecture parameters and model weights are updated alternatively. Therefore, we propose a simple and effective algorithm, named “DARTS+”, to avoid the collapse and improve the original DARTS, by “early stopping” the search procedure when meeting a certain criterion. We demonstrate that the proposed early stopping criterion is effective in avoiding the collapse issue. We also conduct experiments on benchmark datasets and show the effectiveness of our DARTS+ algorithm, where DARTS+ achieves $2.32%$ test error on CIFAR10, $14.87%$ on CIFAR100, and $23.7%$ on ImageNet. We further remark that the idea of “early stopping” is implicitly included in some existing DARTS variants by manually setting a small number of search epochs, while we give an explicit criterion for “early stopping”.
Tasks
Published 2019-09-13
URL https://arxiv.org/abs/1909.06035v1
PDF https://arxiv.org/pdf/1909.06035v1.pdf
PWC https://paperswithcode.com/paper/darts-improved-differentiable-architecture
Repo
Framework

Robust computation with rhythmic spike patterns

Title Robust computation with rhythmic spike patterns
Authors E. Paxon Frady, Friedrich T. Sommer
Abstract Information coding by precise timing of spikes can be faster and more energy-efficient than traditional rate coding. However, spike-timing codes are often brittle, which has limited their use in theoretical neuroscience and computing applications. Here, we propose a novel type of attractor neural network in complex state space, and show how it can be leveraged to construct spiking neural networks with robust computational properties through a phase-to-timing mapping. Building on Hebbian neural associative memories, like Hopfield networks, we first propose threshold phasor associative memory (TPAM) networks. Complex phasor patterns whose components can assume continuous-valued phase angles and binary magnitudes can be stored and retrieved as stable fixed points in the network dynamics. TPAM achieves high memory capacity when storing sparse phasor patterns, and we derive the energy function that governs its fixed point attractor dynamics. Second, through simulation experiments we show how the complex algebraic computations in TPAM can be approximated by a biologically plausible network of integrate-and-fire neurons with synaptic delays and recurrently connected inhibitory interneurons. The fixed points of TPAM in the complex domain are commensurate with stable periodic states of precisely timed spiking activity that are robust to perturbation. The link established between rhythmic firing patterns and complex attractor dynamics has implications for the interpretation of spike patterns seen in neuroscience, and can serve as a framework for computation in emerging neuromorphic devices.
Tasks
Published 2019-01-23
URL http://arxiv.org/abs/1901.07718v1
PDF http://arxiv.org/pdf/1901.07718v1.pdf
PWC https://paperswithcode.com/paper/robust-computation-with-rhythmic-spike
Repo
Framework
Title Influence of Neighborhood on the Preference of an Item in eCommerce Search
Authors Saratchandra Indrakanti, Svetlana Strunjas, Shubhangi Tandon, Manojkumar Rangasamy Kannadasan
Abstract Surfacing a ranked list of items for a search query to help buyers discover inventory and make purchase decisions is a critical problem in eCommerce search. Typically, items are independently predicted with a probability of sale with respect to a given search query. But in a dynamic marketplace like eBay, even for a single product, there are various different factors distinguishing one item from another which can influence the purchase decision for the user. Users have to make a purchase decision by considering all of these options. Majority of the existing learning to rank algorithms model the relative relevance between labeled items only at the loss functions like pairwise or list-wise losses. But they are limited to point-wise scoring functions where items are ranked independently based on the features of the item itself. In this paper, we study the influence of an item’s neighborhood to its purchase decision. Here, we consider the neighborhood as the items ranked above and below the current item in search results. By adding delta features comparing items within a neighborhood and learning a ranking model, we are able to experimentally show that the new ranker with delta features outperforms our baseline ranker in terms of Mean Reciprocal Rank (MRR). The ranking models with proposed delta features result in $3-5%$ improvement in MRR over the baseline model. We also study impact of different sizes for neighborhood. Experimental results show that neighborhood size $3$ perform the best based on MRR with an improvement of $4-5%$ over the baseline model.
Tasks Learning-To-Rank
Published 2019-08-10
URL https://arxiv.org/abs/1908.03825v2
PDF https://arxiv.org/pdf/1908.03825v2.pdf
PWC https://paperswithcode.com/paper/exploring-the-effect-of-an-items-neighborhood
Repo
Framework

Conquering the CNN Over-Parameterization Dilemma: A Volterra Filtering Approach for Action Recognition

Title Conquering the CNN Over-Parameterization Dilemma: A Volterra Filtering Approach for Action Recognition
Authors Siddharth Roheda, Hamid Krim
Abstract The importance of inference in Machine Learning (ML) has led to an explosive number of different proposals in ML, and particularly in Deep Learning. In an attempt to reduce the complexity of Convolutional Neural Networks, we propose a Volterra filter-inspired Network architecture. This architecture introduces controlled non-linearities in the form of interactions between the delayed input samples of data. We propose a cascaded implementation of Volterra Filtering so as to significantly reduce the number of parameters required to carry out the same classification task as that of a conventional Neural Network. We demonstrate an efficient parallel implementation of this Volterra Neural Network (VNN), along with its remarkable performance while retaining a relatively simpler and potentially more tractable structure. Furthermore, we show a rather sophisticated adaptation of this network to nonlinearly fuse the RGB (spatial) information and the Optical Flow (temporal) information of a video sequence for action recognition. The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.
Tasks Optical Flow Estimation
Published 2019-10-21
URL https://arxiv.org/abs/1910.09616v2
PDF https://arxiv.org/pdf/1910.09616v2.pdf
PWC https://paperswithcode.com/paper/conquering-the-cnn-over-parameterization
Repo
Framework

Adapting Language Models for Non-Parallel Author-Stylized Rewriting

Title Adapting Language Models for Non-Parallel Author-Stylized Rewriting
Authors Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Anandhavelu N, Vasudeva Varma
Abstract Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author’s style. Our proposed approach adapts a pre-trained language model to generate author-stylized text by fine-tuning on the author-specific corpus using a denoising autoencoder (DAE) loss in a cascaded encoder-decoder framework. Optimizing over DAE loss allows our model to learn the nuances of an author’s style without relying on parallel data, which has been a severe limitation of the previous related works in this space. To evaluate the efficacy of our approach, we propose a linguistically-motivated framework to quantify stylistic alignment of the generated text to the target author at lexical, syntactic and surface levels. The evaluation framework is both interpretable as it leads to several insights about the model, and self-contained as it does not rely on external classifiers, e.g. sentiment or formality classifiers. Qualitative and quantitative assessment indicates that the proposed approach rewrites the input text with better alignment to the target style while preserving the original content better than state-of-the-art baselines.
Tasks Denoising, Language Modelling
Published 2019-09-22
URL https://arxiv.org/abs/1909.09962v2
PDF https://arxiv.org/pdf/1909.09962v2.pdf
PWC https://paperswithcode.com/paper/190909962
Repo
Framework
comments powered by Disqus