Paper Group ANR 711
How Sequence-to-Sequence Models Perceive Language Styles?. Deep Learning for Physical-Layer 5G Wireless Techniques: Opportunities, Challenges and Solutions. Soft edit distance for differentiable comparison of symbolic sequences. Subtractive Perceptrons for Learning Images: A Preliminary Report. Fast Training of Sparse Graph Neural Networks on Dense …
How Sequence-to-Sequence Models Perceive Language Styles?
Title | How Sequence-to-Sequence Models Perceive Language Styles? |
Authors | Ruozi Huang, Mi Zhang, Xudong Pan, Beina Sheng |
Abstract | Style is ubiquitous in our daily language uses, while what is language style to learning machines? In this paper, by exploiting the second-order statistics of semantic vectors of different corpora, we present a novel perspective on this question via style matrix, i.e. the covariance matrix of semantic vectors, and explain for the first time how Sequence-to-Sequence models encode style information innately in its semantic vectors. As an application, we devise a learning-free text style transfer algorithm, which explicitly constructs a pair of transfer operators from the style matrices for style transfer. Moreover, our algorithm is also observed to be flexible enough to transfer out-of-domain sentences. Extensive experimental evidence justifies the informativeness of style matrix and the competitive performance of our proposed style transfer algorithm with the state-of-the-art methods. |
Tasks | Style Transfer, Text Style Transfer |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05947v1 |
https://arxiv.org/pdf/1908.05947v1.pdf | |
PWC | https://paperswithcode.com/paper/how-sequence-to-sequence-models-perceive |
Repo | |
Framework | |
Deep Learning for Physical-Layer 5G Wireless Techniques: Opportunities, Challenges and Solutions
Title | Deep Learning for Physical-Layer 5G Wireless Techniques: Opportunities, Challenges and Solutions |
Authors | Hongji Huang, Song Guo, Guan Gui, Zhen Yang, Jianhua Zhang, Hikmet Sari, Fumiyuki Adachi |
Abstract | The new demands for high-reliability and ultra-high capacity wireless communication have led to extensive research into 5G communications. However, the current communication systems, which were designed on the basis of conventional communication theories, signficantly restrict further performance improvements and lead to severe limitations. Recently, the emerging deep learning techniques have been recognized as a promising tool for handling the complicated communication systems, and their potential for optimizing wireless communications has been demonstrated. In this article, we first review the development of deep learning solutions for 5G communication, and then propose efficient schemes for deep learning-based 5G scenarios. Specifically, the key ideas for several important deep learningbased communication methods are presented along with the research opportunities and challenges. In particular, novel communication frameworks of non-orthogonal multiple access (NOMA), massive multiple-input multiple-output (MIMO), and millimeter wave (mmWave) are investigated, and their superior performances are demonstrated. We vision that the appealing deep learning-based wireless physical layer frameworks will bring a new direction in communication theories and that this work will move us forward along this road. |
Tasks | |
Published | 2019-04-21 |
URL | http://arxiv.org/abs/1904.09673v1 |
http://arxiv.org/pdf/1904.09673v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-physical-layer-5g-wireless |
Repo | |
Framework | |
Soft edit distance for differentiable comparison of symbolic sequences
Title | Soft edit distance for differentiable comparison of symbolic sequences |
Authors | Evgenii Ofitserov, Vasily Tsvetkov, Vadim Nazarov |
Abstract | Edit distance, also known as Levenshtein distance, is an essential way to compare two strings that proved to be particularly useful in the analysis of genetic sequences and natural language processing. However, edit distance is a discrete function that is known to be hard to optimize. This fact hampers the use of this metric in Machine Learning. Even as simple algorithm as K-means fails to cluster a set of sequences using edit distance if they are of variable length and abundance. In this paper we propose a novel metric - soft edit distance (SED), which is a smooth approximation of edit distance. It is differentiable and therefore it is possible to optimize it with gradient methods. Similar to original edit distance, SED as well as its derivatives can be calculated with recurrent formulas at polynomial time. We prove usefulness of the proposed metric on synthetic datasets and clustering of biological sequences. |
Tasks | |
Published | 2019-04-29 |
URL | http://arxiv.org/abs/1904.12562v1 |
http://arxiv.org/pdf/1904.12562v1.pdf | |
PWC | https://paperswithcode.com/paper/soft-edit-distance-for-differentiable |
Repo | |
Framework | |
Subtractive Perceptrons for Learning Images: A Preliminary Report
Title | Subtractive Perceptrons for Learning Images: A Preliminary Report |
Authors | H. R. Tizhoosh, Shivam Kalra, Shalev Lifshitz, Morteza Babaie |
Abstract | In recent years, artificial neural networks have achieved tremendous success for many vision-based tasks. However, this success remains within the paradigm of \emph{weak AI} where networks, among others, are specialized for just one given task. The path toward \emph{strong AI}, or Artificial General Intelligence, remains rather obscure. One factor, however, is clear, namely that the feed-forward structure of current networks is not a realistic abstraction of the human brain. In this preliminary work, some ideas are proposed to define a \textit{subtractive Perceptron} (s-Perceptron), a graph-based neural network that delivers a more compact topology to learn one specific task. In this preliminary study, we test the s-Perceptron with the MNIST dataset, a commonly used image archive for digit recognition. The proposed network achieves excellent results compared to the benchmark networks that rely on more complex topologies. |
Tasks | |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.12933v1 |
https://arxiv.org/pdf/1909.12933v1.pdf | |
PWC | https://paperswithcode.com/paper/subtractive-perceptrons-for-learning-images-a |
Repo | |
Framework | |
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Title | Fast Training of Sparse Graph Neural Networks on Dense Hardware |
Authors | Matej Balog, Bart van Merriënboer, Subhodeep Moitra, Yujia Li, Daniel Tarlow |
Abstract | Graph neural networks have become increasingly popular in recent years due to their ability to naturally encode relational input data and their ability to scale to large graphs by operating on a sparse representation of graph adjacency matrices. As we look to scale up these models using custom hardware, a natural assumption would be that we need hardware tailored to sparse operations and/or dynamic control flow. In this work, we question this assumption by scaling up sparse graph neural networks using a platform targeted at dense computation on fixed-size data. Drawing inspiration from optimization of numerical algorithms on sparse matrices, we develop techniques that enable training the sparse graph neural network model from Allamanis et al. [2018] in 13 minutes using a 512-core TPUv2 Pod, whereas the original training takes almost a day. |
Tasks | |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11786v1 |
https://arxiv.org/pdf/1906.11786v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-training-of-sparse-graph-neural-networks |
Repo | |
Framework | |
An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack
Title | An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack |
Authors | Yang Zhang, Shiyu Chang, Mo Yu, Kaizhi Qian |
Abstract | There are two major paradigms of white-box adversarial attacks that attempt to impose input perturbations. The first paradigm, called the fix-perturbation attack, crafts adversarial samples within a given perturbation level. The second paradigm, called the zero-confidence attack, finds the smallest perturbation needed to cause mis-classification, also known as the margin of an input feature. While the former paradigm is well-resolved, the latter is not. Existing zero-confidence attacks either introduce significant ap-proximation errors, or are too time-consuming. We therefore propose MARGINATTACK, a zero-confidence attack framework that is able to compute the margin with improved accuracy and efficiency. Our experiments show that MARGINATTACK is able to compute a smaller margin than the state-of-the-art zero-confidence attacks, and matches the state-of-the-art fix-perturbation at-tacks. In addition, it runs significantly faster than the Carlini-Wagner attack, currently the most ac-curate zero-confidence attack algorithm. |
Tasks | Adversarial Attack |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00511v1 |
https://arxiv.org/pdf/1910.00511v1.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-and-margin-approaching-zero-1 |
Repo | |
Framework | |
Multi-view Locality Low-rank Embedding for Dimension Reduction
Title | Multi-view Locality Low-rank Embedding for Dimension Reduction |
Authors | Lin Feng, Xiangzhu Meng, Huibing Wang |
Abstract | During the last decades, we have witnessed a surge of interests of learning a low-dimensional space with discriminative information from one single view. Even though most of them can achieve satisfactory performance in some certain situations, they fail to fully consider the information from multiple views which are highly relevant but sometimes look different from each other. Besides, correlations between features from multiple views always vary greatly, which challenges multi-view subspace learning. Therefore, how to learn an appropriate subspace which can maintain valuable information from multi-view features is of vital importance but challenging. To tackle this problem, this paper proposes a novel multi-view dimension reduction method named Multi-view Locality Low-rank Embedding for Dimension Reduction (MvL2E). MvL2E makes full use of correlations between multi-view features by adopting low-rank representations. Meanwhile, it aims to maintain the correlations and construct a suitable manifold space to capture the low-dimensional embedding for multi-view features. A centroid based scheme is designed to force multiple views to learn from each other. And an iterative alternating strategy is developed to obtain the optimal solution of MvL2E. The proposed method is evaluated on 5 benchmark datasets. Comprehensive experiments show that our proposed MvL2E can achieve comparable performance with previous approaches proposed in recent literatures. |
Tasks | Dimensionality Reduction |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08138v1 |
https://arxiv.org/pdf/1905.08138v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-locality-low-rank-embedding-for |
Repo | |
Framework | |
Generic Multilayer Network Data Analysis with the Fusion of Content and Structure
Title | Generic Multilayer Network Data Analysis with the Fusion of Content and Structure |
Authors | Xuan-Son Vu, Abhishek Santra, Sharma Chakravarthy, Lili Jiang |
Abstract | Multi-feature data analysis (e.g., on Facebook, LinkedIn) is challenging especially if one wants to do it efficiently and retain the flexibility by choosing features of interest for analysis. Features (e.g., age, gender, relationship, political view etc.) can be explicitly given from datasets, but also can be derived from content (e.g., political view based on Facebook posts). Analysis from multiple perspectives is needed to understand the datasets (or subsets of it) and to infer meaningful knowledge. For example, the influence of age, location, and marital status on political views may need to be inferred separately (or in combination). In this paper, we adapt multilayer network (MLN) analysis, a nontraditional approach, to model the Facebook datasets, integrate content analysis, and conduct analysis, which is driven by a list of desired application based queries. Our experimental analysis shows the flexibility and efficiency of the proposed approach when modeling and analyzing datasets with multiple features. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08635v1 |
https://arxiv.org/pdf/1905.08635v1.pdf | |
PWC | https://paperswithcode.com/paper/generic-multilayer-network-data-analysis-with |
Repo | |
Framework | |
The Secrets of Machine Learning: Ten Things You Wish You Had Known Earlier to be More Effective at Data Analysis
Title | The Secrets of Machine Learning: Ten Things You Wish You Had Known Earlier to be More Effective at Data Analysis |
Authors | Cynthia Rudin, David Carlson |
Abstract | Despite the widespread usage of machine learning throughout organizations, there are some key principles that are commonly missed. In particular: 1) There are at least four main families for supervised learning: logical modeling methods, linear combination methods, case-based reasoning methods, and iterative summarization methods. 2) For many application domains, almost all machine learning methods perform similarly (with some caveats). Deep learning methods, which are the leading technique for computer vision problems, do not maintain an edge over other methods for most problems (and there are reasons why). 3) Neural networks are hard to train and weird stuff often happens when you try to train them. 4) If you don’t use an interpretable model, you can make bad mistakes. 5) Explanations can be misleading and you can’t trust them. 6) You can pretty much always find an accurate-yet-interpretable model, even for deep neural networks. 7) Special properties such as decision making or robustness must be built in, they don’t happen on their own. 8) Causal inference is different than prediction (correlation is not causation). 9) There is a method to the madness of deep neural architectures, but not always. 10) It is a myth that artificial intelligence can do anything. |
Tasks | Causal Inference, Decision Making |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01998v1 |
https://arxiv.org/pdf/1906.01998v1.pdf | |
PWC | https://paperswithcode.com/paper/the-secrets-of-machine-learning-ten-things |
Repo | |
Framework | |
Efficient two-sample functional estimation and the super-oracle phenomenon
Title | Efficient two-sample functional estimation and the super-oracle phenomenon |
Authors | Thomas B. Berrett, Richard J. Samworth |
Abstract | We consider the estimation of two-sample integral functionals, of the type that occur naturally, for example, when the object of interest is a divergence between unknown probability densities. Our first main result is that, in wide generality, a weighted nearest neighbour estimator is efficient, in the sense of achieving the local asymptotic minimax lower bound. Moreover, we also prove a corresponding central limit theorem, which facilitates the construction of asymptotically valid confidence intervals for the functional, having asymptotically minimal width. One interesting consequence of our results is the discovery that, for certain functionals, the worst-case performance of our estimator may improve on that of the natural `oracle’ estimator, which is given access to the values of the unknown densities at the observations. | |
Tasks | |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.09347v1 |
http://arxiv.org/pdf/1904.09347v1.pdf | |
PWC | https://paperswithcode.com/paper/190409347 |
Repo | |
Framework | |
Multi-Robot Path Planning Via Genetic Programming
Title | Multi-Robot Path Planning Via Genetic Programming |
Authors | Alexandre Trudeau, Christopher M. Clark |
Abstract | This paper presents a Genetic Programming (GP) approach to solving multi-robot path planning (MRPP) problems in single-lane workspaces, specifically those easily mapped to graph representations. GP’s versatility enables this approach to produce programs optimizing for multiple attributes rather than a single attribute such as path length or completeness. When optimizing for the number of time steps needed to solve individual MRPP problems, the GP constructed programs outperformed complete MRPP algorithms, i.e. Push-Swap-Wait (PSW), by $54.1%$. The GP constructed programs also consistently outperformed PSW in solving problems that did not meet PSW’s completeness conditions. Furthermore, the GP constructed programs exhibited a greater capacity for scaling than PSW as the number of robots navigating within an MRPP environment increased. This research illustrates the benefits of using Genetic Programming for solving individual MRPP problems, including instances in which the number of robots exceeds the number of leaves in the tree-modeled workspace. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09503v1 |
https://arxiv.org/pdf/1912.09503v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-robot-path-planning-via-genetic |
Repo | |
Framework | |
Science and Technology Advance through Surprise
Title | Science and Technology Advance through Surprise |
Authors | Feng Shi, James Evans |
Abstract | Breakthrough discoveries and inventions involve unexpected combinations of contents including problems, methods, and natural entities, and also diverse contexts such as journals, subfields, and conferences. Drawing on data from tens of millions of research papers, patents, and researchers, we construct models that predict next year’s content and context combinations with an AUC of 95% based on embeddings constructed from high-dimensional stochastic block models, where the improbability of new combinations itself predicts up to 50% of the likelihood that they will gain outsized citations and major awards. Most of these breakthroughs occur when problems in one field are unexpectedly solved by researchers from a distant other. These findings demonstrate the critical role of surprise in advance, and enable evaluation of scientific institutions ranging from education and peer review to awards in supporting it. |
Tasks | |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.09370v2 |
https://arxiv.org/pdf/1910.09370v2.pdf | |
PWC | https://paperswithcode.com/paper/science-and-technology-advance-through |
Repo | |
Framework | |
DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain
Title | DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain |
Authors | Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon, Jianfeng Gao |
Abstract | This paper describes our competing system to enter the MEDIQA-2019 competition. We use a multi-source transfer learning approach to transfer the knowledge from MT-DNN and SciBERT to natural language understanding tasks in the medical domain. For transfer learning fine-tuning, we use multi-task learning on NLI, RQE and QA tasks on general and medical domains to improve performance. The proposed methods are proved effective for natural language understanding in the medical domain, and we rank the first place on the QA task. |
Tasks | Multi-Task Learning, Transfer Learning |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04382v1 |
https://arxiv.org/pdf/1906.04382v1.pdf | |
PWC | https://paperswithcode.com/paper/doubletransfer-at-mediqa-2019-multi-source |
Repo | |
Framework | |
Anomaly Detection with Inexact Labels
Title | Anomaly Detection with Inexact Labels |
Authors | Tomoharu Iwata, Machiko Toyoda, Shotaro Tora, Naonori Ueda |
Abstract | We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which is our extension of the area under the ROC curve (AUC) for inexact labels. The proposed method trains an anomaly score function so that the smooth approximation of the inexact AUC increases while anomaly scores for non-anomalous instances become low. We model the anomaly score function by a neural network-based unsupervised anomaly detection method, e.g., autoencoders. The proposed method performs well even when only a small number of inexact labels are available by incorporating an unsupervised anomaly detection mechanism with inexact AUC maximization. Using various datasets, we experimentally demonstrate that our proposed method improves the anomaly detection performance with inexact anomaly labels, and outperforms existing unsupervised and supervised anomaly detection and multiple instance learning methods. |
Tasks | Anomaly Detection, Multiple Instance Learning, Unsupervised Anomaly Detection |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04807v1 |
https://arxiv.org/pdf/1909.04807v1.pdf | |
PWC | https://paperswithcode.com/paper/anomaly-detection-with-inexact-labels |
Repo | |
Framework | |
Randomized Ablation Feature Importance
Title | Randomized Ablation Feature Importance |
Authors | Luke Merrick |
Abstract | Given a model $f$ that predicts a target $y$ from a vector of input features $\pmb{x} = x_1, x_2, \ldots, x_M$, we seek to measure the importance of each feature with respect to the model’s ability to make a good prediction. To this end, we consider how (on average) some measure of goodness or badness of prediction (which we term “loss” $\ell$), changes when we hide or ablate each feature from the model. To ablate a feature, we replace its value with another possible value randomly. By averaging over many points and many possible replacements, we measure the importance of a feature on the model’s ability to make good predictions. Furthermore, we present statistical measures of uncertainty that quantify how confident we are that the feature importance we measure from our finite dataset and finite number of ablations is close to the theoretical true importance value. |
Tasks | Feature Importance |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00174v2 |
https://arxiv.org/pdf/1910.00174v2.pdf | |
PWC | https://paperswithcode.com/paper/randomized-ablation-feature-importance |
Repo | |
Framework | |