Paper Group ANR 135
Generate High-Resolution Adversarial Samples by Identifying Effective Features. Simple and Effective Prevention of Mode Collapse in Deep One-Class Classification. Communication Efficient Sparsification for Large Scale Machine Learning. Semantic Search of Memes on Twitter. Transformer++. Soft-Root-Sign Activation Function. Modeling Future Cost for N …
Generate High-Resolution Adversarial Samples by Identifying Effective Features
Title | Generate High-Resolution Adversarial Samples by Identifying Effective Features |
Authors | Sizhe Chen, Peidong Zhang, Chengjin Sun, Jia Cai, Xiaolin Huang |
Abstract | As the prevalence of deep learning in computer vision, adversarial samples that weaken the neural networks emerge in large numbers, revealing their deep-rooted defects. Most adversarial attacks calculate an imperceptible perturbation in image space to fool the DNNs. In this strategy, the perturbation looks like noise and thus could be mitigated. Attacks in feature space produce semantic perturbation, but they could only deal with low resolution samples. The reason lies in the great number of coupled features to express a high-resolution image. In this paper, we propose Attack by Identifying Effective Features (AIEF), which learns different weights for features to attack. Effective features, those with great weights, influence the victim model much but distort the image little, and thus are more effective for attack. By attacking mostly on them, AIEF produces high resolution adversarial samples with acceptable distortions. We demonstrate the effectiveness of AIEF by attacking on different tasks with different generative models. |
Tasks | |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07631v1 |
https://arxiv.org/pdf/2001.07631v1.pdf | |
PWC | https://paperswithcode.com/paper/generate-high-resolution-adversarial-samples |
Repo | |
Framework | |
Simple and Effective Prevention of Mode Collapse in Deep One-Class Classification
Title | Simple and Effective Prevention of Mode Collapse in Deep One-Class Classification |
Authors | Penny Chong, Lukas Ruff, Marius Kloft, Alexander Binder |
Abstract | Anomaly detection algorithms find extensive use in various fields. This area of research has recently made great advances thanks to deep learning. A recent method, the deep Support Vector Data Description (deep SVDD), which is inspired by the classic kernel-based Support Vector Data Description (SVDD), is capable of simultaneously learning a feature representation of the data and a data-enclosing hypersphere. The method has shown promising results in both unsupervised and semi-supervised settings. However, deep SVDD suffers from hypersphere collapse—also known as mode collapse—, if the architecture of the model does not comply with certain architectural constraints, e.g. the removal of bias terms. These constraints limit the adaptability of the model and in some cases, may affect the model performance due to learning sub-optimal features. In this work, we consider two regularizers to prevent hypersphere collapse in deep SVDD. The first regularizer is based on injecting random noise via the standard cross-entropy loss. The second regularizer penalizes the minibatch variance when it becomes too small. Moreover, we introduce an adaptive weighting scheme to control the amount of penalization between the SVDD loss and the respective regularizer. Our proposed regularized variants of deep SVDD show encouraging results and outperform a prominent state-of-the-art method on a setup where the anomalies have no apparent geometrical structure. |
Tasks | Anomaly Detection |
Published | 2020-01-24 |
URL | https://arxiv.org/abs/2001.08873v3 |
https://arxiv.org/pdf/2001.08873v3.pdf | |
PWC | https://paperswithcode.com/paper/simple-and-effective-prevention-of-mode |
Repo | |
Framework | |
Communication Efficient Sparsification for Large Scale Machine Learning
Title | Communication Efficient Sparsification for Large Scale Machine Learning |
Authors | Sarit Khirirat, Sindri Magnússon, Arda Aytekin, Mikael Johansson |
Abstract | The increasing scale of distributed learning problems necessitates the development of compression techniques for reducing the information exchange between compute nodes. The level of accuracy in existing compression techniques is typically chosen before training, meaning that they are unlikely to adapt well to the problems that they are solving without extensive hyper-parameter tuning. In this paper, we propose dynamic tuning rules that adapt to the communicated gradients at each iteration. In particular, our rules optimize the communication efficiency at each iteration by maximizing the improvement in the objective function that is achieved per communicated bit. Our theoretical results and experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes. |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06377v1 |
https://arxiv.org/pdf/2003.06377v1.pdf | |
PWC | https://paperswithcode.com/paper/communication-efficient-sparsification-for |
Repo | |
Framework | |
Semantic Search of Memes on Twitter
Title | Semantic Search of Memes on Twitter |
Authors | Jesus Perez-Martin, Benjamin Bustos, Magdalena Saldana |
Abstract | Memes are becoming a useful source of data for analyzing behavior on social media. However, a problem to tackle is how to correctly identify a meme. As the number of memes published every day on social media is huge, there is a need for automatic methods for classifying and searching in large meme datasets. This paper proposes and compares several methods for automatically classifying images as memes. Also, we propose a method that allows us to implement a system for retrieving memes from a dataset using a textual query. We experimentally evaluate the methods using a large dataset of memes collected from Twitter users in Chile, which was annotated by a group of experts. Though some of the evaluated methods are effective, there is still room for improvement. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01462v2 |
https://arxiv.org/pdf/2002.01462v2.pdf | |
PWC | https://paperswithcode.com/paper/semantic-search-of-memes-on-twitter |
Repo | |
Framework | |
Transformer++
Title | Transformer++ |
Authors | Prakhar Thapak, Prodip Hore |
Abstract | Recent advancements in attention mechanisms have replaced recurrent neural networks and its variants for machine translation tasks. Transformer using attention mechanism solely achieved state-of-the-art results in sequence modeling. Neural machine translation based on the attention mechanism is parallelizable and addresses the problem of handling long-range dependencies among words in sentences more effectively than recurrent neural networks. One of the key concepts in attention is to learn three matrices, query, key, and value, where global dependencies among words are learned through linearly projecting word embeddings through these matrices. Multiple query, key, value matrices can be learned simultaneously focusing on a different subspace of the embedded dimension, which is called multi-head in Transformer. We argue that certain dependencies among words could be learned better through an intermediate context than directly modeling word-word dependencies. This could happen due to the nature of certain dependencies or lack of patterns that lend them difficult to be modeled globally using multi-head self-attention. In this work, we propose a new way of learning dependencies through a context in multi-head using convolution. This new form of multi-head attention along with the traditional form achieves better results than Transformer on the WMT 2014 English-to-German and English-to-French translation tasks. We also introduce a framework to learn POS tagging and NER information during the training of encoder which further improves results achieving a new state-of-the-art of 32.1 BLEU, better than existing best by 1.4 BLEU, on the WMT 2014 English-to-German and 44.6 BLEU, better than existing best by 1.1 BLEU, on the WMT 2014 English-to-French translation tasks. We call this Transformer++. |
Tasks | Machine Translation, Word Embeddings |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.04974v1 |
https://arxiv.org/pdf/2003.04974v1.pdf | |
PWC | https://paperswithcode.com/paper/transformer |
Repo | |
Framework | |
Soft-Root-Sign Activation Function
Title | Soft-Root-Sign Activation Function |
Authors | Yuan Zhou, Dandan Li, Shuwei Huo, Sun-Yuan Kung |
Abstract | The choice of activation function in deep networks has a significant effect on the training dynamics and task performance. At present, the most effective and widely-used activation function is ReLU. However, because of the non-zero mean, negative missing and unbounded output, ReLU is at a potential disadvantage during optimization. To this end, we introduce a novel activation function to manage to overcome the above three challenges. The proposed nonlinearity, namely “Soft-Root-Sign” (SRS), is smooth, non-monotonic, and bounded. Notably, the bounded property of SRS distinguishes itself from most state-of-the-art activation functions. In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters to capture negative information and provide zero-mean property, which leading not only to better generalization performance, but also to faster learning speed. It also avoids and rectifies the output distribution to be scattered in the non-negative real number space, making it more compatible with batch normalization (BN) and less sensitive to initialization. In experiments, we evaluated SRS on deep networks applied to a variety of tasks, including image classification, machine translation and generative modelling. Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities, showing that the proposed activation function is generalized and can achieve high performance across tasks. Ablation study further verified the compatibility with BN and self-adaptability for different initialization. |
Tasks | Image Classification, Machine Translation |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00547v1 |
https://arxiv.org/pdf/2003.00547v1.pdf | |
PWC | https://paperswithcode.com/paper/soft-root-sign-activation-function |
Repo | |
Framework | |
Modeling Future Cost for Neural Machine Translation
Title | Modeling Future Cost for Neural Machine Translation |
Authors | Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao |
Abstract | Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does not consider its future cost which means the expected cost of generating the subsequent target translation (i.e., the next target word). To respond to this issue, we propose a simple and effective method to model the future cost of each target word for NMT systems. In detail, a time-dependent future cost is estimated based on the current generated target word and its contextual information to boost the training of the NMT model. Furthermore, the learned future context representation at the current time-step is used to help the generation of the next target word in the decoding. Experimental results on three widely-used translation datasets, including the WMT14 German-to-English, WMT14 English-to-French, and WMT17 Chinese-to-English, show that the proposed approach achieves significant improvements over strong Transformer-based NMT baseline. |
Tasks | Machine Translation |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12558v1 |
https://arxiv.org/pdf/2002.12558v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-future-cost-for-neural-machine |
Repo | |
Framework | |
RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks
Title | RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks |
Authors | Lukas Cavigelli, Luca Benini |
Abstract | We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values. Starting from a pre-trained model, we quantize the weights and then relax random partitions of them to their continuous values for retraining before re-quantizing them and switching to another weight partition for further adaptation. We demonstrate binary and ternary-weight networks with accuracies beyond the state-of-the-art for GoogLeNet and competitive performance for ResNet-18 and ResNet-50 using an SGD-based training method that can easily be integrated into existing frameworks. |
Tasks | Quantization |
Published | 2020-01-04 |
URL | https://arxiv.org/abs/2001.01091v1 |
https://arxiv.org/pdf/2001.01091v1.pdf | |
PWC | https://paperswithcode.com/paper/rpr-random-partition-relaxation-for-training |
Repo | |
Framework | |
A Total Variation Denoising Method Based on Median Filter and Phase Consistency
Title | A Total Variation Denoising Method Based on Median Filter and Phase Consistency |
Authors | Shuo Huang, Suiren Wan |
Abstract | The total variation method is widely used in image noise suppression. However, this method is easy to cause the loss of image details, and it is also sensitive to parameters such as iteration time. In this work, the total variation method has been modified using a diffusion rate adjuster based on the phase congruency and a fusion filter of median filter and phase consistency boundary, which is called the MPC-TV method. Experimental results indicate that MPC-TV method is effective in noise suppression, especially for the removing of speckle noise, and it can also improve the robustness of iteration time of TV method on noise with different variance. |
Tasks | Denoising |
Published | 2020-01-01 |
URL | https://arxiv.org/abs/2001.00150v1 |
https://arxiv.org/pdf/2001.00150v1.pdf | |
PWC | https://paperswithcode.com/paper/a-total-variation-denoising-method-based-on |
Repo | |
Framework | |
Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent
Title | Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent |
Authors | Ehsan Amid, Manfred K. Warmuth |
Abstract | Continuous-time mirror descent (CMD) can be seen as the limit case of the discrete-time MD update when the step-size is infinitesimally small. In this paper, we focus on the geometry of the primal and dual CMD updates and introduce a general framework for reparameterizing one CMD update as another. Specifically, the reparameterized update also corresponds to a CMD, but on the composite loss w.r.t. the new variables, and the original variables are obtained via the reparameterization map. We employ these results to introduce a new family of reparameterizations that interpolate between the two commonly used updates, namely the continuous-time gradient descent (GD) and unnormalized exponentiated gradient (EGU), while extending to many other well-known updates. In particular, we show that for the underdetermined linear regression problem, these updates generalize the known behavior of GD and EGU, and provably converge to the minimum $\mathrm{L}_{2-\tau}$-norm solution for $\tau\in[0,1]$. Our new results also have implications for the regularized training of neural networks to induce sparsity. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10487v1 |
https://arxiv.org/pdf/2002.10487v1.pdf | |
PWC | https://paperswithcode.com/paper/interpolating-between-gradient-descent-and |
Repo | |
Framework | |
İnsansız Araçlarla Düzlemsel Olmayan Araçların Taranması
Title | İnsansız Araçlarla Düzlemsel Olmayan Araçların Taranması |
Authors | Çağlar Seylan, Özgür Saygın Bican, Fatih Semiz |
Abstract | The importance of area coverage with unmanned vehicles, in other words, traveling an area with an unmanned vehicle such as a robot or a UAV completely or partially with minimum cost, is increasing with the increase in usage of such vehicles today. Area coverage with unmanned vehicles is used today in the exploration of an area with UAVs, sweeping mines with robots, cleaning ground with robots in large shopping malls, mowing lawn in a large area etc. The problem has versions such as area coverage with a single unmanned vehicle, area coverage with multiple unmanned vehicles, on-line area coverage (The map of the area that will be covered is not known before starting the coverage) with unmanned vehicles etc. In addition, the area may have obstacles that the vehicles cannot move over. Naturally, many researches are working on the problem and a lot of researches have been done on the problem until today. Spanning tree coverage is one of the major approaches to the problem. In this approach, at the basic level, the planar area is divided into identical squares according to the range of sight of the vehicle, and centers of these squares are assumed to be vertexes of a graph. The vertexes of this graph are connected with the edges with unit costs and after finding the minimum spanning tree of the graph, the vehicle strolls around the spanning tree. The method we propose suggests a way to cover a non-planar area with unmanned vehicles. The method we propose also takes advantage of the spanning-tree coverage approach, but instead of assigning unit costs to the edges, we assigned a weight to each edge using slopes between vertexes those the edges connect. We have gotten noticeably better results than the results we got when we did not consider the slope between two squares and used the classical spanning tree approach. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.09310v1 |
https://arxiv.org/pdf/2003.09310v1.pdf | |
PWC | https://paperswithcode.com/paper/insansz-araclarla-duzlemsel-olmayan-araclarn |
Repo | |
Framework | |
Ensemble Genetic Programming
Title | Ensemble Genetic Programming |
Authors | Nuno M. Rodrigues, João E. Batista, Sara Silva |
Abstract | Ensemble learning is a powerful paradigm that has been usedin the top state-of-the-art machine learning methods like Random Forestsand XGBoost. Inspired by the success of such methods, we have devel-oped a new Genetic Programming method called Ensemble GP. The evo-lutionary cycle of Ensemble GP follows the same steps as other GeneticProgramming systems, but with differences in the population structure,fitness evaluation and genetic operators. We have tested this method oneight binary classification problems, achieving results significantly betterthan standard GP, with much smaller models. Although other methodslike M3GP and XGBoost were the best overall, Ensemble GP was able toachieve exceptionally good generalization results on a particularly hardproblem where none of the other methods was able to succeed. |
Tasks | |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07553v1 |
https://arxiv.org/pdf/2001.07553v1.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-genetic-programming |
Repo | |
Framework | |
Provenance for the Description Logic ELHr
Title | Provenance for the Description Logic ELHr |
Authors | Camille Bourgaux, Ana Ozaki, Rafael Peñaloza, Livia Predoiu |
Abstract | We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as an annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering. |
Tasks | |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07541v1 |
https://arxiv.org/pdf/2001.07541v1.pdf | |
PWC | https://paperswithcode.com/paper/provenance-for-the-description-logic-elhr |
Repo | |
Framework | |
Hybrid Attention-Based Transformer Block Model for Distant Supervision Relation Extraction
Title | Hybrid Attention-Based Transformer Block Model for Distant Supervision Relation Extraction |
Authors | Yan Xiao, Yaochu Jin, Ran Cheng, Kuangrong Hao |
Abstract | With an exponential explosive growth of various digital text information, it is challenging to efficiently obtain specific knowledge from massive unstructured text information. As one basic task for natural language processing (NLP), relation extraction aims to extract the semantic relation between entity pairs based on the given text. To avoid manual labeling of datasets, distant supervision relation extraction (DSRE) has been widely used, aiming to utilize knowledge base to automatically annotate datasets. Unfortunately, this method heavily suffers from wrong labelling due to the underlying strong assumptions. To address this issue, we propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task. More specifically, the Transformer block is firstly used as the sentence encoder to capture syntactic information of sentences, which mainly utilizes multi-head self-attention to extract features from word level. Then, a more concise sentence-level attention mechanism is adopted to constitute the bag representation, aiming to incorporate valid information of each sentence to effectively represent the bag. Experimental results on the public dataset New York Times (NYT) demonstrate that the proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset, which verifies the effectiveness of our model for the DSRE task. |
Tasks | Relation Extraction |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.11518v2 |
https://arxiv.org/pdf/2003.11518v2.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-attention-based-transformer-block |
Repo | |
Framework | |
AI-Powered GUI Attack and Its Defensive Methods
Title | AI-Powered GUI Attack and Its Defensive Methods |
Authors | Ning Yu, Zachary Tuttle, Carl Jake Thurnau, Emmanuel Mireku |
Abstract | Since the first Graphical User Interface (GUI) prototype was invented in the 1970s, GUI systems have been deployed into various personal computer systems and server platforms. Recently, with the development of artificial intelligence (AI) technology, malicious malware powered by AI is emerging as a potential threat to GUI systems. This type of AI-based cybersecurity attack, targeting at GUI systems, is explored in this paper. It is twofold: (1) A malware is designed to attack the existing GUI system by using AI-based object recognition techniques. (2) Its defensive methods are discovered by generating adversarial examples and other methods to alleviate the threats from the intelligent GUI attack. The results have shown that a generic GUI attack can be implemented and performed in a simple way based on current AI techniques and its countermeasures are temporary but effective to mitigate the threats of GUI attack so far. |
Tasks | Object Recognition |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09388v1 |
https://arxiv.org/pdf/2001.09388v1.pdf | |
PWC | https://paperswithcode.com/paper/ai-powered-gui-attack-and-its-defensive |
Repo | |
Framework | |