October 18, 2019

2863 words 14 mins read

Paper Group ANR 434

Paper Group ANR 434

Neural Machine Translation with Key-Value Memory-Augmented Attention. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. Improving the Modularity of AUV Control Systems using Behaviour Trees. Non-native children speech recogn …

Neural Machine Translation with Key-Value Memory-Augmented Attention

Title Neural Machine Translation with Key-Value Memory-Augmented Attention
Authors Fandong Meng, Zhaopeng Tu, Yong Cheng, Haiyang Wu, Junjie Zhai, Yuekui Yang, Di Wang
Abstract Although attention-based Neural Machine Translation (NMT) has achieved remarkable progress in recent years, it still suffers from issues of repeating and dropping translations. To alleviate these issues, we propose a novel key-value memory-augmented attention model for NMT, called KVMEMATT. Specifically, we maintain a timely updated keymemory to keep track of attention history and a fixed value-memory to store the representation of source sentence throughout the whole translation process. Via nontrivial transformations and iterative interactions between the two memories, the decoder focuses on more appropriate source word(s) for predicting the next target word at each decoding step, therefore can improve the adequacy of translations. Experimental results on Chinese=>English and WMT17 German<=>English translation tasks demonstrate the superiority of the proposed model.
Tasks Machine Translation
Published 2018-06-29
URL http://arxiv.org/abs/1806.11249v1
PDF http://arxiv.org/pdf/1806.11249v1.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-with-key-value
Repo
Framework

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization

Title AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
Authors Rachel Ward, Xiaoxia Wu, Leon Bottou
Abstract Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine-tune the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing theoretical guarantees for the convergence of AdaGrad for smooth, nonconvex functions. We show that the norm version of AdaGrad (AdaGrad-Norm) converges to a stationary point at the $\mathcal{O}(\log(N)/\sqrt{N})$ rate in the stochastic setting, and at the optimal $\mathcal{O}(1/N)$ rate in the batch (non-stochastic) setting – in this sense, our convergence guarantees are ‘sharp’. In particular, the convergence of AdaGrad-Norm is robust to the choice of all hyper-parameters of the algorithm, in contrast to stochastic gradient descent whose convergence depends crucially on tuning the step-size to the (generally unknown) Lipschitz smoothness constant and level of stochastic noise on the gradient. Extensive numerical experiments are provided to corroborate our theory; moreover, the experiments suggest that the robustness of AdaGrad-Norm extends to state-of-the-art models in deep learning, without sacrificing generalization.
Tasks
Published 2018-06-05
URL http://arxiv.org/abs/1806.01811v6
PDF http://arxiv.org/pdf/1806.01811v6.pdf
PWC https://paperswithcode.com/paper/adagrad-stepsizes-sharp-convergence-over
Repo
Framework

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

Title The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
Authors Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, Simon Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy, Dario Amodei
Abstract This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.
Tasks
Published 2018-02-20
URL http://arxiv.org/abs/1802.07228v1
PDF http://arxiv.org/pdf/1802.07228v1.pdf
PWC https://paperswithcode.com/paper/the-malicious-use-of-artificial-intelligence
Repo
Framework

Improving the Modularity of AUV Control Systems using Behaviour Trees

Title Improving the Modularity of AUV Control Systems using Behaviour Trees
Authors Christopher Iliffe Sprague, Özer Özkahraman, Andrea Munafo, Rachel Marlow, Alexander Phillips, Petter Ögren
Abstract In this paper, we show how behaviour trees (BTs) can be used to design modular, versatile, and robust control architectures for mission-critical systems. In particular, we show this in the context of autonomous underwater vehicles (AUVs). Robustness, in terms of system safety, is important since manual recovery of AUVs is often extremely difficult. Further more, versatility is important to be able to execute many different kinds of missions. Finally, modularity is needed to achieve a combination of robustness and versatility, as the complexity of a versatile systems needs to be encapsulated in modules, in order to create a simple overall structure enabling robustness analysis. The proposed design is illustrated using a typical AUV mission.
Tasks
Published 2018-11-01
URL http://arxiv.org/abs/1811.00426v1
PDF http://arxiv.org/pdf/1811.00426v1.pdf
PWC https://paperswithcode.com/paper/improving-the-modularity-of-auv-control
Repo
Framework

Non-native children speech recognition through transfer learning

Title Non-native children speech recognition through transfer learning
Authors Marco Matassoni, Roberto Gretter, Daniele Falavigna, Diego Giuliani
Abstract This work deals with non-native children’s speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language. The application scenario is characterized by young students learning English and German and reading sentences in these second-languages, as well as in their mother language. The paper analyzes and discusses techniques for training effective DNN-based acoustic models starting from children native speech and performing adaptation with limited non-native audio material. A multi-lingual model is adopted as baseline, where a common phonetic lexicon, defined in terms of the units of the International Phonetic Alphabet (IPA), is shared across the three languages at hand (Italian, German and English); DNN adaptation methods based on transfer learning are evaluated on significant non-native evaluation sets. Results show that the resulting non-native models allow a significant improvement with respect to a mono-lingual system adapted to speakers of the target language.
Tasks Speech Recognition, Transfer Learning
Published 2018-09-25
URL http://arxiv.org/abs/1809.09658v1
PDF http://arxiv.org/pdf/1809.09658v1.pdf
PWC https://paperswithcode.com/paper/non-native-children-speech-recognition
Repo
Framework

Smoothed analysis of the low-rank approach for smooth semidefinite programs

Title Smoothed analysis of the low-rank approach for smooth semidefinite programs
Authors Thomas Pumir, Samy Jelassi, Nicolas Boumal
Abstract We consider semidefinite programs (SDPs) of size n with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced and positive semidefiniteness is naturally enforced. However, the problem in Y is non-convex. In prior work, it has been shown that, when the constraints on the factorized variable regularly define a smooth manifold, provided k is large enough, for almost all cost matrices, all second-order stationary points (SOSPs) are optimal. Importantly, in practice, one can only compute points which approximately satisfy necessary optimality conditions, leading to the question: are such points also approximately optimal? To this end, and under similar assumptions, we use smoothed analysis to show that approximate SOSPs for a randomly perturbed objective function are approximate global optima, with k scaling like the square root of the number of constraints (up to log factors). Moreover, we bound the optimality gap at the approximate solution of the perturbed problem with respect to the original problem. We particularize our results to an SDP relaxation of phase retrieval.
Tasks
Published 2018-06-11
URL http://arxiv.org/abs/1806.03763v2
PDF http://arxiv.org/pdf/1806.03763v2.pdf
PWC https://paperswithcode.com/paper/smoothed-analysis-of-the-low-rank-approach
Repo
Framework

Learning Sharing Behaviors with Arbitrary Numbers of Agents

Title Learning Sharing Behaviors with Arbitrary Numbers of Agents
Authors Katherine Metcalf, Barry-John Theobald, Nicholas Apostoloff
Abstract We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs) with weights dynamically updated during interactions, and the multi-agent fusion model is a logistic regression classifier. We test our models in a multi-agent tower-building environment, where a Q-learning agent learns to interact with rule-based agents. Our approach accurately models the underlying behavior patterns of the rule-based agents with accuracy ranging between 0.63 and 1.0 depending on the stochasticity of the other agent behaviors. In addition we show using KL-divergence that the model accurately captures the distribution of next actions when interacting with both a single agent (KL-divergence < 0.1) and with multiple agents (KL-divergence < 0.37). Finally, we demonstrate that our behavior model can be used by a Q-learning agent to take turns in an interactive turn-taking environment.
Tasks Q-Learning
Published 2018-12-10
URL http://arxiv.org/abs/1812.04145v1
PDF http://arxiv.org/pdf/1812.04145v1.pdf
PWC https://paperswithcode.com/paper/learning-sharing-behaviors-with-arbitrary
Repo
Framework

The Logistic Network Lasso

Title The Logistic Network Lasso
Authors Henrik Ambos, Nguyen Tran, Alexander Jung
Abstract We apply the network Lasso to solve binary classification and clustering problems for network-structured data. To this end, we generalize ordinary logistic regression to non-Euclidean data with an intrinsic network structure. The resulting “logistic network Lasso” amounts to solving a non-smooth convex regularized empirical risk minimization. The risk is measured using the logistic loss incurred over a small set of labeled nodes. For the regularization, we propose to use the total variation of the classifier requiring it to conform to the underlying network structure. A scalable implementation of the learning method is obtained using an inexact variant of the alternating direction methods of multipliers which results in a scalable learning algorithm
Tasks
Published 2018-05-07
URL http://arxiv.org/abs/1805.02483v4
PDF http://arxiv.org/pdf/1805.02483v4.pdf
PWC https://paperswithcode.com/paper/the-logistic-network-lasso
Repo
Framework

A Tensor-Based Sub-Mode Coordinate Algorithm for Stock Prediction

Title A Tensor-Based Sub-Mode Coordinate Algorithm for Stock Prediction
Authors Jieyun Huang, Yunjia Zhang, Jialai Zhang, Xi Zhang
Abstract The investment on the stock market is prone to be affected by the Internet. For the purpose of improving the prediction accuracy, we propose a multi-task stock prediction model that not only considers the stock correlations but also supports multi-source data fusion. Our proposed model first utilizes tensor to integrate the multi-sourced data, including financial Web news, investors’ sentiments extracted from the social network and some quantitative data on stocks. In this way, the intrinsic relationships among different information sources can be captured, and meanwhile, multi-sourced information can be complemented to solve the data sparsity problem. Secondly, we propose an improved sub-mode coordinate algorithm (SMC). SMC is based on the stock similarity, aiming to reduce the variance of their subspace in each dimension produced by the tensor decomposition. The algorithm is able to improve the quality of the input features, and thus improves the prediction accuracy. And the paper utilizes the Long Short-Term Memory (LSTM) neural network model to predict the stock fluctuation trends. Finally, the experiments on 78 A-share stocks in CSI 100 and thirteen popular HK stocks in the year 2015 and 2016 are conducted. The results demonstrate the improvement on the prediction accuracy and the effectiveness of the proposed model.
Tasks Stock Prediction
Published 2018-05-21
URL http://arxiv.org/abs/1805.07979v1
PDF http://arxiv.org/pdf/1805.07979v1.pdf
PWC https://paperswithcode.com/paper/a-tensor-based-sub-mode-coordinate-algorithm
Repo
Framework

Minimax Distribution Estimation in Wasserstein Distance

Title Minimax Distribution Estimation in Wasserstein Distance
Authors Shashank Singh, Barnabás Póczos
Abstract The Wasserstein metric is an important measure of distance between probability distributions, with applications in machine learning, statistics, probability theory, and data analysis. This paper provides upper and lower bounds on statistical minimax rates for the problem of estimating a probability distribution under Wasserstein loss, using only metric properties, such as covering and packing numbers, of the sample space, and weak moment assumptions on the probability distributions.
Tasks
Published 2018-02-24
URL https://arxiv.org/abs/1802.08855v3
PDF https://arxiv.org/pdf/1802.08855v3.pdf
PWC https://paperswithcode.com/paper/minimax-distribution-estimation-in
Repo
Framework

Binarizer at SemEval-2018 Task 3: Parsing dependency and deep learning for irony detection

Title Binarizer at SemEval-2018 Task 3: Parsing dependency and deep learning for irony detection
Authors Nishant Nikhil, Muktabh Mayank Srivastava
Abstract In this paper, we describe the system submitted for the SemEval 2018 Task 3 (Irony detection in English tweets) Subtask A by the team Binarizer. Irony detection is a key task for many natural language processing works. Our method treats ironical tweets to consist of smaller parts containing different emotions. We break down tweets into separate phrases using a dependency parser. We then embed those phrases using an LSTM-based neural network model which is pre-trained to predict emoticons for tweets. Finally, we train a fully-connected network to achieve classification.
Tasks
Published 2018-05-03
URL http://arxiv.org/abs/1805.01112v1
PDF http://arxiv.org/pdf/1805.01112v1.pdf
PWC https://paperswithcode.com/paper/binarizer-at-semeval-2018-task-3-parsing
Repo
Framework

A Century Long Commitment to Assessing Artificial Intelligence and its Impact on Society

Title A Century Long Commitment to Assessing Artificial Intelligence and its Impact on Society
Authors Barbara J. Grosz, Peter Stone
Abstract In September 2016, Stanford’s “One Hundred Year Study on Artificial Intelligence” project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. The report, entitled “Artificial Intelligence and Life in 2030,” examines eight domains of typical urban settings on which AI is likely to have impact over the coming years: transportation, home and service robots, healthcare, education, public safety and security, low-resource communities, employment and workplace, and entertainment. It aims to provide the general public with a scientifically and technologically accurate portrayal of the current state of AI and its potential and to help guide decisions in industry and governments, as well as to inform research and development in the field. This article by the chair of the 2016 Study Panel and the inaugural chair of the AI100 Standing Committee describes the origins of this ambitious longitudinal study, discusses the framing of the inaugural report, and presents the report’s main findings. It concludes with a brief description of the AI100 project’s ongoing efforts and planned next steps.
Tasks
Published 2018-08-23
URL http://arxiv.org/abs/1808.07899v1
PDF http://arxiv.org/pdf/1808.07899v1.pdf
PWC https://paperswithcode.com/paper/a-century-long-commitment-to-assessing
Repo
Framework

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Title Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs
Authors Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, Yanzhi Wang
Abstract Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both \emph{field programmable gate array} (FPGA) and \emph{application-specific integrated circuit} (ASIC) implementations. In algorithm part, SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O($n^2$) to O($n\log n$) and storage complexity from O($n^2$) to O($n$) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline accelerator. For ASIC implementations, the SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.
Tasks
Published 2018-03-28
URL http://arxiv.org/abs/1804.11239v1
PDF http://arxiv.org/pdf/1804.11239v1.pdf
PWC https://paperswithcode.com/paper/structured-weight-matrices-based-hardware
Repo
Framework

Correlated discrete data generation using adversarial training

Title Correlated discrete data generation using adversarial training
Authors Shreyas Patel, Ashutosh Kakadiya, Maitrey Mehta, Raj Derasari, Rahul Patel, Ratnik Gandhi
Abstract Generative Adversarial Networks (GAN) have shown great promise in tasks like synthetic image generation, image inpainting, style transfer, and anomaly detection. However, generating discrete data is a challenge. This work presents an adversarial training based correlated discrete data (CDD) generation model. It also details an approach for conditional CDD generation. The results of our approach are presented over two datasets; job-seeking candidates skill set (private dataset) and MNIST (public dataset). From quantitative and qualitative analysis of these results, we show that our model performs better as it leverages inherent correlation in the data, than an existing model that overlooks correlation.
Tasks Anomaly Detection, Image Generation, Image Inpainting, Style Transfer
Published 2018-04-03
URL http://arxiv.org/abs/1804.00925v1
PDF http://arxiv.org/pdf/1804.00925v1.pdf
PWC https://paperswithcode.com/paper/correlated-discrete-data-generation-using
Repo
Framework

Incremental Learning Framework Using Cloud Computing

Title Incremental Learning Framework Using Cloud Computing
Authors Kumarjit Pathak, Prabhukiran G, Jitin Kapila, Nikit Gawande
Abstract High volume of data, perceived as either challenge or opportunity. Deep learning architecture demands high volume of data to effectively back propagate and train the weights without bias. At the same time, large volume of data demands higher capacity of the machine where it could be executed seamlessly. Budding data scientist along with many research professionals face frequent disconnection issue with cloud computing framework (working without dedicated connection) due to free subscription to the platform. Similar issues also visible while working on local computer where computer may run out of resource or power sometimes and researcher has to start training the models all over again. In this paper, we intend to provide a way to resolve this issue and progressively training the neural network even after having frequent disconnection or resource outage without loosing much of the progress
Tasks
Published 2018-05-12
URL http://arxiv.org/abs/1805.04754v1
PDF http://arxiv.org/pdf/1805.04754v1.pdf
PWC https://paperswithcode.com/paper/incremental-learning-framework-using-cloud
Repo
Framework
comments powered by Disqus