October 19, 2019

3091 words 15 mins read

Paper Group ANR 271

Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch. Data-driven Blockbuster Planning on Online Movie Knowledge Library. Fast determinantal point processes via distortion-free intermediate sampling. Quantization Error as a Metric for Dynamic Precision Scaling in Neural Net Training. Multi-Task Determinantal Point Proces …

Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch


Title	Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch
Authors	João Monteiro, Isabela Albuquerque, Zahid Akhtar, Tiago H. Falk
Abstract	Modern applications of artificial neural networks have yielded remarkable performance gains in a wide range of tasks. However, recent studies have discovered that such modelling strategy is vulnerable to Adversarial Examples, i.e. examples with subtle perturbations often too small and imperceptible to humans, but that can easily fool neural networks. Defense techniques against adversarial examples have been proposed, but ensuring robust performance against varying or novel types of attacks remains an open problem. In this work, we focus on the detection setting, in which case attackers become identifiable while models remain vulnerable. Particularly, we employ the decision layer of independently trained models as features for posterior detection. The proposed framework does not require any prior knowledge of adversarial examples generation techniques, and can be directly employed along with unmodified off-the-shelf models. Experiments on the standard MNIST and CIFAR10 datasets deliver empirical evidence that such detection approach generalizes well across not only different adversarial examples generation methods but also quality degradation attacks. Non-linear binary classifiers trained on top of our proposed features can achieve a high detection rate (>90%) in a set of white-box attacks and maintain such performance when tested against unseen attacks.
Tasks
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07770v3
PDF	http://arxiv.org/pdf/1802.07770v3.pdf
PWC	https://paperswithcode.com/paper/generalizable-adversarial-examples-detection
Repo
Framework

Data-driven Blockbuster Planning on Online Movie Knowledge Library


Title	Data-driven Blockbuster Planning on Online Movie Knowledge Library
Authors	Ye Liu, Jiawei Zhang, Chenwei Zhang, Philip S. Yu
Abstract	In the era of big data, logistic planning can be made data-driven to take advantage of accumulated knowledge in the past. While in the movie industry, movie planning can also exploit the existing online movie knowledge library to achieve better results. However, it is ineffective to solely rely on conventional heuristics for movie planning, due to a large number of existing movies and various real-world factors that contribute to the success of each movie, such as the movie genre, available budget, production team (involving actor, actress, director, and writer), etc. In this paper, we study a “Blockbuster Planning” (BP) problem to learn from previous movies and plan for low budget yet high return new movies in a totally data-driven fashion. After a thorough investigation of an online movie knowledge library, a novel movie planning framework “Blockbuster Planning with Maximized Movie Configuration Acquaintance” (BigMovie) is introduced in this paper. From the investment perspective, BigMovie maximizes the estimated gross of the planned movies with a given budget. It is able to accurately estimate the movie gross with a 0.26 mean absolute percentage error (and 0.16 for budget). Meanwhile, from the production team’s perspective, BigMovie is able to formulate an optimized team with people/movie genres that team members are acquainted with. Historical collaboration records are utilized to estimate acquaintance scores of movie configuration factors via an acquaintance tensor. We formulate the BP problem as a non-linear binary programming problem and prove its NP-hardness. To solve it in polynomial time, BigMovie relaxes the hard binary constraints and addresses the BP problem as a cubic programming problem. Extensive experiments conducted on IMDB movie database demonstrate the capability of BigMovie for an effective data-driven blockbuster planning.
Tasks
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10175v1
PDF	http://arxiv.org/pdf/1810.10175v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-blockbuster-planning-on-online
Repo
Framework

Fast determinantal point processes via distortion-free intermediate sampling


Title	Fast determinantal point processes via distortion-free intermediate sampling
Authors	Michał Dereziński
Abstract	Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelepiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new regularized determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction.
Tasks	Data Summarization, Point Processes, Stochastic Optimization
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03717v2
PDF	http://arxiv.org/pdf/1811.03717v2.pdf
PWC	https://paperswithcode.com/paper/fast-determinantal-point-processes-via
Repo
Framework

Quantization Error as a Metric for Dynamic Precision Scaling in Neural Net Training


Title	Quantization Error as a Metric for Dynamic Precision Scaling in Neural Net Training
Authors	Ian Taras, Dylan Malone Stuart
Abstract	Recent work has explored reduced numerical precision for parameters, activations, and gradients during neural network training as a way to reduce the computational cost of training (Na & Mukhopadhyay, 2016) (Courbariaux et al., 2014). We present a novel dynamic precision scaling (DPS) scheme. Using stochastic fixed-point rounding, a quantization-error based scaling scheme, and dynamic bit-widths during training, we achieve 98.8% test accuracy on the MNIST dataset using an average bit-width of just 16 bits for weights and 14 bits for activations, compared to the standard 32-bit floating point values used in deep learning frameworks.
Tasks	Quantization
Published	2018-01-25
URL	http://arxiv.org/abs/1801.08621v2
PDF	http://arxiv.org/pdf/1801.08621v2.pdf
PWC	https://paperswithcode.com/paper/quantization-error-as-a-metric-for-dynamic
Repo
Framework

Multi-Task Determinantal Point Processes for Recommendation


Title	Multi-Task Determinantal Point Processes for Recommendation
Authors	Romain Warlop, Jérémie Mary, Mike Gartrell
Abstract	Determinantal point processes (DPPs) have received significant attention in the recent years as an elegant model for a variety of machine learning tasks, due to their ability to elegantly model set diversity and item quality or popularity. Recent work has shown that DPPs can be effective models for product recommendation and basket completion tasks. We present an enhanced DPP model that is specialized for the task of basket completion, the multi-task DPP. We view the basket completion problem as a multi-class classification problem, and leverage ideas from tensor factorization and multi-class classification to design the multi-task DPP model. We evaluate our model on several real-world datasets, and find that the multi-task DPP provides significantly better predictive quality than a number of state-of-the-art models.
Tasks	Point Processes, Product Recommendation
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09916v2
PDF	http://arxiv.org/pdf/1805.09916v2.pdf
PWC	https://paperswithcode.com/paper/multi-task-determinantal-point-processes-for
Repo
Framework

Using Large Ensembles of Control Variates for Variational Inference


Title	Using Large Ensembles of Control Variates for Variational Inference
Authors	Tomas Geffner, Justin Domke
Abstract	Variational inference is increasingly being addressed with stochastic optimization. In this setting, the gradient’s variance plays a crucial role in the optimization procedure, since high variance gradients lead to poor convergence. A popular approach used to reduce gradient’s variance involves the use of control variates. Despite the good results obtained, control variates developed for variational inference are typically looked at in isolation. In this paper we clarify the large number of control variates that are available by giving a systematic view of how they are derived. We also present a Bayesian risk minimization framework in which the quality of a procedure for combining control variates is quantified by its effect on optimization convergence rates, which leads to a very simple combination rule. Results show that combining a large number of control variates this way significantly improves the convergence of inference over using the typical gradient estimators or a reduced number of control variates.
Tasks	Stochastic Optimization
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12482v1
PDF	http://arxiv.org/pdf/1810.12482v1.pdf
PWC	https://paperswithcode.com/paper/using-large-ensembles-of-control-variates-for
Repo
Framework

Causal Inference in Nonverbal Dyadic Communication with Relevant Interval Selection and Granger Causality


Title	Causal Inference in Nonverbal Dyadic Communication with Relevant Interval Selection and Granger Causality
Authors	Lea Müller, Maha Shadaydeh, Martin Thümmel, Thomas Kessler, Dana Schneider, Joachim Denzler
Abstract	Human nonverbal emotional communication in dyadic dialogs is a process of mutual influence and adaptation. Identifying the direction of influence, or cause-effect relation between participants is a challenging task, due to two main obstacles. First, distinct emotions might not be clearly visible. Second, participants cause-effect relation is transient and variant over time. In this paper, we address these difficulties by using facial expressions that can be present even when strong distinct facial emotions are not visible. We also propose to apply a relevant interval selection approach prior to causal inference to identify those transient intervals where adaptation process occurs. To identify the direction of influence, we apply the concept of Granger causality to the time series of facial expressions on the set of relevant intervals. We tested our approach on synthetic data and then applied it to newly, experimentally obtained data. Here, we were able to show that a more sensitive facial expression detection algorithm and a relevant interval detection approach is most promising to reveal the cause-effect pattern for dyadic communication in various instructed interaction conditions.
Tasks	Causal Inference, Time Series
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12171v1
PDF	http://arxiv.org/pdf/1810.12171v1.pdf
PWC	https://paperswithcode.com/paper/causal-inference-in-nonverbal-dyadic
Repo
Framework

Distributional Term Set Expansion


Title	Distributional Term Set Expansion
Authors	Amaru Cuba Gyllensten, Magnus Sahlgren
Abstract	This paper is a short empirical study of the performance of centrality and classification based iterative term set expansion methods for distributional semantic models. Iterative term set expansion is an interactive process using distributional semantics models where a user labels terms as belonging to some sought after term set, and a system uses this labeling to supply the user with new, candidate, terms to label, trying to maximize the number of positive examples found. While centrality based methods have a long history in term set expansion, we compare them to classification methods based on the the Simple Margin method, an Active Learning approach to classification using Support Vector Machines. Examining the performance of various centrality and classification based methods for a variety of distributional models over five different term sets, we can show that active learning based methods consistently outperform centrality based methods.
Tasks	Active Learning
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05014v1
PDF	http://arxiv.org/pdf/1802.05014v1.pdf
PWC	https://paperswithcode.com/paper/distributional-term-set-expansion
Repo
Framework

Active Mini-Batch Sampling using Repulsive Point Processes


Title	Active Mini-Batch Sampling using Repulsive Point Processes
Authors	Cheng Zhang, Cengiz Öztireli, Stephan Mandt, Giampiero Salvi
Abstract	The convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches. We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch. In particular, we prove that such repulsive sampling schemes lowers the variance of the gradient estimator. This generalizes recent work on using Determinantal Point Processes (DPPs) for mini-batch diversification (Zhang et al., 2017) to the broader class of repulsive point processes. We first show that the phenomenon of variance reduction by diversified sampling generalizes in particular to non-stationary point processes. We then show that other point processes may be computationally much more efficient than DPPs. In particular, we propose and investigate Poisson Disk sampling—frequently encountered in the computer graphics community—for this task. We show empirically that our approach improves over standard SGD both in terms of convergence speed as well as final model performance.
Tasks	Point Processes
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02772v2
PDF	http://arxiv.org/pdf/1804.02772v2.pdf
PWC	https://paperswithcode.com/paper/active-mini-batch-sampling-using-repulsive
Repo
Framework

Sampling Can Be Faster Than Optimization


Title	Sampling Can Be Faster Than Optimization
Authors	Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, Michael I. Jordan
Abstract	Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical understanding of the relationships between these two kinds of methodology, and limited understanding of relative strengths and weaknesses. Moreover, existing results have been obtained primarily in the setting of convex functions (for optimization) and log-concave functions (for sampling). In this setting, where local properties determine global properties, optimization algorithms are unsurprisingly more efficient computationally than sampling algorithms. We instead examine a class of nonconvex objective functions that arise in mixture modeling and multi-stable systems. In this nonconvex setting, we find that the computational complexity of sampling algorithms scales linearly with the model dimension while that of optimization algorithms scales exponentially.
Tasks
Published	2018-11-20
URL	https://arxiv.org/abs/1811.08413v2
PDF	https://arxiv.org/pdf/1811.08413v2.pdf
PWC	https://paperswithcode.com/paper/sampling-can-be-faster-than-optimization
Repo
Framework

Towards Gradient Free and Projection Free Stochastic Optimization


Title	Towards Gradient Free and Projection Free Stochastic Optimization
Authors	Anit Kumar Sahu, Manzil Zaheer, Soummya Kar
Abstract	This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate $O\left(1/T^{1/3}\right)$, where $T$ denotes the iteration count. In particular, the primal sub-optimality gap is shown to have a dimension dependence of $O\left(d^{1/3}\right)$, which is the best known dimension dependence among all zeroth order optimization algorithms with one directional derivative per iteration. For non-convex functions, we obtain the \emph{Frank-Wolfe} gap to be $O\left(d^{1/3}T^{-1/4}\right)$. Experiments on black-box optimization setups demonstrate the efficacy of the proposed algorithm.
Tasks	Stochastic Optimization
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03233v3
PDF	http://arxiv.org/pdf/1810.03233v3.pdf
PWC	https://paperswithcode.com/paper/towards-gradient-free-and-projection-free
Repo
Framework

End-to-end Speech Recognition with Adaptive Computation Steps


Title	End-to-end Speech Recognition with Adaptive Computation Steps
Authors	Mohan Li, Min Liu, Masanori Hattori
Abstract	In this paper, we present Adaptive Computation Steps (ACS) algo-rithm, which enables end-to-end speech recognition models to dy-namically decide how many frames should be processed to predict a linguistic output. The model that applies ACS algorithm follows the encoder-decoder framework, while unlike the attention-based mod-els, it produces alignments independently at the encoder side using the correlation between adjacent frames. Thus, predictions can be made as soon as sufficient acoustic information is received, which makes the model applicable in online cases. Besides, a small change is made to the decoding stage of the encoder-decoder framework, which allows the prediction to exploit bidirectional contexts. We verify the ACS algorithm on a Mandarin speech corpus AIShell-1, and it achieves a 31.2% CER in the online occasion, compared to the 32.4% CER of the attention-based model. To fully demonstrate the advantage of ACS algorithm, offline experiments are conducted, in which our ACS model achieves an 18.7% CER, outperforming the attention-based counterpart with the CER of 22.0%.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10088v2
PDF	http://arxiv.org/pdf/1808.10088v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-speech-recognition-with-adaptive
Repo
Framework

Convergence and Dynamical Behavior of the ADAM Algorithm for Non Convex Stochastic Optimization


Title	Convergence and Dynamical Behavior of the ADAM Algorithm for Non Convex Stochastic Optimization
Authors	Anas Barakat, Pascal Bianchi
Abstract	ADAM is a popular variant of the stochastic gradient descent for finding a local minimizer of a function. The objective function is unknown but a random estimate of the current gradient vector is observed at each round of the algorithm. Assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point. The key ingredient is the introduction of a continuous-time version of ADAM, under the form of a non-autonomous ordinary differential equation. The existence and the uniqueness of the solution are established, as well as the convergence of the solution towards the stationary points of the objective function. The continuous-time system is a relevant approximation of the ADAM iterates, in the sense that the interpolated ADAM process converges weakly to the solution to the ODE.
Tasks	Stochastic Optimization
Published	2018-10-04
URL	https://arxiv.org/abs/1810.02263v3
PDF	https://arxiv.org/pdf/1810.02263v3.pdf
PWC	https://paperswithcode.com/paper/convergence-of-the-adam-algorithm-from-a
Repo
Framework

Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning


Title	Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning
Authors	Tianyi Liu, Xinsong Zhang, Wanhao Zhou, Weijia Jia
Abstract	Extracting relations is critical for knowledge base completion and construction in which distant supervised methods are widely used to extract relational facts automatically with the existing knowledge bases. However, the automatically constructed datasets comprise amounts of low-quality sentences containing noisy words, which is neglected by current distant supervised methods resulting in unacceptable precisions. To mitigate this problem, we propose a novel word-level distant supervised approach for relation extraction. We first build Sub-Tree Parse(STP) to remove noisy words that are irrelevant to relations. Then we construct a neural network inputting the sub-tree while applying the entity-wise attention to identify the important semantic features of relational words in each instance. To make our model more robust against noisy words, we initialize our network with a priori knowledge learned from the relevant task of entity classification by transfer learning. We conduct extensive experiments using the corpora of New York Times(NYT) and Freebase. Experiments show that our approach is effective and improves the area of Precision/Recall(PR) from 0.35 to 0.39 over the state-of-the-art work.
Tasks	Knowledge Base Completion, Relation Extraction, Relationship Extraction (Distant Supervised), Transfer Learning
Published	2018-08-21
URL	http://arxiv.org/abs/1808.06738v2
PDF	http://arxiv.org/pdf/1808.06738v2.pdf
PWC	https://paperswithcode.com/paper/neural-relation-extraction-via-inner-sentence
Repo
Framework

The RLLChatbot: a solution to the ConvAI challenge


Title	The RLLChatbot: a solution to the ConvAI challenge
Authors	Nicolas Gontier, Koustuv Sinha, Peter Henderson, Iulian Serban, Michael Noseworthy, Prasanna Parthasarathi, Joelle Pineau
Abstract	Current conversational systems can follow simple commands and answer basic questions, but they have difficulty maintaining coherent and open-ended conversations about specific topics. Competitions like the Conversational Intelligence (ConvAI) challenge are being organized to push the research development towards that goal. This article presents in detail the RLLChatbot that participated in the 2017 ConvAI challenge. The goal of this research is to better understand how current deep learning and reinforcement learning tools can be used to build a robust yet flexible open domain conversational agent. We provide a thorough description of how a dialog system can be built and trained from mostly public-domain datasets using an ensemble model. The first contribution of this work is a detailed description and analysis of different text generation models in addition to novel message ranking and selection methods. Moreover, a new open-source conversational dataset is presented. Training on this data significantly improves the Recall@k score of the ranking and selection mechanisms compared to our baseline model responsible for selecting the message returned at each interaction.
Tasks	Text Generation
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02714v2
PDF	http://arxiv.org/pdf/1811.02714v2.pdf
PWC	https://paperswithcode.com/paper/the-rllchatbot-a-solution-to-the-convai
Repo
Framework