Paper Group ANR 704
Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots. Neural Network Compression using Transform Coding and Clustering. Asynchronous Stochastic Composition Optimization with Variance Reduction. Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization B …
Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots
Title | Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots |
Authors | Chandrakant Bothe, Fernando Garcia, Arturo Cruz Maya, Amit Kumar Pandey, Stefan Wermter |
Abstract | Service robots need to show appropriate social behaviour in order to be deployed in social environments such as healthcare, education, retail, etc. Some of the main capabilities that robots should have are navigation and conversational skills. If the person is impatient, the person might want a robot to navigate faster and vice versa. Linguistic features that indicate politeness can provide social cues about a person’s patient and impatient behaviour. The novelty presented in this paper is to dynamically incorporate politeness in robotic dialogue systems for navigation. Understanding the politeness in users’ speech can be used to modulate the robot behaviour and responses. Therefore, we developed a dialogue system to navigate in an indoor environment, which produces different robot behaviours and responses based on users’ intention and degree of politeness. We deploy and test our system with the Pepper robot that adapts to the changes in user’s politeness. |
Tasks | |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07269v2 |
http://arxiv.org/pdf/1809.07269v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-dialogue-based-navigation-with |
Repo | |
Framework | |
Neural Network Compression using Transform Coding and Clustering
Title | Neural Network Compression using Transform Coding and Clustering |
Authors | Thorsten Laude, Yannick Richter, Jörn Ostermann |
Abstract | With the deployment of neural networks on mobile devices and the necessity of transmitting neural networks over limited or expensive channels, the file size of the trained model was identified as bottleneck. In this paper, we propose a codec for the compression of neural networks which is based on transform coding for convolutional and dense layers and on clustering for biases and normalizations. By using this codec, we achieve average compression factors between 7.9-9.3 while the accuracy of the compressed networks for image classification decreases only by 1%-2%, respectively. |
Tasks | Image Classification, Neural Network Compression |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07258v1 |
http://arxiv.org/pdf/1805.07258v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-compression-using-transform |
Repo | |
Framework | |
Asynchronous Stochastic Composition Optimization with Variance Reduction
Title | Asynchronous Stochastic Composition Optimization with Variance Reduction |
Authors | Shuheng Shen, Linli Xu, Jingchang Liu, Junliang Guo, Qing Ling |
Abstract | Composition optimization has drawn a lot of attention in a wide variety of machine learning domains from risk management to reinforcement learning. Existing methods solving the composition optimization problem often work in a sequential and single-machine manner, which limits their applications in large-scale problems. To address this issue, this paper proposes two asynchronous parallel variance reduced stochastic compositional gradient (AsyVRSC) algorithms that are suitable to handle large-scale data sets. The two algorithms are AsyVRSC-Shared for the shared-memory architecture and AsyVRSC-Distributed for the master-worker architecture. The embedded variance reduction techniques enable the algorithms to achieve linear convergence rates. Furthermore, AsyVRSC-Shared and AsyVRSC-Distributed enjoy provable linear speedup, when the time delays are bounded by the data dimensionality or the sparsity ratio of the partial gradients, respectively. Extensive experiments are conducted to verify the effectiveness of the proposed algorithms. |
Tasks | |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06396v1 |
http://arxiv.org/pdf/1811.06396v1.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-stochastic-composition |
Repo | |
Framework | |
Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
Title | Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds |
Authors | Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, Daniela Rus |
Abstract | We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the network’s output. Our approach is based on an importance sampling scheme that judiciously defines a sampling distribution over the neural network parameters, and as a result, retains parameters of high importance while discarding redundant ones. We leverage a novel, empirical notion of sensitivity and extend traditional coreset constructions to the application of compressing parameters. Our theoretical analysis establishes guarantees on the size and accuracy of the resulting compressed network and gives rise to generalization bounds that may provide new insights into the generalization properties of neural networks. We demonstrate the practical effectiveness of our algorithm on a variety of neural network configurations and real-world data sets. |
Tasks | Neural Network Compression |
Published | 2018-04-15 |
URL | https://arxiv.org/abs/1804.05345v6 |
https://arxiv.org/pdf/1804.05345v6.pdf | |
PWC | https://paperswithcode.com/paper/data-dependent-coresets-for-compressing |
Repo | |
Framework | |
Distribution-based Prediction of the Degree of Grammaticalization for German Prepositions
Title | Distribution-based Prediction of the Degree of Grammaticalization for German Prepositions |
Authors | Dominik Schlechtweg, Sabine Schulte im Walde |
Abstract | We test the hypothesis that the degree of grammaticalization of German prepositions correlates with their corpus-based contextual dispersion measured by word entropy. We find that there is indeed a moderate correlation for entropy, but a stronger correlation for frequency and number of context types. |
Tasks | |
Published | 2018-04-14 |
URL | http://arxiv.org/abs/1804.06719v1 |
http://arxiv.org/pdf/1804.06719v1.pdf | |
PWC | https://paperswithcode.com/paper/distribution-based-prediction-of-the-degree |
Repo | |
Framework | |
Training Convolutional Networks with Web Images
Title | Training Convolutional Networks with Web Images |
Authors | Nizar Massouh |
Abstract | In this thesis we investigate the effect of using web images to build a large scale database to be used along a deep learning method for a classification task. We replicate the ImageNet large scale database (ILSVRC-2012) from images collected from the web using 4 different download strategies varying: the search engine, the query and the image resolution. As a deep learning method, we will choose the Convolutional Neural Network that was very successful with recognition tasks; the AlexNet. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08416v1 |
http://arxiv.org/pdf/1805.08416v1.pdf | |
PWC | https://paperswithcode.com/paper/training-convolutional-networks-with-web |
Repo | |
Framework | |
High Dimensional Linear Regression using Lattice Basis Reduction
Title | High Dimensional Linear Regression using Lattice Basis Reduction |
Authors | David Gamarnik, Ilias Zadik |
Abstract | We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector $\beta^$ from $n$ noisy linear observations $Y=X\beta^+W \in \mathbb{R}^n$, for known $X \in \mathbb{R}^{n \times p}$ and unknown $W \in \mathbb{R}^n$. Unlike most of the literature on this model we make no sparsity assumption on $\beta^$. Instead we adopt a regularization based on assuming that the underlying vectors $\beta^$ have rational entries with the same denominator $Q \in \mathbb{Z}_{>0}$. We call this $Q$-rationality assumption. We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the $Q$-rationality assumption, our algorithm recovers exactly the vector $\beta^*$ for a large class of distributions for the iid entries of $X$ and non-zero noise $W$. We prove that it is successful under small noise, even when the learner has access to only one observation ($n=1$). Furthermore, we prove that in the case of the Gaussian white noise for $W$, $n=o\left(p/\log p\right)$ and $Q$ sufficiently large, our algorithm tolerates a nearly optimal information-theoretic level of the noise. |
Tasks | |
Published | 2018-03-18 |
URL | http://arxiv.org/abs/1803.06716v2 |
http://arxiv.org/pdf/1803.06716v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-linear-regression-using |
Repo | |
Framework | |
Radiomic Synthesis Using Deep Convolutional Neural Networks
Title | Radiomic Synthesis Using Deep Convolutional Neural Networks |
Authors | Vishwa S. Parekh, Michael A. Jacobs |
Abstract | Radiomics is a rapidly growing field that deals with modeling the textural information present in the different tissues of interest for clinical decision support. However, the process of generating radiomic images is computationally very expensive and could take substantial time per radiological image for certain higher order features, such as, gray-level co-occurrence matrix(GLCM), even with high-end GPUs. To that end, we developed RadSynth, a deep convolutional neural network(CNN) model, to efficiently generate radiomic images. RadSynth was tested on a breast cancer patient cohort of twenty-four patients(ten benign, ten malignant and four normal) for computation of GLCM entropy images from post-contrast DCE-MRI. RadSynth produced excellent synthetic entropy images compared to traditional GLCM entropy images. The average percentage difference and correlation between the two techniques were 0.07 $\pm$ 0.06 and 0.97, respectively. In conclusion, RadSynth presents a new powerful tool for fast computation and visualization of the textural information present in the radiological images. |
Tasks | |
Published | 2018-10-25 |
URL | https://arxiv.org/abs/1810.11090v2 |
https://arxiv.org/pdf/1810.11090v2.pdf | |
PWC | https://paperswithcode.com/paper/radiomic-synthesis-using-deep-convolutional |
Repo | |
Framework | |
Improving Grey-Box Fuzzing by Modeling Program Behavior
Title | Improving Grey-Box Fuzzing by Modeling Program Behavior |
Authors | Siddharth Karamcheti, Gideon Mann, David Rosenberg |
Abstract | Grey-box fuzzers such as American Fuzzy Lop (AFL) are popular tools for finding bugs and potential vulnerabilities in programs. While these fuzzers have been able to find vulnerabilities in many widely used programs, they are not efficient; of the millions of inputs executed by AFL in a typical fuzzing run, only a handful discover unseen behavior or trigger a crash. The remaining inputs are redundant, exhibiting behavior that has already been observed. Here, we present an approach to increase the efficiency of fuzzers like AFL by applying machine learning to directly model how programs behave. We learn a forward prediction model that maps program inputs to execution traces, training on the thousands of inputs collected during standard fuzzing. This learned model guides exploration by focusing on fuzzing inputs on which our model is the most uncertain (measured via the entropy of the predicted execution trace distribution). By focusing on executing inputs our learned model is unsure about, and ignoring any input whose behavior our model is certain about, we show that we can significantly limit wasteful execution. Through testing our approach on a set of binaries released as part of the DARPA Cyber Grand Challenge, we show that our approach is able to find a set of inputs that result in more code coverage and discovered crashes than baseline fuzzers with significantly fewer executions. |
Tasks | |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08973v1 |
http://arxiv.org/pdf/1811.08973v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-grey-box-fuzzing-by-modeling |
Repo | |
Framework | |
Embedding Individual Table Columns for Resilient SQL Chatbots
Title | Embedding Individual Table Columns for Resilient SQL Chatbots |
Authors | Bojan Petrovski, Ignacio Aguado, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat |
Abstract | Most of the world’s data is stored in relational databases. Accessing these requires specialized knowledge of the Structured Query Language (SQL), putting them out of the reach of many people. A recent research thread in Natural Language Processing (NLP) aims to alleviate this problem by automatically translating natural language questions into SQL queries. While the proposed solutions are a great start, they lack robustness and do not easily generalize: the methods require high quality descriptions of the database table columns, and the most widely used training dataset, WikiSQL, is heavily biased towards using those descriptions as part of the questions. In this work, we propose solutions to both problems: we entirely eliminate the need for column descriptions, by relying solely on their contents, and we augment the WikiSQL dataset by paraphrasing column names to reduce bias. We show that the accuracy of existing methods drops when trained on our augmented, column-agnostic dataset, and that our own method reaches state of the art accuracy, while relying on column contents only. |
Tasks | Sql Chatbots |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00633v1 |
http://arxiv.org/pdf/1811.00633v1.pdf | |
PWC | https://paperswithcode.com/paper/embedding-individual-table-columns-for |
Repo | |
Framework | |
Magnitude Bounded Matrix Factorisation for Recommender Systems
Title | Magnitude Bounded Matrix Factorisation for Recommender Systems |
Authors | Shuai Jiang, Kan Li, Richard Yi Da Xu |
Abstract | Low rank matrix factorisation is often used in recommender systems as a way of extracting latent features. When dealing with large and sparse datasets, traditional recommendation algorithms face the problem of acquiring large, unrestrained, fluctuating values over predictions especially for users/items with very few corresponding observations. Although the problem has been somewhat solved by imposing bounding constraints over its objectives, and/or over all entries to be within a fixed range, in terms of gaining better recommendations, these approaches have two major shortcomings that we aim to mitigate in this work: one is they can only deal with one pair of fixed bounds for all entries, and the other one is they are very time-consuming when applied on large scale recommender systems. In this paper, we propose a novel algorithm named Magnitude Bounded Matrix Factorisation (MBMF), which allows different bounds for individual users/items and performs very fast on large scale datasets. The key idea of our algorithm is to construct a model by constraining the magnitudes of each individual user/item feature vector. We achieve this by converting from the Cartesian to Spherical coordinate system with radii set as the corresponding magnitudes, which allows the above constrained optimisation problem to become an unconstrained one. The Stochastic Gradient Descent (SGD) method is then applied to solve the unconstrained task efficiently. Experiments on synthetic and real datasets demonstrate that in most cases the proposed MBMF is superior over all existing algorithms in terms of accuracy and time complexity. |
Tasks | Recommendation Systems |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05515v1 |
http://arxiv.org/pdf/1807.05515v1.pdf | |
PWC | https://paperswithcode.com/paper/magnitude-bounded-matrix-factorisation-for |
Repo | |
Framework | |
Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications
Title | Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications |
Authors | Deren Lei, Zichen Sun, Yijun Xiao, William Yang Wang |
Abstract | Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some theoretical studies have analyzed the implicit regularization effect of stochastic gradient descent (SGD) on simple machine learning models with certain assumptions. However, how it behaves practically in state-of-the-art models and real-world datasets is still unknown. To bridge this gap, we study the role of SGD implicit regularization in deep learning systems. We show pure SGD tends to converge to minimas that have better generalization performances in multiple natural language processing (NLP) tasks. This phenomenon coexists with dropout, an explicit regularizer. In addition, neural network’s finite learning capability does not impact the intrinsic nature of SGD’s implicit regularization effect. Specifically, under limited training samples or with certain corrupted labels, the implicit regularization effect remains strong. We further analyze the stability by varying the weight initialization range. We corroborate these experimental findings with a decision boundary visualization using a 3-layer neural network for interpretation. Altogether, our work enables a deepened understanding on how implicit regularization affects the deep learning model and sheds light on the future study of the over-parameterized model’s generalization ability. |
Tasks | |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00659v1 |
http://arxiv.org/pdf/1811.00659v1.pdf | |
PWC | https://paperswithcode.com/paper/implicit-regularization-of-stochastic |
Repo | |
Framework | |
A Structural Correlation Filter Combined with A Multi-task Gaussian Particle Filter for Visual Tracking
Title | A Structural Correlation Filter Combined with A Multi-task Gaussian Particle Filter for Visual Tracking |
Authors | Manna Dai, Shuying Cheng, Xiangjian He, Dadong Wang |
Abstract | In this paper, we propose a novel structural correlation filter combined with a multi-task Gaussian particle filter (KCF-GPF) model for robust visual tracking. We first present an assemble structure where several KCF trackers as weak experts provide a preliminary decision for a Gaussian particle filter to make a final decision. The proposed method is designed to exploit and complement the strength of a KCF and a Gaussian particle filter. Compared with the existing tracking methods based on correlation filters or particle filters, the proposed tracker has several advantages. First, it can detect the tracked target in a large-scale search scope via weak KCF trackers and evaluate the reliability of weak trackers\rq decisions for a Gaussian particle filter to make a strong decision, and hence it can tackle fast motions, appearance variations, occlusions and re-detections. Second, it can effectively handle large-scale variations via a Gaussian particle filter. Third, it can be amenable to fully parallel implementation using importance sampling without resampling, thereby it is convenient for VLSI implementation and can lower the computational costs. Extensive experiments on the OTB-2013 dataset containing 50 challenging sequences demonstrate that the proposed algorithm performs favourably against 16 state-of-the-art trackers. |
Tasks | Visual Tracking |
Published | 2018-03-03 |
URL | http://arxiv.org/abs/1803.05845v1 |
http://arxiv.org/pdf/1803.05845v1.pdf | |
PWC | https://paperswithcode.com/paper/a-structural-correlation-filter-combined-with |
Repo | |
Framework | |
Why are Sequence-to-Sequence Models So Dull? Understanding the Low-Diversity Problem of Chatbots
Title | Why are Sequence-to-Sequence Models So Dull? Understanding the Low-Diversity Problem of Chatbots |
Authors | Shaojie Jiang, Maarten de Rijke |
Abstract | Diversity is a long-studied topic in information retrieval that usually refers to the requirement that retrieved results should be non-repetitive and cover different aspects. In a conversational setting, an additional dimension of diversity matters: an engaging response generation system should be able to output responses that are diverse and interesting. Sequence-to-sequence (Seq2Seq) models have been shown to be very effective for response generation. However, dialogue responses generated by Seq2Seq models tend to have low diversity. In this paper, we review known sources and existing approaches to this low-diversity problem. We also identify a source of low diversity that has been little studied so far, namely model over-confidence. We sketch several directions for tackling model over-confidence and, hence, the low-diversity problem, including confidence penalties and label smoothing. |
Tasks | Information Retrieval |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01941v1 |
http://arxiv.org/pdf/1809.01941v1.pdf | |
PWC | https://paperswithcode.com/paper/why-are-sequence-to-sequence-models-so-dull |
Repo | |
Framework | |
Method for Hybrid Precision Convolutional Neural Network Representation
Title | Method for Hybrid Precision Convolutional Neural Network Representation |
Authors | Mo’taz Al-Hami, Marcin Pietron, Rishi Kumar, Raul A. Casas, Samer L. Hijazi, Chris Rowen |
Abstract | This invention addresses fixed-point representations of convolutional neural networks (CNN) in integrated circuits. When quantizing a CNN for a practical implementation there is a trade-off between the precision used for operations between coefficients and data and the accuracy of the system. A homogenous representation may not be sufficient to achieve the best level of performance at a reasonable cost in implementation complexity or power consumption. Parsimonious ways of representing data and coefficients are needed to improve power efficiency and throughput while maintaining accuracy of a CNN. |
Tasks | |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09760v1 |
http://arxiv.org/pdf/1807.09760v1.pdf | |
PWC | https://paperswithcode.com/paper/method-for-hybrid-precision-convolutional |
Repo | |
Framework | |