May 7, 2019

3351 words 16 mins read

Paper Group AWR 23

Paper Group AWR 23

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. Tracing liquid level and material boundaries in transparent vessels using the graph cut computer vision approach. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. A Dictionary …

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning

Title The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
Authors Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
Abstract Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. We develop a simple framework based on one simple but novel strategy called double sampling. Our framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to 6.5x faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7x in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. Finally, we extend our framework through approximation to non-linear models, such as SVM. We show that, although using low-precision data induces bias, we can appropriately bound and control the bias. We find in practice 8-bit precision is often sufficient to converge to the correct solution. Interestingly, however, in practice we notice that our framework does not always outperform the naive rounding approach. We discuss this negative result in detail.
Tasks Quantization
Published 2016-11-16
URL http://arxiv.org/abs/1611.05402v3
PDF http://arxiv.org/pdf/1611.05402v3.pdf
PWC https://paperswithcode.com/paper/the-zipml-framework-for-training-models-with
Repo https://github.com/IST-DASLab/smart-quantizer
Framework pytorch

Tracing liquid level and material boundaries in transparent vessels using the graph cut computer vision approach

Title Tracing liquid level and material boundaries in transparent vessels using the graph cut computer vision approach
Authors Sagi Eppel
Abstract Detection of boundaries of materials stored in transparent vessels is essential for identifying properties such as liquid level and phase boundaries, which are vital for controlling numerous processes in the industry and chemistry laboratory. This work presents a computer vision method for identifying the boundary of materials in transparent vessels using the graph-cut algorithm. The method receives an image of a transparent vessel containing a material and the contour of the vessel in the image. The boundary of the material in the vessel is found by the graph cut method. In general the method uses the vessel region of the image to create a graph, where pixels are vertices, and the cost of an edge between two pixels is inversely correlated with their intensity difference. The bottom 10% of the vessel region in the image is assumed to correspond to the material phase and defined as the graph and source. The top 10% of the pixels in the vessels are assumed to correspond to the air phase and defined as the graph sink. The minimal cut that splits the resulting graph between the source and sink (hence, material and air) is traced using the max-flow/min-cut approach. This cut corresponds to the boundary of the material in the image. The method gave high accuracy in boundary recognition for a wide range of liquid, solid, granular and powder materials in various glass vessels from everyday life and the chemistry laboratory, such as bottles, jars, Glasses, Chromotography colums and separatory funnels.
Tasks
Published 2016-01-31
URL http://arxiv.org/abs/1602.00177v1
PDF http://arxiv.org/pdf/1602.00177v1.pdf
PWC https://paperswithcode.com/paper/tracing-liquid-level-and-material-boundaries
Repo https://github.com/sagieppel/Tracing-liquid-level-and-material-boundaries-in-transparent-vessels-using-the-graph-cut--maxflow-mod
Framework none

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Title An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
Authors Jey Han Lau, Timothy Baldwin
Abstract Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained doc2vec models.
Tasks Document Embedding, Word Embeddings
Published 2016-07-19
URL http://arxiv.org/abs/1607.05368v1
PDF http://arxiv.org/pdf/1607.05368v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-evaluation-of-doc2vec-with
Repo https://github.com/tsandefer/capstone_2
Framework tf

A Dictionary-based Approach to Racism Detection in Dutch Social Media

Title A Dictionary-based Approach to Racism Detection in Dutch Social Media
Authors Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, Walter Daelemans
Abstract We present a dictionary-based approach to racism detection in Dutch social media comments, which were retrieved from two public Belgian social media sites likely to attract racist reactions. These comments were labeled as racist or non-racist by multiple annotators. For our approach, three discourse dictionaries were created: first, we created a dictionary by retrieving possibly racist and more neutral terms from the training data, and then augmenting these with more general words to remove some bias. A second dictionary was created through automatic expansion using a \texttt{word2vec} model trained on a large corpus of general Dutch text. Finally, a third dictionary was created by manually filtering out incorrect expansions. We trained multiple Support Vector Machines, using the distribution of words over the different categories in the dictionaries as features. The best-performing model used the manually cleaned dictionary and obtained an F-score of 0.46 for the racist class on a test set consisting of unseen Dutch comments, retrieved from the same sites used for the training set. The automated expansion of the dictionary only slightly boosted the model’s performance, and this increase in performance was not statistically significant. The fact that the coverage of the expanded dictionaries did increase indicates that the words that were automatically added did occur in the corpus, but were not able to meaningfully impact performance. The dictionaries, code, and the procedure for requesting the corpus are available at: https://github.com/clips/hades
Tasks
Published 2016-08-31
URL http://arxiv.org/abs/1608.08738v1
PDF http://arxiv.org/pdf/1608.08738v1.pdf
PWC https://paperswithcode.com/paper/a-dictionary-based-approach-to-racism
Repo https://github.com/clips/hades
Framework none

Bidirectional LSTM-CRF for Clinical Concept Extraction

Title Bidirectional LSTM-CRF for Clinical Concept Extraction
Authors Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Abstract Automated extraction of concepts from patient clinical records is an essential facilitator of clinical research. For this reason, the 2010 i2b2/VA Natural Language Processing Challenges for Clinical Records introduced a concept extraction task aimed at identifying and classifying concepts into predefined categories (i.e., treatments, tests and problems). State-of-the-art concept extraction approaches heavily rely on handcrafted features and domain-specific resources which are hard to collect and define. For this reason, this paper proposes an alternative, streamlined approach: a recurrent neural network (the bidirectional LSTM with CRF decoding) initialized with general-purpose, off-the-shelf word embeddings. The experimental results achieved on the 2010 i2b2/VA reference corpora using the proposed framework outperform all recent methods and ranks closely to the best submission from the original 2010 i2b2/VA challenge.
Tasks Clinical Concept Extraction, Word Embeddings
Published 2016-11-25
URL http://arxiv.org/abs/1611.08373v1
PDF http://arxiv.org/pdf/1611.08373v1.pdf
PWC https://paperswithcode.com/paper/bidirectional-lstm-crf-for-clinical-concept
Repo https://github.com/raghavchalapathy/Bidirectional-LSTM-CRF-for-Clinical-Concept-Extraction
Framework none

Additive Approximations in High Dimensional Nonparametric Regression via the SALSA

Title Additive Approximations in High Dimensional Nonparametric Regression via the SALSA
Authors Kirthevasan Kandasamy, Yaoliang Yu
Abstract High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of \emph{first order}, which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose SALSA, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. SALSA minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on $15$ real datasets, we show that our method is competitive against $21$ other alternatives.
Tasks
Published 2016-01-31
URL http://arxiv.org/abs/1602.00287v3
PDF http://arxiv.org/pdf/1602.00287v3.pdf
PWC https://paperswithcode.com/paper/additive-approximations-in-high-dimensional
Repo https://github.com/kirthevasank/salsa
Framework none

Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

Title Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
Authors Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Vincent Leclère, Joseph Salmon
Abstract In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider $\ell_1$ penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for confidence sets or uncertainty quantification. In this work, after illustrating numerical difficulties for the Smoothed Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expansive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.
Tasks
Published 2016-06-08
URL http://arxiv.org/abs/1606.02702v1
PDF http://arxiv.org/pdf/1606.02702v1.pdf
PWC https://paperswithcode.com/paper/efficient-smoothed-concomitant-lasso
Repo https://github.com/EugeneNdiaye/smoothed_concomitant_lasso
Framework none

Trusting SVM for Piecewise Linear CNNs

Title Trusting SVM for Piecewise Linear CNNs
Authors Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
Abstract We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks (PL-CNNs), a large class of convolutional neural networks. Specifically, PL-CNNs employ piecewise linear non-linearities such as the commonly used ReLU and max-pool, and an SVM classifier as the final layer. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program, which happens to be a latent structured SVM. We optimize the DC program using the concave-convex procedure, which requires us to iteratively solve a structured SVM problem. This allows to design an optimization algorithm with an optimal learning rate that does not require any tuning. Using the MNIST, CIFAR and ImageNet data sets, we show that our approach always improves over the state of the art variants of backpropagation and scales to large data and large network settings.
Tasks
Published 2016-11-07
URL http://arxiv.org/abs/1611.02185v5
PDF http://arxiv.org/pdf/1611.02185v5.pdf
PWC https://paperswithcode.com/paper/trusting-svm-for-piecewise-linear-cnns
Repo https://github.com/lukasruff/Deep-SVDD
Framework pytorch

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

Title Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
Authors David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
Abstract We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.
Tasks Language Modelling
Published 2016-06-03
URL http://arxiv.org/abs/1606.01305v4
PDF http://arxiv.org/pdf/1606.01305v4.pdf
PWC https://paperswithcode.com/paper/zoneout-regularizing-rnns-by-randomly
Repo https://github.com/weixsong/zoneout
Framework tf

Learning Python Code Suggestion with a Sparse Pointer Network

Title Learning Python Code Suggestion with a Sparse Pointer Network
Authors Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel
Abstract To enhance developer productivity, all modern integrated development environments (IDEs) include code suggestion functionality that proposes likely next tokens at the cursor. While current IDEs work well for statically-typed languages, their reliance on type annotations means that they do not provide the same level of support for dynamic programming languages as for statically-typed languages. Moreover, suggestion engines in modern IDEs do not propose expressions or multi-statement idiomatic code. Recent work has shown that language models can improve code suggestion systems by learning from software repositories. This paper introduces a neural language model with a sparse pointer network aimed at capturing very long-range dependencies. We release a large-scale code suggestion corpus of 41M lines of Python code crawled from GitHub. On this corpus, we found standard neural language models to perform well at suggesting local phenomena, but struggle to refer to identifiers that are introduced many tokens in the past. By augmenting a neural language model with a pointer network specialized in referring to predefined classes of identifiers, we obtain a much lower perplexity and a 5 percentage points increase in accuracy for code suggestion compared to an LSTM baseline. In fact, this increase in code suggestion accuracy is due to a 13 times more accurate prediction of identifiers. Furthermore, a qualitative analysis shows this model indeed captures interesting long-range dependencies, like referring to a class member defined over 60 tokens in the past.
Tasks Language Modelling
Published 2016-11-24
URL http://arxiv.org/abs/1611.08307v1
PDF http://arxiv.org/pdf/1611.08307v1.pdf
PWC https://paperswithcode.com/paper/learning-python-code-suggestion-with-a-sparse
Repo https://github.com/SamuelGabriel/R252
Framework tf

Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots

Title Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots
Authors Xavier Bost, Vincent Labatut, Serigne Gueye, Georges Linarès
Abstract Modern popular TV series often develop complex storylines spanning several seasons, but are usually watched in quite a discontinuous way. As a result, the viewer generally needs a comprehensive summary of the previous season plot before the new one starts. The generation of such summaries requires first to identify and characterize the dynamics of the series subplots. One way of doing so is to study the underlying social network of interactions between the characters involved in the narrative. The standard tools used in the Social Networks Analysis field to extract such a network rely on an integration of time, either over the whole considered period, or as a sequence of several time-slices. However, they turn out to be inappropriate in the case of TV series, due to the fact the scenes showed onscreen alternatively focus on parallel storylines, and do not necessarily respect a traditional chronology. This makes existing extraction methods inefficient to describe the dynamics of relationships between characters, or to get a relevant instantaneous view of the current social state in the plot. This is especially true for characters shown as interacting with each other at some previous point in the plot but temporarily neglected by the narrative. In this article, we introduce narrative smoothing, a novel, still exploratory, network extraction method. It smooths the relationship dynamics based on the plot properties, aiming at solving some of the limitations present in the standard approaches. In order to assess our method, we apply it to a new corpus of 3 popular TV series, and compare it to both standard approaches. Our results are promising, showing narrative smoothing leads to more relevant observations when it comes to the characterization of the protagonists and their relationships. It could be used as a basis for further modeling the intertwined storylines constituting TV series plots.
Tasks
Published 2016-02-25
URL https://arxiv.org/abs/1602.07811v5
PDF https://arxiv.org/pdf/1602.07811v5.pdf
PWC https://paperswithcode.com/paper/narrative-smoothing-dynamic-conversational
Repo https://github.com/bostxavier/Narrative-Smoothing
Framework none

Parameterized Machine Learning for High-Energy Physics

Title Parameterized Machine Learning for High-Energy Physics
Authors Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, Daniel Whiteson
Abstract We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual values. This simplifies the training process and gives improved performance at intermediate values, even for complex problems requiring deep learning. Applications include tools parameterized in terms of theoretical model parameters, such as the mass of a particle, which allow for a single network to provide improved discrimination across a range of masses. This concept is simple to implement and allows for optimized interpolatable results.
Tasks
Published 2016-01-28
URL http://arxiv.org/abs/1601.07913v1
PDF http://arxiv.org/pdf/1601.07913v1.pdf
PWC https://paperswithcode.com/paper/parameterized-machine-learning-for-high
Repo https://github.com/zhuhel/Parameterized-DNN
Framework none

Learning a No-Reference Quality Metric for Single-Image Super-Resolution

Title Learning a No-Reference Quality Metric for Single-Image Super-Resolution
Authors Chao Ma, Chih-Yuan Yang, Xiaokang Yang, Ming-Hsuan Yang
Abstract Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception. While most super-resolution images are evaluated by fullreference metrics, the effectiveness is not clear and the required ground-truth images are not always available in practice. To address these problems, we conduct human subject studies using a large set of super-resolution images and propose a no-reference metric learned from visual perceptual scores. Specifically, we design three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learn a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images. Extensive experimental results show that the proposed metric is effective and efficient to assess the quality of super-resolution images based on human perception.
Tasks Image Super-Resolution, Super-Resolution
Published 2016-12-18
URL http://arxiv.org/abs/1612.05890v1
PDF http://arxiv.org/pdf/1612.05890v1.pdf
PWC https://paperswithcode.com/paper/learning-a-no-reference-quality-metric-for
Repo https://github.com/manricheon/eusr-pcl-tf
Framework tf

A Simple Approach to Sparse Clustering

Title A Simple Approach to Sparse Clustering
Authors Ery Arias-Castro, Xiao Pu
Abstract Consider the problem of sparse clustering, where it is assumed that only a subset of the features are useful for clustering purposes. In the framework of the COSA method of Friedman and Meulman, subsequently improved in the form of the Sparse K-means method of Witten and Tibshirani, a natural and simpler hill-climbing approach is introduced. The new method is shown to be competitive with these two methods and others.
Tasks
Published 2016-02-23
URL http://arxiv.org/abs/1602.07277v2
PDF http://arxiv.org/pdf/1602.07277v2.pdf
PWC https://paperswithcode.com/paper/a-simple-approach-to-sparse-clustering
Repo https://github.com/victorpu/SAS_Hill_Climb
Framework none

Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks

Title Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks
Authors Emily Denton, Sam Gross, Rob Fergus
Abstract We introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as a regularizer for standard supervised training of the discriminator. Using our approach we are able to directly train large VGG-style networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL datasets, where our approach obtains performance comparable or superior to existing methods.
Tasks Image Classification, Semi-Supervised Image Classification
Published 2016-11-19
URL http://arxiv.org/abs/1611.06430v1
PDF http://arxiv.org/pdf/1611.06430v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-learning-with-context
Repo https://github.com/eriklindernoren/PyTorch-GAN
Framework pytorch
comments powered by Disqus