Paper Group AWR 23
The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. Tracing liquid level and material boundaries in transparent vessels using the graph cut computer vision approach. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. A Dictionary …
The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
Title | The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning |
Authors | Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang |
Abstract | Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. We develop a simple framework based on one simple but novel strategy called double sampling. Our framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to 6.5x faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7x in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. Finally, we extend our framework through approximation to non-linear models, such as SVM. We show that, although using low-precision data induces bias, we can appropriately bound and control the bias. We find in practice 8-bit precision is often sufficient to converge to the correct solution. Interestingly, however, in practice we notice that our framework does not always outperform the naive rounding approach. We discuss this negative result in detail. |
Tasks | Quantization |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05402v3 |
http://arxiv.org/pdf/1611.05402v3.pdf | |
PWC | https://paperswithcode.com/paper/the-zipml-framework-for-training-models-with |
Repo | https://github.com/IST-DASLab/smart-quantizer |
Framework | pytorch |
Tracing liquid level and material boundaries in transparent vessels using the graph cut computer vision approach
Title | Tracing liquid level and material boundaries in transparent vessels using the graph cut computer vision approach |
Authors | Sagi Eppel |
Abstract | Detection of boundaries of materials stored in transparent vessels is essential for identifying properties such as liquid level and phase boundaries, which are vital for controlling numerous processes in the industry and chemistry laboratory. This work presents a computer vision method for identifying the boundary of materials in transparent vessels using the graph-cut algorithm. The method receives an image of a transparent vessel containing a material and the contour of the vessel in the image. The boundary of the material in the vessel is found by the graph cut method. In general the method uses the vessel region of the image to create a graph, where pixels are vertices, and the cost of an edge between two pixels is inversely correlated with their intensity difference. The bottom 10% of the vessel region in the image is assumed to correspond to the material phase and defined as the graph and source. The top 10% of the pixels in the vessels are assumed to correspond to the air phase and defined as the graph sink. The minimal cut that splits the resulting graph between the source and sink (hence, material and air) is traced using the max-flow/min-cut approach. This cut corresponds to the boundary of the material in the image. The method gave high accuracy in boundary recognition for a wide range of liquid, solid, granular and powder materials in various glass vessels from everyday life and the chemistry laboratory, such as bottles, jars, Glasses, Chromotography colums and separatory funnels. |
Tasks | |
Published | 2016-01-31 |
URL | http://arxiv.org/abs/1602.00177v1 |
http://arxiv.org/pdf/1602.00177v1.pdf | |
PWC | https://paperswithcode.com/paper/tracing-liquid-level-and-material-boundaries |
Repo | https://github.com/sagieppel/Tracing-liquid-level-and-material-boundaries-in-transparent-vessels-using-the-graph-cut--maxflow-mod |
Framework | none |
An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
Title | An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation |
Authors | Jey Han Lau, Timothy Baldwin |
Abstract | Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained doc2vec models. |
Tasks | Document Embedding, Word Embeddings |
Published | 2016-07-19 |
URL | http://arxiv.org/abs/1607.05368v1 |
http://arxiv.org/pdf/1607.05368v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-evaluation-of-doc2vec-with |
Repo | https://github.com/tsandefer/capstone_2 |
Framework | tf |
A Dictionary-based Approach to Racism Detection in Dutch Social Media
Title | A Dictionary-based Approach to Racism Detection in Dutch Social Media |
Authors | Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, Walter Daelemans |
Abstract | We present a dictionary-based approach to racism detection in Dutch social media comments, which were retrieved from two public Belgian social media sites likely to attract racist reactions. These comments were labeled as racist or non-racist by multiple annotators. For our approach, three discourse dictionaries were created: first, we created a dictionary by retrieving possibly racist and more neutral terms from the training data, and then augmenting these with more general words to remove some bias. A second dictionary was created through automatic expansion using a \texttt{word2vec} model trained on a large corpus of general Dutch text. Finally, a third dictionary was created by manually filtering out incorrect expansions. We trained multiple Support Vector Machines, using the distribution of words over the different categories in the dictionaries as features. The best-performing model used the manually cleaned dictionary and obtained an F-score of 0.46 for the racist class on a test set consisting of unseen Dutch comments, retrieved from the same sites used for the training set. The automated expansion of the dictionary only slightly boosted the model’s performance, and this increase in performance was not statistically significant. The fact that the coverage of the expanded dictionaries did increase indicates that the words that were automatically added did occur in the corpus, but were not able to meaningfully impact performance. The dictionaries, code, and the procedure for requesting the corpus are available at: https://github.com/clips/hades |
Tasks | |
Published | 2016-08-31 |
URL | http://arxiv.org/abs/1608.08738v1 |
http://arxiv.org/pdf/1608.08738v1.pdf | |
PWC | https://paperswithcode.com/paper/a-dictionary-based-approach-to-racism |
Repo | https://github.com/clips/hades |
Framework | none |
Bidirectional LSTM-CRF for Clinical Concept Extraction
Title | Bidirectional LSTM-CRF for Clinical Concept Extraction |
Authors | Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi |
Abstract | Automated extraction of concepts from patient clinical records is an essential facilitator of clinical research. For this reason, the 2010 i2b2/VA Natural Language Processing Challenges for Clinical Records introduced a concept extraction task aimed at identifying and classifying concepts into predefined categories (i.e., treatments, tests and problems). State-of-the-art concept extraction approaches heavily rely on handcrafted features and domain-specific resources which are hard to collect and define. For this reason, this paper proposes an alternative, streamlined approach: a recurrent neural network (the bidirectional LSTM with CRF decoding) initialized with general-purpose, off-the-shelf word embeddings. The experimental results achieved on the 2010 i2b2/VA reference corpora using the proposed framework outperform all recent methods and ranks closely to the best submission from the original 2010 i2b2/VA challenge. |
Tasks | Clinical Concept Extraction, Word Embeddings |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08373v1 |
http://arxiv.org/pdf/1611.08373v1.pdf | |
PWC | https://paperswithcode.com/paper/bidirectional-lstm-crf-for-clinical-concept |
Repo | https://github.com/raghavchalapathy/Bidirectional-LSTM-CRF-for-Clinical-Concept-Extraction |
Framework | none |
Additive Approximations in High Dimensional Nonparametric Regression via the SALSA
Title | Additive Approximations in High Dimensional Nonparametric Regression via the SALSA |
Authors | Kirthevasan Kandasamy, Yaoliang Yu |
Abstract | High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of \emph{first order}, which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose SALSA, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. SALSA minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on $15$ real datasets, we show that our method is competitive against $21$ other alternatives. |
Tasks | |
Published | 2016-01-31 |
URL | http://arxiv.org/abs/1602.00287v3 |
http://arxiv.org/pdf/1602.00287v3.pdf | |
PWC | https://paperswithcode.com/paper/additive-approximations-in-high-dimensional |
Repo | https://github.com/kirthevasank/salsa |
Framework | none |
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
Title | Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression |
Authors | Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Vincent Leclère, Joseph Salmon |
Abstract | In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider $\ell_1$ penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for confidence sets or uncertainty quantification. In this work, after illustrating numerical difficulties for the Smoothed Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expansive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features. |
Tasks | |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02702v1 |
http://arxiv.org/pdf/1606.02702v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-smoothed-concomitant-lasso |
Repo | https://github.com/EugeneNdiaye/smoothed_concomitant_lasso |
Framework | none |
Trusting SVM for Piecewise Linear CNNs
Title | Trusting SVM for Piecewise Linear CNNs |
Authors | Leonard Berrada, Andrew Zisserman, M. Pawan Kumar |
Abstract | We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks (PL-CNNs), a large class of convolutional neural networks. Specifically, PL-CNNs employ piecewise linear non-linearities such as the commonly used ReLU and max-pool, and an SVM classifier as the final layer. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program, which happens to be a latent structured SVM. We optimize the DC program using the concave-convex procedure, which requires us to iteratively solve a structured SVM problem. This allows to design an optimization algorithm with an optimal learning rate that does not require any tuning. Using the MNIST, CIFAR and ImageNet data sets, we show that our approach always improves over the state of the art variants of backpropagation and scales to large data and large network settings. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02185v5 |
http://arxiv.org/pdf/1611.02185v5.pdf | |
PWC | https://paperswithcode.com/paper/trusting-svm-for-piecewise-linear-cnns |
Repo | https://github.com/lukasruff/Deep-SVDD |
Framework | pytorch |
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
Title | Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations |
Authors | David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal |
Abstract | We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST. |
Tasks | Language Modelling |
Published | 2016-06-03 |
URL | http://arxiv.org/abs/1606.01305v4 |
http://arxiv.org/pdf/1606.01305v4.pdf | |
PWC | https://paperswithcode.com/paper/zoneout-regularizing-rnns-by-randomly |
Repo | https://github.com/weixsong/zoneout |
Framework | tf |
Learning Python Code Suggestion with a Sparse Pointer Network
Title | Learning Python Code Suggestion with a Sparse Pointer Network |
Authors | Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel |
Abstract | To enhance developer productivity, all modern integrated development environments (IDEs) include code suggestion functionality that proposes likely next tokens at the cursor. While current IDEs work well for statically-typed languages, their reliance on type annotations means that they do not provide the same level of support for dynamic programming languages as for statically-typed languages. Moreover, suggestion engines in modern IDEs do not propose expressions or multi-statement idiomatic code. Recent work has shown that language models can improve code suggestion systems by learning from software repositories. This paper introduces a neural language model with a sparse pointer network aimed at capturing very long-range dependencies. We release a large-scale code suggestion corpus of 41M lines of Python code crawled from GitHub. On this corpus, we found standard neural language models to perform well at suggesting local phenomena, but struggle to refer to identifiers that are introduced many tokens in the past. By augmenting a neural language model with a pointer network specialized in referring to predefined classes of identifiers, we obtain a much lower perplexity and a 5 percentage points increase in accuracy for code suggestion compared to an LSTM baseline. In fact, this increase in code suggestion accuracy is due to a 13 times more accurate prediction of identifiers. Furthermore, a qualitative analysis shows this model indeed captures interesting long-range dependencies, like referring to a class member defined over 60 tokens in the past. |
Tasks | Language Modelling |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08307v1 |
http://arxiv.org/pdf/1611.08307v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-python-code-suggestion-with-a-sparse |
Repo | https://github.com/SamuelGabriel/R252 |
Framework | tf |
Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots
Title | Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots |
Authors | Xavier Bost, Vincent Labatut, Serigne Gueye, Georges Linarès |
Abstract | Modern popular TV series often develop complex storylines spanning several seasons, but are usually watched in quite a discontinuous way. As a result, the viewer generally needs a comprehensive summary of the previous season plot before the new one starts. The generation of such summaries requires first to identify and characterize the dynamics of the series subplots. One way of doing so is to study the underlying social network of interactions between the characters involved in the narrative. The standard tools used in the Social Networks Analysis field to extract such a network rely on an integration of time, either over the whole considered period, or as a sequence of several time-slices. However, they turn out to be inappropriate in the case of TV series, due to the fact the scenes showed onscreen alternatively focus on parallel storylines, and do not necessarily respect a traditional chronology. This makes existing extraction methods inefficient to describe the dynamics of relationships between characters, or to get a relevant instantaneous view of the current social state in the plot. This is especially true for characters shown as interacting with each other at some previous point in the plot but temporarily neglected by the narrative. In this article, we introduce narrative smoothing, a novel, still exploratory, network extraction method. It smooths the relationship dynamics based on the plot properties, aiming at solving some of the limitations present in the standard approaches. In order to assess our method, we apply it to a new corpus of 3 popular TV series, and compare it to both standard approaches. Our results are promising, showing narrative smoothing leads to more relevant observations when it comes to the characterization of the protagonists and their relationships. It could be used as a basis for further modeling the intertwined storylines constituting TV series plots. |
Tasks | |
Published | 2016-02-25 |
URL | https://arxiv.org/abs/1602.07811v5 |
https://arxiv.org/pdf/1602.07811v5.pdf | |
PWC | https://paperswithcode.com/paper/narrative-smoothing-dynamic-conversational |
Repo | https://github.com/bostxavier/Narrative-Smoothing |
Framework | none |
Parameterized Machine Learning for High-Energy Physics
Title | Parameterized Machine Learning for High-Energy Physics |
Authors | Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, Daniel Whiteson |
Abstract | We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual values. This simplifies the training process and gives improved performance at intermediate values, even for complex problems requiring deep learning. Applications include tools parameterized in terms of theoretical model parameters, such as the mass of a particle, which allow for a single network to provide improved discrimination across a range of masses. This concept is simple to implement and allows for optimized interpolatable results. |
Tasks | |
Published | 2016-01-28 |
URL | http://arxiv.org/abs/1601.07913v1 |
http://arxiv.org/pdf/1601.07913v1.pdf | |
PWC | https://paperswithcode.com/paper/parameterized-machine-learning-for-high |
Repo | https://github.com/zhuhel/Parameterized-DNN |
Framework | none |
Learning a No-Reference Quality Metric for Single-Image Super-Resolution
Title | Learning a No-Reference Quality Metric for Single-Image Super-Resolution |
Authors | Chao Ma, Chih-Yuan Yang, Xiaokang Yang, Ming-Hsuan Yang |
Abstract | Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception. While most super-resolution images are evaluated by fullreference metrics, the effectiveness is not clear and the required ground-truth images are not always available in practice. To address these problems, we conduct human subject studies using a large set of super-resolution images and propose a no-reference metric learned from visual perceptual scores. Specifically, we design three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learn a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images. Extensive experimental results show that the proposed metric is effective and efficient to assess the quality of super-resolution images based on human perception. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2016-12-18 |
URL | http://arxiv.org/abs/1612.05890v1 |
http://arxiv.org/pdf/1612.05890v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-no-reference-quality-metric-for |
Repo | https://github.com/manricheon/eusr-pcl-tf |
Framework | tf |
A Simple Approach to Sparse Clustering
Title | A Simple Approach to Sparse Clustering |
Authors | Ery Arias-Castro, Xiao Pu |
Abstract | Consider the problem of sparse clustering, where it is assumed that only a subset of the features are useful for clustering purposes. In the framework of the COSA method of Friedman and Meulman, subsequently improved in the form of the Sparse K-means method of Witten and Tibshirani, a natural and simpler hill-climbing approach is introduced. The new method is shown to be competitive with these two methods and others. |
Tasks | |
Published | 2016-02-23 |
URL | http://arxiv.org/abs/1602.07277v2 |
http://arxiv.org/pdf/1602.07277v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-approach-to-sparse-clustering |
Repo | https://github.com/victorpu/SAS_Hill_Climb |
Framework | none |
Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks
Title | Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks |
Authors | Emily Denton, Sam Gross, Rob Fergus |
Abstract | We introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as a regularizer for standard supervised training of the discriminator. Using our approach we are able to directly train large VGG-style networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL datasets, where our approach obtains performance comparable or superior to existing methods. |
Tasks | Image Classification, Semi-Supervised Image Classification |
Published | 2016-11-19 |
URL | http://arxiv.org/abs/1611.06430v1 |
http://arxiv.org/pdf/1611.06430v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-with-context |
Repo | https://github.com/eriklindernoren/PyTorch-GAN |
Framework | pytorch |