Paper Group ANR 320
Rectifier Neural Network with a Dual-Pathway Architecture for Image Denoising. Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks. Multimodal Classification of Events in Social Media. Knowledge Representation via Joint Learning of Sequential Text and Knowledge Graphs. Selecting the Selection. An improve …
Rectifier Neural Network with a Dual-Pathway Architecture for Image Denoising
Title | Rectifier Neural Network with a Dual-Pathway Architecture for Image Denoising |
Authors | Keting Zhang, Liqing Zhang |
Abstract | Recently deep neural networks based on tanh activation function have shown their impressive power in image denoising. In this letter, we try to use rectifier function instead of tanh and propose a dual-pathway rectifier neural network by combining two rectifier neurons with reversed input and output weights in the same hidden layer. We drive the equivalent activation function and compare it to some typical activation functions for image denoising under the same network architecture. The experimental results show that our model achieves superior performances faster especially when the noise is small. |
Tasks | Denoising, Image Denoising |
Published | 2016-09-10 |
URL | http://arxiv.org/abs/1609.03024v2 |
http://arxiv.org/pdf/1609.03024v2.pdf | |
PWC | https://paperswithcode.com/paper/rectifier-neural-network-with-a-dual-pathway |
Repo | |
Framework | |
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks
Title | Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks |
Authors | Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, Sherief Reda |
Abstract | Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has recently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint. |
Tasks | Quantization |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03940v1 |
http://arxiv.org/pdf/1612.03940v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-impact-of-precision |
Repo | |
Framework | |
Multimodal Classification of Events in Social Media
Title | Multimodal Classification of Events in Social Media |
Authors | Matthias Zeppelzauer, Daniel Schopfhauser |
Abstract | A large amount of social media hosted on platforms like Flickr and Instagram is related to social events. The task of social event classification refers to the distinction of event and non-event-related content as well as the classification of event types (e.g. sports events, concerts, etc.). In this paper, we provide an extensive study of textual, visual, as well as multimodal representations for social event classification. We investigate strengths and weaknesses of the modalities and study synergy effects between the modalities. Experimental results obtained with our multimodal representation outperform state-of-the-art methods and provide a new baseline for future research. |
Tasks | |
Published | 2016-01-04 |
URL | http://arxiv.org/abs/1601.00599v1 |
http://arxiv.org/pdf/1601.00599v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-classification-of-events-in-social |
Repo | |
Framework | |
Knowledge Representation via Joint Learning of Sequential Text and Knowledge Graphs
Title | Knowledge Representation via Joint Learning of Sequential Text and Knowledge Graphs |
Authors | Jiawei Wu, Ruobing Xie, Zhiyuan Liu, Maosong Sun |
Abstract | Textual information is considered as significant supplement to knowledge representation learning (KRL). There are two main challenges for constructing knowledge representations from plain texts: (1) How to take full advantages of sequential contexts of entities in plain texts for KRL. (2) How to dynamically select those informative sentences of the corresponding entities for KRL. In this paper, we propose the Sequential Text-embodied Knowledge Representation Learning to build knowledge representations from multiple sentences. Given each reference sentence of an entity, we first utilize recurrent neural network with pooling or long short-term memory network to encode the semantic information of the sentence with respect to the entity. Then we further design an attention model to measure the informativeness of each sentence, and build text-based representations of entities. We evaluate our method on two tasks, including triple classification and link prediction. Experimental results demonstrate that our method outperforms other baselines on both tasks, which indicates that our method is capable of selecting informative sentences and encoding the textual information well into knowledge representations. |
Tasks | Knowledge Graphs, Link Prediction, Representation Learning |
Published | 2016-09-22 |
URL | http://arxiv.org/abs/1609.07075v1 |
http://arxiv.org/pdf/1609.07075v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-representation-via-joint-learning |
Repo | |
Framework | |
Selecting the Selection
Title | Selecting the Selection |
Authors | Giles Reger, Martin Suda, Andrei Voronkov, Krystof Hoder |
Abstract | Modern saturation-based Automated Theorem Provers typically implement the superposition calculus for reasoning about first-order logic with or without equality. Practical implementations of this calculus use a variety of literal selections and term orderings to tame the growth of the search space and help steer proof search. This paper introduces the notion of lookahead selection that estimates (looks ahead) the effect on the search space of selecting a literal. There is also a case made for the use of incomplete selection functions that attempt to restrict the search space instead of satisfying some completeness criteria. Experimental evaluation in the \Vampire\ theorem prover shows that both lookahead selection and incomplete selection significantly contribute to solving hard problems unsolvable by other methods. |
Tasks | |
Published | 2016-04-27 |
URL | http://arxiv.org/abs/1604.08055v1 |
http://arxiv.org/pdf/1604.08055v1.pdf | |
PWC | https://paperswithcode.com/paper/selecting-the-selection |
Repo | |
Framework | |
An improved chromosome formulation for genetic algorithms applied to variable selection with the inclusion of interaction terms
Title | An improved chromosome formulation for genetic algorithms applied to variable selection with the inclusion of interaction terms |
Authors | Chee Chun Gan, Gerard Learmonth |
Abstract | Genetic algorithms are a well-known method for tackling the problem of variable selection. As they are non-parametric and can use a large variety of fitness functions, they are well-suited as a variable selection wrapper that can be applied to many different models. In almost all cases, the chromosome formulation used in these genetic algorithms consists of a binary vector of length n for n potential variables indicating the presence or absence of the corresponding variables. While the aforementioned chromosome formulation has exhibited good performance for relatively small n, there are potential problems when the size of n grows very large, especially when interaction terms are considered. We introduce a modification to the standard chromosome formulation that allows for better scalability and model sparsity when interaction terms are included in the predictor search space. Experimental results show that the indexed chromosome formulation demonstrates improved computational efficiency and sparsity on high-dimensional datasets with interaction terms compared to the standard chromosome formulation. |
Tasks | |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06727v1 |
http://arxiv.org/pdf/1604.06727v1.pdf | |
PWC | https://paperswithcode.com/paper/an-improved-chromosome-formulation-for |
Repo | |
Framework | |
Automated Extraction of Number of Subjects in Randomised Controlled Trials
Title | Automated Extraction of Number of Subjects in Randomised Controlled Trials |
Authors | Abeed Sarker |
Abstract | We present a simple approach for automatically extracting the number of subjects involved in randomised controlled trials (RCT). Our approach first applies a set of rule-based techniques to extract candidate study sizes from the abstracts of the articles. Supervised classification is then performed over the candidates with support vector machines, using a small set of lexical, structural, and contextual features. With only a small annotated training set of 201 RCTs, we obtained an accuracy of 88%. We believe that this system will aid complex medical text processing tasks such as summarisation and question answering. |
Tasks | Question Answering |
Published | 2016-06-22 |
URL | http://arxiv.org/abs/1606.07137v1 |
http://arxiv.org/pdf/1606.07137v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-extraction-of-number-of-subjects-in |
Repo | |
Framework | |
Finding Common Characteristics Among NBA Playoff and Championship Teams: A Machine Learning Approach
Title | Finding Common Characteristics Among NBA Playoff and Championship Teams: A Machine Learning Approach |
Authors | Ikjyot Singh Kohli |
Abstract | In this paper, we employ machine learning techniques to analyze seventeen seasons (1999-2000 to 2015-2016) of NBA regular season data from every team to determine the common characteristics among NBA playoff teams. Each team was characterized by 26 predictor variables and one binary response variable taking on a value of “TRUE” if a team had made the playoffs, and value of “FALSE” if a team had missed the playoffs. After fitting an initial classification tree to this problem, this tree was then pruned which decreased the test error rate. Further to this, a random forest of classification trees was grown which provided a very accurate model from which a variable importance plot was generated to determine which predictor variables had the greatest influence on the response variable. The result of this work was the conclusion that the most important factors in characterizing a team’s playoff eligibility are a team’s opponent number of assists per game, a team’s opponent number of made two point shots per game, and a team’s number of steals per game. This seems to suggest that defensive factors as opposed to offensive factors are the most important characteristics shared among NBA playoff teams. We then use neural networks to classify championship teams based on regular season data. From this, we show that the most important factor in a team not winning a championship is that team’s opponent number of made three-point shots per game. This once again implies that defensive characteristics are of great importance in not only determining a team’s playoff eligibility, but certainly, one can conclude that a lack of perimeter defense negatively impacts a team’s championship chances in a given season. Further, it is shown that made two-point shots and defensive rebounding are by far the most important factor in a team’s chances at winning a championship in a given season. |
Tasks | |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05266v7 |
http://arxiv.org/pdf/1604.05266v7.pdf | |
PWC | https://paperswithcode.com/paper/finding-common-characteristics-among-nba |
Repo | |
Framework | |
Deep Recurrent Neural Networks for Supernovae Classification
Title | Deep Recurrent Neural Networks for Supernovae Classification |
Authors | Tom Charnock, Adam Moss |
Abstract | We apply deep recurrent neural networks, which are capable of learning complex sequential information, to classify supernovae\footnote{Code available at \href{https://github.com/adammoss/supernovae}{https://github.com/adammoss/supernovae}}. The observational time and filter fluxes are used as inputs to the network, but since the inputs are agnostic additional data such as host galaxy information can also be included. Using the Supernovae Photometric Classification Challenge (SPCC) data, we find that deep networks are capable of learning about light curves, however the performance of the network is highly sensitive to the amount of training data. For a training size of 50% of the representational SPCC dataset (around $10^4$ supernovae) we obtain a type-Ia vs. non-type-Ia classification accuracy of 94.7%, an area under the Receiver Operating Characteristic curve AUC of 0.986 and a SPCC figure-of-merit $F_1=0.64$. When using only the data for the early-epoch challenge defined by the SPCC we achieve a classification accuracy of 93.1%, AUC of 0.977 and $F_1=0.58$, results almost as good as with the whole light-curve. By employing bidirectional neural networks we can acquire impressive classification results between supernovae types -I,~-II and~-III at an accuracy of 90.4% and AUC of 0.974. We also apply a pre-trained model to obtain classification probabilities as a function of time, and show it can give early indications of supernovae type. Our method is competitive with existing algorithms and has applications for future large-scale photometric surveys. |
Tasks | |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07442v2 |
http://arxiv.org/pdf/1606.07442v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-recurrent-neural-networks-for-supernovae |
Repo | |
Framework | |
Distributed User Association in Energy Harvesting Small Cell Networks: A Probabilistic Model
Title | Distributed User Association in Energy Harvesting Small Cell Networks: A Probabilistic Model |
Authors | Setareh Maghsudi, Ekram Hossain |
Abstract | We consider a distributed downlink user association problem in a small cell network, where small cells obtain the required energy for providing wireless services to users through ambient energy harvesting. Since energy harvesting is opportunistic in nature, the amount of harvested energy is a random variable, without any a priori known characteristics. Moreover, since users arrive in the network randomly and require different wireless services, the energy consumption is a random variable as well. In this paper, we propose a probabilistic framework to mathematically model and analyze the random behavior of energy harvesting and energy consumption in dense small cell networks. Furthermore, as acquiring (even statistical) channel and network knowledge is very costly in a distributed dense network, we develop a bandit-theoretical formulation for distributed user association when no information is available at users |
Tasks | |
Published | 2016-01-27 |
URL | http://arxiv.org/abs/1601.07795v1 |
http://arxiv.org/pdf/1601.07795v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-user-association-in-energy |
Repo | |
Framework | |
Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation
Title | Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation |
Authors | Akash Srivastava, James Zou, Charles Sutton |
Abstract | A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when she sees one. We present a new approach to interactive clustering for data exploration, called \ciif, based on a particularly simple feedback mechanism, in which an analyst can choose to reject individual clusters and request new ones. The new clusters should be different from previously rejected clusters while still fitting the data well. We formalize this interaction in a novel Bayesian prior elicitation framework. In each iteration, the prior is adapted to account for all the previous feedback, and a new clustering is then produced from the posterior distribution. To achieve the computational efficiency necessary for an interactive setting, we propose an incremental optimization method over data minibatches using Lagrangian relaxation. Experiments demonstrate that \ciif can produce accurate and diverse clusterings. |
Tasks | |
Published | 2016-02-22 |
URL | http://arxiv.org/abs/1602.06886v2 |
http://arxiv.org/pdf/1602.06886v2.pdf | |
PWC | https://paperswithcode.com/paper/clustering-with-a-reject-option-interactive-1 |
Repo | |
Framework | |
Optimally Pruning Decision Tree Ensembles With Feature Cost
Title | Optimally Pruning Decision Tree Ensembles With Feature Cost |
Authors | Feng Nan, Joseph Wang, Venkatesh Saligrama |
Abstract | We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general - it takes any ensemble of decision trees as input. By explicitly accounting for feature-sharing across trees together with accuracy/cost trade-off, our method is able to significantly reduce feature cost by pruning subtrees that introduce more loss in terms of feature cost than benefit in terms of prediction accuracy gain. Theoretically, we prove that a linear programming relaxation produces the exact solution of the original integer program. This allows us to use efficient convex optimization tools to obtain an optimally pruned ensemble for any given budget. Empirically, we see that our pruning algorithm significantly improves the performance of the state of the art ensemble method BudgetRF. |
Tasks | |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00955v1 |
http://arxiv.org/pdf/1601.00955v1.pdf | |
PWC | https://paperswithcode.com/paper/optimally-pruning-decision-tree-ensembles |
Repo | |
Framework | |
Visual Congruent Ads for Image Search
Title | Visual Congruent Ads for Image Search |
Authors | Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma |
Abstract | The quality of user experience online is affected by the relevance and placement of advertisements. We propose a new system for selecting and displaying visual advertisements in image search result sets. Our method compares the visual similarity of candidate ads to the image search results and selects the most visually similar ad to be displayed. The method further selects an appropriate location in the displayed image grid to minimize the perceptual visual differences between the ad and its neighbors. We conduct an experiment with about 900 users and find that our proposed method provides significant improvement in the users’ overall satisfaction with the image search experience, without diminishing the users’ ability to see the ad or recall the advertised brand. |
Tasks | Image Retrieval |
Published | 2016-04-21 |
URL | http://arxiv.org/abs/1604.06481v1 |
http://arxiv.org/pdf/1604.06481v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-congruent-ads-for-image-search |
Repo | |
Framework | |
Measuring the non-asymptotic convergence of sequential Monte Carlo samplers using probabilistic programming
Title | Measuring the non-asymptotic convergence of sequential Monte Carlo samplers using probabilistic programming |
Authors | Marco F. Cusumano-Towner, Vikash K. Mansinghka |
Abstract | A key limitation of sampling algorithms for approximate inference is that it is difficult to quantify their approximation error. Widely used sampling schemes, such as sequential importance sampling with resampling and Metropolis-Hastings, produce output samples drawn from a distribution that may be far from the target posterior distribution. This paper shows how to upper-bound the symmetric KL divergence between the output distribution of a broad class of sequential Monte Carlo (SMC) samplers and their target posterior distributions, subject to assumptions about the accuracy of a separate gold-standard sampler. The proposed method applies to samplers that combine multiple particles, multinomial resampling, and rejuvenation kernels. The experiments show the technique being used to estimate bounds on the divergence of SMC samplers for posterior inference in a Bayesian linear regression model and a Dirichlet process mixture model. |
Tasks | Probabilistic Programming |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02161v2 |
http://arxiv.org/pdf/1612.02161v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-non-asymptotic-convergence-of |
Repo | |
Framework | |
Summary - TerpreT: A Probabilistic Programming Language for Program Induction
Title | Summary - TerpreT: A Probabilistic Programming Language for Program Induction |
Authors | Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, Daniel Tarlow |
Abstract | We study machine learning formulations of inductive program synthesis; that is, given input-output examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domain-specific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. The inference task is to observe a set of input-output examples and infer the underlying program. From a TerpreT model we automatically perform inference using four different back-ends: gradient descent (thus each TerpreT model can be seen as defining a differentiable interpreter), linear program (LP) relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesis system. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper comparisons between different approaches to inference. We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. To our knowledge, this is the first work to compare gradient-based search over program space to traditional search-based alternatives. Our key empirical finding is that constraint solvers dominate the gradient descent and LP-based formulations. This is a workshop summary of a longer report at arXiv:1608.04428 |
Tasks | Probabilistic Programming, Program Synthesis |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00817v1 |
http://arxiv.org/pdf/1612.00817v1.pdf | |
PWC | https://paperswithcode.com/paper/summary-terpret-a-probabilistic-programming |
Repo | |
Framework | |