Paper Group ANR 193
Railway Track Specific Traffic Signal Selection Using Deep Learning. Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue. Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis. Probabilistic Synchronous Parallel. Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities. Sparse Markov Dec …
Railway Track Specific Traffic Signal Selection Using Deep Learning
Title | Railway Track Specific Traffic Signal Selection Using Deep Learning |
Authors | S Ritika, Shruti Mittal, Dattaraj Rao |
Abstract | With the railway transportation Industry moving actively towards automation, accurate location and inventory of wayside track assets like traffic signals, crossings, switches, mileposts, etc. is of extreme importance. With the new Positive Train Control (PTC) regulation coming into effect, many railway safety rules will be tied directly to location of assets like mileposts and signals. Newer speed regulations will be enforced based on location of the Train with respect to a wayside asset. Hence it is essential for the railroads to have an accurate database of the types and locations of these assets. This paper talks about a real-world use-case of detecting railway signals from a camera mounted on a moving locomotive and tracking their locations. The camera is engineered to withstand the environment factors on a moving train and provide a consistent steady image at around 30 frames per second. Using advanced image analysis and deep learning techniques, signals are detected in these camera images and a database of their locations is created. Railway signals differ a lot from road signals in terms of shapes and rules for placement with respect to track. Due to space constraint and traffic densities in urban areas signals are not placed on the same side of the track and multiple lines can run in parallel. Hence there is need to associate signal detected with the track on which the train runs. We present a method to associate the signals to the specific track they belong to using a video feed from the front facing camera mounted on the lead locomotive. A pipeline of track detection, region of interest selection, signal detection has been implemented which gives an overall accuracy of 94.7% on a route covering 150km with 247 signals. |
Tasks | |
Published | 2017-12-17 |
URL | http://arxiv.org/abs/1712.06107v1 |
http://arxiv.org/pdf/1712.06107v1.pdf | |
PWC | https://paperswithcode.com/paper/railway-track-specific-traffic-signal |
Repo | |
Framework | |
Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue
Title | Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue |
Authors | Shereen Oraby, Vrindavan Harrison, Lena Reed, Ernesto Hernandez, Ellen Riloff, Marilyn Walker |
Abstract | The use of irony and sarcasm in social media allows us to study them at scale for the first time. However, their diversity has made it difficult to construct a high-quality corpus of sarcasm in dialogue. Here, we describe the process of creating a large- scale, highly-diverse corpus of online debate forums dialogue, and our novel methods for operationalizing classes of sarcasm in the form of rhetorical questions and hyperbole. We show that we can use lexico-syntactic cues to reliably retrieve sarcastic utterances with high accuracy. To demonstrate the properties and quality of our corpus, we conduct supervised learning experiments with simple features, and show that we achieve both higher precision and F than previous work on sarcasm in debate forums dialogue. We apply a weakly-supervised linguistic pattern learner and qualitatively analyze the linguistic differences in each class. |
Tasks | |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05404v1 |
http://arxiv.org/pdf/1709.05404v1.pdf | |
PWC | https://paperswithcode.com/paper/creating-and-characterizing-a-diverse-corpus |
Repo | |
Framework | |
Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis
Title | Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis |
Authors | Qingming Tang, Weiran Wang, Karen Livescu |
Abstract | We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA’s advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeam Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition. |
Tasks | Representation Learning |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.04673v2 |
http://arxiv.org/pdf/1708.04673v2.pdf | |
PWC | https://paperswithcode.com/paper/acoustic-feature-learning-via-deep |
Repo | |
Framework | |
Probabilistic Synchronous Parallel
Title | Probabilistic Synchronous Parallel |
Authors | Liang Wang, Ben Catterall, Richard Mortier |
Abstract | Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent. In distributed learning, the networked nodes have to work collaboratively to update the model parameters, and the way how they proceed is referred to as synchronous parallel design (or barrier control). Synchronous parallel protocol is the building block of any distributed learning framework, and its design has direct impact on the performance and scalability of the system. In this paper, we propose a new barrier control technique - Probabilistic Synchronous Parallel (PSP). Com- paring to the previous Bulk Synchronous Parallel (BSP), Stale Synchronous Parallel (SSP), and (Asynchronous Parallel) ASP, the proposed solution e ectively improves both the convergence speed and the scalability of the SGD algorithm by introducing a sampling primitive into the system. Moreover, we also show that the sampling primitive can be applied atop of the existing barrier control mechanisms to derive fully distributed PSP-based synchronous parallel. We not only provide a thorough theoretical analysis1 on the convergence of PSP-based SGD algorithm, but also implement a full-featured distributed learning framework called Actor and perform intensive evaluation atop of it. |
Tasks | |
Published | 2017-09-22 |
URL | http://arxiv.org/abs/1709.07772v2 |
http://arxiv.org/pdf/1709.07772v2.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-synchronous-parallel |
Repo | |
Framework | |
Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities
Title | Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities |
Authors | Yadollah Yaghoobzadeh, Hinrich Schütze |
Abstract | Entities are essential elements of natural language. In this paper, we present methods for learning multi-level representations of entities on three complementary levels: character (character patterns in entity names extracted, e.g., by neural networks), word (embeddings of words in entity names) and entity (entity embeddings). We investigate state-of-the-art learning methods on each level and find large differences, e.g., for deep learning models, traditional ngram features and the subword model of fasttext (Bojanowski et al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the entity level. We confirm experimentally that each level of representation contributes complementary information and a joint representation of all three levels improves the existing embedding based baseline for fine-grained entity typing by a large margin. Additionally, we show that adding information from entity descriptions further improves multi-level representations of entities. |
Tasks | Entity Embeddings, Entity Typing, Word Embeddings |
Published | 2017-01-08 |
URL | http://arxiv.org/abs/1701.02025v2 |
http://arxiv.org/pdf/1701.02025v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-representations-for-fine-grained |
Repo | |
Framework | |
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Title | Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning |
Authors | Kyungjae Lee, Sungjoon Choi, Songhwai Oh |
Abstract | In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance. |
Tasks | |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06293v3 |
http://arxiv.org/pdf/1709.06293v3.pdf | |
PWC | https://paperswithcode.com/paper/sparse-markov-decision-processes-with-causal |
Repo | |
Framework | |
Comparison of Decoding Strategies for CTC Acoustic Models
Title | Comparison of Decoding Strategies for CTC Acoustic Models |
Authors | Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel |
Abstract | Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols. Output symbols are conditionally independent of each other under CTC loss, so a language model (LM) can be incorporated conveniently during decoding, retaining the traditional separation of acoustic and linguistic components in ASR. For fixed vocabularies, Weighted Finite State Transducers provide a strong baseline for efficient integration of CTC AMs with n-gram LMs. Character-based neural LMs provide a straight forward solution for open vocabulary speech recognition and all-neural models, and can be decoded with beam search. Finally, sequence-to-sequence models can be used to translate a sequence of individual sounds into a word string. We compare the performance of these three approaches, and analyze their error patterns, which provides insightful guidance for future research and development in this important area. |
Tasks | Language Modelling, Speech Recognition |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04469v1 |
http://arxiv.org/pdf/1708.04469v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-decoding-strategies-for-ctc |
Repo | |
Framework | |
Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction
Title | Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction |
Authors | Kedi Wu, Guo-Wei Wei |
Abstract | Toxicity analysis and prediction are of paramount importance to human health and environmental protection. Existing computational methods are built from a wide variety of descriptors and regressors, which makes their performance analysis difficult. For example, deep neural network (DNN), a successful approach in many occasions, acts like a black box and offers little conceptual elegance or physical understanding. The present work constructs a common set of microscopic descriptors based on established physical models for charges, surface areas and free energies to assess the performance of multi-task convolutional neural network (MT-CNN) architectures and a few other approaches, including random forest (RF) and gradient boosting decision tree (GBDT), on an equal footing. Comparison is also given to convolutional neural network (CNN) and non-convolutional deep neural network (DNN) algorithms. Four benchmark toxicity data sets (i.e., endpoints) are used to evaluate various approaches. Extensive numerical studies indicate that the present MT-CNN architecture is able to outperform the state-of-the-art methods. |
Tasks | |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1703.10951v1 |
http://arxiv.org/pdf/1703.10951v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-multi-task-convolutional-neural |
Repo | |
Framework | |
Budgeted Batch Bayesian Optimization With Unknown Batch Sizes
Title | Budgeted Batch Bayesian Optimization With Unknown Batch Sizes |
Authors | Vu Nguyen, Santu Rana, Sunil Gupta, Cheng Li, Svetha Venkatesh |
Abstract | Parameter settings profoundly impact the performance of machine learning algorithms and laboratory experiments. The classical grid search or trial-error methods are exponentially expensive in large parameter spaces, and Bayesian optimization (BO) offers an elegant alternative for global optimization of black box functions. In situations where the black box function can be evaluated at multiple points simultaneously, batch Bayesian optimization is used. Current batch BO approaches are restrictive in that they fix the number of evaluations per batch, and this can be wasteful when the number of specified evaluations is larger than the number of real maxima in the underlying acquisition function. We present the Budgeted Batch Bayesian Optimization (B3O) for hyper-parameter tuning and experimental design - we identify the appropriate batch size for each iteration in an elegant way. To set the batch size flexible, we use the infinite Gaussian mixture model (IGMM) for automatically identifying the number of peaks in the underlying acquisition functions. We solve the intractability of estimating the IGMM directly from the acquisition function by formulating the batch generalized slice sampling to efficiently draw samples from the acquisition function. We perform extensive experiments for both synthetic functions and two real world applications - machine learning hyper-parameter tuning and experimental design for alloy hardening. We show empirically that the proposed B3O outperforms the existing fixed batch BO approaches in finding the optimum whilst requiring a fewer number of evaluations, thus saving cost and time. |
Tasks | |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.04842v2 |
http://arxiv.org/pdf/1703.04842v2.pdf | |
PWC | https://paperswithcode.com/paper/budgeted-batch-bayesian-optimization-with |
Repo | |
Framework | |
An optimal unrestricted learning procedure
Title | An optimal unrestricted learning procedure |
Authors | Shahar Mendelson |
Abstract | We study learning problems involving arbitrary classes of functions $F$, distributions $X$ and targets $Y$. Because proper learning procedures, i.e., procedures that are only allowed to select functions in $F$, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that $F$ is convex), we consider unrestricted learning procedures that are free to choose functions outside the given class. We present a new unrestricted procedure that is optimal in a very strong sense: the required sample complexity is essentially the best one can hope for, and the estimate holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with the what one would expect if $F$ were convex, even when $F$ is not. And if $F$ is convex, the procedure turns out to be proper. Thus, the unrestricted procedure is actually optimal in both realms, for convex classes as a proper procedure and for arbitrary classes as an unrestricted procedure. |
Tasks | |
Published | 2017-07-17 |
URL | http://arxiv.org/abs/1707.05342v3 |
http://arxiv.org/pdf/1707.05342v3.pdf | |
PWC | https://paperswithcode.com/paper/an-optimal-unrestricted-learning-procedure |
Repo | |
Framework | |
Towards “AlphaChem”: Chemical Synthesis Planning with Tree Search and Deep Neural Network Policies
Title | Towards “AlphaChem”: Chemical Synthesis Planning with Tree Search and Deep Neural Network Policies |
Authors | Marwin Segler, Mike Preuß, Mark P. Waller |
Abstract | Retrosynthesis is a technique to plan the chemical synthesis of organic molecules, for example drugs, agro- and fine chemicals. In retrosynthesis, a search tree is built by analysing molecules recursively and dissecting them into simpler molecular building blocks until one obtains a set of known building blocks. The search space is intractably large, and it is difficult to determine the value of retrosynthetic positions. Here, we propose to model retrosynthesis as a Markov Decision Process. In combination with a Deep Neural Network policy learned from essentially the complete published knowledge of chemistry, Monte Carlo Tree Search (MCTS) can be used to evaluate positions. In exploratory studies, we demonstrate that MCTS with neural network policies outperforms the traditionally used best-first search with hand-coded heuristics. |
Tasks | |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1702.00020v1 |
http://arxiv.org/pdf/1702.00020v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-alphachem-chemical-synthesis-planning |
Repo | |
Framework | |
The All-Paths and Cycles Graph Kernel
Title | The All-Paths and Cycles Graph Kernel |
Authors | P. -L. Giscard, R. C. Wilson |
Abstract | With the recent rise in the amount of structured data available, there has been considerable interest in methods for machine learning with graphs. Many of these approaches have been kernel methods, which focus on measuring the similarity between graphs. These generally involving measuring the similarity of structural elements such as walks or paths. Borgwardt and Kriegel proposed the all-paths kernel but emphasized that it is NP-hard to compute and infeasible in practice, favouring instead the shortest-path kernel. In this paper, we introduce a new algorithm for computing the all-paths kernel which is very efficient and enrich it further by including the simple cycles as well. We demonstrate how it is feasible even on large datasets to compute all the paths and simple cycles up to a moderate length. We show how to count labelled paths/simple cycles between vertices of a graph and evaluate a labelled path and simple cycles kernel. Extensive evaluations on a variety of graph datasets demonstrate that the all-paths and cycles kernel has superior performance to the shortest-path kernel and state-of-the-art performance overall. |
Tasks | |
Published | 2017-08-04 |
URL | http://arxiv.org/abs/1708.01410v1 |
http://arxiv.org/pdf/1708.01410v1.pdf | |
PWC | https://paperswithcode.com/paper/the-all-paths-and-cycles-graph-kernel |
Repo | |
Framework | |
No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World
Title | No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World |
Authors | Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley |
Abstract | Modern machine learning systems such as image classifiers rely heavily on large scale data sets for training. Such data sets are costly to create, thus in practice a small number of freely available, open source data sets are widely used. We suggest that examining the geo-diversity of open data sets is critical before adopting a data set for use cases in the developing world. We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales. These results emphasize the need to ensure geo-representation when constructing data sets for use in the developing world. |
Tasks | |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08536v1 |
http://arxiv.org/pdf/1711.08536v1.pdf | |
PWC | https://paperswithcode.com/paper/no-classification-without-representation |
Repo | |
Framework | |
Structured Deep Neural Network Pruning via Matrix Pivoting
Title | Structured Deep Neural Network Pruning via Matrix Pivoting |
Authors | Ranko Sredojevic, Shaoyi Cheng, Lazar Supic, Rawan Naous, Vladimir Stojanovic |
Abstract | Deep Neural Networks (DNNs) are the key to the state-of-the-art machine vision, sensor fusion and audio/video signal processing. Unfortunately, their computation complexity and tight resource constraints on the Edge make them hard to leverage on mobile, embedded and IoT devices. Due to great diversity of Edge devices, DNN designers have to take into account the hardware platform and application requirements during network training. In this work we introduce pruning via matrix pivoting as a way to improve network pruning by compromising between the design flexibility of architecture-oblivious and performance efficiency of architecture-aware pruning, the two dominant techniques for obtaining resource-efficient DNNs. We also describe local and global network optimization techniques for efficient implementation of the resulting pruned networks. In combination, the proposed pruning and implementation result in close to linear speed up with the reduction of network coefficients during pruning. |
Tasks | Network Pruning, Sensor Fusion |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.01084v1 |
http://arxiv.org/pdf/1712.01084v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-deep-neural-network-pruning-via |
Repo | |
Framework | |
Hierarchical modeling of molecular energies using a deep neural network
Title | Hierarchical modeling of molecular energies using a deep neural network |
Authors | Nicholas Lubbers, Justin S. Smith, Kipton Barros |
Abstract | We introduce the Hierarchically Interacting Particle Neural Network (HIP-NN) to model molecular properties from datasets of quantum calculations. Inspired by a many-body expansion, HIP-NN decomposes properties, such as energy, as a sum over hierarchical terms. These terms are generated from a neural network–a composition of many nonlinear transformations–acting on a representation of the molecule. HIP-NN achieves state-of-the-art performance on a dataset of 131k ground state organic molecules, and predicts energies with 0.26 kcal/mol mean absolute error. With minimal tuning, our model is also competitive on a dataset of molecular dynamics trajectories. In addition to enabling accurate energy predictions, the hierarchical structure of HIP-NN helps to identify regions of model uncertainty. |
Tasks | |
Published | 2017-09-29 |
URL | http://arxiv.org/abs/1710.00017v1 |
http://arxiv.org/pdf/1710.00017v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-modeling-of-molecular-energies |
Repo | |
Framework | |