Paper Group ANR 445
Continuation of Nesterov’s Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging. Buried object detection using handheld WEMI with task-driven extended functions of multiple instances. Improved Dropout for Shallow and Deep Learning. Memory-enhanced Decoder for Neural Machine Translation. An Alternative Softmax Operator …
Continuation of Nesterov’s Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging
Title | Continuation of Nesterov’s Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging |
Authors | Fouad Hadj-Selem, Tommy Lofstedt, Elvis Dohmatob, Vincent Frouin, Mathieu Dubois, Vincent Guillemot, Edouard Duchesnay |
Abstract | Predictive models can be used on high-dimensional brain images for diagnosis of a clinical condition. Spatial regularization through structured sparsity offers new perspectives in this context and reduces the risk of overfitting the model while providing interpretable neuroimaging signatures by forcing the solution to adhere to domain-specific constraints. Total Variation (TV) enforces spatial smoothness of the solution while segmenting predictive regions from the background. We consider the problem of minimizing the sum of a smooth convex loss, a non-smooth convex penalty (whose proximal operator is known) and a wide range of possible complex, non-smooth convex structured penalties such as TV or overlapping group Lasso. Existing solvers are either limited in the functions they can minimize or in their practical capacity to scale to high-dimensional imaging data. Nesterov’s smoothing technique can be used to minimize a large number of non-smooth convex structured penalties but reasonable precision requires a small smoothing parameter, which slows down the convergence speed. To benefit from the versatility of Nesterov’s smoothing technique, we propose a first order continuation algorithm, CONESTA, which automatically generates a sequence of decreasing smoothing parameters. The generated sequence maintains the optimal convergence speed towards any globally desired precision. Our main contributions are: To propose an expression of the duality gap to probe the current distance to the global optimum in order to adapt the smoothing parameter and the convergence speed. We provide a convergence rate, which is an improvement over classical proximal gradient smoothing methods. We demonstrate on both simulated and high-dimensional structural neuroimaging data that CONESTA significantly outperforms many state-of-the-art solvers in regard to convergence speed and precision. |
Tasks | |
Published | 2016-05-31 |
URL | http://arxiv.org/abs/1605.09658v6 |
http://arxiv.org/pdf/1605.09658v6.pdf | |
PWC | https://paperswithcode.com/paper/continuation-of-nesterovs-smoothing-for |
Repo | |
Framework | |
Buried object detection using handheld WEMI with task-driven extended functions of multiple instances
Title | Buried object detection using handheld WEMI with task-driven extended functions of multiple instances |
Authors | Matthew Cook, Alina Zare, Dominic Ho |
Abstract | Many effective supervised discriminative dictionary learning methods have been developed in the literature. However, when training these algorithms, precise ground-truth of the training data is required to provide very accurate point-wise labels. Yet, in many applications, accurate labels are not always feasible. This is especially true in the case of buried object detection in which the size of the objects are not consistent. In this paper, a new multiple instance dictionary learning algorithm for detecting buried objects using a handheld WEMI sensor is detailed. The new algorithm, Task Driven Extended Functions of Multiple Instances, can overcome data that does not have very precise point-wise labels and still learn a highly discriminative dictionary. Results are presented and discussed on measured WEMI data. |
Tasks | Dictionary Learning, Object Detection |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06121v1 |
http://arxiv.org/pdf/1603.06121v1.pdf | |
PWC | https://paperswithcode.com/paper/buried-object-detection-using-handheld-wemi |
Repo | |
Framework | |
Improved Dropout for Shallow and Deep Learning
Title | Improved Dropout for Shallow and Deep Learning |
Authors | Zhe Li, Boqing Gong, Tianbao Yang |
Abstract | Dropout has been witnessed with great success in training deep neural networks by independently zeroing out the outputs of neurons at random. It has also received a surge of interest for shallow learning, e.g., logistic regression. However, the independent sampling for dropout could be suboptimal for the sake of convergence. In this paper, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named \textbf{evolutional dropout}) that computes the sampling probabilities on-the-fly from a mini-batch of examples. Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout. For example, on the CIFAR-100 data, the evolutional dropout achieves relative improvements over 10% on the prediction performance and over 50% on the convergence speed compared to the standard dropout. |
Tasks | Stochastic Optimization |
Published | 2016-02-06 |
URL | http://arxiv.org/abs/1602.02220v2 |
http://arxiv.org/pdf/1602.02220v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-dropout-for-shallow-and-deep |
Repo | |
Framework | |
Memory-enhanced Decoder for Neural Machine Translation
Title | Memory-enhanced Decoder for Neural Machine Translation |
Authors | Mingxuan Wang, Zhengdong Lu, Hang Li, Qun Liu |
Abstract | We propose to enhance the RNN decoder in a neural machine translator (NMT) with external memory, as a natural but powerful extension to the state in the decoding RNN. This memory-enhanced RNN decoder is called \textsc{MemDec}. At each time during decoding, \textsc{MemDec} will read from this memory and write to this memory once, both with content-based addressing. Unlike the unbounded memory in previous work\cite{RNNsearch} to store the representation of source sentence, the memory in \textsc{MemDec} is a matrix with pre-determined size designed to better capture the information important for the decoding process at each time step. Our empirical study on Chinese-English translation shows that it can improve by $4.8$ BLEU upon Groundhog and $5.3$ BLEU upon on Moses, yielding the best performance achieved with the same training set. |
Tasks | Machine Translation |
Published | 2016-06-07 |
URL | http://arxiv.org/abs/1606.02003v1 |
http://arxiv.org/pdf/1606.02003v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-enhanced-decoder-for-neural-machine |
Repo | |
Framework | |
An Alternative Softmax Operator for Reinforcement Learning
Title | An Alternative Softmax Operator for Reinforcement Learning |
Authors | Kavosh Asadi, Michael L. Littman |
Abstract | A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one’s weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study a differentiable softmax operator that, among other properties, is a non-expansion ensuring a convergent behavior in learning and planning. We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice. |
Tasks | Decision Making |
Published | 2016-12-16 |
URL | http://arxiv.org/abs/1612.05628v5 |
http://arxiv.org/pdf/1612.05628v5.pdf | |
PWC | https://paperswithcode.com/paper/an-alternative-softmax-operator-for |
Repo | |
Framework | |
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Title | A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges |
Authors | Dingwen Zhang, Huazhu Fu, Junwei Han, Ali Borji, Xuelong Li |
Abstract | Co-saliency detection is a newly emerging and rapidly growing research area in computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this paper, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field, and give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms. |
Tasks | Co-Saliency Detection, Saliency Detection |
Published | 2016-04-24 |
URL | http://arxiv.org/abs/1604.07090v5 |
http://arxiv.org/pdf/1604.07090v5.pdf | |
PWC | https://paperswithcode.com/paper/a-review-of-co-saliency-detection-technique |
Repo | |
Framework | |
Event Abstraction for Process Mining using Supervised Learning Techniques
Title | Event Abstraction for Process Mining using Supervised Learning Techniques |
Authors | Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst |
Abstract | Process mining techniques focus on extracting insight in processes from event logs. In many cases, events recorded in the event log are too fine-grained, causing process discovery algorithms to discover incomprehensible process models or process models that are not representative of the event log. We show that when process discovery algorithms are only able to discover an unrepresentative process model from a low-level event log, structure in the process can in some cases still be discovered by first abstracting the event log to a higher level of granularity. This gives rise to the challenge to bridge the gap between an original low-level event log and a desired high-level perspective on this log, such that a more structured or more comprehensible process model can be discovered. We show that supervised learning can be leveraged for the event abstraction task when annotations with high-level interpretations of the low-level events are available for a subset of the sequences (i.e., traces). We present a method to generate feature vector representations of events based on XES extensions, and describe an approach to abstract events in an event log with Condition Random Fields using these event features. Furthermore, we propose a sequence-focused metric to evaluate supervised event abstraction results that fits closely to the tasks of process discovery and conformance checking. We conclude this paper by demonstrating the usefulness of supervised event abstraction for obtaining more structured and/or more comprehensible process models using both real life event data and synthetic event data. |
Tasks | |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07283v1 |
http://arxiv.org/pdf/1606.07283v1.pdf | |
PWC | https://paperswithcode.com/paper/event-abstraction-for-process-mining-using |
Repo | |
Framework | |
Large-scale Analysis of Chess Games with Chess Engines: A Preliminary Report
Title | Large-scale Analysis of Chess Games with Chess Engines: A Preliminary Report |
Authors | Mathieu Acher, François Esnault |
Abstract | The strength of chess engines together with the availability of numerous chess games have attracted the attention of chess players, data scientists, and researchers during the last decades. State-of-the-art engines now provide an authoritative judgement that can be used in many applications like cheating detection, intrinsic ratings computation, skill assessment, or the study of human decision-making. A key issue for the research community is to gather a large dataset of chess games together with the judgement of chess engines. Unfortunately the analysis of each move takes lots of times. In this paper, we report our effort to analyse almost 5 millions chess games with a computing grid. During summer 2015, we processed 270 millions unique played positions using the Stockfish engine with a quite high depth (20). We populated a database of 1+ tera-octets of chess evaluations, representing an estimated time of 50 years of computation on a single machine. Our effort is a first step towards the replication of research results, the supply of open data and procedures for exploring new directions, and the investigation of software engineering/scalability issues when computing billions of moves. |
Tasks | Decision Making |
Published | 2016-04-28 |
URL | http://arxiv.org/abs/1607.04186v1 |
http://arxiv.org/pdf/1607.04186v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-analysis-of-chess-games-with |
Repo | |
Framework | |
Accessing accurate documents by mining auxiliary document information
Title | Accessing accurate documents by mining auxiliary document information |
Authors | Jinju Joby, Jyothi Korra |
Abstract | Earlier techniques of text mining included algorithms like k-means, Naive Bayes, SVM which classify and cluster the text document for mining relevant information about the documents. The need for improving the mining techniques has us searching for techniques using the available algorithms. This paper proposes one technique which uses the auxiliary information that is present inside the text documents to improve the mining. This auxiliary information can be a description to the content. This information can be either useful or completely useless for mining. The user should assess the worth of the auxiliary information before considering this technique for text mining. In this paper, a combination of classical clustering algorithms is used to mine the datasets. The algorithm runs in two stages which carry out mining at different levels of abstraction. The clustered documents would then be classified based on the necessary groups. The proposed technique is aimed at improved results of document clustering. |
Tasks | |
Published | 2016-04-15 |
URL | http://arxiv.org/abs/1604.04558v1 |
http://arxiv.org/pdf/1604.04558v1.pdf | |
PWC | https://paperswithcode.com/paper/accessing-accurate-documents-by-mining |
Repo | |
Framework | |
Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?
Title | Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks? |
Authors | Matthew Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, Ram Vasudevan |
Abstract | Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images from a simulation engine to rapidly generate annotated data that can be used for the training of machine learning algorithms. We demonstrate that a state of the art architecture, which is trained only using these synthetic annotations, performs better than the identical architecture trained on human annotated real-world data, when tested on the KITTI data set for vehicle detection. By training machine learning algorithms on a rich virtual world, real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning’s application to sensor-based classification problems like those that appear in self-driving cars. The source code and data to train and validate the networks described in this paper are made available for researchers. |
Tasks | Self-Driving Cars |
Published | 2016-10-06 |
URL | http://arxiv.org/abs/1610.01983v2 |
http://arxiv.org/pdf/1610.01983v2.pdf | |
PWC | https://paperswithcode.com/paper/driving-in-the-matrix-can-virtual-worlds |
Repo | |
Framework | |
A Minimax Theory for Adaptive Data Analysis
Title | A Minimax Theory for Adaptive Data Analysis |
Authors | Yu-Xiang Wang, Jing Lei, Stephen E. Fienberg |
Abstract | In adaptive data analysis, the user makes a sequence of queries on the data, where at each step the choice of query may depend on the results in previous steps. The releases are often randomized in order to reduce overfitting for such adaptively chosen queries. In this paper, we propose a minimax framework for adaptive data analysis. Assuming Gaussianity of queries, we establish the first sharp minimax lower bound on the squared error in the order of $O(\frac{\sqrt{k}\sigma^2}{n})$, where $k$ is the number of queries asked, and $\sigma^2/n$ is the ordinary signal-to-noise ratio for a single query. Our lower bound is based on the construction of an approximately least favorable adversary who picks a sequence of queries that are most likely to be affected by overfitting. This approximately least favorable adversary uses only one level of adaptivity, suggesting that the minimax risk for 1-step adaptivity with k-1 initial releases and that for $k$-step adaptivity are on the same order. The key technical component of the lower bound proof is a reduction to finding the convoluting distribution that optimally obfuscates the sign of a Gaussian signal. Our lower bound construction also reveals a transparent and elementary proof of the matching upper bound as an alternative approach to Russo and Zou (2015), who used information-theoretic tools to provide the same upper bound. We believe that the proposed framework opens up opportunities to obtain theoretical insights for many other settings of adaptive data analysis, which would extend the idea to more practical realms. |
Tasks | |
Published | 2016-02-13 |
URL | http://arxiv.org/abs/1602.04287v1 |
http://arxiv.org/pdf/1602.04287v1.pdf | |
PWC | https://paperswithcode.com/paper/a-minimax-theory-for-adaptive-data-analysis |
Repo | |
Framework | |
Deep attractor network for single-microphone speaker separation
Title | Deep attractor network for single-microphone speaker separation |
Authors | Zhuo Chen, Yi Luo, Nima Mesgarani |
Abstract | Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source. Attractor points in this study are created by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. The proposed model is different from prior works in that it implements an end-to-end training, and it does not depend on the number of sources in the mixture. Two strategies are explored in the test time, K-means and fixed attractor points, where the latter requires no post-processing and can be implemented in real-time. We evaluated our system on Wall Street Journal dataset and show 5.49% improvement over the previous state-of-the-art methods. |
Tasks | Speaker Separation, Speech Separation |
Published | 2016-11-27 |
URL | http://arxiv.org/abs/1611.08930v2 |
http://arxiv.org/pdf/1611.08930v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-attractor-network-for-single-microphone |
Repo | |
Framework | |
Inductive Coherence
Title | Inductive Coherence |
Authors | Scott Garrabrant, Benya Fallenstein, Abram Demski, Nate Soares |
Abstract | While probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic. Recent work on assigning probabilities to mathematical statements has used the concept of coherent distributions, which satisfy logical constraints such as the probability of a sentence and its negation summing to one. Although there are algorithms which converge to a coherent probability distribution in the limit, this yields only weak guarantees about finite approximations of these distributions. In our setting, this is a significant limitation: Coherent distributions assign probability one to all statements provable in a specific logical theory, such as Peano Arithmetic, which can prove what the output of any terminating computation is; thus, a coherent distribution must assign probability one to the output of any terminating computation. To model uncertainty about computations, we propose to work with approximations to coherent distributions. We introduce inductive coherence, a strengthening of coherence that provides appropriate constraints on finite approximations, and propose an algorithm which satisfies this criterion. |
Tasks | |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05288v3 |
http://arxiv.org/pdf/1604.05288v3.pdf | |
PWC | https://paperswithcode.com/paper/inductive-coherence |
Repo | |
Framework | |
Automatic recognition of element classes and boundaries in the birdsong with variable sequences
Title | Automatic recognition of element classes and boundaries in the birdsong with variable sequences |
Authors | Takuya Koumura, Kazuo Okanoya |
Abstract | Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unlike human speech recognition, analysis of sequential vocalizations often requires accurate extraction of timing information. In the present study we propose automated systems suitable for recognizing birdsong, one of the most intensively investigated sequential vocalizations, focusing on the three properties of the birdsong. First, a song is a sequence of vocal elements, called notes, which can be grouped into categories. Second, temporal structure of birdsong is precisely controlled, meaning that temporal information is important in song analysis. Finally, notes are produced according to certain probabilistic rules, which may facilitate the accurate song recognition. We divided the procedure of song recognition into three sub-steps: local classification, boundary detection, and global sequencing, each of which corresponds to each of the three properties of birdsong. We compared the performances of several different ways to arrange these three steps. As results, we demonstrated a hybrid model of a deep neural network and a hidden Markov model is effective in recognizing birdsong with variable note sequences. We propose suitable arrangements of methods according to whether accurate boundary detection is needed. Also we designed the new measure to jointly evaluate the accuracy of note classification and boundary detection. Our methods should be applicable, with small modification and tuning, to the songs in other species that hold the three properties of the sequential vocalization. |
Tasks | Boundary Detection, Speech Recognition |
Published | 2016-01-23 |
URL | http://arxiv.org/abs/1601.06248v1 |
http://arxiv.org/pdf/1601.06248v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-recognition-of-element-classes-and |
Repo | |
Framework | |
Computational and Statistical Tradeoffs in Learning to Rank
Title | Computational and Statistical Tradeoffs in Learning to Rank |
Authors | Ashish Khetan, Sewoong Oh |
Abstract | For massive and heterogeneous modern datasets, it is of fundamental interest to provide guarantees on the accuracy of estimation when computational resources are limited. In the application of learning to rank, we provide a hierarchy of rank-breaking mechanisms ordered by the complexity in thus generated sketch of the data. This allows the number of data points collected to be gracefully traded off against computational resources available, while guaranteeing the desired level of accuracy. Theoretical guarantees on the proposed generalized rank-breaking implicitly provide such trade-offs, which can be explicitly characterized under certain canonical scenarios on the structure of the data. |
Tasks | Learning-To-Rank |
Published | 2016-08-22 |
URL | http://arxiv.org/abs/1608.06203v1 |
http://arxiv.org/pdf/1608.06203v1.pdf | |
PWC | https://paperswithcode.com/paper/computational-and-statistical-tradeoffs-in |
Repo | |
Framework | |