Paper Group ANR 360
High Dimensional Structured Superposition Models. Multi Resolution LSTM For Long Term Prediction In Neural Activity Video. Profiling of OCR’ed Historical Texts Revisited. What caused what? A quantitative account of actual causation using dynamical causal networks. Evaluating Ising Processing Units with Integer Programming. Dr.VAE: Drug Response Var …
High Dimensional Structured Superposition Models
Title | High Dimensional Structured Superposition Models |
Authors | Qilong Gu, Arindam Banerjee |
Abstract | High dimensional superposition models characterize observations using parameters which can be written as a sum of multiple component parameters, each with its own structure, e.g., sum of low rank and sparse matrices, sum of sparse and rotated sparse vectors, etc. In this paper, we consider general superposition models which allow sum of any number of component parameters, and each component structure can be characterized by any norm. We present a simple estimator for such models, give a geometric condition under which the components can be accurately estimated, characterize sample complexity of the estimator, and give high probability non-asymptotic bounds on the componentwise estimation error. We use tools from empirical processes and generic chaining for the statistical analysis, and our results, which substantially generalize prior work on superposition models, are in terms of Gaussian widths of suitable sets. |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10886v1 |
http://arxiv.org/pdf/1705.10886v1.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-structured-superposition |
Repo | |
Framework | |
Multi Resolution LSTM For Long Term Prediction In Neural Activity Video
Title | Multi Resolution LSTM For Long Term Prediction In Neural Activity Video |
Authors | Yilin Song, Jonathan Viventi, Yao Wang |
Abstract | Epileptic seizures are caused by abnormal, overly syn- chronized, electrical activity in the brain. The abnor- mal electrical activity manifests as waves, propagating across the brain. Accurate prediction of the propagation velocity and direction of these waves could enable real- time responsive brain stimulation to suppress or prevent the seizures entirely. However, this problem is very chal- lenging because the algorithm must be able to predict the neural signals in a sufficiently long time horizon to allow enough time for medical intervention. We consider how to accomplish long term prediction using a LSTM network. To alleviate the vanishing gradient problem, we propose two encoder-decoder-predictor structures, both using multi-resolution representation. The novel LSTM structure with multi-resolution layers could significantly outperform the single-resolution benchmark with similar number of parameters. To overcome the blurring effect associated with video prediction in the pixel domain using standard mean square error (MSE) loss, we use energy- based adversarial training to improve the long-term pre- diction. We demonstrate and analyze how a discriminative model with an encoder-decoder structure using 3D CNN model improves long term prediction. |
Tasks | Video Prediction |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.02893v2 |
http://arxiv.org/pdf/1705.02893v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-resolution-lstm-for-long-term |
Repo | |
Framework | |
Profiling of OCR’ed Historical Texts Revisited
Title | Profiling of OCR’ed Historical Texts Revisited |
Authors | Florian Fink, Klaus-U. Schulz, Uwe Springmann |
Abstract | In the absence of ground truth it is not possible to automatically determine the exact spectrum and occurrences of OCR errors in an OCR’ed text. Yet, for interactive postcorrection of OCR’ed historical printings it is extremely useful to have a statistical profile available that provides an estimate of error classes with associated frequencies, and that points to conjectured errors and suspicious tokens. The method introduced in Reffle (2013) computes such a profile, combining lexica, pattern sets and advanced matching techniques in a specialized Expectation Maximization (EM) procedure. Here we improve this method in three respects: First, the method in Reffle (2013) is not adaptive: user feedback obtained by actual postcorrection steps cannot be used to compute refined profiles. We introduce a variant of the method that is open for adaptivity, taking correction steps of the user into account. This leads to higher precision with respect to recognition of erroneous OCR tokens. Second, during postcorrection often new historical patterns are found. We show that adding new historical patterns to the linguistic background resources leads to a second kind of improvement, enabling even higher precision by telling historical spellings apart from OCR errors. Third, the method in Reffle (2013) does not make any active use of tokens that cannot be interpreted in the underlying channel model. We show that adding these uninterpretable tokens to the set of conjectured errors leads to a significant improvement of the recall for error detection, at the same time improving precision. |
Tasks | Optical Character Recognition |
Published | 2017-01-19 |
URL | http://arxiv.org/abs/1701.05377v1 |
http://arxiv.org/pdf/1701.05377v1.pdf | |
PWC | https://paperswithcode.com/paper/profiling-of-ocred-historical-texts-revisited |
Repo | |
Framework | |
What caused what? A quantitative account of actual causation using dynamical causal networks
Title | What caused what? A quantitative account of actual causation using dynamical causal networks |
Authors | Larissa Albantakis, William Marshall, Erik Hoel, Giulio Tononi |
Abstract | Actual causation is concerned with the question “what caused what?” Consider a transition between two states within a system of interacting elements, such as an artificial neural network, or a biological brain circuit. Which combination of synapses caused the neuron to fire? Which image features caused the classifier to misinterpret the picture? Even detailed knowledge of the system’s causal network, its elements, their states, connectivity, and dynamics does not automatically provide a straightforward answer to the “what caused what?” question. Counterfactual accounts of actual causation based on graphical models, paired with system interventions, have demonstrated initial success in addressing specific problem cases in line with intuitive causal judgments. Here, we start from a set of basic requirements for causation (realization, composition, information, integration, and exclusion) and develop a rigorous, quantitative account of actual causation that is generally applicable to discrete dynamical systems. We present a formal framework to evaluate these causal requirements that is based on system interventions and partitions, and considers all counterfactuals of a state transition. This framework is used to provide a complete causal account of the transition by identifying and quantifying the strength of all actual causes and effects linking the two consecutive system states. Finally, we examine several exemplary cases and paradoxes of causation and show that they can be illuminated by the proposed framework for quantifying actual causation. |
Tasks | |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.06716v2 |
http://arxiv.org/pdf/1708.06716v2.pdf | |
PWC | https://paperswithcode.com/paper/what-caused-what-a-quantitative-account-of |
Repo | |
Framework | |
Evaluating Ising Processing Units with Integer Programming
Title | Evaluating Ising Processing Units with Integer Programming |
Authors | Carleton Coffrin, Harsha Nagarajan, Russell Bent |
Abstract | The recent emergence of novel computational devices, such as adiabatic quantum computers, CMOS annealers, and optical parametric oscillators, present new opportunities for hybrid-optimization algorithms that are hardware accelerated by these devices. In this work, we propose the idea of an Ising processing unit as a computational abstraction for reasoning about these emerging devices. The challenges involved in using and benchmarking these devices are presented and commercial mixed integer programming solvers are proposed as a valuable tool for the validation of these disparate hardware platforms. The proposed validation methodology is demonstrated on a D-Wave 2X adiabatic quantum computer, one example of an Ising processing unit. The computational results demonstrate that the D-Wave hardware consistently produces high-quality solutions and suggests that as IPU technology matures it could become a valuable co-processor in hybrid-optimization algorithms. |
Tasks | |
Published | 2017-07-02 |
URL | https://arxiv.org/abs/1707.00355v2 |
https://arxiv.org/pdf/1707.00355v2.pdf | |
PWC | https://paperswithcode.com/paper/ising-processing-units-potential-and |
Repo | |
Framework | |
Dr.VAE: Drug Response Variational Autoencoder
Title | Dr.VAE: Drug Response Variational Autoencoder |
Authors | Ladislav Rampasek, Daniel Hidru, Petr Smirnov, Benjamin Haibe-Kains, Anna Goldenberg |
Abstract | We present two deep generative models based on Variational Autoencoders to improve the accuracy of drug response prediction. Our models, Perturbation Variational Autoencoder and its semi-supervised extension, Drug Response Variational Autoencoder (Dr.VAE), learn latent representation of the underlying gene states before and after drug application that depend on: (i) drug-induced biological change of each gene and (ii) overall treatment response outcome. Our VAE-based models outperform the current published benchmarks in the field by anywhere from 3 to 11% AUROC and 2 to 30% AUPR. In addition, we found that better reconstruction accuracy does not necessarily lead to improvement in classification accuracy and that jointly trained models perform better than models that minimize reconstruction error independently. |
Tasks | |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08203v2 |
http://arxiv.org/pdf/1706.08203v2.pdf | |
PWC | https://paperswithcode.com/paper/drvae-drug-response-variational-autoencoder |
Repo | |
Framework | |
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
Title | Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations |
Authors | Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, Sergey Levine |
Abstract | Dexterous multi-fingered hands are extremely versatile and provide a generic way to perform a multitude of tasks in human-centric environments. However, effectively controlling them remains challenging due to their high dimensionality and large number of potential contacts. Deep reinforcement learning (DRL) provides a model-agnostic approach to control complex dynamical systems, but has not been shown to scale to high-dimensional dexterous manipulation. Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency. Consequently, the success of DRL in robotics has thus far been limited to simpler manipulators and tasks. In this work, we show that model-free DRL can effectively scale up to complex manipulation tasks with a high-dimensional 24-DoF hand, and solve them from scratch in simulated experiments. Furthermore, with the use of a small number of human demonstrations, the sample complexity can be significantly reduced, which enables learning with sample sizes equivalent to a few hours of robot experience. The use of demonstrations result in policies that exhibit very natural movements and, surprisingly, are also substantially more robust. |
Tasks | |
Published | 2017-09-28 |
URL | http://arxiv.org/abs/1709.10087v2 |
http://arxiv.org/pdf/1709.10087v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-complex-dexterous-manipulation-with |
Repo | |
Framework | |
Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective
Title | Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective |
Authors | Shixia Liu, Xiting Wang, Mengchen Liu, Jun Zhu |
Abstract | Interactive model analysis, the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization, is very important for users to efficiently solve real-world artificial intelligence and data mining problems. Dramatic advances in big data analytics has led to a wide variety of interactive model analysis tasks. In this paper, we present a comprehensive analysis and interpretation of this rapidly developing area. Specifically, we classify the relevant work into three categories: understanding, diagnosis, and refinement. Each category is exemplified by recent influential work. Possible future research opportunities are also explored and discussed. |
Tasks | |
Published | 2017-02-04 |
URL | http://arxiv.org/abs/1702.01226v1 |
http://arxiv.org/pdf/1702.01226v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-better-analysis-of-machine-learning |
Repo | |
Framework | |
Overcoming Exploration in Reinforcement Learning with Demonstrations
Title | Overcoming Exploration in Reinforcement Learning with Demonstrations |
Authors | Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel |
Abstract | Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy. |
Tasks | Continuous Control |
Published | 2017-09-28 |
URL | http://arxiv.org/abs/1709.10089v2 |
http://arxiv.org/pdf/1709.10089v2.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-exploration-in-reinforcement |
Repo | |
Framework | |
Optimized Structured Sparse Sensing Matrices for Compressive Sensing
Title | Optimized Structured Sparse Sensing Matrices for Compressive Sensing |
Authors | Tao Hong, Xiao Li, Zhihui Zhu, Qiuwei Li |
Abstract | We consider designing a robust structured sparse sensing matrix consisting of a sparse matrix with a few non-zero entries per row and a dense base matrix for capturing signals efficiently We design the robust structured sparse sensing matrix through minimizing the distance between the Gram matrix of the equivalent dictionary and the target Gram of matrix holding small mutual coherence. Moreover, a regularization is added to enforce the robustness of the optimized structured sparse sensing matrix to the sparse representation error (SRE) of signals of interests. An alternating minimization algorithm with global sequence convergence is proposed for solving the corresponding optimization problem. Numerical experiments on synthetic data and natural images show that the obtained structured sensing matrix results in a higher signal reconstruction than a random dense sensing matrix. |
Tasks | Compressive Sensing, Image Compression |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06895v3 |
http://arxiv.org/pdf/1709.06895v3.pdf | |
PWC | https://paperswithcode.com/paper/optimized-structured-sparse-sensing-matrices |
Repo | |
Framework | |
Attention-based Vocabulary Selection for NMT Decoding
Title | Attention-based Vocabulary Selection for NMT Decoding |
Authors | Baskaran Sankaran, Markus Freitag, Yaser Al-Onaizan |
Abstract | Neural Machine Translation (NMT) models usually use large target vocabulary sizes to capture most of the words in the target language. The vocabulary size is a big factor when decoding new sentences as the final softmax layer normalizes over all possible target words. To address this problem, it is widely common to restrict the target vocabulary with candidate lists based on the source sentence. Usually, the candidate lists are a combination of external word-to-word aligner, phrase table entries or most frequent words. In this work, we propose a simple and yet novel approach to learn candidate lists directly from the attention layer during NMT training. The candidate lists are highly optimized for the current NMT model and do not need any external computation of the candidate pool. We show significant decoding speedup compared with using the entire vocabulary, without losing any translation quality for two language pairs. |
Tasks | Machine Translation |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03824v1 |
http://arxiv.org/pdf/1706.03824v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-vocabulary-selection-for-nmt |
Repo | |
Framework | |
A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations
Title | A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations |
Authors | Samuel Rönnqvist, Niko Schenk, Christian Chiarcos |
Abstract | We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model’s ability to selectively focus on the relevant parts of an input sequence. |
Tasks | |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08092v1 |
http://arxiv.org/pdf/1704.08092v1.pdf | |
PWC | https://paperswithcode.com/paper/a-recurrent-neural-model-with-attention-for |
Repo | |
Framework | |
Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks
Title | Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks |
Authors | Monireh Dabaghchian, Amir Alipour-Fanid, Kai Zeng, Qingsi Wang, Peter Auer |
Abstract | In a cognitive radio network, a secondary user learns the spectrum environment and dynamically accesses the channel where the primary user is inactive. At the same time, a primary user emulation (PUE) attacker can send falsified primary user signals and prevent the secondary user from utilizing the available channel. The best attacking strategies that an attacker can apply have not been well studied. In this paper, for the first time, we study optimal PUE attack strategies by formulating an online learning problem where the attacker needs to dynamically decide the attacking channel in each time slot based on its attacking experience. The challenge in our problem is that since the PUE attack happens in the spectrum sensing phase, the attacker cannot observe the reward on the attacked channel. To address this challenge, we utilize the attacker’s observation capability. We propose online learning-based attacking strategies based on the attacker’s observation capabilities. Through our analysis, we show that with no observation within the attacking slot, the attacker loses on the regret order, and with the observation of at least one channel, there is a significant improvement on the attacking performance. Observation of multiple channels does not give additional benefit to the attacker (only a constant scaling) though it gives insight on the number of observations required to achieve the minimum constant factor. Our proposed algorithms are optimal in the sense that their regret upper bounds match their corresponding regret lower-bounds. We show consistency between simulation and analytical results under various system parameters. |
Tasks | |
Published | 2017-09-28 |
URL | http://arxiv.org/abs/1709.10128v3 |
http://arxiv.org/pdf/1709.10128v3.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-with-randomized-feedback |
Repo | |
Framework | |
Low Resolution Face Recognition Using a Two-Branch Deep Convolutional Neural Network Architecture
Title | Low Resolution Face Recognition Using a Two-Branch Deep Convolutional Neural Network Architecture |
Authors | Erfan Zangeneh, Mohammad Rahmati, Yalda Mohsenzadeh |
Abstract | We propose a novel couple mappings method for low resolution face recognition using deep convolutional neural networks (DCNNs). The proposed architecture consists of two branches of DCNNs to map the high and low resolution face images into a common space with nonlinear transformations. The branch corresponding to transformation of high resolution images consists of 14 layers and the other branch which maps the low resolution face images to the common space includes a 5-layer super-resolution network connected to a 14-layer network. The distance between the features of corresponding high and low resolution images are backpropagated to train the networks. Our proposed method is evaluated on FERET data set and compared with state-of-the-art competing methods. Our extensive experimental results show that the proposed method significantly improves the recognition performance especially for very low resolution probe face images (11.4% improvement in recognition accuracy). Furthermore, it can reconstruct a high resolution image from its corresponding low resolution probe image which is comparable with state-of-the-art super-resolution methods in terms of visual quality. |
Tasks | Face Recognition, Super-Resolution |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06247v1 |
http://arxiv.org/pdf/1706.06247v1.pdf | |
PWC | https://paperswithcode.com/paper/low-resolution-face-recognition-using-a-two |
Repo | |
Framework | |
Image denoising using group sparsity residual and external nonlocal self-similarity prior
Title | Image denoising using group sparsity residual and external nonlocal self-similarity prior |
Authors | Zhiyuan Zha, Xinggan Zhang, Qiong Wang, Yechao Bai, Lan Tang |
Abstract | Nonlocal image representation has been successfully used in many image-related inverse problems including denoising, deblurring and deblocking. However, a majority of reconstruction methods only exploit the nonlocal self-similarity (NSS) prior of the degraded observation image, it is very challenging to reconstruct the latent clean image. In this paper we propose a novel model for image denoising via group sparsity residual and external NSS prior. To boost the performance of image denoising, the concept of group sparsity residual is proposed, and thus the problem of image denoising is transformed into one that reduces the group sparsity residual. Due to the fact that the groups contain a large amount of NSS information of natural images, we obtain a good estimation of the group sparse coefficients of the original image by the external NSS prior based on Gaussian Mixture model (GMM) learning and the group sparse coefficients of noisy image is used to approximate the estimation. Experimental results have demonstrated that the proposed method not only outperforms many state-of-the-art methods, but also delivers the best qualitative denoising results with finer details and less ringing artifacts. |
Tasks | Deblurring, Denoising, Image Denoising |
Published | 2017-01-03 |
URL | http://arxiv.org/abs/1701.00723v1 |
http://arxiv.org/pdf/1701.00723v1.pdf | |
PWC | https://paperswithcode.com/paper/image-denoising-using-group-sparsity-residual |
Repo | |
Framework | |