Paper Group ANR 531
A Comprehensive Survey on Bengali Phoneme Recognition. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation. Simplified Long Short-term Memory Recurrent Neural Networks: part III. Fingerprint Orientation Refinement through Iterative Smoothing. Converting Cascade-Correlation Neural Nets into Probabilistic …
A Comprehensive Survey on Bengali Phoneme Recognition
Title | A Comprehensive Survey on Bengali Phoneme Recognition |
Authors | Sadia Tasnim Swarna, Shamim Ehsan, Md. Saiful Islam, Marium E Jannat |
Abstract | Hidden Markov model based various phoneme recognition methods for Bengali language is reviewed. Automatic phoneme recognition for Bengali language using multilayer neural network is reviewed. Usefulness of multilayer neural network over single layer neural network is discussed. Bangla phonetic feature table construction and enhancement for Bengali speech recognition is also discussed. Comparison among these methods is discussed. |
Tasks | Speech Recognition |
Published | 2017-01-27 |
URL | http://arxiv.org/abs/1701.08156v2 |
http://arxiv.org/pdf/1701.08156v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-survey-on-bengali-phoneme |
Repo | |
Framework | |
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Title | Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation |
Authors | YuXuan Liu, Abhishek Gupta, Pieter Abbeel, Sergey Levine |
Abstract | Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning “imitation-from-observation,” and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations in the same environment configuration, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show the effectiveness of our approach in learning a wide range of real-world robotic tasks modeled after common household chores from videos of a human demonstrator, including sweeping, ladling almonds, pushing objects as well as a number of tasks in simulation. |
Tasks | Imitation Learning, Video Prediction |
Published | 2017-07-11 |
URL | http://arxiv.org/abs/1707.03374v2 |
http://arxiv.org/pdf/1707.03374v2.pdf | |
PWC | https://paperswithcode.com/paper/imitation-from-observation-learning-to |
Repo | |
Framework | |
Simplified Long Short-term Memory Recurrent Neural Networks: part III
Title | Simplified Long Short-term Memory Recurrent Neural Networks: part III |
Authors | Atra Akandeh, Fathi M. Salem |
Abstract | This is part III of three-part work. In parts I and II, we have presented eight variants for simplified Long Short Term Memory (LSTM) recurrent neural networks (RNNs). It is noted that fast computation, specially in constrained computing resources, are an important factor in processing big time-sequence data. In this part III paper, we present and evaluate two new LSTM model variants which dramatically reduce the computational load while retaining comparable performance to the base (standard) LSTM RNNs. In these new variants, we impose (Hadamard) pointwise state multiplications in the cell-memory network in addition to the gating signal networks. |
Tasks | |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04626v1 |
http://arxiv.org/pdf/1707.04626v1.pdf | |
PWC | https://paperswithcode.com/paper/simplified-long-short-term-memory-recurrent |
Repo | |
Framework | |
Fingerprint Orientation Refinement through Iterative Smoothing
Title | Fingerprint Orientation Refinement through Iterative Smoothing |
Authors | Pierluigi Maponi, Riccardo Piergallini, Filippo Santarelli |
Abstract | We propose a new gradient-based method for the extraction of the orientation field associated to a fingerprint, and a regularisation procedure to improve the orientation field computed from noisy fingerprint images. The regularisation algorithm is based on three new integral operators, introduced and discussed in this paper. A pre-processing technique is also proposed to achieve better performances of the algorithm. The results of a numerical experiment are reported to give an evidence of the efficiency of the proposed algorithm. |
Tasks | |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03214v1 |
http://arxiv.org/pdf/1711.03214v1.pdf | |
PWC | https://paperswithcode.com/paper/fingerprint-orientation-refinement-through |
Repo | |
Framework | |
Converting Cascade-Correlation Neural Nets into Probabilistic Generative Models
Title | Converting Cascade-Correlation Neural Nets into Probabilistic Generative Models |
Authors | Ardavan Salehi Nobandegani, Thomas R. Shultz |
Abstract | Humans are not only adept in recognizing what class an input instance belongs to (i.e., classification task), but perhaps more remarkably, they can imagine (i.e., generate) plausible instances of a desired class with ease, when prompted. Inspired by this, we propose a framework which allows transforming Cascade-Correlation Neural Networks (CCNNs) into probabilistic generative models, thereby enabling CCNNs to generate samples from a category of interest. CCNNs are a well-known class of deterministic, discriminative NNs, which autonomously construct their topology, and have been successful in giving accounts for a variety of psychological phenomena. Our proposed framework is based on a Markov Chain Monte Carlo (MCMC) method, called the Metropolis-adjusted Langevin algorithm, which capitalizes on the gradient information of the target distribution to direct its explorations towards regions of high probability, thereby achieving good mixing properties. Through extensive simulations, we demonstrate the efficacy of our proposed framework. |
Tasks | |
Published | 2017-01-18 |
URL | http://arxiv.org/abs/1701.05004v1 |
http://arxiv.org/pdf/1701.05004v1.pdf | |
PWC | https://paperswithcode.com/paper/converting-cascade-correlation-neural-nets |
Repo | |
Framework | |
Learning Robust Representations for Computer Vision
Title | Learning Robust Representations for Computer Vision |
Authors | Peng Zheng, Aleksandr Y. Aravkin, Karthikeyan Natesan Ramamurthy, Jayaraman Jayaraman Thiagarajan |
Abstract | Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets. |
Tasks | Representation Learning |
Published | 2017-07-31 |
URL | http://arxiv.org/abs/1708.00069v1 |
http://arxiv.org/pdf/1708.00069v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-robust-representations-for-computer |
Repo | |
Framework | |
Combining Machine Learning and Physics to Understand Glassy Systems
Title | Combining Machine Learning and Physics to Understand Glassy Systems |
Authors | Samuel S. Schoenholz |
Abstract | Our understanding of supercooled liquids and glasses has lagged significantly behind that of simple liquids and crystalline solids. This is in part due to the many possibly relevant degrees of freedom that are present due to the disorder inherent to these systems and in part to non-equilibrium effects which are difficult to treat in the standard context of statistical physics. Together these issues have resulted in a field whose theories are under-constrained by experiment and where fundamental questions are still unresolved. Mean field results have been successful in infinite dimensions but it is unclear to what extent they apply to realistic systems and assume uniform local structure. At odds with this are theories premised on the existence of structural defects. However, until recently it has been impossible to find structural signatures that are predictive of dynamics. Here we summarize and recast the results from several recent papers offering a data driven approach to building a phenomenological theory of disordered materials by combining machine learning with physical intuition. |
Tasks | |
Published | 2017-09-23 |
URL | http://arxiv.org/abs/1709.08015v1 |
http://arxiv.org/pdf/1709.08015v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-machine-learning-and-physics-to |
Repo | |
Framework | |
The landscape of the spiked tensor model
Title | The landscape of the spiked tensor model |
Authors | Gerard Ben Arous, Song Mei, Andrea Montanari, Mihai Nica |
Abstract | We consider the problem of estimating a large rank-one tensor ${\boldsymbol u}^{\otimes k}\in({\mathbb R}^{n})^{\otimes k}$, $k\ge 3$ in Gaussian noise. Earlier work characterized a critical signal-to-noise ratio $\lambda_{Bayes}= O(1)$ above which an ideal estimator achieves strictly positive correlation with the unknown vector of interest. Remarkably no polynomial-time algorithm is known that achieved this goal unless $\lambda\ge C n^{(k-2)/4}$ and even powerful semidefinite programming relaxations appear to fail for $1\ll \lambda\ll n^{(k-2)/4}$. In order to elucidate this behavior, we consider the maximum likelihood estimator, which requires maximizing a degree-$k$ homogeneous polynomial over the unit sphere in $n$ dimensions. We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate. We show that (for $\lambda$ larger than a constant) critical points are either very close to the unknown vector ${\boldsymbol u}$, or are confined in a band of width $\Theta(\lambda^{-1/(k-1)})$ around the maximum circle that is orthogonal to ${\boldsymbol u}$. For local maxima, this band shrinks to be of size $\Theta(\lambda^{-1/(k-2)})$. These `uninformative’ local maxima are likely to cause the failure of optimization algorithms. | |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05424v2 |
http://arxiv.org/pdf/1711.05424v2.pdf | |
PWC | https://paperswithcode.com/paper/the-landscape-of-the-spiked-tensor-model |
Repo | |
Framework | |
An Exploration of Word Embedding Initialization in Deep-Learning Tasks
Title | An Exploration of Word Embedding Initialization in Deep-Learning Tasks |
Authors | Tom Kocmi, Ondřej Bojar |
Abstract | Word embeddings are the interface between the world of discrete units of text processing and the continuous, differentiable world of neural networks. In this work, we examine various random and pretrained initialization methods for embeddings used in deep networks and their effect on the performance on four NLP tasks with both recurrent and convolutional architectures. We confirm that pretrained embeddings are a little better than random initialization, especially considering the speed of learning. On the other hand, we do not see any significant difference between various methods of random initialization, as long as the variance is kept reasonably low. High-variance initialization prevents the network to use the space of embeddings and forces it to use other free parameters to accomplish the task. We support this hypothesis by observing the performance in learning lexical relations and by the fact that the network can learn to perform reasonably in its task even with fixed random embeddings. |
Tasks | Word Embeddings |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09160v1 |
http://arxiv.org/pdf/1711.09160v1.pdf | |
PWC | https://paperswithcode.com/paper/an-exploration-of-word-embedding |
Repo | |
Framework | |
Towards continuous control of flippers for a multi-terrain robot using deep reinforcement learning
Title | Towards continuous control of flippers for a multi-terrain robot using deep reinforcement learning |
Authors | Giuseppe Paolo, Lei Tai, Ming Liu |
Abstract | In this paper we focus on developing a control algorithm for multi-terrain tracked robots with flippers using a reinforcement learning (RL) approach. The work is based on the deep deterministic policy gradient (DDPG) algorithm, proven to be very successful in simple simulation environments. The algorithm works in an end-to-end fashion in order to control the continuous position of the flippers. This end-to-end approach makes it easy to apply the controller to a wide array of circumstances, but the huge flexibility comes to the cost of an increased difficulty of solution. The complexity of the task is enlarged even more by the fact that real multi-terrain robots move in partially observable environments. Notwithstanding these complications, being able to smoothly control a multi-terrain robot can produce huge benefits in impaired people daily lives or in search and rescue situations. |
Tasks | Continuous Control |
Published | 2017-09-25 |
URL | http://arxiv.org/abs/1709.08430v1 |
http://arxiv.org/pdf/1709.08430v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-continuous-control-of-flippers-for-a |
Repo | |
Framework | |
A Novel Brain Decoding Method: a Correlation Network Framework for Revealing Brain Connections
Title | A Novel Brain Decoding Method: a Correlation Network Framework for Revealing Brain Connections |
Authors | Siyu Yu, Nanning Zheng, Yongqiang Ma, Hao Wu, Badong Chen |
Abstract | Brain decoding is a hot spot in cognitive science, which focuses on reconstructing perceptual images from brain activities. Analyzing the correlations of collected data from human brain activities and representing activity patterns are two problems in brain decoding based on functional magnetic resonance imaging (fMRI) signals. However, existing correlation analysis methods mainly focus on the strength information of voxel, which reveals functional connectivity in the cerebral cortex. They tend to neglect the structural information that implies the intracortical or intrinsic connections; that is, structural connectivity. Hence, the effective connectivity inferred by these methods is relatively unilateral. Therefore, we proposed a correlation network (CorrNet) framework that could be flexibly combined with diverse pattern representation models. In the CorrNet framework, the topological correlation was introduced to reveal structural information. Rich correlations were obtained, which contributed to specifying the underlying effective connectivity. We also combined the CorrNet framework with a linear support vector machine (SVM) and a dynamic evolving spike neuron network (SNN) for pattern representation separately, thus providing a novel method for decoding cognitive activity patterns. Experimental results verified the reliability and robustness of our CorrNet framework and demonstrated that the new method achieved significant improvement in brain decoding over comparable methods. |
Tasks | Brain Decoding |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.01668v1 |
http://arxiv.org/pdf/1712.01668v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-brain-decoding-method-a-correlation |
Repo | |
Framework | |
Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System
Title | Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System |
Authors | Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang, Bin Wang, Qiyong Lu |
Abstract | We consider a D2D-enabled cellular network where user equipments (UEs) owned by rational users are incentivized to form D2D pairs using tokens. They exchange tokens electronically to “buy” and “sell” D2D services. Meanwhile the devices have the ability to choose the transmission mode, i.e. receiving data via cellular links or D2D links. Thus taking the different benefits brought by diverse traffic types as a prior, the UEs can utilize their tokens more efficiently via transmission mode selection. In this paper, the optimal transmission mode selection strategy as well as token collection policy are investigated to maximize the long-term utility in the dynamic network environment. The optimal policy is proved to be a threshold strategy, and the thresholds have a monotonicity property. Numerical simulations verify our observations and the gain from transmission mode selection is observed. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00660v3 |
http://arxiv.org/pdf/1703.00660v3.pdf | |
PWC | https://paperswithcode.com/paper/traffic-aware-transmission-mode-selection-in |
Repo | |
Framework | |
Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections
Title | Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections |
Authors | David Isele, Akansel Cosgun, Kikuo Fujimura |
Abstract | We analyze how the knowledge to autonomously handle one type of intersection, represented as a Deep Q-Network, translates to other types of intersections (tasks). We view intersection handling as a deep reinforcement learning problem, which approximates the state action Q function as a deep neural network. Using a traffic simulator, we show that directly copying a network trained for one type of intersection to another type of intersection decreases the success rate. We also show that when a network that is pre-trained on Task A and then is fine-tuned on a Task B, the resulting network not only performs better on the Task B than an network exclusively trained on Task A, but also retained knowledge on the Task A. Finally, we examine a lifelong learning setting, where we train a single network on five different types of intersections sequentially and show that the resulting network exhibited catastrophic forgetting of knowledge on previous tasks. This result suggests a need for a long-term memory component to preserve knowledge. |
Tasks | Transfer Learning |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.01197v1 |
http://arxiv.org/pdf/1705.01197v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-knowledge-transfer-in-deep-q |
Repo | |
Framework | |
Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features
Title | Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features |
Authors | Shota Horiguchi, Daiki Ikami, Kiyoharu Aizawa |
Abstract | The extraction of useful deep features is important for many computer vision tasks. Deep features extracted from classification networks have proved to perform well in those tasks. To obtain features of greater usefulness, end-to-end distance metric learning (DML) has been applied to train the feature extractor directly. However, in these DML studies, there were no equitable comparisons between features extracted from a DML-based network and those from a softmax-based network. In this paper, by presenting objective comparisons between these two approaches under the same network architecture, we show that the softmax-based features perform competitive, or even better, to the state-of-the-art DML features when the size of the dataset, that is, the number of training samples per class, is large. The results suggest that softmax-based features should be properly taken into account when evaluating the performance of deep features. |
Tasks | Metric Learning |
Published | 2017-12-29 |
URL | http://arxiv.org/abs/1712.10151v2 |
http://arxiv.org/pdf/1712.10151v2.pdf | |
PWC | https://paperswithcode.com/paper/significance-of-softmax-based-features-in |
Repo | |
Framework | |
A Gaussian Process Regression Model for Distribution Inputs
Title | A Gaussian Process Regression Model for Distribution Inputs |
Authors | François Bachoc, Fabrice Gamboa, Jean-Michel Loubes, Nil Venet |
Abstract | Monge-Kantorovich distances, otherwise known as Wasserstein distances, have received a growing attention in statistics and machine learning as a powerful discrepancy measure for probability distributions. In this paper, we focus on forecasting a Gaussian process indexed by probability distributions. For this, we provide a family of positive definite kernels built using transportation based distances. We provide a probabilistic understanding of these kernels and characterize the corresponding stochastic processes. We prove that the Gaussian processes indexed by distributions corresponding to these kernels can be efficiently forecast, opening new perspectives in Gaussian process modeling. |
Tasks | Gaussian Processes |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1701.09055v2 |
http://arxiv.org/pdf/1701.09055v2.pdf | |
PWC | https://paperswithcode.com/paper/a-gaussian-process-regression-model-for |
Repo | |
Framework | |