January 28, 2020

3027 words 15 mins read

Paper Group ANR 928

Paper Group ANR 928

Best Pair Formulation & Accelerated Scheme for Non-convex Principal Component Pursuit. Compression and Interpretability of Deep Neural Networks via Tucker Tensor Layer: From First Principles to Tensor Valued Back-Propagation. Efficient Differentiable Neural Architecture Search with Meta Kernels. Evaluating Hierarchies through A Partially Observable …

Best Pair Formulation & Accelerated Scheme for Non-convex Principal Component Pursuit

Title Best Pair Formulation & Accelerated Scheme for Non-convex Principal Component Pursuit
Authors Aritra Dutta, Filip Hanzely, Jingwei Liang, Peter Richtárik
Abstract The best pair problem aims to find a pair of points that minimize the distance between two disjoint sets. In this paper, we formulate the classical robust principal component analysis (RPCA) as the best pair; which was not considered before. We design an accelerated proximal gradient scheme to solve it, for which we show global convergence, as well as the local linear rate. Our extensive numerical experiments on both real and synthetic data suggest that the algorithm outperforms relevant baseline algorithms in the literature.
Tasks
Published 2019-05-25
URL https://arxiv.org/abs/1905.10598v2
PDF https://arxiv.org/pdf/1905.10598v2.pdf
PWC https://paperswithcode.com/paper/best-pair-formulation-accelerated-scheme-for
Repo
Framework

Compression and Interpretability of Deep Neural Networks via Tucker Tensor Layer: From First Principles to Tensor Valued Back-Propagation

Title Compression and Interpretability of Deep Neural Networks via Tucker Tensor Layer: From First Principles to Tensor Valued Back-Propagation
Authors Giuseppe G. Calvi, Ahmad Moniri, Mahmoud Mahfouz, Qibin Zhao, Danilo P. Mandic
Abstract This work aims to help resolve the two main stumbling blocks in the application of Deep Neural Networks (DNNs), that is, the exceedingly large number of trainable parameters and their physical interpretability. This is achieved through a tensor valued approach, based on the proposed Tucker Tensor Layer (TTL), as an alternative to the dense weight-matrices of DNNs. This allows us to treat the weight-matrices of general DNNs as a matrix unfolding of a higher order weight-tensor. By virtue of the compression properties of tensor decompositions, this enables us to introduce a novel and efficient framework for exploiting the multi-way nature of the weight-tensor in order to dramatically reduce the number of DNN parameters. We also derive the tensor valued back-propagation algorithm within the TTL framework, by extending the notion of matrix derivatives to tensors. In this way, the physical interpretability of the Tucker decomposition is exploited to gain physical insights into the NN training, through the process of computing gradients with respect to each factor matrix. The proposed framework is validated on both synthetic data, and the benchmark datasets MNIST, Fashion-MNIST, and CIFAR-10. Overall, through the ability to provide the relative importance of each data feature in training, the TTL back-propagation is shown to help mitigate the “black-box” nature inherent to NNs. Experiments also illustrate that the TTL achieves a 66.63-fold compression on MNIST and Fashion-MNIST, while, by simplifying the VGG-16 network, it achieves a 10% speed up in training time, at a comparable performance.
Tasks
Published 2019-03-14
URL https://arxiv.org/abs/1903.06133v2
PDF https://arxiv.org/pdf/1903.06133v2.pdf
PWC https://paperswithcode.com/paper/tucker-tensor-layer-in-fully-connected-neural
Repo
Framework

Efficient Differentiable Neural Architecture Search with Meta Kernels

Title Efficient Differentiable Neural Architecture Search with Meta Kernels
Authors Shoufa Chen, Yunpeng Chen, Shuicheng Yan, Jiashi Feng
Abstract The searching procedure of neural architecture search (NAS) is notoriously time consuming and cost prohibitive.To make the search space continuous, most existing gradient-based NAS methods relax the categorical choice of a particular operation to a softmax over all possible operations and calculate the weighted sum of multiple features, resulting in a large memory requirement and a huge computation burden. In this work, we propose an efficient and novel search strategy with meta kernels. We directly encode the supernet from the perspective on convolution kernels and “shrink” multiple convolution kernel candidates into a single one before these candidates operate on the input feature. In this way, only a single feature is generated between two intermediate nodes. The memory for storing intermediate features and the resource budget for conducting convolution operations are both reduced remarkably. Despite high efficiency, our search strategy can search in a more fine-grained way than existing works and increases the capacity for representing possible networks. We demonstrate the effectiveness of our search strategy by conducting extensive experiments. Specifically, our method achieves 77.0% top-1 accuracy on ImageNet benchmark dataset with merely 357M FLOPs, outperforming both EfficientNet and MobileNetV3 under the same FLOPs constraints. Compared to models discovered by the start-of-the-art NAS method, our method achieves the same (sometimes even better) performance, while faster by three orders of magnitude.
Tasks Neural Architecture Search
Published 2019-12-10
URL https://arxiv.org/abs/1912.04749v1
PDF https://arxiv.org/pdf/1912.04749v1.pdf
PWC https://paperswithcode.com/paper/efficient-differentiable-neural-architecture
Repo
Framework

Evaluating Hierarchies through A Partially Observable Markov Decision Processes Methodology

Title Evaluating Hierarchies through A Partially Observable Markov Decision Processes Methodology
Authors Weipeng Huang, Guangyuan Piao, Raul Moreno, Neil J. Hurley
Abstract Hierarchical clustering has been shown to be valuable in many scenarios, e.g. catalogues, biology research, image processing, and so on. Despite its usefulness to many situations, there is no agreed methodology on how to properly evaluate the hierarchies produced from different techniques, particularly in the case where ground-truth labels are unavailable. This motivates us to propose a framework for assessing the quality of hierarchical clustering allocations which covers the case of no ground-truth information. Such a quality measurement is useful, for example, to assess the hierarchical structures used by online retailer websites to display their product catalogues. Differently to all the previous measures and metrics, our framework tackles the evaluation from a decision theoretic perspective. We model the process as a bot searching stochastically for items in the hierarchy and establish a measure representing the degree to which the hierarchy supports this search. We employ the concept of Partially Observable Markov Decision Processes (POMDP) to model the uncertainty, the decision making, and the cognitive return for searchers in such a scenario. In this paper, we fully discuss the modeling details and demonstrate its application on some datasets.
Tasks Decision Making
Published 2019-08-19
URL https://arxiv.org/abs/1908.07031v2
PDF https://arxiv.org/pdf/1908.07031v2.pdf
PWC https://paperswithcode.com/paper/evaluating-hierarchies-through-a-partially
Repo
Framework

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Title Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation
Authors Gakuto Kurata, Kartik Audhkhasi
Abstract Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation. End-to-end ASR systems trained using the Connectionist Temporal Classification (CTC) loss do not require frame-level alignment and hence simplify model training. However, sparse and arbitrary posterior spike timings from CTC models pose a new set of challenges in posterior fusion from multiple models and knowledge distillation between CTC models. We propose a method to train a CTC model so that its spike timings are guided to align with those of a pre-trained guiding CTC model. As a result, all models that share the same guiding model have aligned spike timings. We show the advantage of our method in various scenarios including posterior fusion of CTC models and knowledge distillation between CTC models with different architectures. With the 300-hour Switchboard training data, the single word CTC model distilled from multiple models improved the word error rates to 13.7%/23.1% from 14.9%/24.1% on the Hub5 2000 Switchboard/CallHome test sets without using any data augmentation, language model, or complex decoder.
Tasks Data Augmentation, End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2019-04-17
URL https://arxiv.org/abs/1904.08311v2
PDF https://arxiv.org/pdf/1904.08311v2.pdf
PWC https://paperswithcode.com/paper/guiding-ctc-posterior-spike-timings-for
Repo
Framework

Improved Reinforcement Learning with Curriculum

Title Improved Reinforcement Learning with Curriculum
Authors Joseph West, Frederic Maire, Cameron Browne, Simon Denman
Abstract Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. Currently the state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum; instead learning from the entire game at all times. By employing an end-game-first training curriculum to train an AlphaZero inspired player, we empirically show that the rate of learning of an artificial player can be improved during the early stages of training when compared to a player not using a training curriculum.
Tasks Board Games
Published 2019-03-29
URL https://arxiv.org/abs/1903.12328v2
PDF https://arxiv.org/pdf/1903.12328v2.pdf
PWC https://paperswithcode.com/paper/improved-reinforcement-learning-with
Repo
Framework

LSTMs can capture information beyond order

Title LSTMs can capture information beyond order
Authors Kaili Wang, Jose Oramas, Tinne Tuytelaars
Abstract LSTMs have a proven track record in analyzing sequential data. But are they also useful for processing unordered instance sets? Here, we investigate the potential of LSTMs at capturing information be-yond order. We formulate the learning of the underlying structure within a set of instances using LSTM as a Multiple Instance Learning (MIL)problem. In addition, we show that LSTMs are capable of indirectly capturing instance-level information using only set-level annotations. Thus, they can be used to learn instance-level models in a weakly supervised manner. Our empirical evaluation on both simplified (MNIST) and realistic (Lookbook and Histopathology) datasets shows that the proposed method is competitive with or even surpasses state-of-the-art methods specially designed for handling MIL problems. Moreover, we show that its performance on instance-level prediction is close to that of fully-supervised method.
Tasks Multiple Instance Learning
Published 2019-09-11
URL https://arxiv.org/abs/1909.05690v3
PDF https://arxiv.org/pdf/1909.05690v3.pdf
PWC https://paperswithcode.com/paper/an-iterative-approach-for-multiple-instance
Repo
Framework

Latent Complete Row Space Recovery for Multi-view Subspace Clustering

Title Latent Complete Row Space Recovery for Multi-view Subspace Clustering
Authors Hong Tao, Chenping Hou, Yuhua Qian, Jubo Zhu, Dongyun Yi
Abstract Multi-view subspace clustering has been applied to applications such as image processing and video surveillance, and has attracted increasing attention. Most existing methods learn view-specific self-representation matrices, and construct a combined affinity matrix from multiple views. The affinity construction process is time-consuming, and the combined affinity matrix is not guaranteed to reflect the whole true subspace structure. To overcome these issues, the Latent Complete Row Space Recovery (LCRSR) method is proposed. Concretely, LCRSR is based on the assumption that the multi-view observations are generated from an underlying latent representation, which is further assumed to collect the authentic samples drawn exactly from multiple subspaces. LCRSR is able to recover the row space of the latent representation, which not only carries complete information from multiple views but also determines the subspace membership under certain conditions. LCRSR does not involve the graph construction procedure and is solved with an efficient and convergent algorithm, thereby being more scalable to large-scale datasets. The effectiveness and efficiency of LCRSR are validated by clustering various kinds of multi-view data and illustrated in the background subtraction task.
Tasks graph construction, Multi-view Subspace Clustering
Published 2019-12-16
URL https://arxiv.org/abs/1912.07248v1
PDF https://arxiv.org/pdf/1912.07248v1.pdf
PWC https://paperswithcode.com/paper/latent-complete-row-space-recovery-for-multi
Repo
Framework

Expected path length on random manifolds

Title Expected path length on random manifolds
Authors David Eklund, Søren Hauberg
Abstract Manifold learning seeks a low dimensional representation that faithfully captures the essence of data. Current methods can successfully learn such representations, but do not provide a meaningful set of operations that are associated with the representation. Working towards operational representation learning, we endow the latent space of a large class of generative models with a random Riemannian metric, which provides us with elementary operators. As computational tools are unavailable for random Riemannian manifolds, we study deterministic approximations and derive tight error bounds on expected distances.
Tasks Representation Learning
Published 2019-08-20
URL https://arxiv.org/abs/1908.07377v1
PDF https://arxiv.org/pdf/1908.07377v1.pdf
PWC https://paperswithcode.com/paper/expected-path-length-on-random-manifolds
Repo
Framework

Adaptation of Hierarchical Structured Models for Speech Act Recognition in Asynchronous Conversation

Title Adaptation of Hierarchical Structured Models for Speech Act Recognition in Asynchronous Conversation
Authors Tasnim Mohiuddin, Thanh-Tung Nguyen, Shafiq Joty
Abstract We address the problem of speech act recognition (SAR) in asynchronous conversations (forums, emails). Unlike synchronous conversations (e.g., meetings, phone), asynchronous domains lack large labeled datasets to train an effective SAR model. In this paper, we propose methods to effectively leverage abundant unlabeled conversational data and the available labeled data from synchronous domains. We carry out our research in three main steps. First, we introduce a neural architecture based on hierarchical LSTMs and conditional random fields (CRF) for SAR, and show that our method outperforms existing methods when trained on in-domain data only. Second, we improve our initial SAR models by semi-supervised learning in the form of pretrained word embeddings learned from a large unlabeled conversational corpus. Finally, we employ adversarial training to improve the results further by leveraging the labeled data from synchronous domains and by explicitly modeling the distributional shift in two domains.
Tasks Word Embeddings
Published 2019-04-01
URL http://arxiv.org/abs/1904.04021v1
PDF http://arxiv.org/pdf/1904.04021v1.pdf
PWC https://paperswithcode.com/paper/adaptation-of-hierarchical-structured-models
Repo
Framework

Playing Games in the Dark: An approach for cross-modality transfer in reinforcement learning

Title Playing Games in the Dark: An approach for cross-modality transfer in reinforcement learning
Authors Rui Silva, Miguel Vasco, Francisco S. Melo, Ana Paiva, Manuela Veloso
Abstract In this work we explore the use of latent representations obtained from multiple input sensory modalities (such as images or sounds) in allowing an agent to learn and exploit policies over different subsets of input modalities. We propose a three-stage architecture that allows a reinforcement learning agent trained over a given sensory modality, to execute its task on a different sensory modality-for example, learning a visual policy over image inputs, and then execute such policy when only sound inputs are available. We show that the generalized policies achieve better out-of-the-box performance when compared to different baselines. Moreover, we show this holds in different OpenAI gym and video game environments, even when using different multimodal generative models and reinforcement learning algorithms.
Tasks
Published 2019-11-28
URL https://arxiv.org/abs/1911.12851v1
PDF https://arxiv.org/pdf/1911.12851v1.pdf
PWC https://paperswithcode.com/paper/playing-games-in-the-dark-an-approach-for
Repo
Framework

Online Boosting for Multilabel Ranking with Top-k Feedback

Title Online Boosting for Multilabel Ranking with Top-k Feedback
Authors Daniel T. Zhang, Young Hun Jung, Ambuj Tewari
Abstract We present online boosting algorithms for multilabel ranking with top-k feedback,where the learner only receives information about the top-k items from the ranking it provides. We propose a novel surrogate loss function and unbiased estimator, allowing weak learners to update themselves with limited information. Using these techniques we adapt full information multilabel ranking algorithms (Jung and Tewari, 2018) to the top-k feedback setting and provide theoretical performance bounds which closely match the bounds of their full information counter parts, with the cost of increased sample complexity. The experimental results also verify these claims.
Tasks
Published 2019-10-24
URL https://arxiv.org/abs/1910.10937v2
PDF https://arxiv.org/pdf/1910.10937v2.pdf
PWC https://paperswithcode.com/paper/online-boosting-for-multilabel-ranking-with
Repo
Framework

An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks

Title An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks
Authors Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, Eliathamby Ambikairajah
Abstract Auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacoustics and physiology concerning human listening. In this paper, we propose a neural encoding and decoding scheme that is optimized for speech processing. The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve. We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on speech recognition experiments. Finally, we also built and published two spike-version of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.
Tasks Speech Recognition
Published 2019-09-03
URL https://arxiv.org/abs/1909.01302v2
PDF https://arxiv.org/pdf/1909.01302v2.pdf
PWC https://paperswithcode.com/paper/an-efficient-and-perceptually-motivated
Repo
Framework

User Friendly Automatic Construction of Background Knowledge: Mode Construction from ER Diagrams

Title User Friendly Automatic Construction of Background Knowledge: Mode Construction from ER Diagrams
Authors Alexander L. Hayes, Mayukh Das, Phillip Odom, Sriraam Natarajan
Abstract One of the key advantages of Inductive Logic Programming systems is the ability of the domain experts to provide background knowledge as modes that allow for efficient search through the space of hypotheses. However, there is an inherent assumption that this expert should also be an ILP expert to provide effective modes. We relax this assumption by designing a graphical user interface that allows the domain expert to interact with the system using Entity Relationship diagrams. These interactions are used to construct modes for the learning system. We evaluate our algorithm on a probabilistic logic learning system where we demonstrate that the user is able to construct effective background knowledge on par with the expert-encoded knowledge on five data sets.
Tasks
Published 2019-12-16
URL https://arxiv.org/abs/1912.07650v1
PDF https://arxiv.org/pdf/1912.07650v1.pdf
PWC https://paperswithcode.com/paper/user-friendly-automatic-construction-of
Repo
Framework

Global visual localization in LiDAR-maps through shared 2D-3D embedding space

Title Global visual localization in LiDAR-maps through shared 2D-3D embedding space
Authors Daniele Cattaneo, Matteo Vaghi, Simone Fontana, Augusto Luis Ballardini, Domenico Giorgio Sorrenti
Abstract Global localization is an important and widely studied problem for many robotic applications. Place recognition approaches can be exploited to solve this task, e.g., in the autonomous driving field. While most vision-based approaches match an image w.r.t. an image database, global visual localization within LiDAR-maps remains fairly unexplored, even though the path toward high definition 3D maps, produced mainly from LiDARs, is clear. In this work we leverage Deep Neural Network (DNN) approaches to create a shared embedding space between images and LiDAR-maps, allowing for image to 3D-LiDAR place recognition. We trained a 2D and a 3D DNN that create embeddings, respectively from images and from point clouds, that are close to each other whether they refer to the same place. An extensive experimental activity is presented to assess the effectiveness of the approach w.r.t. different learning paradigms, network architectures, and loss functions. All the evaluations have been performed using the Oxford Robotcar Dataset, which encompasses a wide range of weather and light conditions.
Tasks Autonomous Driving, Visual Localization
Published 2019-10-02
URL https://arxiv.org/abs/1910.04871v2
PDF https://arxiv.org/pdf/1910.04871v2.pdf
PWC https://paperswithcode.com/paper/global-visual-localization-in-lidar-maps
Repo
Framework
comments powered by Disqus