July 27, 2019

3068 words 15 mins read

Paper Group ANR 555

Paper Group ANR 555

Block-Sparse Recurrent Neural Networks. Sparse Neural Networks Topologies. Using Optimal Ratio Mask as Training Target for Supervised Speech Separation. Learning Unified Embedding for Apparel Recognition. A Survey on Content-Aware Video Analysis for Sports. Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild. The CLa …

Block-Sparse Recurrent Neural Networks

Title Block-Sparse Recurrent Neural Networks
Authors Sharan Narang, Eric Undersander, Gregory Diamos
Abstract Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.
Tasks Language Modelling, Machine Translation, Speech Recognition
Published 2017-11-08
URL http://arxiv.org/abs/1711.02782v1
PDF http://arxiv.org/pdf/1711.02782v1.pdf
PWC https://paperswithcode.com/paper/block-sparse-recurrent-neural-networks
Repo
Framework

Sparse Neural Networks Topologies

Title Sparse Neural Networks Topologies
Authors Alfred Bourely, John Patrick Boueri, Krzysztof Choromonski
Abstract We propose Sparse Neural Network architectures that are based on random or structured bipartite graph topologies. Sparse architectures provide compression of the models learned and speed-ups of computations, they can also surpass their unstructured or fully connected counterparts. As we show, even more compact topologies of the so-called SNN (Sparse Neural Network) can be achieved with the use of structured graphs of connections between consecutive layers of neurons. In this paper, we investigate how the accuracy and training speed of the models depend on the topology and sparsity of the neural network. Previous approaches using sparcity are all based on fully connected neural network models and create sparcity during training phase, instead we explicitly define a sparse architectures of connections before the training. Building compact neural network models is coherent with empirical observations showing that there is much redundancy in learned neural network models. We show experimentally that the accuracy of the models learned with neural networks depends on expander-like properties of the underlying topologies such as the spectral gap and algebraic connectivity rather than the density of the graphs of connections.
Tasks
Published 2017-06-18
URL http://arxiv.org/abs/1706.05683v1
PDF http://arxiv.org/pdf/1706.05683v1.pdf
PWC https://paperswithcode.com/paper/sparse-neural-networks-topologies
Repo
Framework

Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

Title Using Optimal Ratio Mask as Training Target for Supervised Speech Separation
Authors Shasha Xia, Hao Li, Xueliang Zhang
Abstract Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.
Tasks Speech Separation
Published 2017-09-04
URL http://arxiv.org/abs/1709.00917v1
PDF http://arxiv.org/pdf/1709.00917v1.pdf
PWC https://paperswithcode.com/paper/using-optimal-ratio-mask-as-training-target
Repo
Framework

Learning Unified Embedding for Apparel Recognition

Title Learning Unified Embedding for Apparel Recognition
Authors Yang Song, Yuan Li, Bo Wu, Chao-Yeh Chen, Xiao Zhang, Hartwig Adam
Abstract In apparel recognition, specialized models (e.g. models trained for a particular vertical like dresses) can significantly outperform general models (i.e. models that cover a wide range of verticals). Therefore, deep neural network models are often trained separately for different verticals. However, using specialized models for different verticals is not scalable and expensive to deploy. This paper addresses the problem of learning one unified embedding model for multiple object verticals (e.g. all apparel classes) without sacrificing accuracy. The problem is tackled from two aspects: training data and training difficulty. On the training data aspect, we figure out that for a single model trained with triplet loss, there is an accuracy sweet spot in terms of how many verticals are trained together. To ease the training difficulty, a novel learning scheme is proposed by using the output from specialized models as learning targets so that L2 loss can be used instead of triplet loss. This new loss makes the training easier and make it possible for more efficient use of the feature space. The end result is a unified model which can achieve the same retrieval accuracy as a number of separate specialized models, while having the model complexity as one. The effectiveness of our approach is shown in experiments.
Tasks
Published 2017-07-19
URL http://arxiv.org/abs/1707.05929v2
PDF http://arxiv.org/pdf/1707.05929v2.pdf
PWC https://paperswithcode.com/paper/learning-unified-embedding-for-apparel
Repo
Framework

A Survey on Content-Aware Video Analysis for Sports

Title A Survey on Content-Aware Video Analysis for Sports
Authors Huang-Chia Shih
Abstract Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. On the basis of this insight, we provide an overview of the themes particularly relevant to the research on content-aware systems for broadcast sports. Specifically, we focus on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges. Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. In each group, the gap between sensation and content excitement must be bridged using proper strategies. In this regard, a content-aware approach is required to determine user demands. Finally, the paper summarizes the future trends and challenges for sports video analysis. We believe that our findings can advance the field of research on content-aware video analysis for broadcast sports.
Tasks
Published 2017-03-03
URL http://arxiv.org/abs/1703.01170v1
PDF http://arxiv.org/pdf/1703.01170v1.pdf
PWC https://paperswithcode.com/paper/a-survey-on-content-aware-video-analysis-for
Repo
Framework

Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild

Title Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild
Authors Tarik Alafif, Zeyad Hailat, Melih Aslan, Xuewen Chen
Abstract Face detection methods have relied on face datasets for training. However, existing face datasets tend to be in small scales for face learning in both constrained and unconstrained environments. In this paper, we first introduce our large-scale image datasets, Large-scale Labeled Face (LSLF) and noisy Large-scale Labeled Non-face (LSLNF). Our LSLF dataset consists of a large number of unconstrained multi-view and partially occluded faces. The faces have many variations in color and grayscale, image quality, image resolution, image illumination, image background, image illusion, human face, cartoon face, facial expression, light and severe partial facial occlusion, make up, gender, age, and race. Many of these faces are partially occluded with accessories such as tattoos, hats, glasses, sunglasses, hands, hair, beards, scarves, microphones, or other objects or persons. The LSLF dataset is currently the largest labeled face image dataset in the literature in terms of the number of labeled images and the number of individuals compared to other existing labeled face image datasets. Second, we introduce our CrowedFaces and CrowedNonFaces image datasets. The crowedFaces and CrowedNonFaces datasets include faces and non-faces images from crowed scenes. These datasets essentially aim for researchers to provide a large number of training examples with many variations for large scale face learning and face recognition tasks.
Tasks Face Detection, Face Recognition
Published 2017-06-27
URL http://arxiv.org/abs/1706.08690v1
PDF http://arxiv.org/pdf/1706.08690v1.pdf
PWC https://paperswithcode.com/paper/large-scale-datasets-faces-with-partial
Repo
Framework

The CLaC Discourse Parser at CoNLL-2016

Title The CLaC Discourse Parser at CoNLL-2016
Authors Majid Laali, Andre Cianflone, Leila Kosseim
Abstract This paper describes our submission “CLaC” to the CoNLL-2016 shared task on shallow discourse parsing. We used two complementary approaches for the task. A standard machine learning approach for the parsing of explicit relations, and a deep learning approach for non-explicit relations. Overall, our parser achieves an F1-score of 0.2106 on the identification of discourse relations (0.3110 for explicit relations and 0.1219 for non-explicit relations) on the blind CoNLL-2016 test set.
Tasks
Published 2017-08-19
URL http://arxiv.org/abs/1708.05798v1
PDF http://arxiv.org/pdf/1708.05798v1.pdf
PWC https://paperswithcode.com/paper/the-clac-discourse-parser-at-conll-2016
Repo
Framework

Co-segmentation for Space-Time Co-located Collections

Title Co-segmentation for Space-Time Co-located Collections
Authors Hadar Averbuch-Elor, Johannes Kopf, Tamir Hazan, Daniel Cohen-Or
Abstract We present a co-segmentation technique for space-time co-located image collections. These prevalent collections capture various dynamic events, usually by multiple photographers, and may contain multiple co-occurring objects which are not necessarily part of the intended foreground object, resulting in ambiguities for traditional co-segmentation techniques. Thus, to disambiguate what the common foreground object is, we introduce a weakly-supervised technique, where we assume only a small seed, given in the form of a single segmented image. We take a distributed approach, where local belief models are propagated and reinforced with similar images. Our technique progressively expands the foreground and background belief models across the entire collection. The technique exploits the power of the entire set of image without building a global model, and thus successfully overcomes large variability in appearance of the common foreground object. We demonstrate that our method outperforms previous co-segmentation techniques on challenging space-time co-located collections, including dense benchmark datasets which were adapted for our novel problem setting.
Tasks
Published 2017-01-31
URL http://arxiv.org/abs/1701.08931v1
PDF http://arxiv.org/pdf/1701.08931v1.pdf
PWC https://paperswithcode.com/paper/co-segmentation-for-space-time-co-located
Repo
Framework

Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach

Title Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach
Authors Philip Polack, Brigitte d’Andréa-Novel, Michel Fliess, Arnaud de la Fortelle, Lghani Menhour
Abstract This communication presents a longitudinal model-free control approach for computing the wheel torque command to be applied on a vehicle. This setting enables us to overcome the problem of unknown vehicle parameters for generating a suitable control law. An important parameter in this control setting is made time-varying for ensuring finite-time stability. Several convincing computer simulations are displayed and discussed. Overshoots become therefore smaller. The driving comfort is increased and the robustness to time-delays is improved.
Tasks Autonomous Vehicles
Published 2017-04-05
URL http://arxiv.org/abs/1704.01383v1
PDF http://arxiv.org/pdf/1704.01383v1.pdf
PWC https://paperswithcode.com/paper/finite-time-stabilization-of-longitudinal
Repo
Framework

Autonomous Reactive Mission Scheduling and Task-Path Planning Architecture for Autonomous Underwater Vehicle

Title Autonomous Reactive Mission Scheduling and Task-Path Planning Architecture for Autonomous Underwater Vehicle
Authors Somaiyeh Mahmoud. Zadeh
Abstract An Autonomous Underwater Vehicle (AUV) should carry out complex tasks in a limited time interval. Since existing AUVs have limited battery capacity and restricted endurance, they should autonomously manage mission time and the resources to perform effective persistent deployment in longer missions. Task assignment requires making decisions subject to resource constraints, while tasks are assigned with costs and/or values that are budgeted in advance. Tasks are distributed in a particular operation zone and mapped by a waypoint covered network. Thus, design an efficient routing-task priority assign framework considering vehicle’s availabilities and properties is essential for increasing mission productivity and on-time mission completion. This depends strongly on the order and priority of the tasks that are located between node-like waypoints in an operation network. On the other hand, autonomous operation of AUVs in an unfamiliar dynamic underwater and performing quick response to sudden environmental changes is a complicated process. Water current instabilities can deflect the vehicle to an undesired direction and perturb AUVs safety. The vehicle’s robustness to strong environmental variations is extremely crucial for its safe and optimum operations in an uncertain and dynamic environment. To this end, the AUV needs to have a general overview of the environment in top level to perform an autonomous action selection (task selection) and a lower level local motion planner to operate successfully in dealing with continuously changing situations. This research deals with developing a novel reactive control architecture to provide a higher level of decision autonomy for the AUV operation that enables a single vehicle to accomplish multiple tasks in a single mission in the face of periodic disturbances in a turbulent and highly uncertain environment.
Tasks
Published 2017-06-13
URL http://arxiv.org/abs/1706.04189v1
PDF http://arxiv.org/pdf/1706.04189v1.pdf
PWC https://paperswithcode.com/paper/autonomous-reactive-mission-scheduling-and
Repo
Framework

CT-SRCNN: Cascade Trained and Trimmed Deep Convolutional Neural Networks for Image Super Resolution

Title CT-SRCNN: Cascade Trained and Trimmed Deep Convolutional Neural Networks for Image Super Resolution
Authors Haoyu Ren, Mostafa El-Khamy, Jungwon Lee
Abstract We propose methodologies to train highly accurate and efficient deep convolutional neural networks (CNNs) for image super resolution (SR). A cascade training approach to deep learning is proposed to improve the accuracy of the neural networks while gradually increasing the number of network layers. Next, we explore how to improve the SR efficiency by making the network slimmer. Two methodologies, the one-shot trimming and the cascade trimming, are proposed. With the cascade trimming, the network’s size is gradually reduced layer by layer, without significant loss on its discriminative ability. Experiments on benchmark image datasets show that our proposed SR network achieves the state-of-the-art super resolution accuracy, while being more than 4 times faster compared to existing deep super resolution networks.
Tasks Image Super-Resolution, Super-Resolution
Published 2017-11-11
URL http://arxiv.org/abs/1711.04048v1
PDF http://arxiv.org/pdf/1711.04048v1.pdf
PWC https://paperswithcode.com/paper/ct-srcnn-cascade-trained-and-trimmed-deep
Repo
Framework

Classification of entities via their descriptive sentences

Title Classification of entities via their descriptive sentences
Authors Chao Zhao, Min Zhao, Yi Guan
Abstract Hypernym identification of open-domain entities is crucial for taxonomy construction as well as many higher-level applications. Current methods suffer from either low precision or low recall. To decrease the difficulty of this problem, we adopt a classification-based method. We pre-define a concept taxonomy and classify an entity to one of its leaf concept, based on the name and description information of the entity. A convolutional neural network classifier and a K-means clustering module are adopted for classification. We applied this system to 2.1 million Baidu Baike entities, and 1.1 million of them were successfully identified with a precision of 99.36%.
Tasks
Published 2017-11-28
URL http://arxiv.org/abs/1711.10317v1
PDF http://arxiv.org/pdf/1711.10317v1.pdf
PWC https://paperswithcode.com/paper/classification-of-entities-via-their
Repo
Framework

A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

Title A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking
Authors Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii
Abstract The fifth Dialog State Tracking Challenge (DSTC5) introduces a new cross-language dialog state tracking scenario, where the participants are asked to build their trackers based on the English training corpus, while evaluating them with the unlabeled Chinese corpus. Although the computer-generated translations for both English and Chinese corpus are provided in the dataset, these translations contain errors and careless use of them can easily hurt the performance of the built trackers. To address this problem, we propose a multichannel Convolutional Neural Networks (CNN) architecture, in which we treat English and Chinese language as different input channels of one single CNN model. In the evaluation of DSTC5, we found that such multichannel architecture can effectively improve the robustness against translation errors. Additionally, our method for DSTC5 is purely machine learning based and requires no prior knowledge about the target language. We consider this a desirable property for building a tracker in the cross-language context, as not every developer will be familiar with both languages.
Tasks
Published 2017-01-23
URL http://arxiv.org/abs/1701.06247v1
PDF http://arxiv.org/pdf/1701.06247v1.pdf
PWC https://paperswithcode.com/paper/a-multichannel-convolutional-neural-network
Repo
Framework

Learning Sparse Visual Representations with Leaky Capped Norm Regularizers

Title Learning Sparse Visual Representations with Leaky Capped Norm Regularizers
Authors Jianqiao Wangni, Dahua Lin
Abstract Sparsity inducing regularization is an important part for learning over-complete visual representations. Despite the popularity of $\ell_1$ regularization, in this paper, we investigate the usage of non-convex regularizations in this problem. Our contribution consists of three parts. First, we propose the leaky capped norm regularization (LCNR), which allows model weights below a certain threshold to be regularized more strongly as opposed to those above, therefore imposes strong sparsity and only introduces controllable estimation bias. We propose a majorization-minimization algorithm to optimize the joint objective function. Second, our study over monocular 3D shape recovery and neural networks with LCNR outperforms $\ell_1$ and other non-convex regularizations, achieving state-of-the-art performance and faster convergence. Third, we prove a theoretical global convergence speed on the 3D recovery problem. To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem.
Tasks
Published 2017-11-08
URL http://arxiv.org/abs/1711.02857v1
PDF http://arxiv.org/pdf/1711.02857v1.pdf
PWC https://paperswithcode.com/paper/learning-sparse-visual-representations-with
Repo
Framework

Long-Term Online Smoothing Prediction Using Expert Advice

Title Long-Term Online Smoothing Prediction Using Expert Advice
Authors Alexander Korotin, Vladimir V’yugin, Evgeny Burnaev
Abstract For the prediction with experts’ advice setting, we construct forecasting algorithms that suffer loss not much more than any expert in the pool. In contrast to the standard approach, we investigate the case of long-term forecasting of time series and consider two scenarios. In the first one, at each step $t$ the learner has to combine the point forecasts of the experts issued for the time interval $[t+1, t+d]$ ahead. Our approach implies that at each time step experts issue point forecasts for arbitrary many steps ahead and then the learner (algorithm) combines these forecasts and the forecasts made earlier into one vector forecast for steps $[t+1,t+d]$. By combining past and the current long-term forecasts we obtain a smoothing mechanism that protects our algorithm from temporary trend changes, noise and outliers. In the second scenario, at each step $t$ experts issue a prediction function, and the learner has to combine these functions into the single one, which will be used for long-term time-series prediction. For each scenario, we develop an algorithm for combining experts forecasts and prove $O(\ln T)$ adversarial regret upper bound for both algorithms.
Tasks Time Series, Time Series Prediction
Published 2017-11-08
URL http://arxiv.org/abs/1711.03194v3
PDF http://arxiv.org/pdf/1711.03194v3.pdf
PWC https://paperswithcode.com/paper/long-term-online-smoothing-prediction-using
Repo
Framework
comments powered by Disqus