Paper Group ANR 555
![Paper Group ANR 555](/2017/images/pwc/paper-arxiv_hu144ec288a26b3e360d673e256787de3e_28623_900x500_fit_q75_box.jpg)
Block-Sparse Recurrent Neural Networks. Sparse Neural Networks Topologies. Using Optimal Ratio Mask as Training Target for Supervised Speech Separation. Learning Unified Embedding for Apparel Recognition. A Survey on Content-Aware Video Analysis for Sports. Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild. The CLa …
Block-Sparse Recurrent Neural Networks
Title | Block-Sparse Recurrent Neural Networks |
Authors | Sharan Narang, Eric Undersander, Gregory Diamos |
Abstract | Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity. |
Tasks | Language Modelling, Machine Translation, Speech Recognition |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.02782v1 |
http://arxiv.org/pdf/1711.02782v1.pdf | |
PWC | https://paperswithcode.com/paper/block-sparse-recurrent-neural-networks |
Repo | |
Framework | |
Sparse Neural Networks Topologies
Title | Sparse Neural Networks Topologies |
Authors | Alfred Bourely, John Patrick Boueri, Krzysztof Choromonski |
Abstract | We propose Sparse Neural Network architectures that are based on random or structured bipartite graph topologies. Sparse architectures provide compression of the models learned and speed-ups of computations, they can also surpass their unstructured or fully connected counterparts. As we show, even more compact topologies of the so-called SNN (Sparse Neural Network) can be achieved with the use of structured graphs of connections between consecutive layers of neurons. In this paper, we investigate how the accuracy and training speed of the models depend on the topology and sparsity of the neural network. Previous approaches using sparcity are all based on fully connected neural network models and create sparcity during training phase, instead we explicitly define a sparse architectures of connections before the training. Building compact neural network models is coherent with empirical observations showing that there is much redundancy in learned neural network models. We show experimentally that the accuracy of the models learned with neural networks depends on expander-like properties of the underlying topologies such as the spectral gap and algebraic connectivity rather than the density of the graphs of connections. |
Tasks | |
Published | 2017-06-18 |
URL | http://arxiv.org/abs/1706.05683v1 |
http://arxiv.org/pdf/1706.05683v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-neural-networks-topologies |
Repo | |
Framework | |
Using Optimal Ratio Mask as Training Target for Supervised Speech Separation
Title | Using Optimal Ratio Mask as Training Target for Supervised Speech Separation |
Authors | Shasha Xia, Hao Li, Xueliang Zhang |
Abstract | Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general. |
Tasks | Speech Separation |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.00917v1 |
http://arxiv.org/pdf/1709.00917v1.pdf | |
PWC | https://paperswithcode.com/paper/using-optimal-ratio-mask-as-training-target |
Repo | |
Framework | |
Learning Unified Embedding for Apparel Recognition
Title | Learning Unified Embedding for Apparel Recognition |
Authors | Yang Song, Yuan Li, Bo Wu, Chao-Yeh Chen, Xiao Zhang, Hartwig Adam |
Abstract | In apparel recognition, specialized models (e.g. models trained for a particular vertical like dresses) can significantly outperform general models (i.e. models that cover a wide range of verticals). Therefore, deep neural network models are often trained separately for different verticals. However, using specialized models for different verticals is not scalable and expensive to deploy. This paper addresses the problem of learning one unified embedding model for multiple object verticals (e.g. all apparel classes) without sacrificing accuracy. The problem is tackled from two aspects: training data and training difficulty. On the training data aspect, we figure out that for a single model trained with triplet loss, there is an accuracy sweet spot in terms of how many verticals are trained together. To ease the training difficulty, a novel learning scheme is proposed by using the output from specialized models as learning targets so that L2 loss can be used instead of triplet loss. This new loss makes the training easier and make it possible for more efficient use of the feature space. The end result is a unified model which can achieve the same retrieval accuracy as a number of separate specialized models, while having the model complexity as one. The effectiveness of our approach is shown in experiments. |
Tasks | |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.05929v2 |
http://arxiv.org/pdf/1707.05929v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-unified-embedding-for-apparel |
Repo | |
Framework | |
A Survey on Content-Aware Video Analysis for Sports
Title | A Survey on Content-Aware Video Analysis for Sports |
Authors | Huang-Chia Shih |
Abstract | Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. On the basis of this insight, we provide an overview of the themes particularly relevant to the research on content-aware systems for broadcast sports. Specifically, we focus on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges. Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. In each group, the gap between sensation and content excitement must be bridged using proper strategies. In this regard, a content-aware approach is required to determine user demands. Finally, the paper summarizes the future trends and challenges for sports video analysis. We believe that our findings can advance the field of research on content-aware video analysis for broadcast sports. |
Tasks | |
Published | 2017-03-03 |
URL | http://arxiv.org/abs/1703.01170v1 |
http://arxiv.org/pdf/1703.01170v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-content-aware-video-analysis-for |
Repo | |
Framework | |
Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild
Title | Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild |
Authors | Tarik Alafif, Zeyad Hailat, Melih Aslan, Xuewen Chen |
Abstract | Face detection methods have relied on face datasets for training. However, existing face datasets tend to be in small scales for face learning in both constrained and unconstrained environments. In this paper, we first introduce our large-scale image datasets, Large-scale Labeled Face (LSLF) and noisy Large-scale Labeled Non-face (LSLNF). Our LSLF dataset consists of a large number of unconstrained multi-view and partially occluded faces. The faces have many variations in color and grayscale, image quality, image resolution, image illumination, image background, image illusion, human face, cartoon face, facial expression, light and severe partial facial occlusion, make up, gender, age, and race. Many of these faces are partially occluded with accessories such as tattoos, hats, glasses, sunglasses, hands, hair, beards, scarves, microphones, or other objects or persons. The LSLF dataset is currently the largest labeled face image dataset in the literature in terms of the number of labeled images and the number of individuals compared to other existing labeled face image datasets. Second, we introduce our CrowedFaces and CrowedNonFaces image datasets. The crowedFaces and CrowedNonFaces datasets include faces and non-faces images from crowed scenes. These datasets essentially aim for researchers to provide a large number of training examples with many variations for large scale face learning and face recognition tasks. |
Tasks | Face Detection, Face Recognition |
Published | 2017-06-27 |
URL | http://arxiv.org/abs/1706.08690v1 |
http://arxiv.org/pdf/1706.08690v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-datasets-faces-with-partial |
Repo | |
Framework | |
The CLaC Discourse Parser at CoNLL-2016
Title | The CLaC Discourse Parser at CoNLL-2016 |
Authors | Majid Laali, Andre Cianflone, Leila Kosseim |
Abstract | This paper describes our submission “CLaC” to the CoNLL-2016 shared task on shallow discourse parsing. We used two complementary approaches for the task. A standard machine learning approach for the parsing of explicit relations, and a deep learning approach for non-explicit relations. Overall, our parser achieves an F1-score of 0.2106 on the identification of discourse relations (0.3110 for explicit relations and 0.1219 for non-explicit relations) on the blind CoNLL-2016 test set. |
Tasks | |
Published | 2017-08-19 |
URL | http://arxiv.org/abs/1708.05798v1 |
http://arxiv.org/pdf/1708.05798v1.pdf | |
PWC | https://paperswithcode.com/paper/the-clac-discourse-parser-at-conll-2016 |
Repo | |
Framework | |
Co-segmentation for Space-Time Co-located Collections
Title | Co-segmentation for Space-Time Co-located Collections |
Authors | Hadar Averbuch-Elor, Johannes Kopf, Tamir Hazan, Daniel Cohen-Or |
Abstract | We present a co-segmentation technique for space-time co-located image collections. These prevalent collections capture various dynamic events, usually by multiple photographers, and may contain multiple co-occurring objects which are not necessarily part of the intended foreground object, resulting in ambiguities for traditional co-segmentation techniques. Thus, to disambiguate what the common foreground object is, we introduce a weakly-supervised technique, where we assume only a small seed, given in the form of a single segmented image. We take a distributed approach, where local belief models are propagated and reinforced with similar images. Our technique progressively expands the foreground and background belief models across the entire collection. The technique exploits the power of the entire set of image without building a global model, and thus successfully overcomes large variability in appearance of the common foreground object. We demonstrate that our method outperforms previous co-segmentation techniques on challenging space-time co-located collections, including dense benchmark datasets which were adapted for our novel problem setting. |
Tasks | |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1701.08931v1 |
http://arxiv.org/pdf/1701.08931v1.pdf | |
PWC | https://paperswithcode.com/paper/co-segmentation-for-space-time-co-located |
Repo | |
Framework | |
Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach
Title | Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach |
Authors | Philip Polack, Brigitte d’Andréa-Novel, Michel Fliess, Arnaud de la Fortelle, Lghani Menhour |
Abstract | This communication presents a longitudinal model-free control approach for computing the wheel torque command to be applied on a vehicle. This setting enables us to overcome the problem of unknown vehicle parameters for generating a suitable control law. An important parameter in this control setting is made time-varying for ensuring finite-time stability. Several convincing computer simulations are displayed and discussed. Overshoots become therefore smaller. The driving comfort is increased and the robustness to time-delays is improved. |
Tasks | Autonomous Vehicles |
Published | 2017-04-05 |
URL | http://arxiv.org/abs/1704.01383v1 |
http://arxiv.org/pdf/1704.01383v1.pdf | |
PWC | https://paperswithcode.com/paper/finite-time-stabilization-of-longitudinal |
Repo | |
Framework | |
Autonomous Reactive Mission Scheduling and Task-Path Planning Architecture for Autonomous Underwater Vehicle
Title | Autonomous Reactive Mission Scheduling and Task-Path Planning Architecture for Autonomous Underwater Vehicle |
Authors | Somaiyeh Mahmoud. Zadeh |
Abstract | An Autonomous Underwater Vehicle (AUV) should carry out complex tasks in a limited time interval. Since existing AUVs have limited battery capacity and restricted endurance, they should autonomously manage mission time and the resources to perform effective persistent deployment in longer missions. Task assignment requires making decisions subject to resource constraints, while tasks are assigned with costs and/or values that are budgeted in advance. Tasks are distributed in a particular operation zone and mapped by a waypoint covered network. Thus, design an efficient routing-task priority assign framework considering vehicle’s availabilities and properties is essential for increasing mission productivity and on-time mission completion. This depends strongly on the order and priority of the tasks that are located between node-like waypoints in an operation network. On the other hand, autonomous operation of AUVs in an unfamiliar dynamic underwater and performing quick response to sudden environmental changes is a complicated process. Water current instabilities can deflect the vehicle to an undesired direction and perturb AUVs safety. The vehicle’s robustness to strong environmental variations is extremely crucial for its safe and optimum operations in an uncertain and dynamic environment. To this end, the AUV needs to have a general overview of the environment in top level to perform an autonomous action selection (task selection) and a lower level local motion planner to operate successfully in dealing with continuously changing situations. This research deals with developing a novel reactive control architecture to provide a higher level of decision autonomy for the AUV operation that enables a single vehicle to accomplish multiple tasks in a single mission in the face of periodic disturbances in a turbulent and highly uncertain environment. |
Tasks | |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.04189v1 |
http://arxiv.org/pdf/1706.04189v1.pdf | |
PWC | https://paperswithcode.com/paper/autonomous-reactive-mission-scheduling-and |
Repo | |
Framework | |
CT-SRCNN: Cascade Trained and Trimmed Deep Convolutional Neural Networks for Image Super Resolution
Title | CT-SRCNN: Cascade Trained and Trimmed Deep Convolutional Neural Networks for Image Super Resolution |
Authors | Haoyu Ren, Mostafa El-Khamy, Jungwon Lee |
Abstract | We propose methodologies to train highly accurate and efficient deep convolutional neural networks (CNNs) for image super resolution (SR). A cascade training approach to deep learning is proposed to improve the accuracy of the neural networks while gradually increasing the number of network layers. Next, we explore how to improve the SR efficiency by making the network slimmer. Two methodologies, the one-shot trimming and the cascade trimming, are proposed. With the cascade trimming, the network’s size is gradually reduced layer by layer, without significant loss on its discriminative ability. Experiments on benchmark image datasets show that our proposed SR network achieves the state-of-the-art super resolution accuracy, while being more than 4 times faster compared to existing deep super resolution networks. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04048v1 |
http://arxiv.org/pdf/1711.04048v1.pdf | |
PWC | https://paperswithcode.com/paper/ct-srcnn-cascade-trained-and-trimmed-deep |
Repo | |
Framework | |
Classification of entities via their descriptive sentences
Title | Classification of entities via their descriptive sentences |
Authors | Chao Zhao, Min Zhao, Yi Guan |
Abstract | Hypernym identification of open-domain entities is crucial for taxonomy construction as well as many higher-level applications. Current methods suffer from either low precision or low recall. To decrease the difficulty of this problem, we adopt a classification-based method. We pre-define a concept taxonomy and classify an entity to one of its leaf concept, based on the name and description information of the entity. A convolutional neural network classifier and a K-means clustering module are adopted for classification. We applied this system to 2.1 million Baidu Baike entities, and 1.1 million of them were successfully identified with a precision of 99.36%. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10317v1 |
http://arxiv.org/pdf/1711.10317v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-entities-via-their |
Repo | |
Framework | |
A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking
Title | A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking |
Authors | Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii |
Abstract | The fifth Dialog State Tracking Challenge (DSTC5) introduces a new cross-language dialog state tracking scenario, where the participants are asked to build their trackers based on the English training corpus, while evaluating them with the unlabeled Chinese corpus. Although the computer-generated translations for both English and Chinese corpus are provided in the dataset, these translations contain errors and careless use of them can easily hurt the performance of the built trackers. To address this problem, we propose a multichannel Convolutional Neural Networks (CNN) architecture, in which we treat English and Chinese language as different input channels of one single CNN model. In the evaluation of DSTC5, we found that such multichannel architecture can effectively improve the robustness against translation errors. Additionally, our method for DSTC5 is purely machine learning based and requires no prior knowledge about the target language. We consider this a desirable property for building a tracker in the cross-language context, as not every developer will be familiar with both languages. |
Tasks | |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06247v1 |
http://arxiv.org/pdf/1701.06247v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multichannel-convolutional-neural-network |
Repo | |
Framework | |
Learning Sparse Visual Representations with Leaky Capped Norm Regularizers
Title | Learning Sparse Visual Representations with Leaky Capped Norm Regularizers |
Authors | Jianqiao Wangni, Dahua Lin |
Abstract | Sparsity inducing regularization is an important part for learning over-complete visual representations. Despite the popularity of $\ell_1$ regularization, in this paper, we investigate the usage of non-convex regularizations in this problem. Our contribution consists of three parts. First, we propose the leaky capped norm regularization (LCNR), which allows model weights below a certain threshold to be regularized more strongly as opposed to those above, therefore imposes strong sparsity and only introduces controllable estimation bias. We propose a majorization-minimization algorithm to optimize the joint objective function. Second, our study over monocular 3D shape recovery and neural networks with LCNR outperforms $\ell_1$ and other non-convex regularizations, achieving state-of-the-art performance and faster convergence. Third, we prove a theoretical global convergence speed on the 3D recovery problem. To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem. |
Tasks | |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.02857v1 |
http://arxiv.org/pdf/1711.02857v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-sparse-visual-representations-with |
Repo | |
Framework | |
Long-Term Online Smoothing Prediction Using Expert Advice
Title | Long-Term Online Smoothing Prediction Using Expert Advice |
Authors | Alexander Korotin, Vladimir V’yugin, Evgeny Burnaev |
Abstract | For the prediction with experts’ advice setting, we construct forecasting algorithms that suffer loss not much more than any expert in the pool. In contrast to the standard approach, we investigate the case of long-term forecasting of time series and consider two scenarios. In the first one, at each step $t$ the learner has to combine the point forecasts of the experts issued for the time interval $[t+1, t+d]$ ahead. Our approach implies that at each time step experts issue point forecasts for arbitrary many steps ahead and then the learner (algorithm) combines these forecasts and the forecasts made earlier into one vector forecast for steps $[t+1,t+d]$. By combining past and the current long-term forecasts we obtain a smoothing mechanism that protects our algorithm from temporary trend changes, noise and outliers. In the second scenario, at each step $t$ experts issue a prediction function, and the learner has to combine these functions into the single one, which will be used for long-term time-series prediction. For each scenario, we develop an algorithm for combining experts forecasts and prove $O(\ln T)$ adversarial regret upper bound for both algorithms. |
Tasks | Time Series, Time Series Prediction |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.03194v3 |
http://arxiv.org/pdf/1711.03194v3.pdf | |
PWC | https://paperswithcode.com/paper/long-term-online-smoothing-prediction-using |
Repo | |
Framework | |