Paper Group ANR 887
Generalization Challenges for Neural Architectures in Audio Source Separation. Actor-Centric Relation Network. Improving Fingerprint Pore Detection with a Small FCN. Learning Actionable Representations from Visual Observations. Distributed optimization in wireless sensor networks: an island-model framework. End-to-end Language Identification using …
Generalization Challenges for Neural Architectures in Audio Source Separation
Title | Generalization Challenges for Neural Architectures in Audio Source Separation |
Authors | Shariq Mobin, Brian Cheung, Bruno Olshausen |
Abstract | Recent work has shown that recurrent neural networks can be trained to separate individual speakers in a sound mixture with high fidelity. Here we explore convolutional neural network models as an alternative and show that they achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize and compare the robustness and ability of these different approaches to generalize under three different test conditions: longer time sequences, the addition of intermittent noise, and different datasets not seen during training. For the last condition, we create a new dataset, RealTalkLibri, to test source separation in real-world environments. We show that the acoustics of the environment have significant impact on the structure of the waveform and the overall performance of neural network models, with the convolutional model showing superior ability to generalize to new environments. The code for our study is available at https://github.com/ShariqM/source_separation. |
Tasks | |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08629v2 |
http://arxiv.org/pdf/1803.08629v2.pdf | |
PWC | https://paperswithcode.com/paper/generalization-challenges-for-neural |
Repo | |
Framework | |
Actor-Centric Relation Network
Title | Actor-Centric Relation Network |
Authors | Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid |
Abstract | Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mines the relevant elements automatically with an actor-centric relational network (ACRN). ACRN computes and accumulates pair-wise relation information from actor and global scene features, and generates relation features for action classification. It is implemented as neural networks and can be trained jointly with an existing action detection system. We show that ACRN outperforms alternative approaches which capture relation information, and that the proposed framework improves upon the state-of-the-art performance on JHMDB and AVA. A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action. |
Tasks | Action Classification, Action Detection, Action Localization, Action Recognition In Videos, Spatio-Temporal Action Localization, Temporal Action Localization |
Published | 2018-07-28 |
URL | http://arxiv.org/abs/1807.10982v1 |
http://arxiv.org/pdf/1807.10982v1.pdf | |
PWC | https://paperswithcode.com/paper/actor-centric-relation-network |
Repo | |
Framework | |
Improving Fingerprint Pore Detection with a Small FCN
Title | Improving Fingerprint Pore Detection with a Small FCN |
Authors | Gabriel Dahia, Maurício Pamplona Segundo |
Abstract | In this work, we investigate if previously proposed CNNs for fingerprint pore detection overestimate the number of required model parameters for this task. We show that this is indeed the case by proposing a fully convolutional neural network that has significantly fewer parameters. We evaluate this model using a rigorous and reproducible protocol, which was, prior to our work, not available to the community. Using our protocol, we show that the proposed model, when combined with post-processing, performs better than previous methods, albeit being much more efficient. All our code is available at https://github.com/gdahia/fingerprint-pore-detection |
Tasks | |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06846v1 |
http://arxiv.org/pdf/1811.06846v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-fingerprint-pore-detection-with-a |
Repo | |
Framework | |
Learning Actionable Representations from Visual Observations
Title | Learning Actionable Representations from Visual Observations |
Authors | Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet |
Abstract | In this work we explore a new approach for robots to teach themselves about the world simply by observing it. In particular we investigate the effectiveness of learning task-agnostic representations for continuous control tasks. We extend Time-Contrastive Networks (TCN) that learn from visual observations by embedding multiple frames jointly in the embedding space as opposed to a single frame. We show that by doing so, we are now able to encode both position and velocity attributes significantly more accurately. We test the usefulness of this self-supervised approach in a reinforcement learning setting. We show that the representations learned by agents observing themselves take random actions, or other agents perform tasks successfully, can enable the learning of continuous control policies using algorithms like Proximal Policy Optimization (PPO) using only the learned embeddings as input. We also demonstrate significant improvements on the real-world Pouring dataset with a relative error reduction of 39.4% for motion attributes and 11.1% for static attributes compared to the single-frame baseline. Video results are available at https://sites.google.com/view/actionablerepresentations . |
Tasks | Continuous Control |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00928v3 |
http://arxiv.org/pdf/1808.00928v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-actionable-representations-from |
Repo | |
Framework | |
Distributed optimization in wireless sensor networks: an island-model framework
Title | Distributed optimization in wireless sensor networks: an island-model framework |
Authors | Giovanni Iacca |
Abstract | Wireless Sensor Networks (WSNs) is an emerging technology in several application domains, ranging from urban surveillance to environmental and structural monitoring. Computational Intelligence (CI) techniques are particularly suitable for enhancing these systems. However, when embedding CI into wireless sensors, severe hardware limitations must be taken into account. In this paper we investigate the possibility to perform an online, distributed optimization process within a WSN. Such a system might be used, for example, to implement advanced network features like distributed modelling, self-optimizing protocols, and anomaly detection, to name a few. The proposed approach, called DOWSN (Distributed Optimization for WSN) is an island-model infrastructure in which each node executes a simple, computationally cheap (both in terms of CPU and memory) optimization algorithm, and shares promising solutions with its neighbors. We perform extensive tests of different DOWSN configurations on a benchmark made up of continuous optimization problems; we analyze the influence of the network parameters (number of nodes, inter-node communication period and probability of accepting incoming solutions) on the optimization performance. Finally, we profile energy and memory consumption of DOWSN to show the efficient usage of the limited hardware resources available on the sensor nodes. |
Tasks | Anomaly Detection, Distributed Optimization |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02679v1 |
http://arxiv.org/pdf/1810.02679v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-optimization-in-wireless-sensor |
Repo | |
Framework | |
End-to-end Language Identification using NetFV and NetVLAD
Title | End-to-end Language Identification using NetFV and NetVLAD |
Authors | Jinkun Chen, Weicheng Cai, Danwei Cai, Zexin Cai, Haibin Zhong, Ming Li |
Abstract | In this paper, we apply the NetFV and NetVLAD layers for the end-to-end language identification task. NetFV and NetVLAD layers are the differentiable implementations of the standard Fisher Vector and Vector of Locally Aggregated Descriptors (VLAD) methods, respectively. Both of them can encode a sequence of feature vectors into a fixed dimensional vector which is very important to process those variable-length utterances. We first present the relevances and differences between the classical i-vector and the aforementioned encoding schemes. Then, we construct a flexible end-to-end framework including a convolutional neural network (CNN) architecture and an encoding layer (NetFV or NetVLAD) for the language identification task. Experimental results on the NIST LRE 2007 close-set task show that the proposed system achieves significant EER reductions against the conventional i-vector baseline and the CNN temporal average pooling system, respectively. |
Tasks | Language Identification |
Published | 2018-09-09 |
URL | http://arxiv.org/abs/1809.02906v1 |
http://arxiv.org/pdf/1809.02906v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-language-identification-using |
Repo | |
Framework | |
Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification
Title | Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification |
Authors | Xinxing Su, Yingtian Zou, Yu Cheng, Shuangjie Xu, Mo Yu, Pan Zhou |
Abstract | We tackle the problem of person re-identification in video setting in this paper, which has been viewed as a crucial task in many applications. Meanwhile, it is very challenging since the task requires learning effective representations from video sequences with heterogeneous spatial-temporal information. We present a novel method - Spatial-Temporal Synergic Residual Network (STSRN) for this problem. STSRN contains a spatial residual extractor, a temporal residual processor and a spatial-temporal smooth module. The smoother can alleviate sample noises along the spatial-temporal dimensions thus enable STSRN extracts more robust spatial-temporal features of consecutive frames. Extensive experiments are conducted on several challenging datasets including iLIDS-VID, PRID2011 and MARS. The results demonstrate that the proposed method achieves consistently superior performance over most of state-of-the-art methods. |
Tasks | Person Re-Identification, Video-Based Person Re-Identification |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.05799v1 |
http://arxiv.org/pdf/1807.05799v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-temporal-synergic-residual-learning |
Repo | |
Framework | |
Fast OBDD Reordering using Neural Message Passing on Hypergraph
Title | Fast OBDD Reordering using Neural Message Passing on Hypergraph |
Authors | Feifan Xu, Fei He, Enze Xie, Liang Li |
Abstract | Ordered binary decision diagrams (OBDDs) are an efficient data structure for representing and manipulating Boolean formulas. With respect to different variable orders, the OBDDs’ sizes may vary from linear to exponential in the number of the Boolean variables. Finding the optimal variable order has been proved a NP-complete problem. Many heuristics have been proposed to find a near-optimal solution of this problem. In this paper, we propose a neural network-based method to predict near-optimal variable orders for unknown formulas. Viewing these formulas as hypergraphs, and lifting the message passing neural network into 3-hypergraph (MPNN3), we are able to learn the patterns of Boolean formula. Compared to the traditional methods, our method can find a near-the-best solution with an extremely shorter time, even for some hard examples.To the best of our knowledge, this is the first work on applying neural network to OBDD reordering. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02178v1 |
http://arxiv.org/pdf/1811.02178v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-obdd-reordering-using-neural-message |
Repo | |
Framework | |
UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition
Title | UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition |
Authors | Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, Christopher W. Fletcher |
Abstract | Convolutional Neural Networks (CNNs) have begun to permeate all corners of electronic society (from voice recognition to scene generation) due to their high accuracy and machine efficiency per operation. At their core, CNN computations are made up of multi-dimensional dot products between weight and input vectors. This paper studies how weight repetition —when the same weight occurs multiple times in or across weight vectors— can be exploited to save energy and improve performance during CNN inference. This generalizes a popular line of work to improve efficiency from CNN weight sparsity, as reducing computation due to repeated zero weights is a special case of reducing computation due to repeated weights. To exploit weight repetition, this paper proposes a new CNN accelerator called the Unique Weight CNN Accelerator (UCNN). UCNN uses weight repetition to reuse CNN sub-computations (e.g., dot products) and to reduce CNN model size when stored in off-chip DRAM —both of which save energy. UCNN further improves performance by exploiting sparsity in weights. We evaluate UCNN with an accelerator-level cycle and energy model and with an RTL implementation of the UCNN processing element. On three contemporary CNNs, UCNN improves throughput-normalized energy consumption by 1.2x - 4x, relative to a similarly provisioned baseline accelerator that uses Eyeriss-style sparsity optimizations. At the same time, the UCNN processing element adds only 17-24% area overhead relative to the same baseline. |
Tasks | Scene Generation |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06508v1 |
http://arxiv.org/pdf/1804.06508v1.pdf | |
PWC | https://paperswithcode.com/paper/ucnn-exploiting-computational-reuse-in-deep |
Repo | |
Framework | |
VISER: Visual Self-Regularization
Title | VISER: Visual Self-Regularization |
Authors | Hamid Izadinia, Pierre Garrigues |
Abstract | In this work, we propose the use of large set of unlabeled images as a source of regularization data for learning robust visual representation. Given a visual model trained by a labeled dataset in a supervised fashion, we augment our training samples by incorporating large number of unlabeled data and train a semi-supervised model. We demonstrate that our proposed learning approach leverages an abundance of unlabeled images and boosts the visual recognition performance which alleviates the need to rely on large labeled datasets for learning robust representation. To increment the number of image instances needed to learn robust visual models in our approach, each labeled image propagates its label to its nearest unlabeled image instances. These retrieved unlabeled images serve as local perturbations of each labeled image to perform Visual Self-Regularization (VISER). To retrieve such visual self regularizers, we compute the cosine similarity in a semantic space defined by the penultimate layer in a fully convolutional neural network. We use the publicly available Yahoo Flickr Creative Commons 100M dataset as the source of our unlabeled image set and propose a distributed approximate nearest neighbor algorithm to make retrieval practical at that scale. Using the labeled instances and their regularizer samples we show that we significantly improve object categorization and localization performance on the MS COCO and Visual Genome datasets where objects appear in context. |
Tasks | |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02568v1 |
http://arxiv.org/pdf/1802.02568v1.pdf | |
PWC | https://paperswithcode.com/paper/viser-visual-self-regularization |
Repo | |
Framework | |
Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images
Title | Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images |
Authors | Michele Volpi, Devis Tuia |
Abstract | When approaching the semantic segmentation of overhead imagery in the decimeter spatial resolution range, successful strategies usually combine powerful methods to learn the visual appearance of the semantic classes (e.g. convolutional neural networks) with strategies for spatial regularization (e.g. graphical models such as conditional random fields). In this paper, we propose a method to learn evidence in the form of semantic class likelihoods, semantic boundaries across classes and shallow-to-deep visual features, each one modeled by a multi-task convolutional neural network architecture. We combine this bottom-up information with top-down spatial regularization encoded by a conditional random field model optimizing the label space across a hierarchy of segments with constraints related to structural, spatial and data-dependent pairwise relationships between regions. Our results show that such strategy provide better regularization than a series of strong baselines reflecting state-of-the-art technologies. The proposed strategy offers a flexible and principled framework to include several sources of visual and structural information, while allowing for different degrees of spatial regularization accounting for priors about the expected output structures. |
Tasks | Multi-Task Learning, Semantic Segmentation |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07675v1 |
http://arxiv.org/pdf/1808.07675v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-multi-task-learning-for-a-geographically |
Repo | |
Framework | |
Predictive Local Smoothness for Stochastic Gradient Methods
Title | Predictive Local Smoothness for Stochastic Gradient Methods |
Authors | Jun Li, Hongfu Liu, Bineng Zhong, Yue Wu, Yun Fu |
Abstract | Stochastic gradient methods are dominant in nonconvex optimization especially for deep models but have low asymptotical convergence due to the fixed smoothness. To address this problem, we propose a simple yet effective method for improving stochastic gradient methods named predictive local smoothness (PLS). First, we create a convergence condition to build a learning rate which varies adaptively with local smoothness. Second, the local smoothness can be predicted by the latest gradients. Third, we use the adaptive learning rate to update the stochastic gradients for exploring linear convergence rates. By applying the PLS method, we implement new variants of three popular algorithms: PLS-stochastic gradient descent (PLS-SGD), PLS-accelerated SGD (PLS-AccSGD), and PLS-AMSGrad. Moreover, we provide much simpler proofs to ensure their linear convergence. Empirical results show that the variants have better performance gains than the popular algorithms, such as, faster convergence and alleviating explosion and vanish of gradients. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09386v1 |
http://arxiv.org/pdf/1805.09386v1.pdf | |
PWC | https://paperswithcode.com/paper/predictive-local-smoothness-for-stochastic |
Repo | |
Framework | |
Detecting GAN-generated Imagery using Color Cues
Title | Detecting GAN-generated Imagery using Color Cues |
Authors | Scott McCloskey, Michael Albright |
Abstract | Image forensics is an increasingly relevant problem, as it can potentially address online disinformation campaigns and mitigate problematic aspects of social media. Of particular interest, given its recent successes, is the detection of imagery produced by Generative Adversarial Networks (GANs), e.g. `deepfakes’. Leveraging large training sets and extensive computing resources, recent work has shown that GANs can be trained to generate synthetic imagery which is (in some ways) indistinguishable from real imagery. We analyze the structure of the generating network of a popular GAN implementation, and show that the network’s treatment of color is markedly different from a real camera in two ways. We further show that these two cues can be used to distinguish GAN-generated imagery from camera imagery, demonstrating effective discrimination between GAN imagery and real camera images used to train the GAN. | |
Tasks | |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.08247v1 |
http://arxiv.org/pdf/1812.08247v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-gan-generated-imagery-using-color |
Repo | |
Framework | |
A Unified Framework of Deep Neural Networks by Capsules
Title | A Unified Framework of Deep Neural Networks by Capsules |
Authors | Yujian Li, Chuanhui Shan |
Abstract | With the growth of deep learning, how to describe deep neural networks unifiedly is becoming an important issue. We first formalize neural networks mathematically with their directed graph representations, and prove a generation theorem about the induced networks of connected directed acyclic graphs. Then, we set up a unified framework for deep learning with capsule networks. This capsule framework could simplify the description of existing deep neural networks, and provide a theoretical basis of graphic designing and programming techniques for deep learning models, thus would be of great significance to the advancement of deep learning. |
Tasks | |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03551v2 |
http://arxiv.org/pdf/1805.03551v2.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-framework-of-deep-neural-networks |
Repo | |
Framework | |
Dynamic Programming Approach to Template-based OCR
Title | Dynamic Programming Approach to Template-based OCR |
Authors | M. A. Povolotskiy, D. V. Tropin |
Abstract | In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to be minimized. We show that such a choice of restrictions allows a faster algorithm to be used than the one for the general form of deformation penalties. We named this algorithm Dynamic Squeezeboxes Packing (DSP) and applied it to solve the two OCR problems: text fields extraction from an image of document Visual Inspection Zone (VIZ) and license plate segmentation. The quality and the performance of resulting solutions were experimentally proved to meet the requirements of the state-of-the-art industrial recognition systems. |
Tasks | Optical Character Recognition |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.07933v1 |
http://arxiv.org/pdf/1812.07933v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-programming-approach-to-template |
Repo | |
Framework | |