October 17, 2019

2915 words 14 mins read

Paper Group ANR 887

Generalization Challenges for Neural Architectures in Audio Source Separation. Actor-Centric Relation Network. Improving Fingerprint Pore Detection with a Small FCN. Learning Actionable Representations from Visual Observations. Distributed optimization in wireless sensor networks: an island-model framework. End-to-end Language Identification using …

Generalization Challenges for Neural Architectures in Audio Source Separation


Title	Generalization Challenges for Neural Architectures in Audio Source Separation
Authors	Shariq Mobin, Brian Cheung, Bruno Olshausen
Abstract	Recent work has shown that recurrent neural networks can be trained to separate individual speakers in a sound mixture with high fidelity. Here we explore convolutional neural network models as an alternative and show that they achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize and compare the robustness and ability of these different approaches to generalize under three different test conditions: longer time sequences, the addition of intermittent noise, and different datasets not seen during training. For the last condition, we create a new dataset, RealTalkLibri, to test source separation in real-world environments. We show that the acoustics of the environment have significant impact on the structure of the waveform and the overall performance of neural network models, with the convolutional model showing superior ability to generalize to new environments. The code for our study is available at https://github.com/ShariqM/source_separation.
Tasks
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08629v2
PDF	http://arxiv.org/pdf/1803.08629v2.pdf
PWC	https://paperswithcode.com/paper/generalization-challenges-for-neural
Repo
Framework

Actor-Centric Relation Network


Title	Actor-Centric Relation Network
Authors	Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid
Abstract	Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mines the relevant elements automatically with an actor-centric relational network (ACRN). ACRN computes and accumulates pair-wise relation information from actor and global scene features, and generates relation features for action classification. It is implemented as neural networks and can be trained jointly with an existing action detection system. We show that ACRN outperforms alternative approaches which capture relation information, and that the proposed framework improves upon the state-of-the-art performance on JHMDB and AVA. A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.
Tasks	Action Classification, Action Detection, Action Localization, Action Recognition In Videos, Spatio-Temporal Action Localization, Temporal Action Localization
Published	2018-07-28
URL	http://arxiv.org/abs/1807.10982v1
PDF	http://arxiv.org/pdf/1807.10982v1.pdf
PWC	https://paperswithcode.com/paper/actor-centric-relation-network
Repo
Framework

Improving Fingerprint Pore Detection with a Small FCN


Title	Improving Fingerprint Pore Detection with a Small FCN
Authors	Gabriel Dahia, Maurício Pamplona Segundo
Abstract	In this work, we investigate if previously proposed CNNs for fingerprint pore detection overestimate the number of required model parameters for this task. We show that this is indeed the case by proposing a fully convolutional neural network that has significantly fewer parameters. We evaluate this model using a rigorous and reproducible protocol, which was, prior to our work, not available to the community. Using our protocol, we show that the proposed model, when combined with post-processing, performs better than previous methods, albeit being much more efficient. All our code is available at https://github.com/gdahia/fingerprint-pore-detection
Tasks
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06846v1
PDF	http://arxiv.org/pdf/1811.06846v1.pdf
PWC	https://paperswithcode.com/paper/improving-fingerprint-pore-detection-with-a
Repo
Framework

Learning Actionable Representations from Visual Observations


Title	Learning Actionable Representations from Visual Observations
Authors	Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet
Abstract	In this work we explore a new approach for robots to teach themselves about the world simply by observing it. In particular we investigate the effectiveness of learning task-agnostic representations for continuous control tasks. We extend Time-Contrastive Networks (TCN) that learn from visual observations by embedding multiple frames jointly in the embedding space as opposed to a single frame. We show that by doing so, we are now able to encode both position and velocity attributes significantly more accurately. We test the usefulness of this self-supervised approach in a reinforcement learning setting. We show that the representations learned by agents observing themselves take random actions, or other agents perform tasks successfully, can enable the learning of continuous control policies using algorithms like Proximal Policy Optimization (PPO) using only the learned embeddings as input. We also demonstrate significant improvements on the real-world Pouring dataset with a relative error reduction of 39.4% for motion attributes and 11.1% for static attributes compared to the single-frame baseline. Video results are available at https://sites.google.com/view/actionablerepresentations .
Tasks	Continuous Control
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00928v3
PDF	http://arxiv.org/pdf/1808.00928v3.pdf
PWC	https://paperswithcode.com/paper/learning-actionable-representations-from
Repo
Framework

Distributed optimization in wireless sensor networks: an island-model framework


Title	Distributed optimization in wireless sensor networks: an island-model framework
Authors	Giovanni Iacca
Abstract	Wireless Sensor Networks (WSNs) is an emerging technology in several application domains, ranging from urban surveillance to environmental and structural monitoring. Computational Intelligence (CI) techniques are particularly suitable for enhancing these systems. However, when embedding CI into wireless sensors, severe hardware limitations must be taken into account. In this paper we investigate the possibility to perform an online, distributed optimization process within a WSN. Such a system might be used, for example, to implement advanced network features like distributed modelling, self-optimizing protocols, and anomaly detection, to name a few. The proposed approach, called DOWSN (Distributed Optimization for WSN) is an island-model infrastructure in which each node executes a simple, computationally cheap (both in terms of CPU and memory) optimization algorithm, and shares promising solutions with its neighbors. We perform extensive tests of different DOWSN configurations on a benchmark made up of continuous optimization problems; we analyze the influence of the network parameters (number of nodes, inter-node communication period and probability of accepting incoming solutions) on the optimization performance. Finally, we profile energy and memory consumption of DOWSN to show the efficient usage of the limited hardware resources available on the sensor nodes.
Tasks	Anomaly Detection, Distributed Optimization
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02679v1
PDF	http://arxiv.org/pdf/1810.02679v1.pdf
PWC	https://paperswithcode.com/paper/distributed-optimization-in-wireless-sensor
Repo
Framework

End-to-end Language Identification using NetFV and NetVLAD


Title	End-to-end Language Identification using NetFV and NetVLAD
Authors	Jinkun Chen, Weicheng Cai, Danwei Cai, Zexin Cai, Haibin Zhong, Ming Li
Abstract	In this paper, we apply the NetFV and NetVLAD layers for the end-to-end language identification task. NetFV and NetVLAD layers are the differentiable implementations of the standard Fisher Vector and Vector of Locally Aggregated Descriptors (VLAD) methods, respectively. Both of them can encode a sequence of feature vectors into a fixed dimensional vector which is very important to process those variable-length utterances. We first present the relevances and differences between the classical i-vector and the aforementioned encoding schemes. Then, we construct a flexible end-to-end framework including a convolutional neural network (CNN) architecture and an encoding layer (NetFV or NetVLAD) for the language identification task. Experimental results on the NIST LRE 2007 close-set task show that the proposed system achieves significant EER reductions against the conventional i-vector baseline and the CNN temporal average pooling system, respectively.
Tasks	Language Identification
Published	2018-09-09
URL	http://arxiv.org/abs/1809.02906v1
PDF	http://arxiv.org/pdf/1809.02906v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-language-identification-using
Repo
Framework

Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification


Title	Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification
Authors	Xinxing Su, Yingtian Zou, Yu Cheng, Shuangjie Xu, Mo Yu, Pan Zhou
Abstract	We tackle the problem of person re-identification in video setting in this paper, which has been viewed as a crucial task in many applications. Meanwhile, it is very challenging since the task requires learning effective representations from video sequences with heterogeneous spatial-temporal information. We present a novel method - Spatial-Temporal Synergic Residual Network (STSRN) for this problem. STSRN contains a spatial residual extractor, a temporal residual processor and a spatial-temporal smooth module. The smoother can alleviate sample noises along the spatial-temporal dimensions thus enable STSRN extracts more robust spatial-temporal features of consecutive frames. Extensive experiments are conducted on several challenging datasets including iLIDS-VID, PRID2011 and MARS. The results demonstrate that the proposed method achieves consistently superior performance over most of state-of-the-art methods.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-07-16
URL	http://arxiv.org/abs/1807.05799v1
PDF	http://arxiv.org/pdf/1807.05799v1.pdf
PWC	https://paperswithcode.com/paper/spatial-temporal-synergic-residual-learning
Repo
Framework

Fast OBDD Reordering using Neural Message Passing on Hypergraph


Title	Fast OBDD Reordering using Neural Message Passing on Hypergraph
Authors	Feifan Xu, Fei He, Enze Xie, Liang Li
Abstract	Ordered binary decision diagrams (OBDDs) are an efficient data structure for representing and manipulating Boolean formulas. With respect to different variable orders, the OBDDs’ sizes may vary from linear to exponential in the number of the Boolean variables. Finding the optimal variable order has been proved a NP-complete problem. Many heuristics have been proposed to find a near-optimal solution of this problem. In this paper, we propose a neural network-based method to predict near-optimal variable orders for unknown formulas. Viewing these formulas as hypergraphs, and lifting the message passing neural network into 3-hypergraph (MPNN3), we are able to learn the patterns of Boolean formula. Compared to the traditional methods, our method can find a near-the-best solution with an extremely shorter time, even for some hard examples.To the best of our knowledge, this is the first work on applying neural network to OBDD reordering.
Tasks
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02178v1
PDF	http://arxiv.org/pdf/1811.02178v1.pdf
PWC	https://paperswithcode.com/paper/fast-obdd-reordering-using-neural-message
Repo
Framework

UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition


Title	UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition
Authors	Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, Christopher W. Fletcher
Abstract	Convolutional Neural Networks (CNNs) have begun to permeate all corners of electronic society (from voice recognition to scene generation) due to their high accuracy and machine efficiency per operation. At their core, CNN computations are made up of multi-dimensional dot products between weight and input vectors. This paper studies how weight repetition —when the same weight occurs multiple times in or across weight vectors— can be exploited to save energy and improve performance during CNN inference. This generalizes a popular line of work to improve efficiency from CNN weight sparsity, as reducing computation due to repeated zero weights is a special case of reducing computation due to repeated weights. To exploit weight repetition, this paper proposes a new CNN accelerator called the Unique Weight CNN Accelerator (UCNN). UCNN uses weight repetition to reuse CNN sub-computations (e.g., dot products) and to reduce CNN model size when stored in off-chip DRAM —both of which save energy. UCNN further improves performance by exploiting sparsity in weights. We evaluate UCNN with an accelerator-level cycle and energy model and with an RTL implementation of the UCNN processing element. On three contemporary CNNs, UCNN improves throughput-normalized energy consumption by 1.2x - 4x, relative to a similarly provisioned baseline accelerator that uses Eyeriss-style sparsity optimizations. At the same time, the UCNN processing element adds only 17-24% area overhead relative to the same baseline.
Tasks	Scene Generation
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06508v1
PDF	http://arxiv.org/pdf/1804.06508v1.pdf
PWC	https://paperswithcode.com/paper/ucnn-exploiting-computational-reuse-in-deep
Repo
Framework

VISER: Visual Self-Regularization


Title	VISER: Visual Self-Regularization
Authors	Hamid Izadinia, Pierre Garrigues
Abstract	In this work, we propose the use of large set of unlabeled images as a source of regularization data for learning robust visual representation. Given a visual model trained by a labeled dataset in a supervised fashion, we augment our training samples by incorporating large number of unlabeled data and train a semi-supervised model. We demonstrate that our proposed learning approach leverages an abundance of unlabeled images and boosts the visual recognition performance which alleviates the need to rely on large labeled datasets for learning robust representation. To increment the number of image instances needed to learn robust visual models in our approach, each labeled image propagates its label to its nearest unlabeled image instances. These retrieved unlabeled images serve as local perturbations of each labeled image to perform Visual Self-Regularization (VISER). To retrieve such visual self regularizers, we compute the cosine similarity in a semantic space defined by the penultimate layer in a fully convolutional neural network. We use the publicly available Yahoo Flickr Creative Commons 100M dataset as the source of our unlabeled image set and propose a distributed approximate nearest neighbor algorithm to make retrieval practical at that scale. Using the labeled instances and their regularizer samples we show that we significantly improve object categorization and localization performance on the MS COCO and Visual Genome datasets where objects appear in context.
Tasks
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02568v1
PDF	http://arxiv.org/pdf/1802.02568v1.pdf
PWC	https://paperswithcode.com/paper/viser-visual-self-regularization
Repo
Framework

Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images


Title	Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images
Authors	Michele Volpi, Devis Tuia
Abstract	When approaching the semantic segmentation of overhead imagery in the decimeter spatial resolution range, successful strategies usually combine powerful methods to learn the visual appearance of the semantic classes (e.g. convolutional neural networks) with strategies for spatial regularization (e.g. graphical models such as conditional random fields). In this paper, we propose a method to learn evidence in the form of semantic class likelihoods, semantic boundaries across classes and shallow-to-deep visual features, each one modeled by a multi-task convolutional neural network architecture. We combine this bottom-up information with top-down spatial regularization encoded by a conditional random field model optimizing the label space across a hierarchy of segments with constraints related to structural, spatial and data-dependent pairwise relationships between regions. Our results show that such strategy provide better regularization than a series of strong baselines reflecting state-of-the-art technologies. The proposed strategy offers a flexible and principled framework to include several sources of visual and structural information, while allowing for different degrees of spatial regularization accounting for priors about the expected output structures.
Tasks	Multi-Task Learning, Semantic Segmentation
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07675v1
PDF	http://arxiv.org/pdf/1808.07675v1.pdf
PWC	https://paperswithcode.com/paper/deep-multi-task-learning-for-a-geographically
Repo
Framework

Predictive Local Smoothness for Stochastic Gradient Methods


Title	Predictive Local Smoothness for Stochastic Gradient Methods
Authors	Jun Li, Hongfu Liu, Bineng Zhong, Yue Wu, Yun Fu
Abstract	Stochastic gradient methods are dominant in nonconvex optimization especially for deep models but have low asymptotical convergence due to the fixed smoothness. To address this problem, we propose a simple yet effective method for improving stochastic gradient methods named predictive local smoothness (PLS). First, we create a convergence condition to build a learning rate which varies adaptively with local smoothness. Second, the local smoothness can be predicted by the latest gradients. Third, we use the adaptive learning rate to update the stochastic gradients for exploring linear convergence rates. By applying the PLS method, we implement new variants of three popular algorithms: PLS-stochastic gradient descent (PLS-SGD), PLS-accelerated SGD (PLS-AccSGD), and PLS-AMSGrad. Moreover, we provide much simpler proofs to ensure their linear convergence. Empirical results show that the variants have better performance gains than the popular algorithms, such as, faster convergence and alleviating explosion and vanish of gradients.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09386v1
PDF	http://arxiv.org/pdf/1805.09386v1.pdf
PWC	https://paperswithcode.com/paper/predictive-local-smoothness-for-stochastic
Repo
Framework

Detecting GAN-generated Imagery using Color Cues


Title	Detecting GAN-generated Imagery using Color Cues
Authors	Scott McCloskey, Michael Albright
Abstract	Image forensics is an increasingly relevant problem, as it can potentially address online disinformation campaigns and mitigate problematic aspects of social media. Of particular interest, given its recent successes, is the detection of imagery produced by Generative Adversarial Networks (GANs), e.g. `deepfakes’. Leveraging large training sets and extensive computing resources, recent work has shown that GANs can be trained to generate synthetic imagery which is (in some ways) indistinguishable from real imagery. We analyze the structure of the generating network of a popular GAN implementation, and show that the network’s treatment of color is markedly different from a real camera in two ways. We further show that these two cues can be used to distinguish GAN-generated imagery from camera imagery, demonstrating effective discrimination between GAN imagery and real camera images used to train the GAN. \|
Tasks
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08247v1
PDF	http://arxiv.org/pdf/1812.08247v1.pdf
PWC	https://paperswithcode.com/paper/detecting-gan-generated-imagery-using-color
Repo
Framework

A Unified Framework of Deep Neural Networks by Capsules


Title	A Unified Framework of Deep Neural Networks by Capsules
Authors	Yujian Li, Chuanhui Shan
Abstract	With the growth of deep learning, how to describe deep neural networks unifiedly is becoming an important issue. We first formalize neural networks mathematically with their directed graph representations, and prove a generation theorem about the induced networks of connected directed acyclic graphs. Then, we set up a unified framework for deep learning with capsule networks. This capsule framework could simplify the description of existing deep neural networks, and provide a theoretical basis of graphic designing and programming techniques for deep learning models, thus would be of great significance to the advancement of deep learning.
Tasks
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03551v2
PDF	http://arxiv.org/pdf/1805.03551v2.pdf
PWC	https://paperswithcode.com/paper/a-unified-framework-of-deep-neural-networks
Repo
Framework

Dynamic Programming Approach to Template-based OCR


Title	Dynamic Programming Approach to Template-based OCR
Authors	M. A. Povolotskiy, D. V. Tropin
Abstract	In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to be minimized. We show that such a choice of restrictions allows a faster algorithm to be used than the one for the general form of deformation penalties. We named this algorithm Dynamic Squeezeboxes Packing (DSP) and applied it to solve the two OCR problems: text fields extraction from an image of document Visual Inspection Zone (VIZ) and license plate segmentation. The quality and the performance of resulting solutions were experimentally proved to meet the requirements of the state-of-the-art industrial recognition systems.
Tasks	Optical Character Recognition
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07933v1
PDF	http://arxiv.org/pdf/1812.07933v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-programming-approach-to-template
Repo
Framework