February 1, 2020

3299 words 16 mins read

Paper Group AWR 93

Selective Kernel Networks. Efficient training of energy-based models via spin-glass control. CommunityGAN: Community Detection with Generative Adversarial Nets. Utterance-level Aggregation For Speaker Recognition In The Wild. Data-Driven Neuron Allocation for Scale Aggregation Networks. Boltzmann Exploration Expectation-Maximisation. Type-Driven Au …

Selective Kernel Networks


Title	Selective Kernel Networks
Authors	Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang
Abstract	In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.
Tasks	Image Classification
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06586v2
PDF	http://arxiv.org/pdf/1903.06586v2.pdf
PWC	https://paperswithcode.com/paper/selective-kernel-networks
Repo	https://github.com/implus/PytorchInsight
Framework	pytorch

Efficient training of energy-based models via spin-glass control


Title	Efficient training of energy-based models via spin-glass control
Authors	Alejandro Pozas-Kerstjens, Gorka Muñoz-Gil, Miguel Ángel García-March, Antonio Acín, Maciej Lewenstein, Przemysław R. Grzybowski
Abstract	We present an efficient method for unsupervised learning using Boltzmann machines. The method is rooted in the control of the spin-glass properties of the Ising model described by the Boltzmann machine’s weights. This allows for very easy access to low-energy configurations. We apply RAPID, the combination of Restricting the Axons (RA) of the model and training via Pattern-InDuced correlations (PID), to learn the Bars and Stripes dataset of various sizes and the MNIST dataset. We show how, in these tasks, RAPID quickly outperforms standard techniques for unsupervised learning in generalization ability. Indeed, both the number of epochs needed for effective learning and the computation time per training step are greatly reduced. In its simplest form, PID allows to compute the negative phase of the log-likelihood gradient with no Markov chain Monte Carlo sampling costs at all.
Tasks
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01592v1
PDF	https://arxiv.org/pdf/1910.01592v1.pdf
PWC	https://paperswithcode.com/paper/efficient-training-of-energy-based-models-via
Repo	https://github.com/apozas/rapid
Framework	pytorch

CommunityGAN: Community Detection with Generative Adversarial Nets


Title	CommunityGAN: Community Detection with Generative Adversarial Nets
Authors	Yuting Jia, Qinqin Zhang, Weinan Zhang, Xinbing Wang
Abstract	Community detection refers to the task of discovering groups of vertices sharing similar properties or functions so as to understand the network data. With the recent development of deep learning, graph representation learning techniques are also utilized for community detection. However, the communities can only be inferred by applying clustering algorithms based on learned vertex embeddings. These general cluster algorithms like K-means and Gaussian Mixture Model cannot output much overlapped communities, which have been proved to be very common in many real-world networks. In this paper, we propose CommunityGAN, a novel community detection framework that jointly solves overlapping community detection and graph representation learning. First, unlike the embedding of conventional graph representation learning algorithms where the vector entry values have no specific meanings, the embedding of CommunityGAN indicates the membership strength of vertices to communities. Second, a specifically designed Generative Adversarial Net (GAN) is adopted to optimize such embedding. Through the minimax competition between the motif-level generator and discriminator, both of them can alternatively and iteratively boost their performance and finally output a better community structure. Extensive experiments on synthetic data and real-world tasks demonstrate that CommunityGAN achieves substantial community detection performance gains over the state-of-the-art methods.
Tasks	Community Detection, Graph Representation Learning, Representation Learning
Published	2019-01-20
URL	https://arxiv.org/abs/1901.06631v3
PDF	https://arxiv.org/pdf/1901.06631v3.pdf
PWC	https://paperswithcode.com/paper/communitygan-community-detection-with
Repo	https://github.com/SamJia/CommunityGAN
Framework	tf

Utterance-level Aggregation For Speaker Recognition In The Wild


Title	Utterance-level Aggregation For Speaker Recognition In The Wild
Authors	Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman
Abstract	The objective of this paper is speaker recognition “in the wild”-where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a “thin-ResNet” trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for “in the wild” data, a longer length is beneficial.
Tasks	Speaker Recognition, Text-Independent Speaker Verification
Published	2019-02-26
URL	https://arxiv.org/abs/1902.10107v2
PDF	https://arxiv.org/pdf/1902.10107v2.pdf
PWC	https://paperswithcode.com/paper/utterance-level-aggregation-for-speaker
Repo	https://github.com/WeidiXie/VGG-Speaker-Recognition
Framework	tf

Data-Driven Neuron Allocation for Scale Aggregation Networks


Title	Data-Driven Neuron Allocation for Scale Aggregation Networks
Authors	Yi Li, Zhanghui Kuang, Yimin Chen, Wayne Zhang
Abstract	Successful visual recognition networks benefit from aggregating information spanning from a wide range of scales. Previous research has investigated information fusion of connected layers or multiple branches in a block, seeking to strengthen the power of multi-scale representations. Despite their great successes, existing practices often allocate the neurons for each scale manually, and keep the same ratio in all aggregation blocks of an entire network, rendering suboptimal performance. In this paper, we propose to learn the neuron allocation for aggregating multi-scale information in different building blocks of a deep network. The most informative output neurons in each block are preserved while others are discarded, and thus neurons for multiple scales are competitively and adaptively allocated. Our scale aggregation network (ScaleNet) is constructed by repeating a scale aggregation (SA) block that concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The data-driven neuron allocation and SA block achieve strong representational power at the cost of considerably low computational complexity. The proposed ScaleNet, by replacing all 3x3 convolutions in ResNet with our SA blocks, achieves better performance than ResNet and its outstanding variants like ResNeXt and SE-ResNet, in the same computational complexity. On ImageNet classification, ScaleNets absolutely reduce the top-1 error rate of ResNets by 1.12 (101 layers) and 1.82 (50 layers). On COCO object detection, ScaleNets absolutely improve the mmAP with backbone of ResNets by 3.6 (101 layers) and 4.6 (50 layers) on Faster RCNN, respectively. Code and models are released at https://github.com/Eli-YiLi/ScaleNet.
Tasks	Image Classification, Object Detection
Published	2019-04-20
URL	http://arxiv.org/abs/1904.09460v1
PDF	http://arxiv.org/pdf/1904.09460v1.pdf
PWC	https://paperswithcode.com/paper/190409460
Repo	https://github.com/Eli-YiLi/ScaleNet
Framework	tf

Boltzmann Exploration Expectation-Maximisation


Title	Boltzmann Exploration Expectation-Maximisation
Authors	Mathias Edman, Neil Dhir
Abstract	We present a general method for fitting finite mixture models (FMM). Learning in a mixture model consists of finding the most likely cluster assignment for each data-point, as well as finding the parameters of the clusters themselves. In many mixture models, this is difficult with current learning methods, where the most common approach is to employ monotone learning algorithms e.g. the conventional expectation-maximisation algorithm. While effective, the success of any monotone algorithm is crucially dependant on good parameter initialisation, where a common choice is $K$-means initialisation, commonly employed for Gaussian mixture models. For other types of mixture models, the path to good initialisation parameters is often unclear and may require a problem-specific solution. To this end, we propose a general heuristic learning algorithm that utilises Boltzmann exploration to assign each observation to a specific base distribution within the mixture model, which we call Boltzmann exploration expectation-maximisation (BEEM). With BEEM, hard assignments allow straight forward parameter learning for each base distribution by conditioning only on its assigned observations. Consequently, it can be applied to mixtures of any base distribution where single component parameter learning is tractable. The stochastic learning procedure is able to escape local optima and is thus insensitive to parameter initialisation. We show competitive performance on a number of synthetic benchmark cases as well as on real-world datasets.
Tasks
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08869v1
PDF	https://arxiv.org/pdf/1912.08869v1.pdf
PWC	https://paperswithcode.com/paper/boltzmann-exploration-expectation
Repo	https://github.com/kaminAI/beem
Framework	none

Type-Driven Automated Learning with Lale


Title	Type-Driven Automated Learning with Lale
Authors	Martin Hirzel, Kiran Kate, Avraham Shinnar, Subhrajit Roy, Parikshit Ram
Abstract	Machine-learning automation tools, ranging from humble grid-search to hyperopt, auto-sklearn, and TPOT, help explore large search spaces of possible pipelines. Unfortunately, each of these tools has a different syntax for specifying its search space, leading to lack of portability, missed relevant points, and spurious points that are inconsistent with error checks and documentation of the searchable base components. This paper proposes using types (such as enum, float, or dictionary) both for checking the correctness of, and for automatically searching over, hyperparameters and pipeline configurations. Using types for both of these purposes guarantees consistency. We present Lale, an embedded language that resembles scikit learn but provides better automation, correctness checks, and portability. Lale extends the reach of existing automation tools across data modalities (tables, text, images, time-series) and programming languages (Python, Java, R). Thus, data scientists can leverage automation while remaining in control of their work.
Tasks	Time Series
Published	2019-05-24
URL	https://arxiv.org/abs/1906.03957v1
PDF	https://arxiv.org/pdf/1906.03957v1.pdf
PWC	https://paperswithcode.com/paper/type-driven-automated-learning-with-lale
Repo	https://github.com/IBM/lale
Framework	pytorch

Modeling plate and spring reverberation using a DSP-informed deep neural network


Title	Modeling plate and spring reverberation using a DSP-informed deep neural network
Authors	Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss
Abstract	Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation. Nowadays they are often used in music production for aesthetic reasons due to their particular sonic characteristics. The modeling of these audio processors and their perceptual qualities is difficult since they use mechanical elements together with analog electronics resulting in an extremely complex response. Based on digital reverberators that use sparse FIR filters, we propose a signal processing-informed deep learning architecture for the modeling of artificial reverberators. We explore the capabilities of deep neural networks to learn such highly nonlinear electromechanical responses and we perform modeling of plate and spring reverberators. In order to measure the performance of the model, we conduct a perceptual evaluation experiment and we also analyze how the given task is accomplished and what the model is actually learning.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10105v1
PDF	https://arxiv.org/pdf/1910.10105v1.pdf
PWC	https://paperswithcode.com/paper/modeling-plate-and-spring-reverberation-using
Repo	https://github.com/mchijmma/modeling-plate-spring-reverb
Framework	none

Understanding the Behaviors of BERT in Ranking


Title	Understanding the Behaviors of BERT in Ranking
Authors	Yifan Qiao, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu
Abstract	This paper studies the performances and behaviors of BERT in ranking tasks. We explore several different ways to leverage the pre-trained BERT and fine-tune it on two ranking tasks: MS MARCO passage reranking and TREC Web Track ad hoc document ranking. Experimental results on MS MARCO demonstrate the strong effectiveness of BERT in question-answering focused passage ranking tasks, as well as the fact that BERT is a strong interaction-based seq2seq matching model. Experimental results on TREC show the gaps between the BERT pre-trained on surrounding contexts and the needs of ad hoc document ranking. Analyses illustrate how BERT allocates its attentions between query-document tokens in its Transformer layers, how it prefers semantic matches between paraphrase tokens, and how that differs with the soft match patterns learned by a click-trained neural ranker.
Tasks	Document Ranking, Question Answering
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07531v4
PDF	http://arxiv.org/pdf/1904.07531v4.pdf
PWC	https://paperswithcode.com/paper/understanding-the-behaviors-of-bert-in
Repo	https://github.com/NavePnow/Google-BERT-on-fake_or_real-news-dataset
Framework	pytorch

Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning


Title	Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning
Authors	Diego Ferigo, Silvio Traversaro, Giorgio Metta, Daniele Pucci
Abstract	This paper presents Gym-Ignition, a new framework to create reproducible robotic environments for reinforcement learning research. It interfaces with the new generation of Gazebo, part of the Ignition Robotics suite, which provides three main improvements for reinforcement learning applications compared to the alternatives: 1) the modular architecture enables using the simulator as a C++ library, simplifying the interconnection with external software; 2) multiple physics and rendering engines are supported as plugins, simplifying their selection during the execution; 3) the new distributed simulation capability allows simulating complex scenarios while sharing the load on multiple workers and machines. The core of Gym-Ignition is a component that contains the Ignition Gazebo simulator and exposes a simple interface for its configuration and execution. We provide a Python package that allows developers to create robotic environments simulated in Ignition Gazebo. Environments expose the common OpenAI Gym interface, making them compatible out-of-the-box with third-party frameworks containing reinforcement learning algorithms. Simulations can be executed in both headless and GUI mode, the physics engine can run in accelerated mode, and instances can be parallelized. Furthermore, the Gym-Ignition software architecture provides abstraction of the Robot and the Task, making environments agnostic on the specific runtime. This abstraction allows their execution also in a real-time setting on actual robotic platforms, even if driven by different middlewares.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01715v2
PDF	https://arxiv.org/pdf/1911.01715v2.pdf
PWC	https://paperswithcode.com/paper/gym-ignition-reproducible-robotic-simulations
Repo	https://github.com/robotology/gym-ignition
Framework	none

Optimising Trotter-Suzuki Decompositions for Quantum Simulation Using Evolutionary Strategies


Title	Optimising Trotter-Suzuki Decompositions for Quantum Simulation Using Evolutionary Strategies
Authors	Benjamin D. M. Jones, George O. O’Brien, David R. White, Earl T. Campbell, John A. Clark
Abstract	One of the most promising applications of near-term quantum computing is the simulation of quantum systems, a classically intractable task. Quantum simulation requires computationally expensive matrix exponentiation; Trotter-Suzuki decomposition of this exponentiation enables efficient simulation to a desired accuracy on a quantum computer. We apply the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) algorithm to optimise the Trotter-Suzuki decompositions of a canonical quantum system, the Heisenberg Chain; we reduce simulation error by around 60%. We introduce this problem to the computational search community, show that an evolutionary optimisation approach is robust across runs and problem instances, and find that optimisation results generalise to the simulation of larger systems.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01336v3
PDF	http://arxiv.org/pdf/1904.01336v3.pdf
PWC	https://paperswithcode.com/paper/optimising-trotter-suzuki-decompositions-for
Repo	https://github.com/sheffieldquantum/qsim
Framework	none

End to End Trainable Active Contours via Differentiable Rendering


Title	End to End Trainable Active Contours via Differentiable Rendering
Authors	Shir Gur, Tal Shaharabany, Lior Wolf
Abstract	We present an image segmentation method that iteratively evolves a polygon. At each iteration, the vertices of the polygon are displaced based on the local value of a 2D shift map that is inferred from the input image via an encoder-decoder architecture. The main training loss that is used is the difference between the polygon shape and the ground truth segmentation mask. The network employs a neural renderer to create the polygon from its vertices, making the process fully differentiable. We demonstrate that our method outperforms the state of the art segmentation networks and deep active contour solutions in a variety of benchmarks, including medical imaging and aerial images. Our code is available at https://github.com/shirgur/ACDRNet.
Tasks	Semantic Segmentation
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00367v1
PDF	https://arxiv.org/pdf/1912.00367v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-trainable-active-contours-via-1
Repo	https://github.com/shirgur/ACDRNet
Framework	pytorch

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation


Title	Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
Authors	Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler
Abstract	Current state-of-the-art methods for image segmentation form a dense image representation where the color, shape and texture information are all processed together inside a deep CNN. This however may not be ideal as they contain very different type of information relevant for recognition. Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i.e. shape stream, that processes information in parallel to the classical stream. Key to this architecture is a new type of gates that connect the intermediate layers of the two streams. Specifically, we use the higher-level activations in the classical stream to gate the lower-level activations in the shape stream, effectively removing noise and helping the shape stream to only focus on processing the relevant boundary-related information. This enables us to use a very shallow architecture for the shape stream that operates on the image-level resolution. Our experiments show that this leads to a highly effective architecture that produces sharper predictions around object boundaries and significantly boosts performance on thinner and smaller objects. Our method achieves state-of-the-art performance on the Cityscapes benchmark, in terms of both mask (mIoU) and boundary (F-score) quality, improving by 2% and 4% over strong baselines.
Tasks	Semantic Segmentation
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05740v1
PDF	https://arxiv.org/pdf/1907.05740v1.pdf
PWC	https://paperswithcode.com/paper/gated-scnn-gated-shape-cnns-for-semantic
Repo	https://github.com/moonkeyd/Note-for-GSCNN
Framework	none

Sequential estimation of quantiles with applications to A/B-testing and best-arm identification


Title	Sequential estimation of quantiles with applications to A/B-testing and best-arm identification
Authors	Steven R. Howard, Aaditya Ramdas
Abstract	Consider the problem of sequentially estimating quantiles of any distribution over a complete, fully-ordered set, based on a stream of i.i.d. observations. We propose new, theoretically sound and practically tight confidence sequences for quantiles, that is, sequences of confidence intervals which are valid uniformly over time. We give two methods for tracking a fixed quantile and two methods for tracking all quantiles simultaneously. Specifically, we provide explicit expressions with small constants for intervals whose widths shrink at the fastest possible $\sqrt{t^{-1} \log\log t}$ rate, as determined by the law of the iterated logarithm (LIL). As a byproduct, we give a non-asymptotic concentration inequality for the empirical distribution function which holds uniformly over time with the LIL rate, thus strengthening Smirnov’s asymptotic empirical process LIL, and extending the famed Dvoretzky-Kiefer-Wolfowitz (DKW) inequality to hold uniformly over all sample sizes while only being about twice as wide in practice. This inequality directly yields sequential analogues of the one- and two-sample Kolmogorov-Smirnov tests, and a test of stochastic dominance. We apply our results to the problem of selecting an arm with an approximately best quantile in a multi-armed bandit framework, proving a state-of-the-art sample complexity bound for a novel allocation strategy. Simulations demonstrate that our method stops with fewer samples than existing methods by a factor of five to fifty. Finally, we show how to compute confidence sequences for the difference between quantiles of two arms in an A/B test, along with corresponding always-valid $p$-values.
Tasks
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09712v2
PDF	https://arxiv.org/pdf/1906.09712v2.pdf
PWC	https://paperswithcode.com/paper/sequential-estimation-of-quantiles-with
Repo	https://github.com/gostevehoward/confseq
Framework	none

Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment


Title	Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment
Authors	Jingyang Zhang, Huanrui Yang, Fan Chen, Yitu Wang, Hai Li
Abstract	Emerging resistive random-access memory (ReRAM) has recently been intensively investigated to accelerate the processing of deep neural networks (DNNs). Due to the in-situ computation capability, analog ReRAM crossbars yield significant throughput improvement and energy reduction compared to traditional digital methods. However, the power hungry analog-to-digital converters (ADCs) prevent the practical deployment of ReRAM-based DNN accelerators on end devices with limited chip area and power budget. We observe that due to the limited bit-density of ReRAM cells, DNN weights are bit sliced and correspondingly stored on multiple ReRAM bitlines. The accumulated current on bitlines resulted by weights directly dictates the overhead of ADCs. As such, bitwise weight sparsity rather than the sparsity of the full weight, is desirable for efficient ReRAM deployment. In this work, we propose bit-slice L1, the first algorithm to induce bit-slice sparsity during the training of dynamic fixed-point DNNs. Experiment results show that our approach achieves 2x sparsity improvement compared to previous algorithms. The resulting sparsity allows the ADC resolution to be reduced to 1-bit of the most significant bit-slice and down to 3-bit for the others bits, which significantly speeds up processing and reduces power and area overhead.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08496v2
PDF	https://arxiv.org/pdf/1909.08496v2.pdf
PWC	https://paperswithcode.com/paper/exploring-bit-slice-sparsity-in-deep-neural
Repo	https://github.com/zjysteven/bitslice_sparsity
Framework	pytorch