January 27, 2020

3602 words 17 mins read

Paper Group ANR 1230

Efficient Residual Dense Block Search for Image Super-Resolution. Political Discussions in Homogeneous and Cross-Cutting Communication Spaces. Restricted Boltzmann Machines for galaxy morphology classification with a quantum annealer. Solving the Robot-World Hand-Eye(s) Calibration Problem with Iterative Methods. Dynamic Convolution: Attention over …

Efficient Residual Dense Block Search for Image Super-Resolution


Title	Efficient Residual Dense Block Search for Image Super-Resolution
Authors	Dehua Song, Chang Xu, Xu Jia, Yiyi Chen, Chunjing Xu, Yunhe Wang
Abstract	Although remarkable progress has been made on single image super-resolution due to the revival of deep convolutional neural networks, deep learning methods are confronted with the challenges of computation and memory consumption in practice, especially for mobile devices. Focusing on this issue, we propose an efficient residual dense block search algorithm with multiple objectives to hunt for fast, lightweight and accurate networks for image super-resolution. Firstly, to accelerate super-resolution network, we exploit the variation of feature scale adequately with the proposed efficient residual dense blocks. In the proposed evolutionary algorithm, the locations of pooling and upsampling operator are searched automatically. Secondly, network architecture is evolved with the guidance of block credits to acquire accurate super-resolution network. The block credit reflects the effect of current block and is earned during model evaluation process. It guides the evolution by weighing the sampling probability of mutation to favor admirable blocks. Extensive experimental results demonstrate the effectiveness of the proposed searching method and the found efficient super-resolution models achieve better performance than the state-of-the-art methods with limited number of parameters and FLOPs.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11409v3
PDF	https://arxiv.org/pdf/1909.11409v3.pdf
PWC	https://paperswithcode.com/paper/efficient-residual-dense-block-search-for
Repo
Framework

Political Discussions in Homogeneous and Cross-Cutting Communication Spaces


Title	Political Discussions in Homogeneous and Cross-Cutting Communication Spaces
Authors	Jisun An, Haewoon Kwak, Oliver Posegga, Andreas Jungherr
Abstract	Online platforms, such as Facebook, Twitter, and Reddit, provide users with a rich set of features for sharing and consuming political information, expressing political opinions, and exchanging potentially contrary political views. In such activities, two types of communication spaces naturally emerge: those dominated by exchanges between politically homogeneous users and those that allow and encourage cross-cutting exchanges in politically heterogeneous groups. While research on political talk in online environments abounds, we know surprisingly little about the potentially varying nature of discussions in politically homogeneous spaces as compared to cross-cutting communication spaces. To fill this gap, we use Reddit to explore the nature of political discussions in homogeneous and cross-cutting communication spaces. In particular, we develop an analytical template to study interaction and linguistic patterns within and between politically homogeneous and heterogeneous communication spaces. Our analyses reveal different behavioral patterns in homogeneous and cross-cutting communications spaces. We discuss theoretical and practical implications in the context of research on political talk online.
Tasks
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05643v1
PDF	http://arxiv.org/pdf/1904.05643v1.pdf
PWC	https://paperswithcode.com/paper/political-discussions-in-homogeneous-and
Repo
Framework

Restricted Boltzmann Machines for galaxy morphology classification with a quantum annealer


Title	Restricted Boltzmann Machines for galaxy morphology classification with a quantum annealer
Authors	João Caldeira, Joshua Job, Steven H. Adachi, Brian Nord, Gabriel N. Perdue
Abstract	We present the application of Restricted Boltzmann Machines (RBMs) to the task of astronomical image classification using a quantum annealer built by D-Wave Systems. Morphological analysis of galaxies provides critical information for studying their formation and evolution across cosmic time scales. We compress galaxy images using principal component analysis to fit a representation on the quantum hardware. Then, we train RBMs with discriminative and generative algorithms, including contrastive divergence and hybrid generative-discriminative approaches, to classify different galaxy morphologies. The methods we compare include Quantum Annealing (QA), Markov Chain Monte Carlo (MCMC) Gibbs Sampling, and Simulated Annealing (SA) as well as machine learning algorithms like gradient boosted decision trees. We find that RBMs implemented on D-Wave hardware perform well, and that they show some classification performance advantages on small datasets, but they don’t offer a broadly strategic advantage for this task. During this exploration, we analyzed the steps required for Boltzmann sampling with the D-Wave 2000Q, including a study of temperature estimation, and examined the impact of qubit noise by comparing and contrasting the original D-Wave 2000Q to the lower-noise version recently made available. While these analyses ultimately had minimal impact on the performance of the RBMs, we include them for reference.
Tasks	Image Classification, Morphological Analysis
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06259v2
PDF	https://arxiv.org/pdf/1911.06259v2.pdf
PWC	https://paperswithcode.com/paper/restricted-boltzmann-machines-for-galaxy
Repo
Framework

Solving the Robot-World Hand-Eye(s) Calibration Problem with Iterative Methods


Title	Solving the Robot-World Hand-Eye(s) Calibration Problem with Iterative Methods
Authors	Amy Tabb, Khalil M. Ahmad Yousef
Abstract	Robot-world, hand-eye calibration is the problem of determining the transformation between the robot end-effector and a camera, as well as the transformation between the robot base and the world coordinate system. This relationship has been modeled as $\mathbf{AX}=\mathbf{ZB}$, where $\mathbf{X}$ and $\mathbf{Z}$ are unknown homogeneous transformation matrices. The successful execution of many robot manipulation tasks depends on determining these matrices accurately, and we are particularly interested in the use of calibration for use in vision tasks. In this work, we describe a collection of methods consisting of two cost function classes, three different parameterizations of rotation components, and separable versus simultaneous formulations. We explore the behavior of this collection of methods on real datasets and simulated datasets, and compare to seven other state-of-the-art methods. Our collection of methods return greater accuracy on many metrics as compared to the state-of-the-art. The collection of methods is extended to the problem of robot-world hand-multiple-eye calibration, and results are shown with two and three cameras mounted on the same robot.
Tasks	Calibration
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12425v1
PDF	https://arxiv.org/pdf/1907.12425v1.pdf
PWC	https://paperswithcode.com/paper/solving-the-robot-world-hand-eyes-calibration
Repo
Framework

Dynamic Convolution: Attention over Convolution Kernels


Title	Dynamic Convolution: Attention over Convolution Kernels
Authors	Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, Zicheng Liu
Abstract	Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.
Tasks	Keypoint Detection
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03458v2
PDF	https://arxiv.org/pdf/1912.03458v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-convolution-attention-over
Repo
Framework

Block based Singular Value Decomposition approach to matrix factorization for recommender systems


Title	Block based Singular Value Decomposition approach to matrix factorization for recommender systems
Authors	Prasad Bhavana, Vikas Kumar, Vineet Padmanabhan
Abstract	With the abundance of data in recent years, interesting challenges are posed in the area of recommender systems. Producing high quality recommendations with scalability and performance is the need of the hour. Singular Value Decomposition(SVD) based recommendation algorithms have been leveraged to produce better results. In this paper, we extend the SVD technique further for scalability and performance in the context of 1) multi-threading 2) multiple computational units (with the use of Graphical Processing Units) and 3) distributed computation. We propose block based matrix factorization (BMF) paired with SVD. This enabled us to take advantage of SVD over basic matrix factorization(MF) while taking advantage of parallelism and scalability through BMF. We used Compute Unified Device Architecture (CUDA) platform and related hardware for leveraging Graphical Processing Unit (GPU) along with block based SVD to demonstrate the advantages in terms of performance and memory.
Tasks	Recommendation Systems
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07410v1
PDF	https://arxiv.org/pdf/1907.07410v1.pdf
PWC	https://paperswithcode.com/paper/block-based-singular-value-decomposition
Repo
Framework

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS


Title	High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS
Authors	Shihui Yin, Xiaoyu Sun, Shimeng Yu, Jae-sun Seo
Abstract	Deep learning hardware designs have been bottlenecked by conventional memories such as SRAM due to density, leakage and parallel computing challenges. Resistive devices can address the density and volatility issues, but have been limited by peripheral circuit integration. In this work, we demonstrate a scalable RRAM based in-memory computing design, termed XNOR-RRAM, which is fabricated in a 90nm CMOS technology with monolithic integration of RRAM devices between metal 1 and 2. We integrated a 128x64 RRAM array with CMOS peripheral circuits including row/column decoders and flash analog-to-digital converters (ADCs), which collectively become a core component for scalable RRAM-based in-memory computing towards large deep neural networks (DNNs). To maximize the parallelism of in-memory computing, we assert all 128 wordlines of the RRAM array simultaneously, perform analog computing along the bitlines, and digitize the bitline voltages using ADCs. The resistance distribution of low resistance states is tightened by write-verify scheme, and the ADC offset is calibrated. Prototype chip measurements show that the proposed design achieves high binary DNN accuracy of 98.5% for MNIST and 83.5% for CIFAR-10 datasets, respectively, with energy efficiency of 24 TOPS/W and 158 GOPS throughput. This represents 5.6X, 3.2X, 14.1X improvements in throughput, energy-delay product (EDP), and energy-delay-squared product (ED2P), respectively, compared to the state-of-the-art literature. The proposed XNOR-RRAM can enable intelligent functionalities for area-/energy-constrained edge computing devices.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07514v1
PDF	https://arxiv.org/pdf/1909.07514v1.pdf
PWC	https://paperswithcode.com/paper/high-throughput-in-memory-computing-for
Repo
Framework

SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking


Title	SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking
Authors	Qing Guo, Xiaofei Xie, Felix Juefei-Xu, Lei Ma, Zhongguo Li, Wanli Xue, Wei Feng, Yang Liu
Abstract	Adversarial attacks of deep neural networks have been intensively studied on image, audio, natural language, patch, and pixel classification tasks. Nevertheless, as a typical, while important real-world application, the adversarial attacks of online video object tracking that traces an object’s moving trajectory instead of its category are rarely explored. In this paper, we identify a new task for the adversarial attack to visual tracking: online generating imperceptible perturbations that mislead trackers along an incorrect (Untargeted Attack, UA) or specified trajectory (Targeted Attack, TA). To this end, we first propose a \textit{spatial-aware} basic attack by adapting existing attack methods, i.e., FGSM, BIM, and C&W, and comprehensively analyze the attacking performance. We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency. To address these challenges, we further propose the spatial-aware online incremental attack (a.k.a. SPARK) that performs spatial-temporal sparse incremental perturbations online and makes the adversarial attack less perceptible. In addition, as an optimization-based method, SPARK quickly converges to very small losses within several iterations by considering historical incremental perturbations, making it much more efficient than basic attacks. The in-depth evaluation on state-of-the-art trackers (i.e., SiamRPN++ with AlexNet, MobileNetv2, and ResNet-50, and SiamDW) on OTB100, VOT2018, UAV123, and LaSOT demonstrates the effectiveness and transferability of SPARK in misleading the trackers under both UA and TA with minor perturbations.
Tasks	Adversarial Attack, Object Tracking, Video Object Tracking, Visual Object Tracking, Visual Tracking
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08681v3
PDF	https://arxiv.org/pdf/1910.08681v3.pdf
PWC	https://paperswithcode.com/paper/spatial-aware-online-adversarial
Repo
Framework

Establishing an Evaluation Metric to Quantify Climate Change Image Realism


Title	Establishing an Evaluation Metric to Quantify Climate Change Image Realism
Authors	Sharon Zhou, Alexandra Luccioni, Gautier Cosne, Michael S. Bernstein, Yoshua Bengio
Abstract	With success on controlled tasks, generative models are being increasingly applied to humanitarian applications [1,2]. In this paper, we focus on the evaluation of a conditional generative model that illustrates the consequences of climate change-induced flooding to encourage public interest and awareness on the issue. Because metrics for comparing the realism of different modes in a conditional generative model do not exist, we propose several automated and human-based methods for evaluation. To do this, we adapt several existing metrics, and assess the automated metrics against gold standard human evaluation. We find that using Fr'echet Inception Distance (FID) with embeddings from an intermediary Inception-V3 layer that precedes the auxiliary classifier produces results most correlated with human realism. While insufficient alone to establish a human-correlated automatic evaluation metric, we believe this work begins to bridge the gap between human and automated generative evaluation procedures.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10143v1
PDF	https://arxiv.org/pdf/1910.10143v1.pdf
PWC	https://paperswithcode.com/paper/establishing-an-evaluation-metric-to-quantify
Repo
Framework

LanCe: A Comprehensive and Lightweight CNN Defense Methodology against Physical Adversarial Attacks on Embedded Multimedia Applications


Title	LanCe: A Comprehensive and Lightweight CNN Defense Methodology against Physical Adversarial Attacks on Embedded Multimedia Applications
Authors	Zirui Xu, Fuxun Yu, Xiang Chen
Abstract	Recently, adversarial attacks can be applied to the physical world, causing practical issues to various Convolutional Neural Networks (CNNs) powered applications. Most existing physical adversarial attack defense works only focus on eliminating explicit perturbation patterns from inputs, ignoring interpretation to CNN’s intrinsic vulnerability. Therefore, they lack the expected versatility to different attacks and thereby depend on considerable data processing costs. In this paper, we propose LanCe – a comprehensive and lightweight CNN defense methodology against different physical adversarial attacks. By interpreting CNN’s vulnerability, we find that non-semantic adversarial perturbations can activate CNN with significantly abnormal activations and even overwhelm other semantic input patterns’ activations. We improve the CNN recognition process by adding a self-verification stage to detect the potential adversarial input with only one CNN inference cost. Based on the detection result, we further propose a data recovery methodology to defend the physical adversarial attacks. We apply such defense methodology into both image and audio CNN recognition scenarios and analyze the computational complexity for each scenario, respectively. Experiments show that our methodology can achieve an average 91% successful rate for attack detection and 89% accuracy recovery. Moreover, it is at most 3x faster compared with the state-of-the-art defense methods, making it feasible to resource-constrained embedded systems, such as mobile devices.
Tasks	Adversarial Attack
Published	2019-10-17
URL	https://arxiv.org/abs/1910.08536v1
PDF	https://arxiv.org/pdf/1910.08536v1.pdf
PWC	https://paperswithcode.com/paper/lance-a-comprehensive-and-lightweight-cnn
Repo
Framework

Deep learning approach to control of prosthetic hands with electromyography signals


Title	Deep learning approach to control of prosthetic hands with electromyography signals
Authors	Mohsen Jafarzadeh, Daniel Curtiss Hussey, Yonas Tadesse
Abstract	Natural muscles provide mobility in response to nerve impulses. Electromyography (EMG) measures the electrical activity of muscles in response to a nerve’s stimulation. In the past few decades, EMG signals have been used extensively in the identification of user intention to potentially control assistive devices such as smart wheelchairs, exoskeletons, and prosthetic devices. In the design of conventional assistive devices, developers optimize multiple subsystems independently. Feature extraction and feature description are essential subsystems of this approach. Therefore, researchers proposed various hand-crafted features to interpret EMG signals. However, the performance of conventional assistive devices is still unsatisfactory. In this paper, we propose a deep learning approach to control prosthetic hands with raw EMG signals. We use a novel deep convolutional neural network to eschew the feature-engineering step. Removing the feature extraction and feature description is an important step toward the paradigm of end-to-end optimization. Fine-tuning and personalization are additional advantages of our approach. The proposed approach is implemented in Python with TensorFlow deep learning library, and it runs in real-time in general-purpose graphics processing units of NVIDIA Jetson TX2 developer kit. Our results demonstrate the ability of our system to predict fingers position from raw EMG signals. We anticipate our EMG-based control system to be a starting point to design more sophisticated prosthetic hands. For example, a pressure measurement unit can be added to transfer the perception of the environment to the user. Furthermore, our system can be modified for other prosthetic devices.
Tasks	Electromyography (EMG), Feature Engineering
Published	2019-09-21
URL	https://arxiv.org/abs/1909.09910v2
PDF	https://arxiv.org/pdf/1909.09910v2.pdf
PWC	https://paperswithcode.com/paper/190909910
Repo
Framework


Title	MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing
Authors	Chao-I Tuan, Yuan-Kuei Wu, Hung-yi Lee, Yu Tsao
Abstract	Deep learning methods have brought substantial advancements in speech separation (SS). Nevertheless, it remains challenging to deploy deep-learning-based models on edge devices. Thus, identifying an effective way to compress these large models without hurting SS performance has become an important research topic. Recently, TasNet and Conv-TasNet have been proposed. They achieved state-of-the-art results on several standardized SS tasks. Moreover, their low latency natures make them definitely suitable for real-time on-device applications. In this study, we propose two parameter-sharing schemes to lower the memory consumption on TasNet and Conv-TasNet. Accordingly, we derive a novel so-called MiTAS (Mini TasNet). Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio. We also designed a series of ablation experiments to analyze the relation between SS performance and the amount of parameters in the model. The results show that MiTAS is able to reduce the model size by a factor of four while maintaining comparable SS performance with improved stability as compared to TasNet and Conv-TasNet. This suggests that MiTAS is more suitable for real-time low latency applications.
Tasks	Speech Separation
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03884v1
PDF	https://arxiv.org/pdf/1912.03884v1.pdf
PWC	https://paperswithcode.com/paper/mitas-a-compressed-time-domain-audio
Repo
Framework

On Robustness of Neural Ordinary Differential Equations


Title	On Robustness of Neural Ordinary Differential Equations
Authors	Hanshu Yan, Jiawei Du, Vincent Y. F. Tan, Jiashi Feng
Abstract	Neural ordinary differential equations (ODEs) have been attracting increasing attention in various research domains recently. There have been some works studying optimization issues and approximation capabilities of neural ODEs, but their robustness is still yet unclear. In this work, we fill this important gap by exploring robustness properties of neural ODEs both empirically and theoretically. We first present an empirical study on the robustness of the neural ODE-based networks (ODENets) by exposing them to inputs with various types of perturbations and subsequently investigating the changes of the corresponding outputs. In contrast to conventional convolutional neural networks (CNNs), we find that the ODENets are more robust against both random Gaussian perturbations and adversarial attack examples. We then provide an insightful understanding of this phenomenon by exploiting a certain desirable property of the flow of a continuous-time ODE, namely that integral curves are non-intersecting. Our work suggests that, due to their intrinsic robustness, it is promising to use neural ODEs as a basic block for building robust deep network models. To further enhance the robustness of vanilla neural ODEs, we propose the time-invariant steady neural ODE (TisODE), which regularizes the flow on perturbed data via the time-invariant property and the imposition of a steady-state constraint. We show that the TisODE method outperforms vanilla neural ODEs and also can work in conjunction with other state-of-the-art architectural methods to build more robust deep networks.
Tasks	Adversarial Attack
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05513v2
PDF	https://arxiv.org/pdf/1910.05513v2.pdf
PWC	https://paperswithcode.com/paper/on-robustness-of-neural-ordinary-differential
Repo
Framework

Demystifying TasNet: A Dissecting Approach


Title	Demystifying TasNet: A Dissecting Approach
Authors	Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas Drude, Reinhold Haeb-Umbach
Abstract	In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments. In this paper we dissect the gains of the time-domain audio separation network (TasNet) approach by gradually replacing components of an utterance-level permutation invariant training (u-PIT) based separation system in the frequency domain until the TasNet system is reached, thus blending components of frequency domain approaches with those of time domain approaches. Some of the intermediate variants achieve comparable signal-to-distortion ratio (SDR) gains to TasNet, but retain the advantage of frequency domain processing: compatibility with classic signal processing tools such as frequency-domain beamforming and the human interpretability of the masks. Furthermore, we show that the scale invariant signal-to-distortion ratio (si-SDR) criterion used as loss function in TasNet is related to a logarithmic mean square error criterion and that it is this criterion which contributes most reliable to the performance advantage of TasNet. Finally, we critically assess which gains in a noise-free single channel environment generalize to more realistic reverberant conditions.
Tasks	Speech Separation
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08895v2
PDF	https://arxiv.org/pdf/1911.08895v2.pdf
PWC	https://paperswithcode.com/paper/demystifying-tasnet-a-dissecting-approach
Repo
Framework

Interpretability with Accurate Small Models


Title	Interpretability with Accurate Small Models
Authors	Abhishek Ghose, Balaraman Ravindran
Abstract	Models often need to be constrained to a certain size for them to be considered interpretable. For example, a decision tree of depth 5 is much easier to understand than one of depth 50. Limiting model size, however, often reduces accuracy. We suggest a practical technique that minimizes this trade-off between interpretability and classification accuracy. This enables an arbitrary learning algorithm to produce highly accurate small-sized models. Our technique identifies the training data distribution to learn from that leads to the highest accuracy for a model of a given size. We represent the training distribution as a combination of sampling schemes. Each scheme is defined by a parameterized probability mass function applied to the segmentation produced by a decision tree. An Infinite Mixture Model with Beta components is used to represent a combination of such schemes. The mixture model parameters are learned using Bayesian Optimization. Under simplistic assumptions, we would need to optimize for $O(d)$ variables for a distribution over a $d$-dimensional input space, which is cumbersome for most real-world data. However, we show that our technique significantly reduces this number to a \emph{fixed set of eight variables} at the cost of relatively cheap preprocessing. The proposed technique is flexible: it is \emph{model-agnostic}, i.e., it may be applied to the learning algorithm for any model family, and it admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models with different sizes. We observe significant improvements in the F1-score in most instances, exceeding an improvement of $100%$ in some cases.
Tasks
Published	2019-05-04
URL	https://arxiv.org/abs/1905.01520v2
PDF	https://arxiv.org/pdf/1905.01520v2.pdf
PWC	https://paperswithcode.com/paper/optimal-resampling-for-learning-small-models
Repo
Framework