January 31, 2020

3382 words 16 mins read

Paper Group ANR 67

Paper Group ANR 67

Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning. Incorporating Context and External Knowledge for Pronoun Coreference Resolution. Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification. An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training …

Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

Title Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Authors Ákos Kádár, Grzegorz Chrupała, Afra Alishahi, Desmond Elliott
Abstract Recent work has highlighted the advantage of jointly learning grounded sentence representations from multiple languages. However, the data used in these studies has been limited to an aligned scenario: the same images annotated with sentences in multiple languages. We focus on the more realistic disjoint scenario in which there is no overlap between the images in multilingual image–caption datasets. We confirm that training with aligned data results in better grounded sentence representations than training with disjoint data, as measured by image–sentence retrieval performance. In order to close this gap in performance, we propose a pseudopairing method to generate synthetically aligned English–German–image triplets from the disjoint sets. The method works by first training a model on the disjoint data, and then creating new triples across datasets using sentence similarity under the learned model. Experiments show that pseudopairs improve image–sentence retrieval performance compared to disjoint training, despite requiring no external data or models. However, we do find that using an external machine translation model to generate the synthetic data sets results in better performance.
Tasks Machine Translation, Representation Learning
Published 2019-11-09
URL https://arxiv.org/abs/1911.03678v1
PDF https://arxiv.org/pdf/1911.03678v1.pdf
PWC https://paperswithcode.com/paper/bootstrapping-disjoint-datasets-for
Repo
Framework

Incorporating Context and External Knowledge for Pronoun Coreference Resolution

Title Incorporating Context and External Knowledge for Pronoun Coreference Resolution
Authors Hongming Zhang, Yan Song, Yangqiu Song
Abstract Linking pronominal expressions to the correct references requires, in many cases, better analysis of the contextual information and external knowledge. In this paper, we propose a two-layer model for pronoun coreference resolution that leverages both context and external knowledge, where a knowledge attention mechanism is designed to ensure the model leveraging the appropriate source of external knowledge based on different context. Experimental results demonstrate the validity and effectiveness of our model, where it outperforms state-of-the-art models by a large margin.
Tasks Coreference Resolution
Published 2019-05-24
URL https://arxiv.org/abs/1905.10238v1
PDF https://arxiv.org/pdf/1905.10238v1.pdf
PWC https://paperswithcode.com/paper/incorporating-context-and-external-knowledge
Repo
Framework

Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification

Title Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Authors Alejandro Linares-Barranco, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Tobi Delbruck
Abstract Deep-learning is a cutting edge theory that is being applied to many fields. For vision applications the Convolutional Neural Networks (CNN) are demanding significant accuracy for classification tasks. Numerous hardware accelerators have populated during the last years to improve CPU or GPU based solutions. This technology is commonly prototyped and tested over FPGAs before being considered for ASIC fabrication for mass production. The use of commercial typical cameras (30fps) limits the capabilities of these systems for high speed applications. The use of dynamic vision sensors (DVS) that emulate the behavior of a biological retina is taking an incremental importance to improve this applications due to its nature, where the information is represented by a continuous stream of spikes and the frames to be processed by the CNN are constructed collecting a fixed number of these spikes (called events). The faster an object is, the more events are produced by DVS, so the higher is the equivalent frame rate. Therefore, these DVS utilization allows to compute a frame at the maximum speed a CNN accelerator can offer. In this paper we present a VHDL/HLS description of a pipelined design for FPGA able to collect events from an Address-Event-Representation (AER) DVS retina to obtain a normalized histogram to be used by a particular CNN accelerator, called NullHop. VHDL is used to describe the circuit, and HLS for computation blocks, which are used to perform the normalization of a frame needed for the CNN. Results outperform previous implementations of frames collection and normalization using ARM processors running at 800MHz on a Zynq7100 in both latency and power consumption. A measured 67% speedup factor is presented for a Roshambo CNN real-time experiment running at 160fps peak rate.
Tasks
Published 2019-05-17
URL https://arxiv.org/abs/1905.07419v1
PDF https://arxiv.org/pdf/1905.07419v1.pdf
PWC https://paperswithcode.com/paper/dynamic-vision-sensor-integration-on-fpga
Repo
Framework

An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy

Title An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy
Authors Jiaxu Chen, Jing Hao, Kai Chen, Di Xie, Shicai Yang, Shiliang Pu
Abstract Audio classification can distinguish different kinds of sounds, which is helpful for intelligent applications in daily life. However, it remains a challenging task since the sound events in an audio clip is probably multiple, even overlapping. This paper introduces an end-to-end audio classification system based on raw waveforms and mix-training strategy. Compared to human-designed features which have been widely used in existing research, raw waveforms contain more complete information and are more appropriate for multi-label classification. Taking raw waveforms as input, our network consists of two variants of ResNet structure which can learn a discriminative representation. To explore the information in intermediate layers, a multi-level prediction with attention structure is applied in our model. Furthermore, we design a mix-training strategy to break the performance limitation caused by the amount of training data. Experiments show that the mean average precision of the proposed audio classification system on Audio Set dataset is 37.2%. Without using extra training data, our system exceeds the state-of-the-art multi-level attention model.
Tasks Audio Classification, Multi-Label Classification
Published 2019-11-21
URL https://arxiv.org/abs/1911.09349v1
PDF https://arxiv.org/pdf/1911.09349v1.pdf
PWC https://paperswithcode.com/paper/an-end-to-end-audio-classification-system
Repo
Framework

Introducing the Hearthstone-AI Competition

Title Introducing the Hearthstone-AI Competition
Authors Alexander Dockhorn, Sanaz Mostaghim
Abstract The Hearthstone AI framework and competition motivates the development of artificial intelligence agents that can play collectible card games. A special feature of those games is the high variety of cards, which can be chosen by the players to create their own decks. In contrast to simpler card games, the value of many cards is determined by their possible synergies. The vast amount of possible decks, the randomness of the game, as well as the restricted information during the player’s turn offer quite a hard challenge for the development of game-playing agents. This short paper introduces the competition framework and goes into more detail on the problems and challenges that need to be faced during the development process.
Tasks Card Games
Published 2019-05-06
URL https://arxiv.org/abs/1906.04238v1
PDF https://arxiv.org/pdf/1906.04238v1.pdf
PWC https://paperswithcode.com/paper/introducing-the-hearthstone-ai-competition
Repo
Framework

Unified Multifaceted Feature Learning for Person Re-Identification

Title Unified Multifaceted Feature Learning for Person Re-Identification
Authors Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen
Abstract Person re-identification (ReID) aims at re-identifying persons from different viewpoints across multiple cameras, of which it is of great importance to learn multifaceted features expressed in different parts of a person, e.g., clothes, bags, and other accessories in the main body, appearance in the head, and shoes in the foot. To learn such features, existing methods are focused on the striping-based approach that builds multi-branch neural networks to learn local features in each part of the identities, with one-branch network dedicated to one part. This results in complex models with a large number of parameters. To address this issue, this paper proposes to learn the multifaceted features in a simple unified single-branch neural network. The Unified Multifaceted Feature Learning (UMFL) framework is introduced to fulfill this goal, which consists of two key collaborative modules: compound batch image erasing (including batch constant erasing and random erasing) and hierarchical structured loss. The loss structures the augmented images resulted by the two types of image erasing in a two-level hierarchy and enforces multifaceted attention to different parts. As we show in the extensive experimental results on four benchmark person ReID datasets, despite the use of significantly simplified network structure, our method performs substantially better than state-of-the-art competing methods. Our method can also effectively generalize to vehicle ReID, achieving similar improvement on two vehicle ReID datasets.
Tasks Person Re-Identification
Published 2019-11-20
URL https://arxiv.org/abs/1911.08651v2
PDF https://arxiv.org/pdf/1911.08651v2.pdf
PWC https://paperswithcode.com/paper/unified-multifaceted-feature-learning-for
Repo
Framework

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

Title Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis
Authors Tyler L. Hayes, Christopher Kanan
Abstract When an agent acquires new information, ideally it would immediately be capable of using that information to understand its environment. This is not possible using conventional deep neural networks, which suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. A variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, where a model learns from a series of large collections of labeled samples. However, in this setting, inference is only possible after a batch has been accumulated, which prohibits many applications. An alternative paradigm is online learning in a single pass through the training dataset on a resource constrained budget, which is known as streaming learning. Streaming learning has been much less studied in the deep learning community. In streaming learning, an agent learns instances one-by-one and can be tested at any time, rather than only after learning a large batch. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet ILSVRC-2012 and CORe50, a dataset that involves learning to classify from temporally ordered samples.
Tasks
Published 2019-09-04
URL https://arxiv.org/abs/1909.01520v2
PDF https://arxiv.org/pdf/1909.01520v2.pdf
PWC https://paperswithcode.com/paper/lifelong-machine-learning-with-deep-streaming
Repo
Framework

Shapley Interpretation and Activation in Neural Networks

Title Shapley Interpretation and Activation in Neural Networks
Authors Yadong Li, Xin Cui
Abstract We propose a novel Shapley value approach to help address neural networks’ interpretability and “vanishing gradient” problems. Our method is based on an accurate analytical approximation to the Shapley value of a neuron with ReLU activation. This analytical approximation admits a linear propagation of relevance across neural network layers, resulting in a simple, fast and sensible interpretation of neural networks’ decision making process. We then derived a globally continuous and non-vanishing Shapley gradient, which can replace the conventional gradient in training neural network layers with ReLU activation, and leading to better training performance. We further derived a Shapley Activation (SA) function, which is a close approximation to ReLU but features the Shapley gradient. The SA is easy to implement in existing machine learning frameworks. Numerical tests show that SA consistently outperforms ReLU in training convergence, accuracy and stability.
Tasks Decision Making
Published 2019-09-13
URL https://arxiv.org/abs/1909.06143v2
PDF https://arxiv.org/pdf/1909.06143v2.pdf
PWC https://paperswithcode.com/paper/shapley-interpretation-and-activation-in
Repo
Framework

Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

Title Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks
Authors Jy-yong Sohn, Dong-Jun Han, Beongjun Choi, Jaekyun Moon
Abstract Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SIGNSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes ELECTION CODING, a coding-theoretic framework to guarantee Byzantine-robustness for SIGNSGD WITH MAJORITY VOTE, which uses minimum worker-master communication in both directions. The suggested framework explores new information-theoretic limits of finding the majority opinion when some workers could be malicious, and paves the road to implement robust and efficient distributed learning algorithms. Under this framework, we construct two types of explicit codes, random Bernoulli codes and deterministic algebraic codes, that can tolerate Byzantine attacks with a controlled amount of computational redundancy. For the Bernoulli codes, we provide upper bounds on the error probability in estimating the majority opinion, which give useful insights into code design for tolerating Byzantine attacks. As for deterministic codes, we construct an explicit code which perfectly tolerates Byzantines, and provide tight upper/lower bounds on the minimum required computational redundancy. Finally, the Byzantine-tolerance of the suggested coding schemes is confirmed by deep learning experiments on Amazon EC2 using Python with MPI4py package.
Tasks
Published 2019-10-14
URL https://arxiv.org/abs/1910.06093v1
PDF https://arxiv.org/pdf/1910.06093v1.pdf
PWC https://paperswithcode.com/paper/election-coding-for-distributed-learning
Repo
Framework

AdderNet: Do We Really Need Multiplications in Deep Learning?

Title AdderNet: Do We Really Need Multiplications in Deep Learning?
Authors Hanting Chen, Yunhe Wang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu
Abstract Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron’s gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.
Tasks
Published 2019-12-31
URL https://arxiv.org/abs/1912.13200v3
PDF https://arxiv.org/pdf/1912.13200v3.pdf
PWC https://paperswithcode.com/paper/addernet-do-we-really-need-multiplications-in
Repo
Framework

A Cyclically-Trained Adversarial Network for Invariant Representation Learning

Title A Cyclically-Trained Adversarial Network for Invariant Representation Learning
Authors Jiawei Chen, Janusz Konrad, Prakash Ishwar
Abstract We propose a cyclically-trained adversarial network to learn mappings from image space to a latent representation space and back such that the latent representation is invariant to a specified factor of variation (e.g., identity). The learned mappings also assure that the synthesized image is not only realistic, but has the same values for unspecified factors (e.g., pose and illumination) as the original image and a desired value of the specified factor. We encourage invariance to a specified factor, by applying adversarial training using a variational autoencoder in the image space as opposed to the latent space. We strengthen this invariance by introducing a cyclic training process (forward and backward pass). We also propose a new method to evaluate conditional generative networks. It compares how well different factors of variation can be predicted from the synthesized, as opposed to real, images. We demonstrate the effectiveness of our approach on factors such as identity, pose, illumination or style on three datasets and compare it with state-of-the-art methods. Our network produces good quality synthetic images and, interestingly, can be used to perform face morphing in latent space.
Tasks Representation Learning
Published 2019-06-21
URL https://arxiv.org/abs/1906.09313v1
PDF https://arxiv.org/pdf/1906.09313v1.pdf
PWC https://paperswithcode.com/paper/a-cyclically-trained-adversarial-network-for
Repo
Framework

Ethanos: Lightweight Bootstrapping for Ethereum

Title Ethanos: Lightweight Bootstrapping for Ethereum
Authors Jae-Yun Kim, Jun-Mo Lee, Yeon-Jae Koo, Sang-Hyeon Park, Soo-Mook Moon
Abstract As ethereum blockchain has become popular, the number of users and transactions has skyrocketed, causing an explosive increase of its data size. As a result, ordinary clients using PCs or smartphones cannot easily bootstrap as a full node, but rely on other full nodes such as the miners to run or verify transactions. This may affect the security of ethereum, so light bootstrapping techniques such as fast sync has been proposed to download only parts of full data, yet the space overhead is still too high. One of the biggest space overhead that cannot easily be reduced is caused by saving the state of all accounts in the block’s state trie. Fortunately, we found that more than 90% of accounts are inactive and old transactions are hard to be manipulated. Based on these observations, this paper propose a novel optimization technique called ethanos that can reduce bootstrapping cost by sweeping inactive accounts periodically and by not downloading old transactions. If an inactive account becomes active, ethanos restore its state by running a restoration transaction. Also, ethanos gives incentives for archive nodes to maintain the old transactions for possible re-verification. We implemented ethanos by instrumenting the go-ethereum (geth) client and evaluated with the real 113 million transactions from 14 million accounts between 7M-th and 8M-th blocks in ethereum. Our experimental result shows that ethanos can reduce the size of the account state by half, which, if combined with removing old transactions, may reduce the storage size for bootstrapping to around 1GB. This would be reasonable enough for ordinary clients to bootstrap on their personal devices.
Tasks
Published 2019-11-14
URL https://arxiv.org/abs/1911.05953v1
PDF https://arxiv.org/pdf/1911.05953v1.pdf
PWC https://paperswithcode.com/paper/ethanos-lightweight-bootstrapping-for
Repo
Framework

A Fourier Analytical Approach to Estimation of Smooth Functions in Gaussian Shift Model

Title A Fourier Analytical Approach to Estimation of Smooth Functions in Gaussian Shift Model
Authors Fan Zhou, Ping Li
Abstract We study the estimation of $f(\btheta)$ under Gaussian shift model $\bx = \btheta+\bxi$, where $\btheta \in \RR^d$ is an unknown parameter, $\bxi \sim \mathcal{N}(\mathbf{0},\bSigma)$ is the random noise with covariance matrix $\bSigma$, and $f$ is a given function which belongs to certain Besov space with smoothness index $s>1$. Let $\sigma^2 = \bSigma_{op}$ be the operator norm of $\bSigma$ and $\sigma^{-2\alpha} = \br(\bSigma)$ be its effective rank with some $0<\alpha<1$ and $\sigma>0$. We develop a new estimator $g(\bx)$ based on a Fourier analytical approach that achieves effective bias reduction. We show that when the intrinsic dimension of the problem is large enough such that nontrivial bias reduction is needed, the mean square error (MSE) rate of $g(\bx)$ is $O\big(\sigma^2 \vee \sigma^{2(1-\alpha)s}\big)$ as $\sigma\rightarrow 0$. By developing new methods to establish the minimax lower bounds under standard Gaussian shift model, we show that this rate is indeed minimax optimal and so is $g(\bx)$. The minimax rate implies a sharp threshold on the smoothness $s$ such that for only $f$ with smoothness above the threshold, $f(\btheta)$ can be estimated efficiently with an MSE rate of the order $O(\sigma^2)$. Normal approximation and asymptotic efficiency were proved for $g(\bx)$ under mild restrictions. Furthermore, we propose a data-driven procedure to develop an adaptive estimator when the covariance matrix $\bSigma$ is unknown. Numerical simulations are presented to validate our analysis. The simplicity of implementation and its superiority over the plug-in approach indicate the new estimator can be applied to a broad range of real world applications.
Tasks
Published 2019-11-05
URL https://arxiv.org/abs/1911.02010v1
PDF https://arxiv.org/pdf/1911.02010v1.pdf
PWC https://paperswithcode.com/paper/a-fourier-analytical-approach-to-estimation
Repo
Framework

Distribution Context Aware Loss for Person Re-identification

Title Distribution Context Aware Loss for Person Re-identification
Authors Zhigang Chang, Qin Zhou, Mingyang Yu, Shibao Zheng, Hua Yang, Tai-Pang Wu
Abstract To learn the optimal similarity function between probe and gallery images in Person re-identification, effective deep metric learning methods have been extensively explored to obtain discriminative feature embedding. However, existing metric loss like triplet loss and its variants always emphasize pair-wise relations but ignore the distribution context in feature space, leading to inconsistency and sub-optimal. In fact, the similarity of one pair not only decides the match of this pair, but also has potential impacts on other sample pairs. In this paper, we propose a novel Distribution Context Aware (DCA) loss based on triplet loss to combine both numerical similarity and relation similarity in feature space for better clustering. Extensive experiments on three benchmarks including Market-1501, DukeMTMC-reID and MSMT17, evidence the favorable performance of our method against the corresponding baseline and other state-of-the-art methods.
Tasks Metric Learning, Person Re-Identification
Published 2019-11-17
URL https://arxiv.org/abs/1911.07273v1
PDF https://arxiv.org/pdf/1911.07273v1.pdf
PWC https://paperswithcode.com/paper/distribution-context-aware-loss-for-person-re
Repo
Framework

Ranking-based Deep Cross-modal Hashing

Title Ranking-based Deep Cross-modal Hashing
Authors Xuanwu Liu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Yazhou Ren, Maozu Guo
Abstract Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing hashing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper, we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next, to expand the semantic representation power of hand-crafted features, RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications.
Tasks Cross-Modal Retrieval
Published 2019-05-11
URL https://arxiv.org/abs/1905.04450v1
PDF https://arxiv.org/pdf/1905.04450v1.pdf
PWC https://paperswithcode.com/paper/ranking-based-deep-cross-modal-hashing
Repo
Framework
comments powered by Disqus