January 31, 2020

3300 words 16 mins read

Paper Group AWR 447

ALFA: Agglomerative Late Fusion Algorithm for Object Detection. Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset. Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications. …

ALFA: Agglomerative Late Fusion Algorithm for Object Detection


Title	ALFA: Agglomerative Late Fusion Algorithm for Object Detection
Authors	Evgenii Razinkov, Iuliia Saveleva, Jiři Matas
Abstract	We propose ALFA - a novel late fusion algorithm for object detection. ALFA is based on agglomerative clustering of object detector predictions taking into consideration both the bounding box locations and the class scores. Each cluster represents a single object hypothesis whose location is a weighted combination of the clustered bounding boxes. ALFA was evaluated using combinations of a pair (SSD and DeNet) and a triplet (SSD, DeNet and Faster R-CNN) of recent object detectors that are close to the state-of-the-art. ALFA achieves state of the art results on PASCAL VOC 2007 and PASCAL VOC 2012, outperforming the individual detectors as well as baseline combination strategies, achieving up to 32% lower error than the best individual detectors and up to 6% lower error than the reference fusion algorithm DBF - Dynamic Belief Fusion.
Tasks	Object Detection
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06067v1
PDF	https://arxiv.org/pdf/1907.06067v1.pdf
PWC	https://paperswithcode.com/paper/alfa-agglomerative-late-fusion-algorithm-for
Repo	https://github.com/IuliiaSaveleva/ALFA
Framework	none

Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention


Title	Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention
Authors	Qingyi Tao, Zongyuan Ge, Jianfei Cai, Jianxiong Yin, Simon See
Abstract	Lesion detection from computed tomography (CT) scans is challenging compared to natural object detection because of two major reasons: small lesion size and small inter-class variation. Firstly, the lesions usually only occupy a small region in the CT image. The feature of such small region may not be able to provide sufficient information due to its limited spatial feature resolution. Secondly, in CT scans, the lesions are often indistinguishable from the background since the lesion and non-lesion areas may have very similar appearances. To tackle both problems, we need to enrich the feature representation and improve the feature discriminativeness. Therefore, we introduce a dual-attention mechanism to the 3D contextual lesion detection framework, including the cross-slice contextual attention to selectively aggregate the information from different slices through a soft re-sampling process. Moreover, we propose intra-slice spatial attention to focus the feature learning in the most prominent regions. Our method can be easily trained end-to-end without adding heavy overhead on the base detection network. We use DeepLesion dataset and train a universal lesion detector to detect all kinds of lesions such as liver tumors, lung nodules, and so on. The results show that our model can significantly boost the results of the baseline lesion detector (with 3D contextual information) but using much fewer slices.
Tasks	Computed Tomography (CT), Object Detection
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04052v1
PDF	https://arxiv.org/pdf/1907.04052v1.pdf
PWC	https://paperswithcode.com/paper/improving-deep-lesion-detection-using-3d
Repo	https://github.com/truetqy/lesion_det_dual_att
Framework	mxnet

Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset


Title	Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
Authors	Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan
Abstract	Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.
Tasks	Dialogue State Tracking, Slot Filling
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05855v2
PDF	https://arxiv.org/pdf/1909.05855v2.pdf
PWC	https://paperswithcode.com/paper/towards-scalable-multi-domain-conversational
Repo	https://github.com/google-research-datasets/dstc8-schema-guided-dialogue
Framework	none

Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications


Title	Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications
Authors	Mark-Christoph Müller
Abstract	We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.
Tasks
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12550v1
PDF	http://arxiv.org/pdf/1904.12550v1.pdf
PWC	https://paperswithcode.com/paper/semantic-matching-of-documents-from
Repo	https://github.com/nlpAThits/TopNCosSimAvg
Framework	tf

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion


Title	StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Authors	Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Abstract	Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings among multiple domains without relying on parallel data. This is important but challenging owing to the requirement of learning multiple mappings and the non-availability of explicit supervision. Recently, StarGAN-VC has garnered attention owing to its ability to solve this problem only using a single generator. However, there is still a gap between real and converted speech. To bridge this gap, we rethink conditional methods of StarGAN-VC, which are key components for achieving non-parallel multi-domain VC in a single model, and propose an improved variant called StarGAN-VC2. Particularly, we rethink conditional methods in two aspects: training objectives and network architectures. For the former, we propose a source-and-target conditional adversarial loss that allows all source domain data to be convertible to the target domain data. For the latter, we introduce a modulation-based conditional method that can transform the modulation of the acoustic feature in a domain-specific manner. We evaluated our methods on non-parallel multi-speaker VC. An objective evaluation demonstrates that our proposed methods improve speech quality in terms of both global and local structure measures. Furthermore, a subjective evaluation shows that StarGAN-VC2 outperforms StarGAN-VC in terms of naturalness and speaker similarity. The converted speech samples are provided at http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/stargan-vc2/index.html.
Tasks	Voice Conversion
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12279v2
PDF	https://arxiv.org/pdf/1907.12279v2.pdf
PWC	https://paperswithcode.com/paper/stargan-vc2-rethinking-conditional-methods
Repo	https://github.com/SamuelBroughton/StarGAN-Voice-Conversion-2
Framework	pytorch

GMNN: Graph Markov Neural Networks


Title	GMNN: Graph Markov Neural Networks
Authors	Meng Qu, Yoshua Bengio, Jian Tang
Abstract	This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.
Tasks	Object Classification, Relational Reasoning, Representation Learning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06214v2
PDF	https://arxiv.org/pdf/1905.06214v2.pdf
PWC	https://paperswithcode.com/paper/gmnn-graph-markov-neural-networks
Repo	https://github.com/DeepGraphLearning/GMNN
Framework	pytorch

Robust copy-move forgery detection by false alarms control


Title	Robust copy-move forgery detection by false alarms control
Authors	Thibaud Ehret
Abstract	Detecting reliably copy-move forgeries is difficult because images do contain similar objects. The question is: how to discard natural image self-similarities while still detecting copy-moved parts as being “unnaturally similar”? Copy-move may have been performed after a rotation, a change of scale and followed by JPEG compression or the addition of noise. For this reason, we base our method on SIFT, which provides sparse keypoints with scale, rotation and illumination invariant descriptors. To discriminate natural descriptor matches from artificial ones, we introduce an a contrario method which gives theoretical guarantees on the number of false alarms. We validate our method on several databases. Being fully unsupervised it can be integrated into any generic automated image tampering detection pipeline.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00649v1
PDF	https://arxiv.org/pdf/1906.00649v1.pdf
PWC	https://paperswithcode.com/paper/190600649
Repo	https://github.com/tehret/rcmfd
Framework	none

Contrastive Representation Distillation


Title	Contrastive Representation Distillation
Authors	Yonglong Tian, Dilip Krishnan, Phillip Isola
Abstract	Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a collection of models into a single estimator. Knowledge distillation, the standard approach to these problems, minimizes the KL divergence between the probabilistic outputs of a teacher and student network. We demonstrate that this objective ignores important structural knowledge of the teacher network. This motivates an alternative objective by which we train a student to capture significantly more information in the teacher’s representation of the data. We formulate this objective as contrastive learning. Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation. Code: http://github.com/HobbitLong/RepDistiller.
Tasks	Model Compression, Transfer Learning
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10699v2
PDF	https://arxiv.org/pdf/1910.10699v2.pdf
PWC	https://paperswithcode.com/paper/contrastive-representation-distillation-1
Repo	https://github.com/HobbitLong/RepDistiller
Framework	pytorch

Adversarial Policies: Attacking Deep Reinforcement Learning


Title	Adversarial Policies: Attacking Deep Reinforcement Learning
Authors	Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Abstract	Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent’s observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at https://adversarialpolicies.github.io/.
Tasks
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10615v2
PDF	https://arxiv.org/pdf/1905.10615v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-policies-attacking-deep
Repo	https://github.com/HumanCompatibleAI/adversarial-policies
Framework	none

Disentangling feature and lazy training in deep neural networks


Title	Disentangling feature and lazy training in deep neural networks
Authors	Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart
Abstract	Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $\Theta$ (the NTK). By contrast, in the Mean Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated to a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as $\alpha h^{-1/2}$ at initialization. By varying $\alpha$ and $h$, we probe the crossover between the two limits. We observe two regimes that we call “lazy training” and “feature training”. In the lazy-training regime, the dynamics is almost linear and the NTK does barely change after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and thus learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that: (i) The two regimes are separated by an $\alpha^*$ that scales as $\frac{1}{\sqrt{h}}$. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks generally perform better in the lazy-training regime (except when we reduce the dataset via PCA), and we provide an example of a convolutional network that achieves a lower error in the feature-training regime. (iii) In both regimes, the fluctuations $\delta F$ induced by initial conditions on the learned function decay as $\delta F \sim 1/\sqrt{h}$, leading to a performance that increases with $h$. (iv) In the feature-training regime we identify a time scale $t_1 \sim \sqrt{h}\alpha$
Tasks
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08034v2
PDF	https://arxiv.org/pdf/1906.08034v2.pdf
PWC	https://paperswithcode.com/paper/disentangling-feature-and-lazy-learning-in
Repo	https://github.com/mariogeiger/feature_lazy
Framework	pytorch

Auto-Retoucher(ART) - A framework for Background Replacement and Image Editing


Title	Auto-Retoucher(ART) - A framework for Background Replacement and Image Editing
Authors	Yunxuan Xiao, Yikai Li, Yuwei Wu, Lizhen Zhu
Abstract	Replacing the background and simultaneously adjusting foreground objects is a challenging task in image editing. Current techniques for generating such images relies heavily on user interactions with image editing softwares, which is a tedious job for professional retouchers. To reduce their workload, some exciting progress has been made on generating images with a given background. However, these models can neither adjust the position and scale of the foreground objects, nor guarantee the semantic consistency between foreground and background. To overcome these limitations, we propose a framework – ART(Auto-Retoucher), to generate images with sufficient semantic and spatial consistency. Images are first processed by semantic matting and scene parsing modules, then a multi-task verifier model will give two confidence scores for the current background and position setting. We demonstrate that our jointly optimized verifier model successfully improves the visual consistency, and our ART framework performs well on images with the human body as foregrounds.
Tasks	Scene Parsing
Published	2019-01-13
URL	http://arxiv.org/abs/1901.03954v1
PDF	http://arxiv.org/pdf/1901.03954v1.pdf
PWC	https://paperswithcode.com/paper/auto-retoucherart-a-framework-for-background
Repo	https://github.com/woshiyyya/Auto-Retoucher-pytorch
Framework	pytorch

Adversarial Embedding: A robust and elusive Steganography and Watermarking technique


Title	Adversarial Embedding: A robust and elusive Steganography and Watermarking technique
Authors	Salah Ghamizi, Maxime Cordy, Mike Papadakis, Yves Le Traon
Abstract	We propose adversarial embedding, a new steganography and watermarking technique that embeds secret information within images. The key idea of our method is to use deep neural networks for image classification and adversarial attacks to embed secret information within images. Thus, we use the attacks to embed an encoding of the message within images and the related deep neural network outputs to extract it. The key properties of adversarial attacks (invisible perturbations, nontransferability, resilience to tampering) offer guarantees regarding the confidentiality and the integrity of the hidden messages. We empirically evaluate adversarial embedding using more than 100 models and 1,000 messages. Our results confirm that our embedding passes unnoticed by both humans and steganalysis methods, while at the same time impedes illicit retrieval of the message (less than 13% recovery rate when the interceptor has some knowledge about our model), and is resilient to soft and (to some extent) aggressive image tampering (up to 100% recovery rate under jpeg compression). We further develop our method by proposing a new type of adversarial attack which improves the embedding density (amount of hidden information) of our method to up to 10 bits per pixel.
Tasks	Adversarial Attack, Image Classification
Published	2019-11-14
URL	https://arxiv.org/abs/1912.01487v1
PDF	https://arxiv.org/pdf/1912.01487v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-embedding-a-robust-and-elusive
Repo	https://github.com/yamizi/Adversarial-Embedding
Framework	tf

Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking


Title	Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking
Authors	Masaru Isonuma, Junichiro Mori, Ichiro Sakata
Abstract	This paper focuses on the end-to-end abstractive summarization of a single product review without supervision. We assume that a review can be described as a discourse tree, in which the summary is the root, and the child sentences explain their parent in detail. By recursively estimating a parent from its children, our model learns the latent discourse tree without an external parser and generates a concise summary. We also introduce an architecture that ranks the importance of each sentence on the tree to support summary generation focusing on the main review point. The experimental results demonstrate that our model is competitive with or outperforms other unsupervised approaches. In particular, for relatively long reviews, it achieves a competitive or better performance than supervised models. The induced tree shows that the child sentences provide additional information about their parent, and the generated summary abstracts the entire review.
Tasks	Abstractive Text Summarization, Document Summarization
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05691v1
PDF	https://arxiv.org/pdf/1906.05691v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-neural-single-document
Repo	https://github.com/misonuma/strsum
Framework	tf

Music Artist Classification with Convolutional Recurrent Neural Networks


Title	Music Artist Classification with Convolutional Recurrent Neural Networks
Authors	Zain Nasrullah, Yue Zhao
Abstract	Previous attempts at music artist classification use frame level audio features which summarize frequency content within short intervals of time. Comparatively, more recent music information retrieval tasks take advantage of temporal structure in audio spectrograms using deep convolutional and recurrent models. This paper revisits artist classification with this new framework and empirically explores the impacts of incorporating temporal structure in the feature representation. To this end, an established classification architecture, a Convolutional Recurrent Neural Network (CRNN), is applied to the artist20 music artist identification dataset under a comprehensive set of conditions. These include audio clip length, which is a novel contribution in this work, and previously identified considerations such as dataset split and feature level. Our results improve upon baseline works, verify the influence of the producer effect on classification performance and demonstrate the trade-offs between audio length and training set size. The best performing model achieves an average F1 score of 0.937 across three independent trials which is a substantial improvement over the corresponding baseline under similar conditions. Additionally, to showcase the effectiveness of the CRNN’s feature extraction capabilities, we visualize audio samples at the model’s bottleneck layer demonstrating that learned representations segment into clusters belonging to their respective artists.
Tasks	Information Retrieval, Music Information Retrieval
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04555v2
PDF	http://arxiv.org/pdf/1901.04555v2.pdf
PWC	https://paperswithcode.com/paper/music-artist-classification-with
Repo	https://github.com/winstonll/SynC
Framework	none

SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks with at most one Spike per Neuron


Title	SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks with at most one Spike per Neuron
Authors	Milad Mozafari, Mohammad Ganjtabesh, Abbas Nowzari-Dalini, Timothée Masquelier
Abstract	Application of deep convolutional spiking neural networks (SNNs) to artificial intelligence (AI) tasks has recently gained a lot of interest since SNNs are hardware-friendly and energy-efficient. Unlike the non-spiking counterparts, most of the existing SNN simulation frameworks are not practically efficient enough for large-scale AI tasks. In this paper, we introduce SpykeTorch, an open-source high-speed simulation framework based on PyTorch. This framework simulates convolutional SNNs with at most one spike per neuron and the rank-order encoding scheme. In terms of learning rules, both spike-timing-dependent plasticity (STDP) and reward-modulated STDP (R-STDP) are implemented, but other rules could be implemented easily. Apart from the aforementioned properties, SpykeTorch is highly generic and capable of reproducing the results of various studies. Computations in the proposed framework are tensor-based and totally done by PyTorch functions, which in turn brings the ability of just-in-time optimization for running on CPUs, GPUs, or Multi-GPU platforms.
Tasks
Published	2019-03-06
URL	https://arxiv.org/abs/1903.02440v2
PDF	https://arxiv.org/pdf/1903.02440v2.pdf
PWC	https://paperswithcode.com/paper/spyketorch-efficient-simulation-of
Repo	https://github.com/miladmozafari/SpykeTorch
Framework	pytorch