Paper Group AWR 447
ALFA: Agglomerative Late Fusion Algorithm for Object Detection. Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset. Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications. …
ALFA: Agglomerative Late Fusion Algorithm for Object Detection
Title | ALFA: Agglomerative Late Fusion Algorithm for Object Detection |
Authors | Evgenii Razinkov, Iuliia Saveleva, Jiři Matas |
Abstract | We propose ALFA - a novel late fusion algorithm for object detection. ALFA is based on agglomerative clustering of object detector predictions taking into consideration both the bounding box locations and the class scores. Each cluster represents a single object hypothesis whose location is a weighted combination of the clustered bounding boxes. ALFA was evaluated using combinations of a pair (SSD and DeNet) and a triplet (SSD, DeNet and Faster R-CNN) of recent object detectors that are close to the state-of-the-art. ALFA achieves state of the art results on PASCAL VOC 2007 and PASCAL VOC 2012, outperforming the individual detectors as well as baseline combination strategies, achieving up to 32% lower error than the best individual detectors and up to 6% lower error than the reference fusion algorithm DBF - Dynamic Belief Fusion. |
Tasks | Object Detection |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06067v1 |
https://arxiv.org/pdf/1907.06067v1.pdf | |
PWC | https://paperswithcode.com/paper/alfa-agglomerative-late-fusion-algorithm-for |
Repo | https://github.com/IuliiaSaveleva/ALFA |
Framework | none |
Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention
Title | Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention |
Authors | Qingyi Tao, Zongyuan Ge, Jianfei Cai, Jianxiong Yin, Simon See |
Abstract | Lesion detection from computed tomography (CT) scans is challenging compared to natural object detection because of two major reasons: small lesion size and small inter-class variation. Firstly, the lesions usually only occupy a small region in the CT image. The feature of such small region may not be able to provide sufficient information due to its limited spatial feature resolution. Secondly, in CT scans, the lesions are often indistinguishable from the background since the lesion and non-lesion areas may have very similar appearances. To tackle both problems, we need to enrich the feature representation and improve the feature discriminativeness. Therefore, we introduce a dual-attention mechanism to the 3D contextual lesion detection framework, including the cross-slice contextual attention to selectively aggregate the information from different slices through a soft re-sampling process. Moreover, we propose intra-slice spatial attention to focus the feature learning in the most prominent regions. Our method can be easily trained end-to-end without adding heavy overhead on the base detection network. We use DeepLesion dataset and train a universal lesion detector to detect all kinds of lesions such as liver tumors, lung nodules, and so on. The results show that our model can significantly boost the results of the baseline lesion detector (with 3D contextual information) but using much fewer slices. |
Tasks | Computed Tomography (CT), Object Detection |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04052v1 |
https://arxiv.org/pdf/1907.04052v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-deep-lesion-detection-using-3d |
Repo | https://github.com/truetqy/lesion_det_dual_att |
Framework | mxnet |
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
Title | Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset |
Authors | Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan |
Abstract | Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting. |
Tasks | Dialogue State Tracking, Slot Filling |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05855v2 |
https://arxiv.org/pdf/1909.05855v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-scalable-multi-domain-conversational |
Repo | https://github.com/google-research-datasets/dstc8-schema-guided-dialogue |
Framework | none |
Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications
Title | Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications |
Authors | Mark-Christoph Müller |
Abstract | We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities. |
Tasks | |
Published | 2019-04-29 |
URL | http://arxiv.org/abs/1904.12550v1 |
http://arxiv.org/pdf/1904.12550v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-matching-of-documents-from |
Repo | https://github.com/nlpAThits/TopNCosSimAvg |
Framework | tf |
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Title | StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion |
Authors | Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo |
Abstract | Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings among multiple domains without relying on parallel data. This is important but challenging owing to the requirement of learning multiple mappings and the non-availability of explicit supervision. Recently, StarGAN-VC has garnered attention owing to its ability to solve this problem only using a single generator. However, there is still a gap between real and converted speech. To bridge this gap, we rethink conditional methods of StarGAN-VC, which are key components for achieving non-parallel multi-domain VC in a single model, and propose an improved variant called StarGAN-VC2. Particularly, we rethink conditional methods in two aspects: training objectives and network architectures. For the former, we propose a source-and-target conditional adversarial loss that allows all source domain data to be convertible to the target domain data. For the latter, we introduce a modulation-based conditional method that can transform the modulation of the acoustic feature in a domain-specific manner. We evaluated our methods on non-parallel multi-speaker VC. An objective evaluation demonstrates that our proposed methods improve speech quality in terms of both global and local structure measures. Furthermore, a subjective evaluation shows that StarGAN-VC2 outperforms StarGAN-VC in terms of naturalness and speaker similarity. The converted speech samples are provided at http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/stargan-vc2/index.html. |
Tasks | Voice Conversion |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12279v2 |
https://arxiv.org/pdf/1907.12279v2.pdf | |
PWC | https://paperswithcode.com/paper/stargan-vc2-rethinking-conditional-methods |
Repo | https://github.com/SamuelBroughton/StarGAN-Voice-Conversion-2 |
Framework | pytorch |
GMNN: Graph Markov Neural Networks
Title | GMNN: Graph Markov Neural Networks |
Authors | Meng Qu, Yoshua Bengio, Jian Tang |
Abstract | This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results. |
Tasks | Object Classification, Relational Reasoning, Representation Learning |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06214v2 |
https://arxiv.org/pdf/1905.06214v2.pdf | |
PWC | https://paperswithcode.com/paper/gmnn-graph-markov-neural-networks |
Repo | https://github.com/DeepGraphLearning/GMNN |
Framework | pytorch |
Robust copy-move forgery detection by false alarms control
Title | Robust copy-move forgery detection by false alarms control |
Authors | Thibaud Ehret |
Abstract | Detecting reliably copy-move forgeries is difficult because images do contain similar objects. The question is: how to discard natural image self-similarities while still detecting copy-moved parts as being “unnaturally similar”? Copy-move may have been performed after a rotation, a change of scale and followed by JPEG compression or the addition of noise. For this reason, we base our method on SIFT, which provides sparse keypoints with scale, rotation and illumination invariant descriptors. To discriminate natural descriptor matches from artificial ones, we introduce an a contrario method which gives theoretical guarantees on the number of false alarms. We validate our method on several databases. Being fully unsupervised it can be integrated into any generic automated image tampering detection pipeline. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00649v1 |
https://arxiv.org/pdf/1906.00649v1.pdf | |
PWC | https://paperswithcode.com/paper/190600649 |
Repo | https://github.com/tehret/rcmfd |
Framework | none |
Contrastive Representation Distillation
Title | Contrastive Representation Distillation |
Authors | Yonglong Tian, Dilip Krishnan, Phillip Isola |
Abstract | Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a collection of models into a single estimator. Knowledge distillation, the standard approach to these problems, minimizes the KL divergence between the probabilistic outputs of a teacher and student network. We demonstrate that this objective ignores important structural knowledge of the teacher network. This motivates an alternative objective by which we train a student to capture significantly more information in the teacher’s representation of the data. We formulate this objective as contrastive learning. Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation. Code: http://github.com/HobbitLong/RepDistiller. |
Tasks | Model Compression, Transfer Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10699v2 |
https://arxiv.org/pdf/1910.10699v2.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-representation-distillation-1 |
Repo | https://github.com/HobbitLong/RepDistiller |
Framework | pytorch |
Adversarial Policies: Attacking Deep Reinforcement Learning
Title | Adversarial Policies: Attacking Deep Reinforcement Learning |
Authors | Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell |
Abstract | Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent’s observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at https://adversarialpolicies.github.io/. |
Tasks | |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10615v2 |
https://arxiv.org/pdf/1905.10615v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-policies-attacking-deep |
Repo | https://github.com/HumanCompatibleAI/adversarial-policies |
Framework | none |
Disentangling feature and lazy training in deep neural networks
Title | Disentangling feature and lazy training in deep neural networks |
Authors | Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart |
Abstract | Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $\Theta$ (the NTK). By contrast, in the Mean Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated to a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as $\alpha h^{-1/2}$ at initialization. By varying $\alpha$ and $h$, we probe the crossover between the two limits. We observe two regimes that we call “lazy training” and “feature training”. In the lazy-training regime, the dynamics is almost linear and the NTK does barely change after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and thus learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that: (i) The two regimes are separated by an $\alpha^*$ that scales as $\frac{1}{\sqrt{h}}$. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks generally perform better in the lazy-training regime (except when we reduce the dataset via PCA), and we provide an example of a convolutional network that achieves a lower error in the feature-training regime. (iii) In both regimes, the fluctuations $\delta F$ induced by initial conditions on the learned function decay as $\delta F \sim 1/\sqrt{h}$, leading to a performance that increases with $h$. (iv) In the feature-training regime we identify a time scale $t_1 \sim \sqrt{h}\alpha$ |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08034v2 |
https://arxiv.org/pdf/1906.08034v2.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-feature-and-lazy-learning-in |
Repo | https://github.com/mariogeiger/feature_lazy |
Framework | pytorch |
Auto-Retoucher(ART) - A framework for Background Replacement and Image Editing
Title | Auto-Retoucher(ART) - A framework for Background Replacement and Image Editing |
Authors | Yunxuan Xiao, Yikai Li, Yuwei Wu, Lizhen Zhu |
Abstract | Replacing the background and simultaneously adjusting foreground objects is a challenging task in image editing. Current techniques for generating such images relies heavily on user interactions with image editing softwares, which is a tedious job for professional retouchers. To reduce their workload, some exciting progress has been made on generating images with a given background. However, these models can neither adjust the position and scale of the foreground objects, nor guarantee the semantic consistency between foreground and background. To overcome these limitations, we propose a framework – ART(Auto-Retoucher), to generate images with sufficient semantic and spatial consistency. Images are first processed by semantic matting and scene parsing modules, then a multi-task verifier model will give two confidence scores for the current background and position setting. We demonstrate that our jointly optimized verifier model successfully improves the visual consistency, and our ART framework performs well on images with the human body as foregrounds. |
Tasks | Scene Parsing |
Published | 2019-01-13 |
URL | http://arxiv.org/abs/1901.03954v1 |
http://arxiv.org/pdf/1901.03954v1.pdf | |
PWC | https://paperswithcode.com/paper/auto-retoucherart-a-framework-for-background |
Repo | https://github.com/woshiyyya/Auto-Retoucher-pytorch |
Framework | pytorch |
Adversarial Embedding: A robust and elusive Steganography and Watermarking technique
Title | Adversarial Embedding: A robust and elusive Steganography and Watermarking technique |
Authors | Salah Ghamizi, Maxime Cordy, Mike Papadakis, Yves Le Traon |
Abstract | We propose adversarial embedding, a new steganography and watermarking technique that embeds secret information within images. The key idea of our method is to use deep neural networks for image classification and adversarial attacks to embed secret information within images. Thus, we use the attacks to embed an encoding of the message within images and the related deep neural network outputs to extract it. The key properties of adversarial attacks (invisible perturbations, nontransferability, resilience to tampering) offer guarantees regarding the confidentiality and the integrity of the hidden messages. We empirically evaluate adversarial embedding using more than 100 models and 1,000 messages. Our results confirm that our embedding passes unnoticed by both humans and steganalysis methods, while at the same time impedes illicit retrieval of the message (less than 13% recovery rate when the interceptor has some knowledge about our model), and is resilient to soft and (to some extent) aggressive image tampering (up to 100% recovery rate under jpeg compression). We further develop our method by proposing a new type of adversarial attack which improves the embedding density (amount of hidden information) of our method to up to 10 bits per pixel. |
Tasks | Adversarial Attack, Image Classification |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1912.01487v1 |
https://arxiv.org/pdf/1912.01487v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-embedding-a-robust-and-elusive |
Repo | https://github.com/yamizi/Adversarial-Embedding |
Framework | tf |
Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking
Title | Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking |
Authors | Masaru Isonuma, Junichiro Mori, Ichiro Sakata |
Abstract | This paper focuses on the end-to-end abstractive summarization of a single product review without supervision. We assume that a review can be described as a discourse tree, in which the summary is the root, and the child sentences explain their parent in detail. By recursively estimating a parent from its children, our model learns the latent discourse tree without an external parser and generates a concise summary. We also introduce an architecture that ranks the importance of each sentence on the tree to support summary generation focusing on the main review point. The experimental results demonstrate that our model is competitive with or outperforms other unsupervised approaches. In particular, for relatively long reviews, it achieves a competitive or better performance than supervised models. The induced tree shows that the child sentences provide additional information about their parent, and the generated summary abstracts the entire review. |
Tasks | Abstractive Text Summarization, Document Summarization |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05691v1 |
https://arxiv.org/pdf/1906.05691v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-single-document |
Repo | https://github.com/misonuma/strsum |
Framework | tf |
Music Artist Classification with Convolutional Recurrent Neural Networks
Title | Music Artist Classification with Convolutional Recurrent Neural Networks |
Authors | Zain Nasrullah, Yue Zhao |
Abstract | Previous attempts at music artist classification use frame level audio features which summarize frequency content within short intervals of time. Comparatively, more recent music information retrieval tasks take advantage of temporal structure in audio spectrograms using deep convolutional and recurrent models. This paper revisits artist classification with this new framework and empirically explores the impacts of incorporating temporal structure in the feature representation. To this end, an established classification architecture, a Convolutional Recurrent Neural Network (CRNN), is applied to the artist20 music artist identification dataset under a comprehensive set of conditions. These include audio clip length, which is a novel contribution in this work, and previously identified considerations such as dataset split and feature level. Our results improve upon baseline works, verify the influence of the producer effect on classification performance and demonstrate the trade-offs between audio length and training set size. The best performing model achieves an average F1 score of 0.937 across three independent trials which is a substantial improvement over the corresponding baseline under similar conditions. Additionally, to showcase the effectiveness of the CRNN’s feature extraction capabilities, we visualize audio samples at the model’s bottleneck layer demonstrating that learned representations segment into clusters belonging to their respective artists. |
Tasks | Information Retrieval, Music Information Retrieval |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04555v2 |
http://arxiv.org/pdf/1901.04555v2.pdf | |
PWC | https://paperswithcode.com/paper/music-artist-classification-with |
Repo | https://github.com/winstonll/SynC |
Framework | none |
SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks with at most one Spike per Neuron
Title | SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks with at most one Spike per Neuron |
Authors | Milad Mozafari, Mohammad Ganjtabesh, Abbas Nowzari-Dalini, Timothée Masquelier |
Abstract | Application of deep convolutional spiking neural networks (SNNs) to artificial intelligence (AI) tasks has recently gained a lot of interest since SNNs are hardware-friendly and energy-efficient. Unlike the non-spiking counterparts, most of the existing SNN simulation frameworks are not practically efficient enough for large-scale AI tasks. In this paper, we introduce SpykeTorch, an open-source high-speed simulation framework based on PyTorch. This framework simulates convolutional SNNs with at most one spike per neuron and the rank-order encoding scheme. In terms of learning rules, both spike-timing-dependent plasticity (STDP) and reward-modulated STDP (R-STDP) are implemented, but other rules could be implemented easily. Apart from the aforementioned properties, SpykeTorch is highly generic and capable of reproducing the results of various studies. Computations in the proposed framework are tensor-based and totally done by PyTorch functions, which in turn brings the ability of just-in-time optimization for running on CPUs, GPUs, or Multi-GPU platforms. |
Tasks | |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02440v2 |
https://arxiv.org/pdf/1903.02440v2.pdf | |
PWC | https://paperswithcode.com/paper/spyketorch-efficient-simulation-of |
Repo | https://github.com/miladmozafari/SpykeTorch |
Framework | pytorch |