February 1, 2020

3297 words 16 mins read

Paper Group AWR 296

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. Levenshtein Transformer. Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network. Online Adaptive Principal Component Analysis and Its extensions. Task-Oriented Language Grounding for Language …

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer


Title	BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
Authors	Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, Peng Jiang
Abstract	Modeling users’ dynamic and evolving preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks (e.g., Recurrent Neural Network) to encode users’ historical interactions from left to right into hidden representations for making recommendations. Although these methods achieve satisfactory results, they often assume a rigidly ordered sequence which is not always practical. We argue that such left-to-right unidirectional architectures restrict the power of the historical sequence representations. For this purpose, we introduce a Bidirectional Encoder Representations from Transformers for sequential Recommendation (BERT4Rec). However, jointly conditioning on both left and right context in deep bidirectional model would make the training become trivial since each item can indirectly “see the target item”. To address this problem, we train the bidirectional model using the Cloze task, predicting the masked items in the sequence by jointly conditioning on their left and right context. Comparing with predicting the next item at each position in a sequence, the Cloze task can produce more samples to train a more powerful bidirectional model. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.
Tasks	Recommendation Systems
Published	2019-04-14
URL	https://arxiv.org/abs/1904.06690v2
PDF	https://arxiv.org/pdf/1904.06690v2.pdf
PWC	https://paperswithcode.com/paper/bert4rec-sequential-recommendation-with
Repo	https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch
Framework	pytorch

Levenshtein Transformer


Title	Levenshtein Transformer
Authors	Jiatao Gu, Changhan Wang, Jake Zhao
Abstract	Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the atomic operations of our model are insertion and deletion. The combination of them facilitates not only generation but also sequence refinement allowing dynamic length changes. We also propose a set of new training techniques dedicated at them, effectively exploiting one as the other’s learning signal thanks to their complementary nature. Experiments applying the proposed model achieve comparable performance but much-improved efficiency on both generation (e.g. machine translation, text summarization) and refinement tasks (e.g. automatic post-editing). We further confirm the flexibility of our model by showing a Levenshtein Transformer trained by machine translation can straightforwardly be used for automatic post-editing.
Tasks	Automatic Post-Editing, Machine Translation, Text Summarization
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11006v2
PDF	https://arxiv.org/pdf/1905.11006v2.pdf
PWC	https://paperswithcode.com/paper/levenshtein-transformer
Repo	https://github.com/pytorch/fairseq
Framework	pytorch

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network


Title	Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network
Authors	Sharath Adavanne, Archontis Politis, Tuomas Virtanen
Abstract	This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN). We use a CRNN previously proposed for the localization and detection of stationary sources, and show that the recurrent layers enable the spatial tracking of moving sources when trained with dynamic scenes. The tracking performance of the CRNN is compared with a stand-alone tracking method that combines a multi-source (DOA) estimator and a particle filter. Their respective performance is evaluated in various acoustic conditions such as anechoic and reverberant scenarios, stationary and moving sources at several angular velocities, and with a varying number of overlapping sources. The results show that the CRNN manages to track multiple sources more consistently than the parametric method across acoustic scenarios, but at the cost of higher localization error.
Tasks
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12769v1
PDF	http://arxiv.org/pdf/1904.12769v1.pdf
PWC	https://paperswithcode.com/paper/localization-detection-and-tracking-of
Repo	https://github.com/sharathadavanne/seld-net
Framework	none

Online Adaptive Principal Component Analysis and Its extensions


Title	Online Adaptive Principal Component Analysis and Its extensions
Authors	Jianjun Yuan, Andrew Lamperski
Abstract	We propose algorithms for online principal component analysis (PCA) and variance minimization for adaptive settings. Previous literature has focused on upper bounding the static adversarial regret, whose comparator is the optimal fixed action in hindsight. However, static regret is not an appropriate metric when the underlying environment is changing. Instead, we adopt the adaptive regret metric from the previous literature and propose online adaptive algorithms for PCA and variance minimization, that have sub-linear adaptive regret guarantees. We demonstrate both theoretically and experimentally that the proposed algorithms can adapt to the changing environments.
Tasks
Published	2019-01-23
URL	https://arxiv.org/abs/1901.07687v3
PDF	https://arxiv.org/pdf/1901.07687v3.pdf
PWC	https://paperswithcode.com/paper/online-adaptive-principal-component-analysis
Repo	https://github.com/yuanx270/online-adaptive-PCA
Framework	none

Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order


Title	Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order
Authors	Vladislav Kurenkov, Bulat Maksudov, Adil Khan
Abstract	In this work, we analyze the performance of general deep reinforcement learning algorithms for a task-oriented language grounding problem, where language input contains multiple sub-goals and their order of execution is non-linear. We generate a simple instructional language for the GridWorld environment, that is built around three language elements (order connectors) defining the order of execution: one linear - “comma” and two non-linear - “but first”, “but before”. We apply one of the deep reinforcement learning baselines - Double DQN with frame stacking and ablate several extensions such as Prioritized Experience Replay and Gated-Attention architecture. Our results show that the introduction of non-linear order connectors improves the success rate on instructions with a higher number of sub-goals in 2-3 times, but it still does not exceed 20%. Also, we observe that the usage of Gated-Attention provides no competitive advantage against concatenation in this setting. Source code and experiments’ results are available at https://github.com/vkurenkov/language-grounding-multigoal
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12354v1
PDF	https://arxiv.org/pdf/1910.12354v1.pdf
PWC	https://paperswithcode.com/paper/task-oriented-language-grounding-for-language
Repo	https://github.com/vkurenkov/language-grounding-multigoal
Framework	pytorch

Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems


Title	Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems
Authors	Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, Pascale Fung
Abstract	Recently, data-driven task-oriented dialogue systems have achieved promising performance in English. However, developing dialogue systems that support low-resource languages remains a long-standing challenge due to the absence of high-quality data. In order to circumvent the expensive and time-consuming data collection, we introduce Attention-Informed Mixed-Language Training (MLT), a novel zero-shot adaptation method for cross-lingual task-oriented dialogue systems. It leverages very few task-related parallel word pairs to generate code-switching sentences for learning the inter-lingual semantics across languages. Instead of manually selecting the word pairs, we propose to extract source words based on the scores computed by the attention layer of a trained English task-related model and then generate word pairs using existing bilingual dictionaries. Furthermore, intensive experiments with different cross-lingual embeddings demonstrate the effectiveness of our approach. Finally, with very few word pairs, our model achieves significant zero-shot adaptation performance improvements in both cross-lingual dialogue state tracking and natural language understanding (i.e., intent detection and slot filling) tasks compared to the current state-of-the-art approaches, which utilize a much larger amount of bilingual data.
Tasks	Dialogue State Tracking, Intent Detection, Slot Filling, Task-Oriented Dialogue Systems
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09273v1
PDF	https://arxiv.org/pdf/1911.09273v1.pdf
PWC	https://paperswithcode.com/paper/attention-informed-mixed-language-training
Repo	https://github.com/zliucr/mixed-language-training
Framework	pytorch

Complex Signal Denoising and Interference Mitigation for Automotive Radar Using Convolutional Neural Networks


Title	Complex Signal Denoising and Interference Mitigation for Automotive Radar Using Convolutional Neural Networks
Authors	Johanna Rock, Mate Toth, Elmar Messner, Paul Meissner, Franz Pernkopf
Abstract	Driver assistance systems as well as autonomous cars have to rely on sensors to perceive their environment. A heterogeneous set of sensors is used to perform this task robustly. Among them, radar sensors are indispensable because of their range resolution and the possibility to directly measure velocity. Since more and more radar sensors are deployed on the streets, mutual interference must be dealt with. In the so far unregulated automotive radar frequency band, a sensor must be capable of detecting, or even mitigating the harmful effects of interference, which include a decreased detection sensitivity. In this paper, we address this issue with Convolutional Neural Networks (CNNs), which are state-of-the-art machine learning tools. We show that the ability of CNNs to find structured information in data while preserving local information enables superior denoising performance. To achieve this, CNN parameters are found using training with simulated data and integrated into the automotive radar signal processing chain. The presented method is compared with the state of the art, highlighting its promising performance. Hence, CNNs can be employed for interference mitigation as an alternative to conventional signal processing methods. Code and pre-trained models are available at https://github.com/johanna-rock/imRICnn.
Tasks	Denoising
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10044v2
PDF	https://arxiv.org/pdf/1906.10044v2.pdf
PWC	https://paperswithcode.com/paper/complex-signal-denoising-and-interference
Repo	https://github.com/johanna-rock/imRICnn
Framework	pytorch

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search


Title	CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Authors	Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, Marc Brockschmidt
Abstract	Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly technical) and natural language more suitable to describe vague concepts and ideas. To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus. The corpus contains about 6 million functions from open-source code spanning six programming languages (Go, Java, JavaScript, PHP, Python, and Ruby). The CodeSearchNet Corpus also contains automatically generated query-like natural language for 2 million functions, obtained from mechanically scraping and preprocessing associated function documentation. In this article, we describe the methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task. We hope that CodeSearchNet Challenge encourages researchers and practitioners to study this interesting task further and will host a competition and leaderboard to track the progress on the challenge. We are also keen on extending CodeSearchNet Challenge to more queries and programming languages in the future.
Tasks	Code Search, Information Retrieval
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09436v2
PDF	https://arxiv.org/pdf/1909.09436v2.pdf
PWC	https://paperswithcode.com/paper/codesearchnet-challenge-evaluating-the-state
Repo	https://github.com/UmarFarooqui/Springboard
Framework	none

$ρ$-VAE: Autoregressive parametrization of the VAE encoder


Title	$ρ$-VAE: Autoregressive parametrization of the VAE encoder
Authors	Sohrab Ferdowsi, Maurits Diephuis, Shideh Rezaeifar, Slava Voloshynovskiy
Abstract	We make a minimal, but very effective alteration to the VAE model. This is about a drop-in replacement for the (sample-dependent) approximate posterior to change it from the standard white Gaussian with diagonal covariance to the first-order autoregressive Gaussian. We argue that this is a more reasonable choice to adopt for natural signals like images, as it does not force the existing correlation in the data to disappear in the posterior. Moreover, it allows more freedom for the approximate posterior to match the true posterior. This allows for the repararametrization trick, as well as the KL-divergence term to still have closed-form expressions, obviating the need for its sample-based estimation. Although providing more freedom to adapt to correlated distributions, our parametrization has even less number of parameters than the diagonal covariance, as it requires only two scalars, $\rho$ and $s$, to characterize correlation and scaling, respectively. As validated by the experiments, our proposition noticeably and consistently improves the quality of image generation in a plug-and-play manner, needing no further parameter tuning, and across all setups. The code to reproduce our experiments is available at \url{https://github.com/sssohrab/rho_VAE/}.
Tasks	Image Generation
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06236v1
PDF	https://arxiv.org/pdf/1909.06236v1.pdf
PWC	https://paperswithcode.com/paper/-vae-autoregressive-parametrization-of-the
Repo	https://github.com/sssohrab/rho_VAE
Framework	pytorch

SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference


Title	SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
Authors	Martin Schmitt, Hinrich Schütze
Abstract	We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) ~960k unlabeled InfCands, and (ii) ~190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09. Each InfCand consists of one of these relations, expressed as a lemmatized dependency path, and two argument placeholders, each linked to one or more Freebase types. Due to our candidate selection process based on strong distributional evidence, SherLIiC is much harder than existing testbeds because distributional evidence is of little utility in the classification of InfCands. We also show that, due to its construction, many of SherLIiC’s correct InfCands are novel and missing from existing rule bases. We evaluate a number of strong baselines on SherLIiC, ranging from semantic vector space models to state of the art neural models of natural language inference (NLI). We show that SherLIiC poses a tough challenge to existing NLI systems.
Tasks	Natural Language Inference
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01393v1
PDF	https://arxiv.org/pdf/1906.01393v1.pdf
PWC	https://paperswithcode.com/paper/sherliic-a-typed-event-focused-lexical
Repo	https://github.com/mnschmit/SherLIiC
Framework	none

One-Shot Neural Architecture Search via Self-Evaluated Template Network


Title	One-Shot Neural Architecture Search via Self-Evaluated Template Network
Authors	Xuanyi Dong, Yi Yang
Abstract	Neural architecture search (NAS) aims to automate the search procedure of architecture instead of manual design. Even if recent NAS approaches finish the search within days, lengthy training is still required for a specific architecture candidate to get the parameters for its accurate evaluation. Recently one-shot NAS methods are proposed to largely squeeze the tedious training process by sharing parameters across candidates. In this way, the parameters for each candidate can be directly extracted from the shared parameters instead of training them from scratch. However, they have no sense of which candidate will perform better until evaluation so that the candidates to evaluate are randomly sampled and the top-1 candidate is considered the best. In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates. SETN consists of two components: (1) an evaluator, which learns to indicate the probability of each individual architecture being likely to have a lower validation loss. The candidates for evaluation can thus be selectively sampled according to this evaluator. (2) a template network, which shares parameters among all candidates to amortize the training cost of generated candidates. In experiments, the architecture found by SETN achieves state-of-the-art performance on CIFAR and ImageNet benchmarks within comparable computation costs. Code is publicly available on GitHub: https://github.com/D-X-Y/NAS-Projects.
Tasks	Neural Architecture Search
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05733v3
PDF	https://arxiv.org/pdf/1910.05733v3.pdf
PWC	https://paperswithcode.com/paper/one-shot-neural-architecture-search-via-self
Repo	https://github.com/D-X-Y/GDAS
Framework	pytorch

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms


Title	MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Authors	Marco Pasini
Abstract	Traditional voice conversion methods rely on parallel recordings of multiple speakers pronouncing the same sentences. For real-world applications however, parallel data is rarely available. We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice. We firstly compute spectrograms from waveform data and then perform a domain translation using a Generative Adversarial Network (GAN) architecture. An additional siamese network helps preserving speech information in the translation process, without sacrificing the ability to flexibly model the style of the target speaker. We test our framework with a dataset of clean speech recordings, as well as with a collection of noisy real-world speech examples. Finally, we apply the same method to perform music style transfer, translating arbitrarily long music samples from one genre to another, and showing that our framework is flexible and can be used for audio manipulation applications different from voice conversion.
Tasks	Style Transfer, Voice Conversion
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03713v2
PDF	https://arxiv.org/pdf/1910.03713v2.pdf
PWC	https://paperswithcode.com/paper/melgan-vc-voice-conversion-and-audio-style
Repo	https://github.com/marcoppasini/MelGAN-VC
Framework	tf

Temporal Knowledge Propagation for Image-to-Video Person Re-identification


Title	Temporal Knowledge Propagation for Image-to-Video Person Re-identification
Authors	Xinqian Gu, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen
Abstract	In many scenarios of Person Re-identification (Re-ID), the gallery set consists of lots of surveillance videos and the query is just an image, thus Re-ID has to be conducted between image and videos. Compared with videos, still person images lack temporal information. Besides, the information asymmetry between image and video features increases the difficulty in matching images and videos. To solve this problem, we propose a novel Temporal Knowledge Propagation (TKP) method which propagates the temporal knowledge learned by the video representation network to the image representation network. Specifically, given the input videos, we enforce the image representation network to fit the outputs of video representation network in a shared feature space. With back propagation, temporal knowledge can be transferred to enhance the image features and the information asymmetry problem can be alleviated. With additional classification and integrated triplet losses, our model can learn expressive and discriminative image and video features for image-to-video re-identification. Extensive experiments demonstrate the effectiveness of our method and the overall results on two widely used datasets surpass the state-of-the-art methods by a large margin. Code is available at: https://github.com/guxinqian/TKP
Tasks	Image-To-Video Person Re-Identification, Person Re-Identification, Video-Based Person Re-Identification
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03885v3
PDF	https://arxiv.org/pdf/1908.03885v3.pdf
PWC	https://paperswithcode.com/paper/temporal-knowledge-propagation-for-image-to
Repo	https://github.com/guxinqian/TKP
Framework	pytorch

Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks


Title	Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
Authors	Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, Quanquan Gu
Abstract	Graph convolutional networks (GCNs) have recently received wide attentions, due to their successful applications in different graph tasks and different domains. Training GCNs for a large graph, however, is still a challenge. Original full-batch GCN training requires calculating the representation of all the nodes in the graph per GCN layer, which brings in high computation and memory costs. To alleviate this issue, several sampling-based methods have been proposed to train GCNs on a subset of nodes. Among them, the node-wise neighbor-sampling method recursively samples a fixed number of neighbor nodes, and thus its computation cost suffers from exponential growing neighbor size; while the layer-wise importance-sampling method discards the neighbor-dependent constraints, and thus the nodes sampled across layer suffer from sparse connection problem. To deal with the above two problems, we propose a new effective sampling algorithm called LAyer-Dependent ImportancE Sampling (LADIES). Based on the sampled nodes in the upper layer, LADIES selects their neighborhood nodes, constructs a bipartite subgraph and computes the importance probability accordingly. Then, it samples a fixed number of nodes by the calculated probability, and recursively conducts such procedure per layer to construct the whole computation graph. We prove theoretically and experimentally, that our proposed sampling algorithm outperforms the previous sampling methods in terms of both time and memory costs. Furthermore, LADIES is shown to have better generalization accuracy than original full-batch GCN, due to its stochastic nature.
Tasks	Node Classification
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07323v1
PDF	https://arxiv.org/pdf/1911.07323v1.pdf
PWC	https://paperswithcode.com/paper/layer-dependent-importance-sampling-for-1
Repo	https://github.com/acbull/LADIES
Framework	pytorch

A-CNN: Annularly Convolutional Neural Networks on Point Clouds


Title	A-CNN: Annularly Convolutional Neural Networks on Point Clouds
Authors	Artem Komarichev, Zichun Zhong, Jing Hua
Abstract	Analyzing the geometric and semantic properties of 3D point clouds through the deep networks is still challenging due to the irregularity and sparsity of samplings of their geometric structures. This paper presents a new method to define and compute convolution directly on 3D point clouds by the proposed annular convolution. This new convolution operator can better capture the local neighborhood geometry of each point by specifying the (regular and dilated) ring-shaped structures and directions in the computation. It can adapt to the geometric variability and scalability at the signal processing level. We apply it to the developed hierarchical neural networks for object classification, part segmentation, and semantic segmentation in large-scale scenes. The extensive experiments and comparisons demonstrate that our approach outperforms the state-of-the-art methods on a variety of standard benchmark datasets (e.g., ModelNet10, ModelNet40, ShapeNet-part, S3DIS, and ScanNet).
Tasks	Object Classification, Semantic Segmentation
Published	2019-04-16
URL	http://arxiv.org/abs/1904.08017v1
PDF	http://arxiv.org/pdf/1904.08017v1.pdf
PWC	https://paperswithcode.com/paper/190408017
Repo	https://github.com/artemkomarichev/a-cnn
Framework	tf