October 20, 2019

3251 words 16 mins read

Paper Group AWR 289

A Bayesian Approach for Sequence Tagging with Crowds. Weighted Spectral Embedding of Graphs. Lip Movements Generation at a Glance. Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality. Resource Aware Person Re-identification across Multiple Resolutions. Spatial and Temporal Mutual Promotion …

A Bayesian Approach for Sequence Tagging with Crowds


Title	A Bayesian Approach for Sequence Tagging with Crowds
Authors	Edwin Simpson, Iryna Gurevych
Abstract	Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.
Tasks	Active Learning, Argument Mining, Named Entity Recognition
Published	2018-11-02
URL	https://arxiv.org/abs/1811.00780v3
PDF	https://arxiv.org/pdf/1811.00780v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-ensembles-of-crowds-and-deep
Repo	https://github.com/UKPLab/arxiv2018-bayesian-ensembles
Framework	none

Weighted Spectral Embedding of Graphs


Title	Weighted Spectral Embedding of Graphs
Authors	Thomas Bonald, Alexandre Hollocou, Marc Lelarge
Abstract	We present a novel spectral embedding of graphs that incorporates weights assigned to the nodes, quantifying their relative importance. This spectral embedding is based on the first eigenvectors of some properly normalized version of the Laplacian. We prove that these eigenvectors correspond to the configurations of lowest energy of an equivalent physical system, either mechanical or electrical, in which the weight of each node can be interpreted as its mass or its capacitance, respectively. Experiments on a real dataset illustrate the impact of weighting on the embedding.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1809.11115v2
PDF	http://arxiv.org/pdf/1809.11115v2.pdf
PWC	https://paperswithcode.com/paper/weighted-spectral-embedding-of-graphs
Repo	https://github.com/tbonald/spectral_embedding
Framework	none

Lip Movements Generation at a Glance


Title	Lip Movements Generation at a Glance
Authors	Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
Abstract	Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. To perform well in this task, it inevitably requires a model to not only consider the retention of target identity, photo-realistic of synthesized images, consistency and smoothness of lip images in a sequence, but more importantly, learn the correlations between audio speech and lip movements. To solve the collective problems, we explore the best modeling of the audio-visual correlations in building and training a lip-movement generator network. Specifically, we devise a method to fuse audio and image embeddings to generate multiple lip images at once and propose a novel correlation loss to synchronize lip changes and speech changes. Our final model utilizes a combination of four losses for a comprehensive consideration in generating lip movements; it is trained in an end-to-end fashion and is robust to lip shapes, view angles and different facial characteristics. Thoughtful experiments on three datasets ranging from lab-recorded to lips in-the-wild show that our model significantly outperforms other state-of-the-art methods extended to this task.
Tasks
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10404v3
PDF	http://arxiv.org/pdf/1803.10404v3.pdf
PWC	https://paperswithcode.com/paper/lip-movements-generation-at-a-glance
Repo	https://github.com/lelechen63/3d_gan
Framework	pytorch

Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality


Title	Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality
Authors	Sajad Saeedi, Bruno Bodin, Harry Wagstaff, Andy Nisbet, Luigi Nardi, John Mawer, Nicolas Melot, Oscar Palomar, Emanuele Vespa, Tom Spink, Cosmin Gorgovan, Andrew Webb, James Clarkson, Erik Tomusk, Thomas Debrunner, Kuba Kaszyk, Pablo Gonzalez-de-Aledo, Andrey Rodchenko, Graham Riley, Christos Kotselidis, Björn Franke, Michael F. P. O’Boyle, Andrew J. Davison, Paul H. J. Kelly, Mikel Luján, Steve Furber
Abstract	Visual understanding of 3D environments in real-time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context.
Tasks	Autonomous Vehicles
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06352v1
PDF	http://arxiv.org/pdf/1808.06352v1.pdf
PWC	https://paperswithcode.com/paper/navigating-the-landscape-for-real-time
Repo	https://github.com/xiexiexiaoxiexie/Udacity-self-driving-car-engineer-P6-Kidnapped-Vehicle
Framework	none

Resource Aware Person Re-identification across Multiple Resolutions


Title	Resource Aware Person Re-identification across Multiple Resolutions
Authors	Yan Wang, Lequn Wang, Yurong You, Xu Zou, Vincent Chen, Serena Li, Gao Huang, Bharath Hariharan, Kilian Q. Weinberger
Abstract	Not all people are equally easy to identify: color statistics might be enough for some cases while others might require careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) methods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needlessly expensive for the easy ones. To remedy this, we present a new person re-ID model that combines effective embeddings built on multiple convolutional network layers, trained with deep-supervision. On traditional re-ID benchmarks, our method improves substantially over the previous state-of-the-art results on all five datasets that we evaluate on. We then propose two new formulations of the person re-ID problem under resource-constraints, and show how our model can be used to effectively trade off accuracy and computation in the presence of resource constraints. Code and pre-trained models are available at https://github.com/mileyan/DARENet.
Tasks	Person Re-Identification
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08805v3
PDF	http://arxiv.org/pdf/1805.08805v3.pdf
PWC	https://paperswithcode.com/paper/resource-aware-person-re-identification
Repo	https://github.com/mileyan/DARENet
Framework	pytorch

Spatial and Temporal Mutual Promotion for Video-based Person Re-identification


Title	Spatial and Temporal Mutual Promotion for Video-based Person Re-identification
Authors	Yiheng Liu, Zhenxun Yuan, Wengang Zhou, Houqiang Li
Abstract	Video-based person re-identification is a crucial task of matching video sequences of a person across multiple camera views. Generally, features directly extracted from a single frame suffer from occlusion, blur, illumination and posture changes. This leads to false activation or missing activation in some regions, which corrupts the appearance and motion representation. How to explore the abundant spatial-temporal information in video sequences is the key to solve this problem. To this end, we propose a Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses noisy parts of the current frame’s features by referring historical frames. With RRU, the quality of each frame’s appearance representation is improved. Then we use the Spatial-Temporal clues Integration Module (STIM) to mine the spatial-temporal information from those upgraded features. Meanwhile, the multi-level training objective is used to enhance the capability of RRU and STIM. Through the cooperation of those modules, the spatial and temporal features mutually promote each other and the final spatial-temporal feature representation is more discriminative and robust. Extensive experiments are conducted on three challenging datasets, i.e., iLIDS-VID, PRID-2011 and MARS. The experimental results demonstrate that our approach outperforms existing state-of-the-art methods of video-based person re-identification on iLIDS-VID and MARS and achieves favorable results on PRID-2011.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-12-26
URL	http://arxiv.org/abs/1812.10305v1
PDF	http://arxiv.org/pdf/1812.10305v1.pdf
PWC	https://paperswithcode.com/paper/spatial-and-temporal-mutual-promotion-for
Repo	https://github.com/yolomax/rru-reid
Framework	pytorch


Title	Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
Authors	Xin Wang, Yuan-Fang Wang, William Yang Wang
Abstract	A major challenge for video captioning is to combine audio and visual cues. Existing multi-modal fusion methods have shown encouraging results in video understanding. However, the temporal structures of multiple modalities at different granularities are rarely explored, and how to selectively fuse the multi-modal representations at different levels of details remains uncharted. In this paper, we propose a novel hierarchically aligned cross-modal attention (HACA) framework to learn and selectively fuse both global and local temporal dynamics of different modalities. Furthermore, for the first time, we validate the superior performance of the deep audio features on the video captioning task. Finally, our HACA model significantly outperforms the previous best systems and achieves new state-of-the-art results on the widely used MSR-VTT dataset.
Tasks	Video Captioning, Video Understanding
Published	2018-04-15
URL	http://arxiv.org/abs/1804.05448v1
PDF	http://arxiv.org/pdf/1804.05448v1.pdf
PWC	https://paperswithcode.com/paper/watch-listen-and-describe-globally-and
Repo	https://github.com/chitwansaharia/HACAModel
Framework	pytorch

Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain


Title	Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain
Authors	Pablo A. Alvarado, Mauricio A. Álvarez, Dan Stowell
Abstract	Gaussian process (GP) audio source separation is a time-domain approach that circumvents the inherent phase approximation issue of spectrogram based methods. Furthermore, through its kernel, GPs elegantly incorporate prior knowledge about the sources into the separation model. Despite these compelling advantages, the computational complexity of GP inference scales cubically with the number of audio samples. As a result, source separation GP models have been restricted to the analysis of short audio frames. We introduce an efficient application of GPs to time-domain audio source separation, without compromising performance. For this purpose, we used GP regression, together with spectral mixture kernels, and variational sparse GPs. We compared our method with LD-PSDTF (positive semi-definite tensor factorization), KL-NMF (Kullback-Leibler non-negative matrix factorization), and IS-NMF (Itakura-Saito NMF). Results show that the proposed method outperforms these techniques.
Tasks
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12679v3
PDF	http://arxiv.org/pdf/1810.12679v3.pdf
PWC	https://paperswithcode.com/paper/sparse-gaussian-process-audio-source
Repo	https://github.com/PabloAlvarado/ssgp
Framework	tf

NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm


Title	NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm
Authors	Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, Wolfgang Banzhaf
Abstract	This paper introduces NSGA-Net – an evolutionary approach for neural architecture search (NAS). NSGA-Net is designed with three goals in mind: (1) a procedure considering multiple and conflicting objectives, (2) an efficient procedure balancing exploration and exploitation of the space of potential neural network architectures, and (3) a procedure finding a diverse set of trade-off network architectures achieved in a single run. NSGA-Net is a population-based search algorithm that explores a space of potential neural network architectures in three steps, namely, a population initialization step that is based on prior-knowledge from hand-crafted architectures, an exploration step comprising crossover and mutation of architectures, and finally an exploitation step that utilizes the hidden useful knowledge stored in the entire history of evaluated neural architectures in the form of a Bayesian Network. Experimental results suggest that combining the dual objectives of minimizing an error metric and computational complexity, as measured by FLOPs, allows NSGA-Net to find competitive neural architectures. Moreover, NSGA-Net achieves error rate on the CIFAR-10 dataset on par with other state-of-the-art NAS methods while using orders of magnitude less computational resources. These results are encouraging and shows the promise to further use of EC methods in various deep-learning paradigms.
Tasks	Efficient Exploration, Neural Architecture Search, Object Classification
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03522v2
PDF	http://arxiv.org/pdf/1810.03522v2.pdf
PWC	https://paperswithcode.com/paper/nsga-net-a-multi-objective-genetic-algorithm
Repo	https://github.com/ianwhale/nsga-net
Framework	pytorch

Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation


Title	Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation
Authors	Edward Smith, Scott Fujimoto, David Meger
Abstract	We consider the problem of scaling deep generative shape models to high-resolution. Drawing motivation from the canonical view representation of objects, we introduce a novel method for the fast up-sampling of 3D objects in voxel space through networks that perform super-resolution on the six orthographic depth projections. This allows us to generate high-resolution objects with more efficient scaling than methods which work directly in 3D. We decompose the problem of 2D depth super-resolution into silhouette and depth prediction to capture both structure and fine detail. This allows our method to generate sharp edges more easily than an individual network. We evaluate our work on multiple experiments concerning high-resolution 3D objects, and show our system is capable of accurately predicting novel objects at resolutions as large as 512$\mathbf{\times}$512$\mathbf{\times}$512 – the highest resolution reported for this task. We achieve state-of-the-art performance on 3D object reconstruction from RGB images on the ShapeNet dataset, and further demonstrate the first effective 3D super-resolution method.
Tasks	3D Object Reconstruction, 3D Object Super-Resolution, Depth Estimation, Object Reconstruction, Super-Resolution
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09987v3
PDF	http://arxiv.org/pdf/1802.09987v3.pdf
PWC	https://paperswithcode.com/paper/multi-view-silhouette-and-depth-decomposition
Repo	https://github.com/kingcheng2000/Multi-View-Silhouette-and-Depth-Decomposition-for-High-Resolution-3D-Object-Representation
Framework	tf

Sparse Label Smoothing Regularization for Person Re-Identification


Title	Sparse Label Smoothing Regularization for Person Re-Identification
Authors	Jean-Paul Ainam, Ke Qin, Guisong Liu, Guangchun Luo
Abstract	Person re-identification (re-id) is a cross-camera retrieval task which establishes a correspondence between images of a person from multiple cameras. Deep Learning methods have been successfully applied to this problem and have achieved impressive results. However, these methods require a large amount of labeled training data. Currently labeled datasets in person re-id are limited in their scale and manual acquisition of such large-scale datasets from surveillance cameras is a tedious and labor-intensive task. In this paper, we propose a framework that performs intelligent data augmentation and assigns partial smoothing label to generated data. Our approach first exploits the clustering property of existing person re-id datasets to create groups of similar objects that model cross-view variations. Each group is then used to generate realistic images through adversarial training. Our aim is to emphasize feature similarity between generated samples and the original samples. Finally, we assign a non-uniform label distribution to the generated samples and define a regularized loss function for training. The proposed approach tackles two problems (1) how to efficiently use the generated data and (2) how to address the over-smoothness problem found in current regularization methods. Extensive experiments on four larges cale datasets show that our regularization method significantly improves the Re-ID accuracy compared to existing methods.
Tasks	Data Augmentation, Person Re-Identification, Semi-Supervised Person Re-Identification
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04976v3
PDF	http://arxiv.org/pdf/1809.04976v3.pdf
PWC	https://paperswithcode.com/paper/sparse-label-smoothing-regularization-for
Repo	https://github.com/jpainam/SLS_ReID
Framework	pytorch

General audio tagging with ensembling convolutional neural network and statistical features


Title	General audio tagging with ensembling convolutional neural network and statistical features
Authors	Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang
Abstract	Audio tagging aims to infer descriptive labels from audio clips. Audio tagging is challenging due to the limited size of data and noisy labels. In this paper, we describe our solution for the DCASE 2018 Task 2 general audio tagging challenge. The contributions of our solution include: We investigated a variety of convolutional neural network architectures to solve the audio tagging task. Statistical features are applied to capture statistical patterns of audio features to improve the classification performance. Ensemble learning is applied to ensemble the outputs from the deep classifiers to utilize complementary information. a sample re-weight strategy is employed for ensemble training to address the noisy label problem. Our system achieves a mean average precision (mAP@3) of 0.958, outperforming the baseline system of 0.704. Our system ranked the 1st and 4th out of 558 submissions in the public and private leaderboard of DCASE 2018 Task 2 challenge. Our codes are available at https://github.com/Cocoxili/DCASE2018Task2/.
Tasks	Audio Tagging
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12832v1
PDF	http://arxiv.org/pdf/1810.12832v1.pdf
PWC	https://paperswithcode.com/paper/general-audio-tagging-with-ensembling
Repo	https://github.com/Cocoxili/DCASE2018Task2
Framework	pytorch

Neural Collective Entity Linking


Title	Neural Collective Entity Linking
Authors	Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu
Abstract	Entity Linking aims to link entity mentions in texts to knowledge bases, and neural models have achieved recent success in this task. However, most existing methods rely on local contexts to resolve entities independently, which may usually fail due to the data sparsity of local information. To address this issue, we propose a novel neural model for collective entity linking, named as NCEL. NCEL applies Graph Convolutional Network to integrate both local contextual features and global coherence information for entity linking. To improve the computation efficiency, we approximately perform graph convolution on a subgraph of adjacent entity mentions instead of those in the entire text. We further introduce an attention scheme to improve the robustness of NCEL to data noise and train the model on Wikipedia hyperlinks to avoid overfitting and domain bias. In experiments, we evaluate NCEL on five publicly available datasets to verify the linking performance as well as generalization ability. We also conduct an extensive analysis of time complexity, the impact of key modules, and qualitative results, which demonstrate the effectiveness and efficiency of our proposed method.
Tasks	Entity Linking
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08603v1
PDF	http://arxiv.org/pdf/1811.08603v1.pdf
PWC	https://paperswithcode.com/paper/neural-collective-entity-linking
Repo	https://github.com/TaoMiner/NCEL
Framework	pytorch

End-to-End Neural Entity Linking


Title	End-to-End Neural Entity Linking
Authors	Nikolaos Kolitsas, Octavian-Eugen Ganea, Thomas Hofmann
Abstract	Entity Linking (EL) is an essential task for semantic text understanding and information extraction. Popular methods separately address the Mention Detection (MD) and Entity Disambiguation (ED) stages of EL, without leveraging their mutual dependency. We here propose the first neural end-to-end EL system that jointly discovers and links entities in a text document. The main idea is to consider all possible spans as potential mentions and learn contextual similarity scores over their entity candidates that are useful for both MD and ED decisions. Key components are context-aware mention embeddings, entity embeddings and a probabilistic mention - entity map, without demanding other engineered features. Empirically, we show that our end-to-end method significantly outperforms popular systems on the Gerbil platform when enough training data is available. Conversely, if testing datasets follow different annotation conventions compared to the training set (e.g. queries/ tweets vs news documents), our ED model coupled with a traditional NER system offers the best or second best EL accuracy.
Tasks	Entity Disambiguation, Entity Embeddings, Entity Linking
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07699v2
PDF	http://arxiv.org/pdf/1808.07699v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-neural-entity-linking
Repo	https://github.com/dalab/end2end_neural_el
Framework	none

The SWAG Algorithm; a Mathematical Approach that Outperforms Traditional Deep Learning. Theory and Implementation


Title	The SWAG Algorithm; a Mathematical Approach that Outperforms Traditional Deep Learning. Theory and Implementation
Authors	Saeid Safaei, Vahid Safaei, Solmazi Safaei, Zerotti Woods, Hamid R. Arabnia, Juan B. Gutierrez
Abstract	The performance of artificial neural networks (ANNs) is influenced by weight initialization, the nature of activation functions, and their architecture. There is a wide range of activation functions that are traditionally used to train a neural network, e.g. sigmoid, tanh, and Rectified Linear Unit (ReLU). A widespread practice is to use the same type of activation function in all neurons in a given layer. In this manuscript, we present a type of neural network in which the activation functions in every layer form a polynomial basis; we name this method SWAG after the initials of the last names of the authors. We tested SWAG on three complex highly non-linear functions as well as the MNIST handwriting data set. SWAG outperforms and converges faster than the state of the art performance in fully connected neural networks. Given the low computational complexity of SWAG, and the fact that it was capable of solving problems current architectures cannot, it has the potential to change the way that we approach deep learning.
Tasks
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11813v1
PDF	http://arxiv.org/pdf/1811.11813v1.pdf
PWC	https://paperswithcode.com/paper/the-swag-algorithm-a-mathematical-approach
Repo	https://github.com/DeepLearningSaeid/New-Type-of-Deep-Learning
Framework	none