Paper Group AWR 289
A Bayesian Approach for Sequence Tagging with Crowds. Weighted Spectral Embedding of Graphs. Lip Movements Generation at a Glance. Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality. Resource Aware Person Re-identification across Multiple Resolutions. Spatial and Temporal Mutual Promotion …
A Bayesian Approach for Sequence Tagging with Crowds
Title | A Bayesian Approach for Sequence Tagging with Crowds |
Authors | Edwin Simpson, Iryna Gurevych |
Abstract | Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations. |
Tasks | Active Learning, Argument Mining, Named Entity Recognition |
Published | 2018-11-02 |
URL | https://arxiv.org/abs/1811.00780v3 |
https://arxiv.org/pdf/1811.00780v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-ensembles-of-crowds-and-deep |
Repo | https://github.com/UKPLab/arxiv2018-bayesian-ensembles |
Framework | none |
Weighted Spectral Embedding of Graphs
Title | Weighted Spectral Embedding of Graphs |
Authors | Thomas Bonald, Alexandre Hollocou, Marc Lelarge |
Abstract | We present a novel spectral embedding of graphs that incorporates weights assigned to the nodes, quantifying their relative importance. This spectral embedding is based on the first eigenvectors of some properly normalized version of the Laplacian. We prove that these eigenvectors correspond to the configurations of lowest energy of an equivalent physical system, either mechanical or electrical, in which the weight of each node can be interpreted as its mass or its capacitance, respectively. Experiments on a real dataset illustrate the impact of weighting on the embedding. |
Tasks | |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11115v2 |
http://arxiv.org/pdf/1809.11115v2.pdf | |
PWC | https://paperswithcode.com/paper/weighted-spectral-embedding-of-graphs |
Repo | https://github.com/tbonald/spectral_embedding |
Framework | none |
Lip Movements Generation at a Glance
Title | Lip Movements Generation at a Glance |
Authors | Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu |
Abstract | Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. To perform well in this task, it inevitably requires a model to not only consider the retention of target identity, photo-realistic of synthesized images, consistency and smoothness of lip images in a sequence, but more importantly, learn the correlations between audio speech and lip movements. To solve the collective problems, we explore the best modeling of the audio-visual correlations in building and training a lip-movement generator network. Specifically, we devise a method to fuse audio and image embeddings to generate multiple lip images at once and propose a novel correlation loss to synchronize lip changes and speech changes. Our final model utilizes a combination of four losses for a comprehensive consideration in generating lip movements; it is trained in an end-to-end fashion and is robust to lip shapes, view angles and different facial characteristics. Thoughtful experiments on three datasets ranging from lab-recorded to lips in-the-wild show that our model significantly outperforms other state-of-the-art methods extended to this task. |
Tasks | |
Published | 2018-03-28 |
URL | http://arxiv.org/abs/1803.10404v3 |
http://arxiv.org/pdf/1803.10404v3.pdf | |
PWC | https://paperswithcode.com/paper/lip-movements-generation-at-a-glance |
Repo | https://github.com/lelechen63/3d_gan |
Framework | pytorch |
Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality
Title | Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality |
Authors | Sajad Saeedi, Bruno Bodin, Harry Wagstaff, Andy Nisbet, Luigi Nardi, John Mawer, Nicolas Melot, Oscar Palomar, Emanuele Vespa, Tom Spink, Cosmin Gorgovan, Andrew Webb, James Clarkson, Erik Tomusk, Thomas Debrunner, Kuba Kaszyk, Pablo Gonzalez-de-Aledo, Andrey Rodchenko, Graham Riley, Christos Kotselidis, Björn Franke, Michael F. P. O’Boyle, Andrew J. Davison, Paul H. J. Kelly, Mikel Luján, Steve Furber |
Abstract | Visual understanding of 3D environments in real-time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context. |
Tasks | Autonomous Vehicles |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06352v1 |
http://arxiv.org/pdf/1808.06352v1.pdf | |
PWC | https://paperswithcode.com/paper/navigating-the-landscape-for-real-time |
Repo | https://github.com/xiexiexiaoxiexie/Udacity-self-driving-car-engineer-P6-Kidnapped-Vehicle |
Framework | none |
Resource Aware Person Re-identification across Multiple Resolutions
Title | Resource Aware Person Re-identification across Multiple Resolutions |
Authors | Yan Wang, Lequn Wang, Yurong You, Xu Zou, Vincent Chen, Serena Li, Gao Huang, Bharath Hariharan, Kilian Q. Weinberger |
Abstract | Not all people are equally easy to identify: color statistics might be enough for some cases while others might require careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) methods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needlessly expensive for the easy ones. To remedy this, we present a new person re-ID model that combines effective embeddings built on multiple convolutional network layers, trained with deep-supervision. On traditional re-ID benchmarks, our method improves substantially over the previous state-of-the-art results on all five datasets that we evaluate on. We then propose two new formulations of the person re-ID problem under resource-constraints, and show how our model can be used to effectively trade off accuracy and computation in the presence of resource constraints. Code and pre-trained models are available at https://github.com/mileyan/DARENet. |
Tasks | Person Re-Identification |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08805v3 |
http://arxiv.org/pdf/1805.08805v3.pdf | |
PWC | https://paperswithcode.com/paper/resource-aware-person-re-identification |
Repo | https://github.com/mileyan/DARENet |
Framework | pytorch |
Spatial and Temporal Mutual Promotion for Video-based Person Re-identification
Title | Spatial and Temporal Mutual Promotion for Video-based Person Re-identification |
Authors | Yiheng Liu, Zhenxun Yuan, Wengang Zhou, Houqiang Li |
Abstract | Video-based person re-identification is a crucial task of matching video sequences of a person across multiple camera views. Generally, features directly extracted from a single frame suffer from occlusion, blur, illumination and posture changes. This leads to false activation or missing activation in some regions, which corrupts the appearance and motion representation. How to explore the abundant spatial-temporal information in video sequences is the key to solve this problem. To this end, we propose a Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses noisy parts of the current frame’s features by referring historical frames. With RRU, the quality of each frame’s appearance representation is improved. Then we use the Spatial-Temporal clues Integration Module (STIM) to mine the spatial-temporal information from those upgraded features. Meanwhile, the multi-level training objective is used to enhance the capability of RRU and STIM. Through the cooperation of those modules, the spatial and temporal features mutually promote each other and the final spatial-temporal feature representation is more discriminative and robust. Extensive experiments are conducted on three challenging datasets, i.e., iLIDS-VID, PRID-2011 and MARS. The experimental results demonstrate that our approach outperforms existing state-of-the-art methods of video-based person re-identification on iLIDS-VID and MARS and achieves favorable results on PRID-2011. |
Tasks | Person Re-Identification, Video-Based Person Re-Identification |
Published | 2018-12-26 |
URL | http://arxiv.org/abs/1812.10305v1 |
http://arxiv.org/pdf/1812.10305v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-and-temporal-mutual-promotion-for |
Repo | https://github.com/yolomax/rru-reid |
Framework | pytorch |
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
Title | Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning |
Authors | Xin Wang, Yuan-Fang Wang, William Yang Wang |
Abstract | A major challenge for video captioning is to combine audio and visual cues. Existing multi-modal fusion methods have shown encouraging results in video understanding. However, the temporal structures of multiple modalities at different granularities are rarely explored, and how to selectively fuse the multi-modal representations at different levels of details remains uncharted. In this paper, we propose a novel hierarchically aligned cross-modal attention (HACA) framework to learn and selectively fuse both global and local temporal dynamics of different modalities. Furthermore, for the first time, we validate the superior performance of the deep audio features on the video captioning task. Finally, our HACA model significantly outperforms the previous best systems and achieves new state-of-the-art results on the widely used MSR-VTT dataset. |
Tasks | Video Captioning, Video Understanding |
Published | 2018-04-15 |
URL | http://arxiv.org/abs/1804.05448v1 |
http://arxiv.org/pdf/1804.05448v1.pdf | |
PWC | https://paperswithcode.com/paper/watch-listen-and-describe-globally-and |
Repo | https://github.com/chitwansaharia/HACAModel |
Framework | pytorch |
Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain
Title | Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain |
Authors | Pablo A. Alvarado, Mauricio A. Álvarez, Dan Stowell |
Abstract | Gaussian process (GP) audio source separation is a time-domain approach that circumvents the inherent phase approximation issue of spectrogram based methods. Furthermore, through its kernel, GPs elegantly incorporate prior knowledge about the sources into the separation model. Despite these compelling advantages, the computational complexity of GP inference scales cubically with the number of audio samples. As a result, source separation GP models have been restricted to the analysis of short audio frames. We introduce an efficient application of GPs to time-domain audio source separation, without compromising performance. For this purpose, we used GP regression, together with spectral mixture kernels, and variational sparse GPs. We compared our method with LD-PSDTF (positive semi-definite tensor factorization), KL-NMF (Kullback-Leibler non-negative matrix factorization), and IS-NMF (Itakura-Saito NMF). Results show that the proposed method outperforms these techniques. |
Tasks | |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12679v3 |
http://arxiv.org/pdf/1810.12679v3.pdf | |
PWC | https://paperswithcode.com/paper/sparse-gaussian-process-audio-source |
Repo | https://github.com/PabloAlvarado/ssgp |
Framework | tf |
NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm
Title | NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm |
Authors | Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, Wolfgang Banzhaf |
Abstract | This paper introduces NSGA-Net – an evolutionary approach for neural architecture search (NAS). NSGA-Net is designed with three goals in mind: (1) a procedure considering multiple and conflicting objectives, (2) an efficient procedure balancing exploration and exploitation of the space of potential neural network architectures, and (3) a procedure finding a diverse set of trade-off network architectures achieved in a single run. NSGA-Net is a population-based search algorithm that explores a space of potential neural network architectures in three steps, namely, a population initialization step that is based on prior-knowledge from hand-crafted architectures, an exploration step comprising crossover and mutation of architectures, and finally an exploitation step that utilizes the hidden useful knowledge stored in the entire history of evaluated neural architectures in the form of a Bayesian Network. Experimental results suggest that combining the dual objectives of minimizing an error metric and computational complexity, as measured by FLOPs, allows NSGA-Net to find competitive neural architectures. Moreover, NSGA-Net achieves error rate on the CIFAR-10 dataset on par with other state-of-the-art NAS methods while using orders of magnitude less computational resources. These results are encouraging and shows the promise to further use of EC methods in various deep-learning paradigms. |
Tasks | Efficient Exploration, Neural Architecture Search, Object Classification |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03522v2 |
http://arxiv.org/pdf/1810.03522v2.pdf | |
PWC | https://paperswithcode.com/paper/nsga-net-a-multi-objective-genetic-algorithm |
Repo | https://github.com/ianwhale/nsga-net |
Framework | pytorch |
Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation
Title | Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation |
Authors | Edward Smith, Scott Fujimoto, David Meger |
Abstract | We consider the problem of scaling deep generative shape models to high-resolution. Drawing motivation from the canonical view representation of objects, we introduce a novel method for the fast up-sampling of 3D objects in voxel space through networks that perform super-resolution on the six orthographic depth projections. This allows us to generate high-resolution objects with more efficient scaling than methods which work directly in 3D. We decompose the problem of 2D depth super-resolution into silhouette and depth prediction to capture both structure and fine detail. This allows our method to generate sharp edges more easily than an individual network. We evaluate our work on multiple experiments concerning high-resolution 3D objects, and show our system is capable of accurately predicting novel objects at resolutions as large as 512$\mathbf{\times}$512$\mathbf{\times}$512 – the highest resolution reported for this task. We achieve state-of-the-art performance on 3D object reconstruction from RGB images on the ShapeNet dataset, and further demonstrate the first effective 3D super-resolution method. |
Tasks | 3D Object Reconstruction, 3D Object Super-Resolution, Depth Estimation, Object Reconstruction, Super-Resolution |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09987v3 |
http://arxiv.org/pdf/1802.09987v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-silhouette-and-depth-decomposition |
Repo | https://github.com/kingcheng2000/Multi-View-Silhouette-and-Depth-Decomposition-for-High-Resolution-3D-Object-Representation |
Framework | tf |
Sparse Label Smoothing Regularization for Person Re-Identification
Title | Sparse Label Smoothing Regularization for Person Re-Identification |
Authors | Jean-Paul Ainam, Ke Qin, Guisong Liu, Guangchun Luo |
Abstract | Person re-identification (re-id) is a cross-camera retrieval task which establishes a correspondence between images of a person from multiple cameras. Deep Learning methods have been successfully applied to this problem and have achieved impressive results. However, these methods require a large amount of labeled training data. Currently labeled datasets in person re-id are limited in their scale and manual acquisition of such large-scale datasets from surveillance cameras is a tedious and labor-intensive task. In this paper, we propose a framework that performs intelligent data augmentation and assigns partial smoothing label to generated data. Our approach first exploits the clustering property of existing person re-id datasets to create groups of similar objects that model cross-view variations. Each group is then used to generate realistic images through adversarial training. Our aim is to emphasize feature similarity between generated samples and the original samples. Finally, we assign a non-uniform label distribution to the generated samples and define a regularized loss function for training. The proposed approach tackles two problems (1) how to efficiently use the generated data and (2) how to address the over-smoothness problem found in current regularization methods. Extensive experiments on four larges cale datasets show that our regularization method significantly improves the Re-ID accuracy compared to existing methods. |
Tasks | Data Augmentation, Person Re-Identification, Semi-Supervised Person Re-Identification |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.04976v3 |
http://arxiv.org/pdf/1809.04976v3.pdf | |
PWC | https://paperswithcode.com/paper/sparse-label-smoothing-regularization-for |
Repo | https://github.com/jpainam/SLS_ReID |
Framework | pytorch |
General audio tagging with ensembling convolutional neural network and statistical features
Title | General audio tagging with ensembling convolutional neural network and statistical features |
Authors | Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang |
Abstract | Audio tagging aims to infer descriptive labels from audio clips. Audio tagging is challenging due to the limited size of data and noisy labels. In this paper, we describe our solution for the DCASE 2018 Task 2 general audio tagging challenge. The contributions of our solution include: We investigated a variety of convolutional neural network architectures to solve the audio tagging task. Statistical features are applied to capture statistical patterns of audio features to improve the classification performance. Ensemble learning is applied to ensemble the outputs from the deep classifiers to utilize complementary information. a sample re-weight strategy is employed for ensemble training to address the noisy label problem. Our system achieves a mean average precision (mAP@3) of 0.958, outperforming the baseline system of 0.704. Our system ranked the 1st and 4th out of 558 submissions in the public and private leaderboard of DCASE 2018 Task 2 challenge. Our codes are available at https://github.com/Cocoxili/DCASE2018Task2/. |
Tasks | Audio Tagging |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12832v1 |
http://arxiv.org/pdf/1810.12832v1.pdf | |
PWC | https://paperswithcode.com/paper/general-audio-tagging-with-ensembling |
Repo | https://github.com/Cocoxili/DCASE2018Task2 |
Framework | pytorch |
Neural Collective Entity Linking
Title | Neural Collective Entity Linking |
Authors | Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu |
Abstract | Entity Linking aims to link entity mentions in texts to knowledge bases, and neural models have achieved recent success in this task. However, most existing methods rely on local contexts to resolve entities independently, which may usually fail due to the data sparsity of local information. To address this issue, we propose a novel neural model for collective entity linking, named as NCEL. NCEL applies Graph Convolutional Network to integrate both local contextual features and global coherence information for entity linking. To improve the computation efficiency, we approximately perform graph convolution on a subgraph of adjacent entity mentions instead of those in the entire text. We further introduce an attention scheme to improve the robustness of NCEL to data noise and train the model on Wikipedia hyperlinks to avoid overfitting and domain bias. In experiments, we evaluate NCEL on five publicly available datasets to verify the linking performance as well as generalization ability. We also conduct an extensive analysis of time complexity, the impact of key modules, and qualitative results, which demonstrate the effectiveness and efficiency of our proposed method. |
Tasks | Entity Linking |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08603v1 |
http://arxiv.org/pdf/1811.08603v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-collective-entity-linking |
Repo | https://github.com/TaoMiner/NCEL |
Framework | pytorch |
End-to-End Neural Entity Linking
Title | End-to-End Neural Entity Linking |
Authors | Nikolaos Kolitsas, Octavian-Eugen Ganea, Thomas Hofmann |
Abstract | Entity Linking (EL) is an essential task for semantic text understanding and information extraction. Popular methods separately address the Mention Detection (MD) and Entity Disambiguation (ED) stages of EL, without leveraging their mutual dependency. We here propose the first neural end-to-end EL system that jointly discovers and links entities in a text document. The main idea is to consider all possible spans as potential mentions and learn contextual similarity scores over their entity candidates that are useful for both MD and ED decisions. Key components are context-aware mention embeddings, entity embeddings and a probabilistic mention - entity map, without demanding other engineered features. Empirically, we show that our end-to-end method significantly outperforms popular systems on the Gerbil platform when enough training data is available. Conversely, if testing datasets follow different annotation conventions compared to the training set (e.g. queries/ tweets vs news documents), our ED model coupled with a traditional NER system offers the best or second best EL accuracy. |
Tasks | Entity Disambiguation, Entity Embeddings, Entity Linking |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07699v2 |
http://arxiv.org/pdf/1808.07699v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-neural-entity-linking |
Repo | https://github.com/dalab/end2end_neural_el |
Framework | none |
The SWAG Algorithm; a Mathematical Approach that Outperforms Traditional Deep Learning. Theory and Implementation
Title | The SWAG Algorithm; a Mathematical Approach that Outperforms Traditional Deep Learning. Theory and Implementation |
Authors | Saeid Safaei, Vahid Safaei, Solmazi Safaei, Zerotti Woods, Hamid R. Arabnia, Juan B. Gutierrez |
Abstract | The performance of artificial neural networks (ANNs) is influenced by weight initialization, the nature of activation functions, and their architecture. There is a wide range of activation functions that are traditionally used to train a neural network, e.g. sigmoid, tanh, and Rectified Linear Unit (ReLU). A widespread practice is to use the same type of activation function in all neurons in a given layer. In this manuscript, we present a type of neural network in which the activation functions in every layer form a polynomial basis; we name this method SWAG after the initials of the last names of the authors. We tested SWAG on three complex highly non-linear functions as well as the MNIST handwriting data set. SWAG outperforms and converges faster than the state of the art performance in fully connected neural networks. Given the low computational complexity of SWAG, and the fact that it was capable of solving problems current architectures cannot, it has the potential to change the way that we approach deep learning. |
Tasks | |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11813v1 |
http://arxiv.org/pdf/1811.11813v1.pdf | |
PWC | https://paperswithcode.com/paper/the-swag-algorithm-a-mathematical-approach |
Repo | https://github.com/DeepLearningSaeid/New-Type-of-Deep-Learning |
Framework | none |