Paper Group ANR 600
Using Deep Object Features for Image Descriptions. CHD:Consecutive Horizontal Dropout for Human Gait Feature Extraction. Exploration by Optimisation in Partial Monitoring. Learning 3D Human Body Embedding. Measuring Domain Portability and ErrorPropagation in Biomedical QA. Novel Applications of Factored Neural Machine Translation. Multimodal Classi …
Using Deep Object Features for Image Descriptions
Title | Using Deep Object Features for Image Descriptions |
Authors | Ashutosh Mishra, Marcus Liwicki |
Abstract | Inspired by recent advances in leveraging multiple modalities in machine translation, we introduce an encoder-decoder pipeline that uses (1) specific objects within an image and their object labels, (2) a language model for decoding joint embedding of object features and the object labels. Our pipeline merges prior detected objects from the image and their object labels and then learns the sequences of captions describing the particular image. The decoder model learns to extract descriptions for the image from scratch by decoding the joint representation of the object visual features and their object classes conditioned by the encoder component. The idea of the model is to concentrate only on the specific objects of the image and their labels for generating descriptions of the image rather than visual feature of the entire image. The model needs to be calibrated more by adjusting the parameters and settings to result in better accuracy and performance. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09969v1 |
http://arxiv.org/pdf/1902.09969v1.pdf | |
PWC | https://paperswithcode.com/paper/using-deep-object-features-for-image |
Repo | |
Framework | |
CHD:Consecutive Horizontal Dropout for Human Gait Feature Extraction
Title | CHD:Consecutive Horizontal Dropout for Human Gait Feature Extraction |
Authors | Chengtao Cai, Yueyuan Zhou, Yanming Wang |
Abstract | Despite gait recognition and person re-identification researches have made a lot of progress, the accuracy of identification is not high enough in some specific situations, for example, people carrying bags or changing coats. In order to alleviate above situations, we propose a simple but effective Consecutive Horizontal Dropout (CHD) method apply on human feature extraction in deep learning network to avoid overfitting. Within the CHD, we intensify the robust of deep learning network for cross-view gait recognition and person re-identification. The experiments illustrate that the rank-1 accuracy on cross-view gait recognition task has been increased about 10% from 68.0% to 78.201% and 8% from 83.545% to 91.364% in person re-identification task in wearing coat or jacket condition. In addition, 100% accuracy of NM condition was first obtained with CHD. On the benchmarks of CASIA-B, above accuracies are state-of-the-arts. |
Tasks | Gait Recognition, Person Re-Identification |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05039v2 |
https://arxiv.org/pdf/1910.05039v2.pdf | |
PWC | https://paperswithcode.com/paper/chdconsecutive-horizontal-dropout-for-human |
Repo | |
Framework | |
Exploration by Optimisation in Partial Monitoring
Title | Exploration by Optimisation in Partial Monitoring |
Authors | Tor Lattimore, Csaba Szepesvari |
Abstract | We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. |
Tasks | |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05772v3 |
https://arxiv.org/pdf/1907.05772v3.pdf | |
PWC | https://paperswithcode.com/paper/exploration-by-optimisation-in-partial |
Repo | |
Framework | |
Learning 3D Human Body Embedding
Title | Learning 3D Human Body Embedding |
Authors | Boyi Jiang, Juyong Zhang, Jianfei Cai, Jianmin Zheng |
Abstract | Although human body shapes vary for different identities with different poses, they can be embedded into a low-dimensional space due to their similarity in structure. Inspired by the recent work on latent representation learning with a deformation-based mesh representation, we propose an autoencoder like network architecture to learn disentangled shape and pose embedding specifically for 3D human body. We also integrate a coarse-to-fine reconstruction pipeline into the disentangling process to improve the reconstruction accuracy. Moreover, we construct a large dataset of human body models with consistent topology for the learning of neural network. Our learned embedding can achieve not only superior reconstruction accuracy but also provide great flexibilities in 3D human body creations via interpolation, bilateral interpolation and latent space sampling, which is confirmed by extensive experiments. The constructed dataset and trained model will be made publicly available. |
Tasks | Representation Learning |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05622v1 |
https://arxiv.org/pdf/1905.05622v1.pdf | |
PWC | https://paperswithcode.com/paper/190505622 |
Repo | |
Framework | |
Measuring Domain Portability and ErrorPropagation in Biomedical QA
Title | Measuring Domain Portability and ErrorPropagation in Biomedical QA |
Authors | Stefan Hosein, Daniel Andor, Ryan McDonald |
Abstract | In this work we present Google’s submission to the BioASQ 7 biomedical question answering (QA) task (specifically Task 7b, Phase B). The core of our systems are based on BERT QA models, specifically the model of \cite{alberti2019bert}. In this report, and via our submissions, we aimed to investigate two research questions. We start by studying how domain portable are QA systems that have been pre-trained and fine-tuned on general texts, e.g., Wikipedia. We measure this via two submissions. The first is a non-adapted model that uses a public pre-trained BERT model and is fine-tuned on the Natural Questions data set \cite{kwiatkowski2019natural}. The second system takes this non-adapted model and fine-tunes it with the BioASQ training data. Next, we study the impact of error propagation in end-to-end retrieval and QA systems. Again we test this via two submissions. The first uses human annotated relevant documents and snippets as input to the model and the second predicted documents and snippets. Our main findings are that domain specific fine-tuning can benefit Biomedical QA. However, the biggest quality bottleneck is at the retrieval stage, where we see large drops in metrics – over 10pts absolute – when using non gold inputs to the QA model. |
Tasks | Question Answering |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.09704v2 |
https://arxiv.org/pdf/1909.09704v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-domain-portability-and |
Repo | |
Framework | |
Novel Applications of Factored Neural Machine Translation
Title | Novel Applications of Factored Neural Machine Translation |
Authors | Patrick Wilken, Evgeny Matusov |
Abstract | In this work, we explore the usefulness of target factors in neural machine translation (NMT) beyond their original purpose of predicting word lemmas and their inflections, as proposed by Garc`ia-Mart`inez et al., 2016. For this, we introduce three novel applications of the factored output architecture: In the first one, we use a factor to explicitly predict the word case separately from the target word itself. This allows for information to be shared between different casing variants of a word. In a second task, we use a factor to predict when two consecutive subwords have to be joined, eliminating the need for target subword joining markers. The third task is the prediction of special tokens of the operation sequence NMT model (OSNMT) of Stahlberg et al., 2018. Automatic evaluation on English-to-German and English-to-Turkish tasks showed that integration of such auxiliary prediction tasks into NMT is at least as good as the standard NMT approach. For the OSNMT, we observed a significant improvement in BLEU over the baseline OSNMT implementation due to a reduced output sequence length that resulted from the introduction of the target factors. |
Tasks | Machine Translation |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.03912v1 |
https://arxiv.org/pdf/1910.03912v1.pdf | |
PWC | https://paperswithcode.com/paper/novel-applications-of-factored-neural-machine |
Repo | |
Framework | |
Multimodal Classification of Urban Micro-Events
Title | Multimodal Classification of Urban Micro-Events |
Authors | Maarten Sukel, Stevan Rudinac, Marcel Worring |
Abstract | In this paper we seek methods to effectively detect urban micro-events. Urban micro-events are events which occur in cities, have limited geographical coverage and typically affect only a small group of citizens. Because of their scale these are difficult to identify in most data sources. However, by using citizen sensing to gather data, detecting them becomes feasible. The data gathered by citizen sensing is often multimodal and, as a consequence, the information required to detect urban micro-events is distributed over multiple modalities. This makes it essential to have a classifier capable of combining them. In this paper we explore several methods of creating such a classifier, including early, late, hybrid fusion and representation learning using multimodal graphs. We evaluate performance on a real world dataset obtained from a live citizen reporting system. We show that a multimodal approach yields higher performance than unimodal alternatives. Furthermore, we demonstrate that our hybrid combination of early and late fusion with multimodal embeddings performs best in classification of urban micro-events. |
Tasks | Representation Learning |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1904.13349v1 |
http://arxiv.org/pdf/1904.13349v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-classification-of-urban-micro |
Repo | |
Framework | |
Learning to Select Knowledge for Response Generation in Dialog Systems
Title | Learning to Select Knowledge for Response Generation in Dialog Systems |
Authors | Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, Hua Wu |
Abstract | End-to-end neural models for intelligent dialogue systems suffer from the problem of generating uninformative responses. Various methods were proposed to generate more informative responses by leveraging external knowledge. However, few previous work has focused on selecting appropriate knowledge in the learning process. The inappropriate selection of knowledge could prohibit the model from learning to make full use of the knowledge. Motivated by this, we propose an end-to-end neural model which employs a novel knowledge selection mechanism where both prior and posterior distributions over knowledge are used to facilitate knowledge selection. Specifically, a posterior distribution over knowledge is inferred from both utterances and responses, and it ensures the appropriate selection of knowledge during the training process. Meanwhile, a prior distribution, which is inferred from utterances only, is used to approximate the posterior distribution so that appropriate knowledge can be selected even without responses during the inference process. Compared with the previous work, our model can better incorporate appropriate knowledge in response generation. Experiments on both automatic and human evaluation verify the superiority of our model over previous baselines. |
Tasks | |
Published | 2019-02-13 |
URL | https://arxiv.org/abs/1902.04911v2 |
https://arxiv.org/pdf/1902.04911v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-select-knowledge-for-response |
Repo | |
Framework | |
An Efficient and Effective Second-Order Training Algorithm For LSTM-based Adaptive Learning
Title | An Efficient and Effective Second-Order Training Algorithm For LSTM-based Adaptive Learning |
Authors | N. Mert Vural, Salih Ergut, Suleyman S. Kozat |
Abstract | We study the problem of adaptive (or online) nonlinear regression with long short term memory (LSTM) based networks, i.e., LSTM-based adaptive learning. For the LSTM-based adaptive learning, we introduce a highly efficient and effective extended Kalman filter (EKF) based training algorithm. Our algorithm is truly online, i.e., it does not make any assumption on the underlying data generating process and future information such as the data length or data change statistics. Through an extensive set of simulations, we demonstrate significant performance improvements achieved by our algorithm with respect to the widely used LSTM training methods in the adaptive learning and machine learning literatures. We particularly show that our algorithm provides very similar error performance with the EKF learning algorithm in 9 to 38 times shorter training time depending on the parameter size of the network. |
Tasks | |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09857v3 |
https://arxiv.org/pdf/1910.09857v3.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-ekf-based-algorithm-for-lstm |
Repo | |
Framework | |
MRS-VPR: a multi-resolution sampling based global visual place recognition method
Title | MRS-VPR: a multi-resolution sampling based global visual place recognition method |
Authors | Peng Yin, Rangaprasad Arun Srivatsan, Yin Chen, Xueqian Li, Hongda Zhang, Lingyun Xu, Lu Li, Zhenzhong Jia, Jianmin Ji, Yuqing He |
Abstract | Place recognition and loop closure detection are challenging for long-term visual navigation tasks. SeqSLAM is considered to be one of the most successful approaches to achieving long-term localization under varying environmental conditions and changing viewpoints. It depends on a brute-force, time-consuming sequential matching method. We propose MRS-VPR, a multi-resolution, sampling-based place recognition method, which can significantly improve the matching efficiency and accuracy in sequential matching. The novelty of this method lies in the coarse-to-fine searching pipeline and a particle filter-based global sampling scheme, that can balance the matching efficiency and accuracy in the long-term navigation task. Moreover, our model works much better than SeqSLAM when the testing sequence has a much smaller scale than the reference sequence. Our experiments demonstrate that the proposed method is efficient in locating short temporary trajectories within long-term reference ones without losing accuracy compared to SeqSLAM. |
Tasks | Loop Closure Detection, Visual Navigation, Visual Place Recognition |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.10059v1 |
http://arxiv.org/pdf/1902.10059v1.pdf | |
PWC | https://paperswithcode.com/paper/mrs-vpr-a-multi-resolution-sampling-based |
Repo | |
Framework | |
Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching
Title | Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching |
Authors | Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin |
Abstract | A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs). Hitherto algorithms excel in pairwise domains while as $m$ increases, remain struggling to scale themselves to fit a joint distribution. In this paper, we propose a domain-scalable DGM, i.e., MMI-ALI for $m$-domain joint distribution matching. As an $m$-domain ensemble model of ALIs \cite{dumoulin2016adversarially}, MMI-ALI is adversarially trained with maximizing Multivariate Mutual Information (MMI) w.r.t. joint variables of each pair of domains and their shared feature. The negative MMIs are upper bounded by a series of feasible losses that provably lead to matching $m$-domain joint distributions. MMI-ALI linearly scales as $m$ increases and thus, strikes a right balance between efficacy and scalability. We evaluate MMI-ALI in diverse challenging $m$-domain scenarios and verify its superiority. |
Tasks | |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03426v1 |
https://arxiv.org/pdf/1907.03426v1.pdf | |
PWC | https://paperswithcode.com/paper/multivariate-information-adversarial-ensemble |
Repo | |
Framework | |
Improved Analysis of Spectral Algorithm for Clustering
Title | Improved Analysis of Spectral Algorithm for Clustering |
Authors | Tomohiko Mizutani |
Abstract | Spectral algorithms are graph partitioning algorithms that partition a node set of a graph into groups by using a spectral embedding map. Clustering techniques based on the algorithms are referred to as spectral clustering and are widely used in data analysis. To gain a better understanding of why spectral clustering is successful, Peng et al. (2015) and Kolev and Mehlhorn (2016) studied the behavior of a certain type of spectral algorithm for a class of graphs, called well-clustered graphs. Specifically, they put an assumption on graphs and showed the performance guarantee of the spectral algorithm under it. The algorithm they studied used the spectral embedding map developed by Shi and Malic (2000). In this paper, we improve on their results, giving a better performance guarantee under a weaker assumption. We also evaluate the performance of the spectral algorithm with the spectral embedding map developed by Ng et al. (2002). |
Tasks | graph partitioning |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.02997v1 |
https://arxiv.org/pdf/1912.02997v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-analysis-of-spectral-algorithm-for |
Repo | |
Framework | |
End-to-End Open-Domain Question Answering with BERTserini
Title | End-to-End Open-Domain Question Answering with BERTserini |
Authors | Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin |
Abstract | We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans. |
Tasks | Information Retrieval, Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01718v2 |
https://arxiv.org/pdf/1902.01718v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-open-domain-question-answering |
Repo | |
Framework | |
Dual-attention Focused Module for Weakly Supervised Object Localization
Title | Dual-attention Focused Module for Weakly Supervised Object Localization |
Authors | Yukun Zhou, Zailiang Chen, Hailan Shen, Qing Liu, Rongchang Zhao, Yixiong Liang |
Abstract | The research on recognizing the most discriminative regions provides referential information for weakly supervised object localization with only image-level annotations. However, the most discriminative regions usually conceal the other parts of the object, thereby impeding entire object recognition and localization. To tackle this problem, the Dual-attention Focused Module (DFM) is proposed to enhance object localization performance. Specifically, we present a dual attention module for information fusion, consisting of a position branch and a channel one. In each branch, the input feature map is deduced into an enhancement map and a mask map, thereby highlighting the most discriminative parts or hiding them. For the position mask map, we introduce a focused matrix to enhance it, which utilizes the principle that the pixels of an object are continuous. Between these two branches, the enhancement map is integrated with the mask map, aiming at partially compensating the lost information and diversifies the features. With the dual-attention module and focused matrix, the entire object region could be precisely recognized with implicit information. We demonstrate outperforming results of DFM in experiments. In particular, DFM achieves state-of-the-art performance in localization accuracy in ILSVRC 2016 and CUB-200-2011. |
Tasks | Object Localization, Object Recognition, Weakly-Supervised Object Localization |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04813v1 |
https://arxiv.org/pdf/1909.04813v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-attention-focused-module-for-weakly |
Repo | |
Framework | |
The Wiki Music dataset: A tool for computational analysis of popular music
Title | The Wiki Music dataset: A tool for computational analysis of popular music |
Authors | Fabio Celli |
Abstract | Is it possible use algorithms to find trends in the history of popular music? And is it possible to predict the characteristics of future music genres? In order to answer these questions, we produced a hand-crafted dataset with the intent to put together features about style, psychology, sociology and typology, annotated by music genre and indexed by time and decade. We collected a list of popular genres by decade from Wikipedia and scored music genres based on Wikipedia descriptions. Using statistical and machine learning techniques, we find trends in the musical preferences and use time series forecasting to evaluate the prediction of future music genres. |
Tasks | Time Series, Time Series Forecasting |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10275v1 |
https://arxiv.org/pdf/1908.10275v1.pdf | |
PWC | https://paperswithcode.com/paper/the-wiki-music-dataset-a-tool-for |
Repo | |
Framework | |