January 29, 2020

2810 words 14 mins read

Paper Group ANR 600

Using Deep Object Features for Image Descriptions. CHD:Consecutive Horizontal Dropout for Human Gait Feature Extraction. Exploration by Optimisation in Partial Monitoring. Learning 3D Human Body Embedding. Measuring Domain Portability and ErrorPropagation in Biomedical QA. Novel Applications of Factored Neural Machine Translation. Multimodal Classi …

Using Deep Object Features for Image Descriptions


Title	Using Deep Object Features for Image Descriptions
Authors	Ashutosh Mishra, Marcus Liwicki
Abstract	Inspired by recent advances in leveraging multiple modalities in machine translation, we introduce an encoder-decoder pipeline that uses (1) specific objects within an image and their object labels, (2) a language model for decoding joint embedding of object features and the object labels. Our pipeline merges prior detected objects from the image and their object labels and then learns the sequences of captions describing the particular image. The decoder model learns to extract descriptions for the image from scratch by decoding the joint representation of the object visual features and their object classes conditioned by the encoder component. The idea of the model is to concentrate only on the specific objects of the image and their labels for generating descriptions of the image rather than visual feature of the entire image. The model needs to be calibrated more by adjusting the parameters and settings to result in better accuracy and performance.
Tasks	Language Modelling, Machine Translation
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09969v1
PDF	http://arxiv.org/pdf/1902.09969v1.pdf
PWC	https://paperswithcode.com/paper/using-deep-object-features-for-image
Repo
Framework

CHD:Consecutive Horizontal Dropout for Human Gait Feature Extraction


Title	CHD:Consecutive Horizontal Dropout for Human Gait Feature Extraction
Authors	Chengtao Cai, Yueyuan Zhou, Yanming Wang
Abstract	Despite gait recognition and person re-identification researches have made a lot of progress, the accuracy of identification is not high enough in some specific situations, for example, people carrying bags or changing coats. In order to alleviate above situations, we propose a simple but effective Consecutive Horizontal Dropout (CHD) method apply on human feature extraction in deep learning network to avoid overfitting. Within the CHD, we intensify the robust of deep learning network for cross-view gait recognition and person re-identification. The experiments illustrate that the rank-1 accuracy on cross-view gait recognition task has been increased about 10% from 68.0% to 78.201% and 8% from 83.545% to 91.364% in person re-identification task in wearing coat or jacket condition. In addition, 100% accuracy of NM condition was first obtained with CHD. On the benchmarks of CASIA-B, above accuracies are state-of-the-arts.
Tasks	Gait Recognition, Person Re-Identification
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05039v2
PDF	https://arxiv.org/pdf/1910.05039v2.pdf
PWC	https://paperswithcode.com/paper/chdconsecutive-horizontal-dropout-for-human
Repo
Framework

Exploration by Optimisation in Partial Monitoring


Title	Exploration by Optimisation in Partial Monitoring
Authors	Tor Lattimore, Csaba Szepesvari
Abstract	We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games.
Tasks
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05772v3
PDF	https://arxiv.org/pdf/1907.05772v3.pdf
PWC	https://paperswithcode.com/paper/exploration-by-optimisation-in-partial
Repo
Framework

Learning 3D Human Body Embedding


Title	Learning 3D Human Body Embedding
Authors	Boyi Jiang, Juyong Zhang, Jianfei Cai, Jianmin Zheng
Abstract	Although human body shapes vary for different identities with different poses, they can be embedded into a low-dimensional space due to their similarity in structure. Inspired by the recent work on latent representation learning with a deformation-based mesh representation, we propose an autoencoder like network architecture to learn disentangled shape and pose embedding specifically for 3D human body. We also integrate a coarse-to-fine reconstruction pipeline into the disentangling process to improve the reconstruction accuracy. Moreover, we construct a large dataset of human body models with consistent topology for the learning of neural network. Our learned embedding can achieve not only superior reconstruction accuracy but also provide great flexibilities in 3D human body creations via interpolation, bilateral interpolation and latent space sampling, which is confirmed by extensive experiments. The constructed dataset and trained model will be made publicly available.
Tasks	Representation Learning
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05622v1
PDF	https://arxiv.org/pdf/1905.05622v1.pdf
PWC	https://paperswithcode.com/paper/190505622
Repo
Framework

Measuring Domain Portability and ErrorPropagation in Biomedical QA


Title	Measuring Domain Portability and ErrorPropagation in Biomedical QA
Authors	Stefan Hosein, Daniel Andor, Ryan McDonald
Abstract	In this work we present Google’s submission to the BioASQ 7 biomedical question answering (QA) task (specifically Task 7b, Phase B). The core of our systems are based on BERT QA models, specifically the model of \cite{alberti2019bert}. In this report, and via our submissions, we aimed to investigate two research questions. We start by studying how domain portable are QA systems that have been pre-trained and fine-tuned on general texts, e.g., Wikipedia. We measure this via two submissions. The first is a non-adapted model that uses a public pre-trained BERT model and is fine-tuned on the Natural Questions data set \cite{kwiatkowski2019natural}. The second system takes this non-adapted model and fine-tunes it with the BioASQ training data. Next, we study the impact of error propagation in end-to-end retrieval and QA systems. Again we test this via two submissions. The first uses human annotated relevant documents and snippets as input to the model and the second predicted documents and snippets. Our main findings are that domain specific fine-tuning can benefit Biomedical QA. However, the biggest quality bottleneck is at the retrieval stage, where we see large drops in metrics – over 10pts absolute – when using non gold inputs to the QA model.
Tasks	Question Answering
Published	2019-09-12
URL	https://arxiv.org/abs/1909.09704v2
PDF	https://arxiv.org/pdf/1909.09704v2.pdf
PWC	https://paperswithcode.com/paper/measuring-domain-portability-and
Repo
Framework

Novel Applications of Factored Neural Machine Translation


Title	Novel Applications of Factored Neural Machine Translation
Authors	Patrick Wilken, Evgeny Matusov
Abstract	In this work, we explore the usefulness of target factors in neural machine translation (NMT) beyond their original purpose of predicting word lemmas and their inflections, as proposed by Garc`ia-Mart`inez et al., 2016. For this, we introduce three novel applications of the factored output architecture: In the first one, we use a factor to explicitly predict the word case separately from the target word itself. This allows for information to be shared between different casing variants of a word. In a second task, we use a factor to predict when two consecutive subwords have to be joined, eliminating the need for target subword joining markers. The third task is the prediction of special tokens of the operation sequence NMT model (OSNMT) of Stahlberg et al., 2018. Automatic evaluation on English-to-German and English-to-Turkish tasks showed that integration of such auxiliary prediction tasks into NMT is at least as good as the standard NMT approach. For the OSNMT, we observed a significant improvement in BLEU over the baseline OSNMT implementation due to a reduced output sequence length that resulted from the introduction of the target factors.
Tasks	Machine Translation
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03912v1
PDF	https://arxiv.org/pdf/1910.03912v1.pdf
PWC	https://paperswithcode.com/paper/novel-applications-of-factored-neural-machine
Repo
Framework

Multimodal Classification of Urban Micro-Events


Title	Multimodal Classification of Urban Micro-Events
Authors	Maarten Sukel, Stevan Rudinac, Marcel Worring
Abstract	In this paper we seek methods to effectively detect urban micro-events. Urban micro-events are events which occur in cities, have limited geographical coverage and typically affect only a small group of citizens. Because of their scale these are difficult to identify in most data sources. However, by using citizen sensing to gather data, detecting them becomes feasible. The data gathered by citizen sensing is often multimodal and, as a consequence, the information required to detect urban micro-events is distributed over multiple modalities. This makes it essential to have a classifier capable of combining them. In this paper we explore several methods of creating such a classifier, including early, late, hybrid fusion and representation learning using multimodal graphs. We evaluate performance on a real world dataset obtained from a live citizen reporting system. We show that a multimodal approach yields higher performance than unimodal alternatives. Furthermore, we demonstrate that our hybrid combination of early and late fusion with multimodal embeddings performs best in classification of urban micro-events.
Tasks	Representation Learning
Published	2019-04-30
URL	http://arxiv.org/abs/1904.13349v1
PDF	http://arxiv.org/pdf/1904.13349v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-classification-of-urban-micro
Repo
Framework

Learning to Select Knowledge for Response Generation in Dialog Systems


Title	Learning to Select Knowledge for Response Generation in Dialog Systems
Authors	Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, Hua Wu
Abstract	End-to-end neural models for intelligent dialogue systems suffer from the problem of generating uninformative responses. Various methods were proposed to generate more informative responses by leveraging external knowledge. However, few previous work has focused on selecting appropriate knowledge in the learning process. The inappropriate selection of knowledge could prohibit the model from learning to make full use of the knowledge. Motivated by this, we propose an end-to-end neural model which employs a novel knowledge selection mechanism where both prior and posterior distributions over knowledge are used to facilitate knowledge selection. Specifically, a posterior distribution over knowledge is inferred from both utterances and responses, and it ensures the appropriate selection of knowledge during the training process. Meanwhile, a prior distribution, which is inferred from utterances only, is used to approximate the posterior distribution so that appropriate knowledge can be selected even without responses during the inference process. Compared with the previous work, our model can better incorporate appropriate knowledge in response generation. Experiments on both automatic and human evaluation verify the superiority of our model over previous baselines.
Tasks
Published	2019-02-13
URL	https://arxiv.org/abs/1902.04911v2
PDF	https://arxiv.org/pdf/1902.04911v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-select-knowledge-for-response
Repo
Framework

An Efficient and Effective Second-Order Training Algorithm For LSTM-based Adaptive Learning


Title	An Efficient and Effective Second-Order Training Algorithm For LSTM-based Adaptive Learning
Authors	N. Mert Vural, Salih Ergut, Suleyman S. Kozat
Abstract	We study the problem of adaptive (or online) nonlinear regression with long short term memory (LSTM) based networks, i.e., LSTM-based adaptive learning. For the LSTM-based adaptive learning, we introduce a highly efficient and effective extended Kalman filter (EKF) based training algorithm. Our algorithm is truly online, i.e., it does not make any assumption on the underlying data generating process and future information such as the data length or data change statistics. Through an extensive set of simulations, we demonstrate significant performance improvements achieved by our algorithm with respect to the widely used LSTM training methods in the adaptive learning and machine learning literatures. We particularly show that our algorithm provides very similar error performance with the EKF learning algorithm in 9 to 38 times shorter training time depending on the parameter size of the network.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09857v3
PDF	https://arxiv.org/pdf/1910.09857v3.pdf
PWC	https://paperswithcode.com/paper/an-efficient-ekf-based-algorithm-for-lstm
Repo
Framework

MRS-VPR: a multi-resolution sampling based global visual place recognition method


Title	MRS-VPR: a multi-resolution sampling based global visual place recognition method
Authors	Peng Yin, Rangaprasad Arun Srivatsan, Yin Chen, Xueqian Li, Hongda Zhang, Lingyun Xu, Lu Li, Zhenzhong Jia, Jianmin Ji, Yuqing He
Abstract	Place recognition and loop closure detection are challenging for long-term visual navigation tasks. SeqSLAM is considered to be one of the most successful approaches to achieving long-term localization under varying environmental conditions and changing viewpoints. It depends on a brute-force, time-consuming sequential matching method. We propose MRS-VPR, a multi-resolution, sampling-based place recognition method, which can significantly improve the matching efficiency and accuracy in sequential matching. The novelty of this method lies in the coarse-to-fine searching pipeline and a particle filter-based global sampling scheme, that can balance the matching efficiency and accuracy in the long-term navigation task. Moreover, our model works much better than SeqSLAM when the testing sequence has a much smaller scale than the reference sequence. Our experiments demonstrate that the proposed method is efficient in locating short temporary trajectories within long-term reference ones without losing accuracy compared to SeqSLAM.
Tasks	Loop Closure Detection, Visual Navigation, Visual Place Recognition
Published	2019-02-26
URL	http://arxiv.org/abs/1902.10059v1
PDF	http://arxiv.org/pdf/1902.10059v1.pdf
PWC	https://paperswithcode.com/paper/mrs-vpr-a-multi-resolution-sampling-based
Repo
Framework

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching


Title	Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching
Authors	Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin
Abstract	A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs). Hitherto algorithms excel in pairwise domains while as $m$ increases, remain struggling to scale themselves to fit a joint distribution. In this paper, we propose a domain-scalable DGM, i.e., MMI-ALI for $m$-domain joint distribution matching. As an $m$-domain ensemble model of ALIs \cite{dumoulin2016adversarially}, MMI-ALI is adversarially trained with maximizing Multivariate Mutual Information (MMI) w.r.t. joint variables of each pair of domains and their shared feature. The negative MMIs are upper bounded by a series of feasible losses that provably lead to matching $m$-domain joint distributions. MMI-ALI linearly scales as $m$ increases and thus, strikes a right balance between efficacy and scalability. We evaluate MMI-ALI in diverse challenging $m$-domain scenarios and verify its superiority.
Tasks
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03426v1
PDF	https://arxiv.org/pdf/1907.03426v1.pdf
PWC	https://paperswithcode.com/paper/multivariate-information-adversarial-ensemble
Repo
Framework

Improved Analysis of Spectral Algorithm for Clustering


Title	Improved Analysis of Spectral Algorithm for Clustering
Authors	Tomohiko Mizutani
Abstract	Spectral algorithms are graph partitioning algorithms that partition a node set of a graph into groups by using a spectral embedding map. Clustering techniques based on the algorithms are referred to as spectral clustering and are widely used in data analysis. To gain a better understanding of why spectral clustering is successful, Peng et al. (2015) and Kolev and Mehlhorn (2016) studied the behavior of a certain type of spectral algorithm for a class of graphs, called well-clustered graphs. Specifically, they put an assumption on graphs and showed the performance guarantee of the spectral algorithm under it. The algorithm they studied used the spectral embedding map developed by Shi and Malic (2000). In this paper, we improve on their results, giving a better performance guarantee under a weaker assumption. We also evaluate the performance of the spectral algorithm with the spectral embedding map developed by Ng et al. (2002).
Tasks	graph partitioning
Published	2019-12-06
URL	https://arxiv.org/abs/1912.02997v1
PDF	https://arxiv.org/pdf/1912.02997v1.pdf
PWC	https://paperswithcode.com/paper/improved-analysis-of-spectral-algorithm-for
Repo
Framework

End-to-End Open-Domain Question Answering with BERTserini


Title	End-to-End Open-Domain Question Answering with BERTserini
Authors	Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin
Abstract	We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.
Tasks	Information Retrieval, Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01718v2
PDF	https://arxiv.org/pdf/1902.01718v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-open-domain-question-answering
Repo
Framework

Dual-attention Focused Module for Weakly Supervised Object Localization


Title	Dual-attention Focused Module for Weakly Supervised Object Localization
Authors	Yukun Zhou, Zailiang Chen, Hailan Shen, Qing Liu, Rongchang Zhao, Yixiong Liang
Abstract	The research on recognizing the most discriminative regions provides referential information for weakly supervised object localization with only image-level annotations. However, the most discriminative regions usually conceal the other parts of the object, thereby impeding entire object recognition and localization. To tackle this problem, the Dual-attention Focused Module (DFM) is proposed to enhance object localization performance. Specifically, we present a dual attention module for information fusion, consisting of a position branch and a channel one. In each branch, the input feature map is deduced into an enhancement map and a mask map, thereby highlighting the most discriminative parts or hiding them. For the position mask map, we introduce a focused matrix to enhance it, which utilizes the principle that the pixels of an object are continuous. Between these two branches, the enhancement map is integrated with the mask map, aiming at partially compensating the lost information and diversifies the features. With the dual-attention module and focused matrix, the entire object region could be precisely recognized with implicit information. We demonstrate outperforming results of DFM in experiments. In particular, DFM achieves state-of-the-art performance in localization accuracy in ILSVRC 2016 and CUB-200-2011.
Tasks	Object Localization, Object Recognition, Weakly-Supervised Object Localization
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04813v1
PDF	https://arxiv.org/pdf/1909.04813v1.pdf
PWC	https://paperswithcode.com/paper/dual-attention-focused-module-for-weakly
Repo
Framework

The Wiki Music dataset: A tool for computational analysis of popular music


Title	The Wiki Music dataset: A tool for computational analysis of popular music
Authors	Fabio Celli
Abstract	Is it possible use algorithms to find trends in the history of popular music? And is it possible to predict the characteristics of future music genres? In order to answer these questions, we produced a hand-crafted dataset with the intent to put together features about style, psychology, sociology and typology, annotated by music genre and indexed by time and decade. We collected a list of popular genres by decade from Wikipedia and scored music genres based on Wikipedia descriptions. Using statistical and machine learning techniques, we find trends in the musical preferences and use time series forecasting to evaluate the prediction of future music genres.
Tasks	Time Series, Time Series Forecasting
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10275v1
PDF	https://arxiv.org/pdf/1908.10275v1.pdf
PWC	https://paperswithcode.com/paper/the-wiki-music-dataset-a-tool-for
Repo
Framework