July 29, 2019

3129 words 15 mins read

Paper Group AWR 150

Predicting the Driver’s Focus of Attention: the DR(eye)VE Project. A Crowd-Annotated Spanish Corpus for Humor Analysis. Discriminative models for multi-instance problems with tree-structure. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. Convolutional Image Captioning. FoldingNet: Point Cloud Auto-encoder via Deep Grid Defor …

Predicting the Driver’s Focus of Attention: the DR(eye)VE Project


Title	Predicting the Driver’s Focus of Attention: the DR(eye)VE Project
Authors	Andrea Palazzi, Davide Abati, Simone Calderara, Francesco Solera, Rita Cucchiara
Abstract	In this work we aim to predict the driver’s focus of attention. The goal is to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. To this end we propose a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics. We also introduce DR(eye)VE, the largest dataset of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, matching ego-centric views (from glasses worn by drivers) and car-centric views (from roof-mounted camera), further enriched by other sensors measurements. Results highlight that several attention patterns are shared across drivers and can be reproduced to some extent. The indication of which elements in the scene are likely to capture the driver’s attention may benefit several applications in the context of human-vehicle interaction and driver attention analysis.
Tasks	Eye Tracking
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03854v3
PDF	http://arxiv.org/pdf/1705.03854v3.pdf
PWC	https://paperswithcode.com/paper/predicting-the-drivers-focus-of-attention-the
Repo	https://github.com/ndrplz/dreyeve
Framework	none

A Crowd-Annotated Spanish Corpus for Humor Analysis


Title	A Crowd-Annotated Spanish Corpus for Humor Analysis
Authors	Santiago Castro, Luis Chiruzzo, Aiala Rosá, Diego Garat, Guillermo Moncecchi
Abstract	Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The inter-annotator agreement Krippendorff’s alpha value is 0.5710. The dataset is available for general use and can serve as a basis for humor detection and as a first step to tackle subjectivity.
Tasks	Humor Detection
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00477v4
PDF	http://arxiv.org/pdf/1710.00477v4.pdf
PWC	https://paperswithcode.com/paper/a-crowd-annotated-spanish-corpus-for-humor
Repo	https://github.com/pln-fing-udelar/humor
Framework	none

Discriminative models for multi-instance problems with tree-structure


Title	Discriminative models for multi-instance problems with tree-structure
Authors	Tomas Pevny, Petr Somol
Abstract	Modeling network traffic is gaining importance in order to counter modern threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in model training phase. We propose a discriminative model that makes decisions based on all computer’s traffic observed during predefined time window (5 minutes in our case). The model is trained on collected traffic samples over equally sized time window per large number of computers, where the only labels needed are human verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training the model itself recognizes discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, while the learned traffic patterns can be interpreted as Indicators of Compromise. In the following we implement the discriminative model as a neural network with special structure reflecting two stacked multi-instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) which are typically visited by infected computers.
Tasks
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02868v1
PDF	http://arxiv.org/pdf/1703.02868v1.pdf
PWC	https://paperswithcode.com/paper/discriminative-models-for-multi-instance
Repo	https://github.com/pevnak/Mill.jl
Framework	none

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification


Title	Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Authors	Longhui Wei, Shiliang Zhang, Wen Gao, Qi Tian
Abstract	Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e.g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network. To facilitate the research towards conquering those issues, this paper contributes a new dataset called MSMT17 with many important features, e.g., 1) the raw videos are taken by an 15-camera network deployed in both indoor and outdoor scenes, 2) the videos cover a long period of time and present complex lighting variations, and 3) it contains currently the largest number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe that, domain gap commonly exists between datasets, which essentially causes severe performance drop when training and testing on different datasets. This results in that available training data cannot be effectively leveraged for new testing domains. To relieve the expensive costs of annotating new training samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to bridge the domain gap. Comprehensive experiments show that the domain gap could be substantially narrowed-down by the PTGAN.
Tasks	Person Re-Identification
Published	2017-11-23
URL	http://arxiv.org/abs/1711.08565v2
PDF	http://arxiv.org/pdf/1711.08565v2.pdf
PWC	https://paperswithcode.com/paper/person-transfer-gan-to-bridge-domain-gap-for
Repo	https://github.com/yxgeee/MMT
Framework	pytorch

Convolutional Image Captioning


Title	Convolutional Image Captioning
Authors	Jyoti Aneja, Aditya Deshpande, Alexander Schwing
Abstract	Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In recent years significant progress has been made in image captioning, using Recurrent Neural Networks powered by long-short-term-memory (LSTM) units. Despite mitigating the vanishing gradient problem, and despite their compelling ability to memorize dependencies, LSTM units are complex and inherently sequential across time. To address this issue, recent work has shown benefits of convolutional networks for machine translation and conditional image generation. Inspired by their success, in this paper, we develop a convolutional image captioning technique. We demonstrate its efficacy on the challenging MSCOCO dataset and demonstrate performance on par with the baseline, while having a faster training time per number of parameters. We also perform a detailed analysis, providing compelling reasons in favor of convolutional language generation approaches.
Tasks	Image Captioning, Text Generation
Published	2017-11-24
URL	http://arxiv.org/abs/1711.09151v1
PDF	http://arxiv.org/pdf/1711.09151v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-image-captioning
Repo	https://github.com/NaskyD/convnet
Framework	pytorch

FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation


Title	FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation
Authors	Yaoqing Yang, Chen Feng, Yiru Shen, Dong Tian
Abstract	Recent deep networks that directly handle points in a point set, e.g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graph-based enhancement is enforced to promote local structures on top of PointNet. Then, a novel folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fully-connected neural networks, yet leads to a more discriminative representation that achieves higher linear SVM classification accuracy than the benchmark. In addition, the proposed decoder structure is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid. Our code is available at http://www.merl.com/research/license#FoldingNet
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.07262v2
PDF	http://arxiv.org/pdf/1712.07262v2.pdf
PWC	https://paperswithcode.com/paper/foldingnet-point-cloud-auto-encoder-via-deep
Repo	https://github.com/AnTao97/UnsupervisedPointCloudReconstruction
Framework	pytorch

A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking


Title	A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking
Authors	M. Saquib Sarfraz, Arne Schumann, Andreas Eberle, Rainer Stiefelhagen
Abstract	Person re identification is a challenging retrieval task that requires matching a person’s acquired image across non overlapping camera views. In this paper we propose an effective approach that incorporates both the fine and coarse pose information of the person to learn a discriminative embedding. In contrast to the recent direction of explicitly modeling body parts or correcting for misalignment based on these, we show that a rather straightforward inclusion of acquired camera view and/or the detected joint locations into a convolutional neural network helps to learn a very effective representation. To increase retrieval performance, re-ranking techniques based on computed distances have recently gained much attention. We propose a new unsupervised and automatic re-ranking framework that achieves state-of-the-art re-ranking performance. We show that in contrast to the current state-of-the-art re-ranking methods our approach does not require to compute new rank lists for each image pair (e.g., based on reciprocal neighbors) and performs well by using simple direct rank list based comparison or even by just using the already computed euclidean distances between the images. We show that both our learned representation and our re-ranking method achieve state-of-the-art performance on a number of challenging surveillance image and video datasets. The code is available online at: https://github.com/pse-ecn/pose-sensitive-embedding
Tasks	Person Re-Identification
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10378v2
PDF	http://arxiv.org/pdf/1711.10378v2.pdf
PWC	https://paperswithcode.com/paper/a-pose-sensitive-embedding-for-person-re
Repo	https://github.com/pse-ecn/pose-sensitive-embedding
Framework	tf

Proximity Variational Inference


Title	Proximity Variational Inference
Authors	Jaan Altosaar, Rajesh Ranganath, David M. Blei
Abstract	Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initialization and optimization quirks and finds better local optima. We demonstrate our method on three proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network with both real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014). Empirically, we show that PVI consistently finds better local optima and gives better predictive performance.
Tasks
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08931v1
PDF	http://arxiv.org/pdf/1705.08931v1.pdf
PWC	https://paperswithcode.com/paper/proximity-variational-inference
Repo	https://github.com/altosaar/proximity_vi
Framework	tf

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs


Title	In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
Authors	Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
Abstract	In this work we present In-Place Activated Batch Normalization (InPlace-ABN) - a novel approach to drastically reduce the training memory footprint of modern deep neural networks in a computationally efficient way. Our solution substitutes the conventionally used succession of BatchNorm + Activation layers with a single plugin layer, hence avoiding invasive framework surgery while providing straightforward applicability for existing deep learning frameworks. We obtain memory savings of up to 50% by dropping intermediate results and by recovering required information during the backward pass through the inversion of stored forward results, with only minor increase (0.8-2%) in computation time. Also, we demonstrate how frequently used checkpointing approaches can be made computationally as efficient as InPlace-ABN. In our experiments on image classification, we demonstrate on-par results on ImageNet-1k with state-of-the-art approaches. On the memory-demanding task of semantic segmentation, we report results for COCO-Stuff, Cityscapes and Mapillary Vistas, obtaining new state-of-the-art results on the latter without additional training data but in a single-scale and -model scenario. Code can be found at https://github.com/mapillary/inplace_abn .
Tasks	Image Classification, Semantic Segmentation
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02616v3
PDF	http://arxiv.org/pdf/1712.02616v3.pdf
PWC	https://paperswithcode.com/paper/in-place-activated-batchnorm-for-memory
Repo	https://github.com/ternaus/TernausNetV2
Framework	pytorch

InfiniteBoost: building infinite ensembles with gradient descent


Title	InfiniteBoost: building infinite ensembles with gradient descent
Authors	Alex Rogozhnikov, Tatiana Likhomanenko
Abstract	In machine learning ensemble methods have demonstrated high accuracy for the variety of problems in different areas. Two notable ensemble methods widely used in practice are gradient boosting and random forests. In this paper we present InfiniteBoost - a novel algorithm, which combines important properties of these two approaches. The algorithm constructs the ensemble of trees for which two properties hold: trees of the ensemble incorporate the mistakes done by others; at the same time the ensemble could contain the infinite number of trees without the over-fitting effect. The proposed algorithm is evaluated on the regression, classification, and ranking tasks using large scale, publicly available datasets.
Tasks
Published	2017-06-04
URL	http://arxiv.org/abs/1706.01109v2
PDF	http://arxiv.org/pdf/1706.01109v2.pdf
PWC	https://paperswithcode.com/paper/infiniteboost-building-infinite-ensembles
Repo	https://github.com/arogozhnikov/infiniteboost
Framework	none

Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior


Title	Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior
Authors	Jeremy Morton, Mykel J. Kochenderfer
Abstract	In this work, we propose a method for learning driver models that account for variables that cannot be observed directly. When trained on a synthetic dataset, our models are able to learn encodings for vehicle trajectories that distinguish between four distinct classes of driver behavior. Such encodings are learned without any knowledge of the number of driver classes or any objective that directly requires the models to learn encodings for each class. We show that driving policies trained with knowledge of latent variables are more effective than baseline methods at imitating the driver behavior that they are trained to replicate. Furthermore, we demonstrate that the actions chosen by our policy are heavily influenced by the latent variable settings that are provided to them.
Tasks
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05566v1
PDF	http://arxiv.org/pdf/1704.05566v1.pdf
PWC	https://paperswithcode.com/paper/simultaneous-policy-learning-and-latent-state
Repo	https://github.com/sisl/latent_driver
Framework	tf

Element-centric clustering comparison unifies overlaps and hierarchy


Title	Element-centric clustering comparison unifies overlaps and hierarchy
Authors	Alexander J. Gates, Ian B. Wood, William P. Hetrick, Yong-Yeol Ahn
Abstract	Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.
Tasks
Published	2017-06-19
URL	https://arxiv.org/abs/1706.06136v2
PDF	https://arxiv.org/pdf/1706.06136v2.pdf
PWC	https://paperswithcode.com/paper/on-comparing-clusterings-an-element-centric
Repo	https://github.com/Hoosier-Clusters/clusim
Framework	none

Outlier Detection for Text Data : An Extended Version


Title	Outlier Detection for Text Data : An Extended Version
Authors	Ramakrishnan Kannan, Hyenkyun Woo, Charu C. Aggarwal, Haesun Park
Abstract	The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distinguish the anomalies with the use of low rank approximations of the underlying data. Our iterative algorithm TONMF is based on block coordinate descent (BCD) framework. We define blocks over the term-document matrix such that the function becomes solvable. Given most recently updated values of other matrix blocks, we always update one block at a time to its optimal. Our approach has significant advantages over traditional methods for text outlier detection. Finally, we present experimental results illustrating the effectiveness of our method over competing methods.
Tasks	Outlier Detection
Published	2017-01-05
URL	http://arxiv.org/abs/1701.01325v1
PDF	http://arxiv.org/pdf/1701.01325v1.pdf
PWC	https://paperswithcode.com/paper/outlier-detection-for-text-data-an-extended
Repo	https://github.com/manojkumar-github/NLP-TextAnalytics-DeepLearning
Framework	none

Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition


Title	Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition
Authors	L. T. Anh, M. Y. Arkhipov, M. S. Burtsev
Abstract	Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approaches have been proposed for this task in Russian language, it still has a substantial potential for the better solutions. In this work, we studied several deep neural network models starting from vanilla Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with Conditional Random Fields (CRF) as well as highway networks and finally adding external word embeddings. All models were evaluated across three datasets: Gareev’s dataset, Person-1000, FactRuEval-2016. We found that extension of Bi-LSTM model with CRF significantly increased the quality of predictions. Encoding input tokens with external word embeddings reduced training time and allowed to achieve state of the art for the Russian NER task.
Tasks	Named Entity Recognition, Word Embeddings
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09686v2
PDF	http://arxiv.org/pdf/1709.09686v2.pdf
PWC	https://paperswithcode.com/paper/application-of-a-hybrid-bi-lstm-crf-model-to
Repo	https://github.com/deepmipt/DeepPavlov
Framework	tf

Making Neural QA as Simple as Possible but not Simpler


Title	Making Neural QA as Simple as Possible but not Simpler
Authors	Dirk Weissenborn, Georg Wiese, Laura Seiffe
Abstract	Recent development of large-scale question answering (QA) datasets triggered a substantial amount of research into end-to-end neural architectures for QA. Increasingly complex systems have been conceived without comparison to simpler neural baseline systems that would justify their complexity. In this work, we propose a simple heuristic that guides the development of neural baseline systems for the extractive QA task. We find that there are two ingredients necessary for building a high-performing neural QA system: first, the awareness of question words while processing the context and second, a composition function that goes beyond simple bag-of-words modeling, such as recurrent neural networks. Our results show that FastQA, a system that meets these two requirements, can achieve very competitive performance compared with existing models. We argue that this surprising finding puts results of previous systems and the complexity of recent QA datasets into perspective.
Tasks	Question Answering, Reading Comprehension
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04816v3
PDF	http://arxiv.org/pdf/1703.04816v3.pdf
PWC	https://paperswithcode.com/paper/making-neural-qa-as-simple-as-possible-but
Repo	https://github.com/newmast/QA-Deep-Learning
Framework	none