Paper Group AWR 150
Predicting the Driver’s Focus of Attention: the DR(eye)VE Project. A Crowd-Annotated Spanish Corpus for Humor Analysis. Discriminative models for multi-instance problems with tree-structure. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. Convolutional Image Captioning. FoldingNet: Point Cloud Auto-encoder via Deep Grid Defor …
Predicting the Driver’s Focus of Attention: the DR(eye)VE Project
Title | Predicting the Driver’s Focus of Attention: the DR(eye)VE Project |
Authors | Andrea Palazzi, Davide Abati, Simone Calderara, Francesco Solera, Rita Cucchiara |
Abstract | In this work we aim to predict the driver’s focus of attention. The goal is to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. To this end we propose a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics. We also introduce DR(eye)VE, the largest dataset of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, matching ego-centric views (from glasses worn by drivers) and car-centric views (from roof-mounted camera), further enriched by other sensors measurements. Results highlight that several attention patterns are shared across drivers and can be reproduced to some extent. The indication of which elements in the scene are likely to capture the driver’s attention may benefit several applications in the context of human-vehicle interaction and driver attention analysis. |
Tasks | Eye Tracking |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03854v3 |
http://arxiv.org/pdf/1705.03854v3.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-drivers-focus-of-attention-the |
Repo | https://github.com/ndrplz/dreyeve |
Framework | none |
A Crowd-Annotated Spanish Corpus for Humor Analysis
Title | A Crowd-Annotated Spanish Corpus for Humor Analysis |
Authors | Santiago Castro, Luis Chiruzzo, Aiala Rosá, Diego Garat, Guillermo Moncecchi |
Abstract | Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The inter-annotator agreement Krippendorff’s alpha value is 0.5710. The dataset is available for general use and can serve as a basis for humor detection and as a first step to tackle subjectivity. |
Tasks | Humor Detection |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00477v4 |
http://arxiv.org/pdf/1710.00477v4.pdf | |
PWC | https://paperswithcode.com/paper/a-crowd-annotated-spanish-corpus-for-humor |
Repo | https://github.com/pln-fing-udelar/humor |
Framework | none |
Discriminative models for multi-instance problems with tree-structure
Title | Discriminative models for multi-instance problems with tree-structure |
Authors | Tomas Pevny, Petr Somol |
Abstract | Modeling network traffic is gaining importance in order to counter modern threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in model training phase. We propose a discriminative model that makes decisions based on all computer’s traffic observed during predefined time window (5 minutes in our case). The model is trained on collected traffic samples over equally sized time window per large number of computers, where the only labels needed are human verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training the model itself recognizes discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, while the learned traffic patterns can be interpreted as Indicators of Compromise. In the following we implement the discriminative model as a neural network with special structure reflecting two stacked multi-instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) which are typically visited by infected computers. |
Tasks | |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02868v1 |
http://arxiv.org/pdf/1703.02868v1.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-models-for-multi-instance |
Repo | https://github.com/pevnak/Mill.jl |
Framework | none |
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Title | Person Transfer GAN to Bridge Domain Gap for Person Re-Identification |
Authors | Longhui Wei, Shiliang Zhang, Wen Gao, Qi Tian |
Abstract | Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e.g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network. To facilitate the research towards conquering those issues, this paper contributes a new dataset called MSMT17 with many important features, e.g., 1) the raw videos are taken by an 15-camera network deployed in both indoor and outdoor scenes, 2) the videos cover a long period of time and present complex lighting variations, and 3) it contains currently the largest number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe that, domain gap commonly exists between datasets, which essentially causes severe performance drop when training and testing on different datasets. This results in that available training data cannot be effectively leveraged for new testing domains. To relieve the expensive costs of annotating new training samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to bridge the domain gap. Comprehensive experiments show that the domain gap could be substantially narrowed-down by the PTGAN. |
Tasks | Person Re-Identification |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08565v2 |
http://arxiv.org/pdf/1711.08565v2.pdf | |
PWC | https://paperswithcode.com/paper/person-transfer-gan-to-bridge-domain-gap-for |
Repo | https://github.com/yxgeee/MMT |
Framework | pytorch |
Convolutional Image Captioning
Title | Convolutional Image Captioning |
Authors | Jyoti Aneja, Aditya Deshpande, Alexander Schwing |
Abstract | Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In recent years significant progress has been made in image captioning, using Recurrent Neural Networks powered by long-short-term-memory (LSTM) units. Despite mitigating the vanishing gradient problem, and despite their compelling ability to memorize dependencies, LSTM units are complex and inherently sequential across time. To address this issue, recent work has shown benefits of convolutional networks for machine translation and conditional image generation. Inspired by their success, in this paper, we develop a convolutional image captioning technique. We demonstrate its efficacy on the challenging MSCOCO dataset and demonstrate performance on par with the baseline, while having a faster training time per number of parameters. We also perform a detailed analysis, providing compelling reasons in favor of convolutional language generation approaches. |
Tasks | Image Captioning, Text Generation |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09151v1 |
http://arxiv.org/pdf/1711.09151v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-image-captioning |
Repo | https://github.com/NaskyD/convnet |
Framework | pytorch |
FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation
Title | FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation |
Authors | Yaoqing Yang, Chen Feng, Yiru Shen, Dong Tian |
Abstract | Recent deep networks that directly handle points in a point set, e.g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graph-based enhancement is enforced to promote local structures on top of PointNet. Then, a novel folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fully-connected neural networks, yet leads to a more discriminative representation that achieves higher linear SVM classification accuracy than the benchmark. In addition, the proposed decoder structure is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid. Our code is available at http://www.merl.com/research/license#FoldingNet |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.07262v2 |
http://arxiv.org/pdf/1712.07262v2.pdf | |
PWC | https://paperswithcode.com/paper/foldingnet-point-cloud-auto-encoder-via-deep |
Repo | https://github.com/AnTao97/UnsupervisedPointCloudReconstruction |
Framework | pytorch |
A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking
Title | A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking |
Authors | M. Saquib Sarfraz, Arne Schumann, Andreas Eberle, Rainer Stiefelhagen |
Abstract | Person re identification is a challenging retrieval task that requires matching a person’s acquired image across non overlapping camera views. In this paper we propose an effective approach that incorporates both the fine and coarse pose information of the person to learn a discriminative embedding. In contrast to the recent direction of explicitly modeling body parts or correcting for misalignment based on these, we show that a rather straightforward inclusion of acquired camera view and/or the detected joint locations into a convolutional neural network helps to learn a very effective representation. To increase retrieval performance, re-ranking techniques based on computed distances have recently gained much attention. We propose a new unsupervised and automatic re-ranking framework that achieves state-of-the-art re-ranking performance. We show that in contrast to the current state-of-the-art re-ranking methods our approach does not require to compute new rank lists for each image pair (e.g., based on reciprocal neighbors) and performs well by using simple direct rank list based comparison or even by just using the already computed euclidean distances between the images. We show that both our learned representation and our re-ranking method achieve state-of-the-art performance on a number of challenging surveillance image and video datasets. The code is available online at: https://github.com/pse-ecn/pose-sensitive-embedding |
Tasks | Person Re-Identification |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10378v2 |
http://arxiv.org/pdf/1711.10378v2.pdf | |
PWC | https://paperswithcode.com/paper/a-pose-sensitive-embedding-for-person-re |
Repo | https://github.com/pse-ecn/pose-sensitive-embedding |
Framework | tf |
Proximity Variational Inference
Title | Proximity Variational Inference |
Authors | Jaan Altosaar, Rajesh Ranganath, David M. Blei |
Abstract | Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initialization and optimization quirks and finds better local optima. We demonstrate our method on three proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network with both real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014). Empirically, we show that PVI consistently finds better local optima and gives better predictive performance. |
Tasks | |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08931v1 |
http://arxiv.org/pdf/1705.08931v1.pdf | |
PWC | https://paperswithcode.com/paper/proximity-variational-inference |
Repo | https://github.com/altosaar/proximity_vi |
Framework | tf |
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
Title | In-Place Activated BatchNorm for Memory-Optimized Training of DNNs |
Authors | Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder |
Abstract | In this work we present In-Place Activated Batch Normalization (InPlace-ABN) - a novel approach to drastically reduce the training memory footprint of modern deep neural networks in a computationally efficient way. Our solution substitutes the conventionally used succession of BatchNorm + Activation layers with a single plugin layer, hence avoiding invasive framework surgery while providing straightforward applicability for existing deep learning frameworks. We obtain memory savings of up to 50% by dropping intermediate results and by recovering required information during the backward pass through the inversion of stored forward results, with only minor increase (0.8-2%) in computation time. Also, we demonstrate how frequently used checkpointing approaches can be made computationally as efficient as InPlace-ABN. In our experiments on image classification, we demonstrate on-par results on ImageNet-1k with state-of-the-art approaches. On the memory-demanding task of semantic segmentation, we report results for COCO-Stuff, Cityscapes and Mapillary Vistas, obtaining new state-of-the-art results on the latter without additional training data but in a single-scale and -model scenario. Code can be found at https://github.com/mapillary/inplace_abn . |
Tasks | Image Classification, Semantic Segmentation |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02616v3 |
http://arxiv.org/pdf/1712.02616v3.pdf | |
PWC | https://paperswithcode.com/paper/in-place-activated-batchnorm-for-memory |
Repo | https://github.com/ternaus/TernausNetV2 |
Framework | pytorch |
InfiniteBoost: building infinite ensembles with gradient descent
Title | InfiniteBoost: building infinite ensembles with gradient descent |
Authors | Alex Rogozhnikov, Tatiana Likhomanenko |
Abstract | In machine learning ensemble methods have demonstrated high accuracy for the variety of problems in different areas. Two notable ensemble methods widely used in practice are gradient boosting and random forests. In this paper we present InfiniteBoost - a novel algorithm, which combines important properties of these two approaches. The algorithm constructs the ensemble of trees for which two properties hold: trees of the ensemble incorporate the mistakes done by others; at the same time the ensemble could contain the infinite number of trees without the over-fitting effect. The proposed algorithm is evaluated on the regression, classification, and ranking tasks using large scale, publicly available datasets. |
Tasks | |
Published | 2017-06-04 |
URL | http://arxiv.org/abs/1706.01109v2 |
http://arxiv.org/pdf/1706.01109v2.pdf | |
PWC | https://paperswithcode.com/paper/infiniteboost-building-infinite-ensembles |
Repo | https://github.com/arogozhnikov/infiniteboost |
Framework | none |
Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior
Title | Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior |
Authors | Jeremy Morton, Mykel J. Kochenderfer |
Abstract | In this work, we propose a method for learning driver models that account for variables that cannot be observed directly. When trained on a synthetic dataset, our models are able to learn encodings for vehicle trajectories that distinguish between four distinct classes of driver behavior. Such encodings are learned without any knowledge of the number of driver classes or any objective that directly requires the models to learn encodings for each class. We show that driving policies trained with knowledge of latent variables are more effective than baseline methods at imitating the driver behavior that they are trained to replicate. Furthermore, we demonstrate that the actions chosen by our policy are heavily influenced by the latent variable settings that are provided to them. |
Tasks | |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05566v1 |
http://arxiv.org/pdf/1704.05566v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-policy-learning-and-latent-state |
Repo | https://github.com/sisl/latent_driver |
Framework | tf |
Element-centric clustering comparison unifies overlaps and hierarchy
Title | Element-centric clustering comparison unifies overlaps and hierarchy |
Authors | Alexander J. Gates, Ian B. Wood, William P. Hetrick, Yong-Yeol Ahn |
Abstract | Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science. |
Tasks | |
Published | 2017-06-19 |
URL | https://arxiv.org/abs/1706.06136v2 |
https://arxiv.org/pdf/1706.06136v2.pdf | |
PWC | https://paperswithcode.com/paper/on-comparing-clusterings-an-element-centric |
Repo | https://github.com/Hoosier-Clusters/clusim |
Framework | none |
Outlier Detection for Text Data : An Extended Version
Title | Outlier Detection for Text Data : An Extended Version |
Authors | Ramakrishnan Kannan, Hyenkyun Woo, Charu C. Aggarwal, Haesun Park |
Abstract | The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distinguish the anomalies with the use of low rank approximations of the underlying data. Our iterative algorithm TONMF is based on block coordinate descent (BCD) framework. We define blocks over the term-document matrix such that the function becomes solvable. Given most recently updated values of other matrix blocks, we always update one block at a time to its optimal. Our approach has significant advantages over traditional methods for text outlier detection. Finally, we present experimental results illustrating the effectiveness of our method over competing methods. |
Tasks | Outlier Detection |
Published | 2017-01-05 |
URL | http://arxiv.org/abs/1701.01325v1 |
http://arxiv.org/pdf/1701.01325v1.pdf | |
PWC | https://paperswithcode.com/paper/outlier-detection-for-text-data-an-extended |
Repo | https://github.com/manojkumar-github/NLP-TextAnalytics-DeepLearning |
Framework | none |
Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition
Title | Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition |
Authors | L. T. Anh, M. Y. Arkhipov, M. S. Burtsev |
Abstract | Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approaches have been proposed for this task in Russian language, it still has a substantial potential for the better solutions. In this work, we studied several deep neural network models starting from vanilla Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with Conditional Random Fields (CRF) as well as highway networks and finally adding external word embeddings. All models were evaluated across three datasets: Gareev’s dataset, Person-1000, FactRuEval-2016. We found that extension of Bi-LSTM model with CRF significantly increased the quality of predictions. Encoding input tokens with external word embeddings reduced training time and allowed to achieve state of the art for the Russian NER task. |
Tasks | Named Entity Recognition, Word Embeddings |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09686v2 |
http://arxiv.org/pdf/1709.09686v2.pdf | |
PWC | https://paperswithcode.com/paper/application-of-a-hybrid-bi-lstm-crf-model-to |
Repo | https://github.com/deepmipt/DeepPavlov |
Framework | tf |
Making Neural QA as Simple as Possible but not Simpler
Title | Making Neural QA as Simple as Possible but not Simpler |
Authors | Dirk Weissenborn, Georg Wiese, Laura Seiffe |
Abstract | Recent development of large-scale question answering (QA) datasets triggered a substantial amount of research into end-to-end neural architectures for QA. Increasingly complex systems have been conceived without comparison to simpler neural baseline systems that would justify their complexity. In this work, we propose a simple heuristic that guides the development of neural baseline systems for the extractive QA task. We find that there are two ingredients necessary for building a high-performing neural QA system: first, the awareness of question words while processing the context and second, a composition function that goes beyond simple bag-of-words modeling, such as recurrent neural networks. Our results show that FastQA, a system that meets these two requirements, can achieve very competitive performance compared with existing models. We argue that this surprising finding puts results of previous systems and the complexity of recent QA datasets into perspective. |
Tasks | Question Answering, Reading Comprehension |
Published | 2017-03-14 |
URL | http://arxiv.org/abs/1703.04816v3 |
http://arxiv.org/pdf/1703.04816v3.pdf | |
PWC | https://paperswithcode.com/paper/making-neural-qa-as-simple-as-possible-but |
Repo | https://github.com/newmast/QA-Deep-Learning |
Framework | none |