July 29, 2019

3023 words 15 mins read

Paper Group AWR 98

Paper Group AWR 98

Nighttime sky/cloud image segmentation. Minimizing Supervision for Free-space Segmentation. Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering. Detecting and Explaining Causes From Text For a Time Series Event. Exploring text datasets by visualizing relevant words. Detecting Visual Relationships with Deep Relational …

Nighttime sky/cloud image segmentation

Title Nighttime sky/cloud image segmentation
Authors Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Abstract Imaging the atmosphere using ground-based sky cameras is a popular approach to study various atmospheric phenomena. However, it usually focuses on the daytime. Nighttime sky/cloud images are darker and noisier, and thus harder to analyze. An accurate segmentation of sky/cloud images is already challenging because of the clouds’ non-rigid structure and size, and the lower and less stable illumination of the night sky increases the difficulty. Nonetheless, nighttime cloud imaging is essential in certain applications, such as continuous weather analysis and satellite communication. In this paper, we propose a superpixel-based method to segment nighttime sky/cloud images. We also release the first nighttime sky/cloud image segmentation database to the research community. The experimental results show the efficacy of our proposed algorithm for nighttime images.
Tasks Semantic Segmentation
Published 2017-05-30
URL http://arxiv.org/abs/1705.10583v1
PDF http://arxiv.org/pdf/1705.10583v1.pdf
PWC https://paperswithcode.com/paper/nighttime-skycloud-image-segmentation
Repo https://github.com/Soumyabrata/nighttime-imaging
Framework none

Minimizing Supervision for Free-space Segmentation

Title Minimizing Supervision for Free-space Segmentation
Authors Satoshi Tsutsui, Tommi Kerola, Shunta Saito, David J. Crandall
Abstract Identifying “free-space,” or safely driveable regions in the scene ahead, is a fundamental task for autonomous navigation. While this task can be addressed using semantic segmentation, the manual labor involved in creating pixelwise annotations to train the segmentation model is very costly. Although weakly supervised segmentation addresses this issue, most methods are not designed for free-space. In this paper, we observe that homogeneous texture and location are two key characteristics of free-space, and develop a novel, practical framework for free-space segmentation with minimal human supervision. Our experiments show that our framework performs better than other weakly supervised methods while using less supervision. Our work demonstrates the potential for performing free-space segmentation without tedious and costly manual annotation, which will be important for adapting autonomous driving systems to different types of vehicles and environments
Tasks Autonomous Driving, Autonomous Navigation, Semantic Segmentation
Published 2017-11-16
URL http://arxiv.org/abs/1711.05998v3
PDF http://arxiv.org/pdf/1711.05998v3.pdf
PWC https://paperswithcode.com/paper/minimizing-supervision-for-free-space
Repo https://github.com/apple2373/min-seg-road
Framework pytorch

Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering

Title Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering
Authors Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Abstract The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers. Given that recent architectural innovations are mostly new word interaction layers or attention-based matching mechanisms, it seems to be a well-established fact that these components are mandatory for good performance. Unfortunately, the memory and computation cost incurred by these complex mechanisms are undesirable for practical applications. As such, this paper tackles the question of whether it is possible to achieve competitive performance with simple neural architectures. We propose a simple but novel deep learning architecture for fast and efficient question-answer ranking and retrieval. More specifically, our proposed model, \textsc{HyperQA}, is a parameter efficient neural network that outperforms other parameter intensive models such as Attentive Pooling BiLSTMs and Multi-Perspective CNNs on multiple QA benchmarks. The novelty behind \textsc{HyperQA} is a pairwise ranking objective that models the relationship between question and answer embeddings in Hyperbolic space instead of Euclidean space. This empowers our model with a self-organizing ability and enables automatic discovery of latent hierarchies while learning embeddings of questions and answers. Our model requires no feature engineering, no similarity matrix matching, no complicated attention mechanisms nor over-parameterized layers and yet outperforms and remains competitive to many models that have these functionalities on multiple benchmarks.
Tasks Feature Engineering, Question Answering, Representation Learning
Published 2017-07-25
URL http://arxiv.org/abs/1707.07847v3
PDF http://arxiv.org/pdf/1707.07847v3.pdf
PWC https://paperswithcode.com/paper/hyperbolic-representation-learning-for-fast
Repo https://github.com/vanzytay/WSDM2018_HyperQA
Framework tf

Detecting and Explaining Causes From Text For a Time Series Event

Title Detecting and Explaining Causes From Text For a Time Series Event
Authors Dongyeop Kang, Varun Gangal, Ang Lu, Zheng Chen, Eduard Hovy
Abstract Explaining underlying causes or effects about events is a challenging but valuable task. We define a novel problem of generating explanations of a time series event by (1) searching cause and effect relationships of the time series with textual data and (2) constructing a connecting chain between them to generate an explanation. To detect causal features from text, we propose a novel method based on the Granger causality of time series between features extracted from text such as N-grams, topics, sentiments, and their composition. The generation of the sequence of causal entities requires a commonsense causative knowledge base with efficient reasoning. To ensure good interpretability and appropriate lexical usage we combine symbolic and neural representations, using a neural reasoning algorithm trained on commonsense causal tuples to predict the next cause step. Our quantitative and human analysis show empirical evidence that our method successfully extracts meaningful causality relationships between time series with textual features and generates appropriate explanation between them.
Tasks Time Series
Published 2017-07-27
URL http://arxiv.org/abs/1707.08852v1
PDF http://arxiv.org/pdf/1707.08852v1.pdf
PWC https://paperswithcode.com/paper/detecting-and-explaining-causes-from-text-for
Repo https://github.com/dykang/cgraph
Framework none

Exploring text datasets by visualizing relevant words

Title Exploring text datasets by visualizing relevant words
Authors Franziska Horn, Leila Arras, Grégoire Montavon, Klaus-Robert Müller, Wojciech Samek
Abstract When working with a new dataset, it is important to first explore and familiarize oneself with it, before applying any advanced machine learning algorithms. However, to the best of our knowledge, no tools exist that quickly and reliably give insight into the contents of a selection of documents with respect to what distinguishes them from other documents belonging to different categories. In this paper we propose to extract `relevant words’ from a collection of texts, which summarize the contents of documents belonging to a certain class (or discovered cluster in the case of unlabeled datasets), and visualize them in word clouds to allow for a survey of salient features at a glance. We compare three methods for extracting relevant words and demonstrate the usefulness of the resulting word clouds by providing an overview of the classes contained in a dataset of scientific publications as well as by discovering trending topics from recent New York Times article snippets. |
Tasks
Published 2017-07-17
URL http://arxiv.org/abs/1707.05261v1
PDF http://arxiv.org/pdf/1707.05261v1.pdf
PWC https://paperswithcode.com/paper/exploring-text-datasets-by-visualizing
Repo https://github.com/acdreyer/thesis
Framework none

Detecting Visual Relationships with Deep Relational Networks

Title Detecting Visual Relationships with Deep Relational Networks
Authors Bo Dai, Yuqi Zhang, Dahua Lin
Abstract Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task. Previous methods often treat this as a classification problem, considering each type of relationship (e.g. “ride”) or each distinct visual phrase (e.g. “person-ride-horse”) as a category. Such approaches are faced with significant difficulties caused by the high diversity of visual appearance for each kind of relationships or the large number of distinct visual phrases. We propose an integrated framework to tackle this problem. At the heart of this framework is the Deep Relational Network, a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships. On two large datasets, the proposed method achieves substantial improvement over state-of-the-art.
Tasks
Published 2017-04-11
URL http://arxiv.org/abs/1704.03114v2
PDF http://arxiv.org/pdf/1704.03114v2.pdf
PWC https://paperswithcode.com/paper/detecting-visual-relationships-with-deep
Repo https://github.com/doubledaibo/drnet
Framework caffe2

Self-supervised Learning of Motion Capture

Title Self-supervised Learning of Motion Capture
Authors Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki
Abstract Current state-of-the-art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. In this work, we propose a learning based motion capture model for single camera input. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end framework. Empirically we show our model combines the best of both worlds of supervised learning and test-time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit than a pretrained fixed model. We show that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.
Tasks Motion Capture, Optical Flow Estimation
Published 2017-12-04
URL http://arxiv.org/abs/1712.01337v1
PDF http://arxiv.org/pdf/1712.01337v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-learning-of-motion-capture
Repo https://github.com/chingswy/HumanPoseMemo
Framework pytorch

Scene Recognition by Combining Local and Global Image Descriptors

Title Scene Recognition by Combining Local and Global Image Descriptors
Authors Jobin Wilson, Muhammad Arif
Abstract Object recognition is an important problem in computer vision, having diverse applications. In this work, we construct an end-to-end scene recognition pipeline consisting of feature extraction, encoding, pooling and classification. Our approach simultaneously utilize global feature descriptors as well as local feature descriptors from images, to form a hybrid feature descriptor corresponding to each image. We utilize DAISY features associated with key points within images as our local feature descriptor and histogram of oriented gradients (HOG) corresponding to an entire image as a global descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch K-Means algorithm to reduce the complexity of our feature encoding scheme. A 2-level pooling procedure is used to combine DAISY and HOG features corresponding to each image. Finally, we experiment with a multi-class SVM classifier with several kernels, in a cross-validation setting, and tabulate our results on the fifteen scene categories dataset. The average accuracy of our model was 76.4% in the case of a 40%-60% random split of images into training and testing datasets respectively. The primary objective of this work is to clearly outline the practical implementation of a basic screne-recognition pipeline having a reasonable accuracy, in python, using open-source libraries. A full implementation of the proposed model is available in our github repository.
Tasks Object Recognition, Scene Recognition
Published 2017-02-21
URL http://arxiv.org/abs/1702.06850v1
PDF http://arxiv.org/pdf/1702.06850v1.pdf
PWC https://paperswithcode.com/paper/scene-recognition-by-combining-local-and
Repo https://github.com/flytxtds/scene-recognition
Framework none

Detecting Oriented Text in Natural Images by Linking Segments

Title Detecting Oriented Text in Natural Images by Linking Segments
Authors Baoguang Shi, Xiang Bai, Serge Belongie
Abstract Most state-of-the-art text detection methods are specific to horizontal Latin text and are not fast enough for real-time applications. We introduce Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. A segment is an oriented box covering a part of a word or text line; A link connects two adjacent segments, indicating that they belong to the same word or text line. Both elements are detected densely at multiple scales by an end-to-end trained, fully-convolutional neural network. Final detections are produced by combining segments connected by links. Compared with previous methods, SegLink improves along the dimensions of accuracy, speed, and ease of training. It achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin. It runs at over 20 FPS on 512x512 images. Moreover, without modification, SegLink is able to detect long lines of non-Latin text, such as Chinese.
Tasks Curved Text Detection, Scene Text Detection
Published 2017-03-19
URL http://arxiv.org/abs/1703.06520v3
PDF http://arxiv.org/pdf/1703.06520v3.pdf
PWC https://paperswithcode.com/paper/detecting-oriented-text-in-natural-images-by
Repo https://github.com/bgshih/seglink
Framework tf

Single Shot Text Detector with Regional Attention

Title Single Shot Text Detector with Regional Attention
Authors Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li
Abstract We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely small sizes. This results in a single model that essentially works in a coarse-to-fine manner. It departs from recent FCN- based text detectors which cascade multiple FCN models to achieve an accurate prediction. Furthermore, we develop a hierarchical inception module which efficiently aggregates multi-scale inception features. This enhances local details, and also encodes strong context information, allow- ing the detector to work reliably on multi-scale and multi- orientation text with single-scale images. Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28]. Demo is available at: http://sstd.whuang.org/.
Tasks Scene Text Detection
Published 2017-09-01
URL http://arxiv.org/abs/1709.00138v1
PDF http://arxiv.org/pdf/1709.00138v1.pdf
PWC https://paperswithcode.com/paper/single-shot-text-detector-with-regional
Repo https://github.com/BestSonny/SSTD
Framework none

A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text

Title A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
Authors Jingjing Xu, Ji Wen, Xu Sun, Qi Su
Abstract Named Entity Recognition and Relation Extraction for Chinese literature text is regarded as the highly difficult problem, partially because of the lack of tagging sets. In this paper, we build a discourse-level dataset from hundreds of Chinese literature articles for improving this task. To build a high quality dataset, we propose two tagging methods to solve the problem of data inconsistency, including a heuristic tagging method and a machine auxiliary tagging method. Based on this corpus, we also introduce several widely used models to conduct experiments. Experimental results not only show the usefulness of the proposed dataset, but also provide baselines for further research. The dataset is available at https://github.com/lancopku/Chinese-Literature-NER-RE-Dataset
Tasks Named Entity Recognition, Relation Extraction
Published 2017-11-19
URL https://arxiv.org/abs/1711.07010v5
PDF https://arxiv.org/pdf/1711.07010v5.pdf
PWC https://paperswithcode.com/paper/a-discourse-level-named-entity-recognition
Repo https://github.com/lancopku/Chinese-Literature-NER-RE-Dataset
Framework none

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Title StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Authors Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo
Abstract Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN’s superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.
Tasks Image-to-Image Translation
Published 2017-11-24
URL http://arxiv.org/abs/1711.09020v3
PDF http://arxiv.org/pdf/1711.09020v3.pdf
PWC https://paperswithcode.com/paper/stargan-unified-generative-adversarial
Repo https://github.com/cosmic119/StarGAN
Framework pytorch

Concrete Dropout

Title Concrete Dropout
Authors Yarin Gal, Jiri Hron, Alex Kendall
Abstract Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary - a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.
Tasks
Published 2017-05-22
URL http://arxiv.org/abs/1705.07832v1
PDF http://arxiv.org/pdf/1705.07832v1.pdf
PWC https://paperswithcode.com/paper/concrete-dropout
Repo https://github.com/yaringal/ConcreteDropout
Framework none

Riemannian Optimization for Skip-Gram Negative Sampling

Title Riemannian Optimization for Skip-Gram Negative Sampling
Authors Alexander Fonarev, Oleksii Hrinchuk, Gleb Gusev, Pavel Serdyukov, Ivan Oseledets
Abstract Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in “word2vec” software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework to optimize the SGNS objective over the manifold of required low-rank matrices. In this paper, we propose an algorithm that optimizes SGNS objective using Riemannian optimization and demonstrates its superiority over popular competitors, such as the original method to train SGNS and SVD over SPPMI matrix.
Tasks
Published 2017-04-26
URL http://arxiv.org/abs/1704.08059v1
PDF http://arxiv.org/pdf/1704.08059v1.pdf
PWC https://paperswithcode.com/paper/riemannian-optimization-for-skip-gram
Repo https://github.com/AlexGrinch/ro_sgns
Framework none

Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science

Title Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science
Authors Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, Antonio Liotta
Abstract Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erd\H{o}s-R'enyi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.
Tasks
Published 2017-07-15
URL http://arxiv.org/abs/1707.04780v2
PDF http://arxiv.org/pdf/1707.04780v2.pdf
PWC https://paperswithcode.com/paper/scalable-training-of-artificial-neural
Repo https://github.com/gru2/DoubleBlockSparse
Framework none
comments powered by Disqus