Paper Group AWR 161
Forward and Reverse Gradient-Based Hyperparameter Optimization. Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection. Good Semi-supervised Learning that Requires a Bad GAN. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. Unsupervised Domain Adaptation for 3D Keypoint Estimation v …
Forward and Reverse Gradient-Based Hyperparameter Optimization
Title | Forward and Reverse Gradient-Based Hyperparameter Optimization |
Authors | Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil |
Abstract | We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror two methods of computing gradients for recurrent neural networks and have different trade-offs in terms of running time and space requirements. Our formulation of the reverse-mode procedure is linked to previous work by Maclaurin et al. [2015] but does not require reversible dynamics. The forward-mode procedure is suitable for real-time hyperparameter updates, which may significantly speed up hyperparameter optimization on large datasets. We present experiments on data cleaning and on learning task interactions. We also present one large-scale experiment where the use of previous gradient-based methods would be prohibitive. |
Tasks | Hyperparameter Optimization |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.01785v3 |
http://arxiv.org/pdf/1703.01785v3.pdf | |
PWC | https://paperswithcode.com/paper/forward-and-reverse-gradient-based |
Repo | https://github.com/lucfra/FAR-HO |
Framework | tf |
Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection
Title | Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection |
Authors | Zhao Meng, Lili Mou, Zhi Jin |
Abstract | Speaker change detection (SCD) is an important task in dialog modeling. Our paper addresses the problem of text-based SCD, which differs from existing audio-based studies and is useful in various scenarios, for example, processing dialog transcripts where speaker identities are missing (e.g., OpenSubtitle), and enhancing audio SCD with textual information. We formulate text-based SCD as a matching problem of utterances before and after a certain decision point; we propose a hierarchical recurrent neural network (RNN) with static sentence-level attention. Experimental results show that neural networks consistently achieve better performance than feature-based approaches, and that our attention-based model significantly outperforms non-attention neural networks. |
Tasks | |
Published | 2017-03-22 |
URL | http://arxiv.org/abs/1703.07713v2 |
http://arxiv.org/pdf/1703.07713v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-rnn-with-static-sentence-level |
Repo | https://github.com/stoplime/OpenAudioAI |
Framework | none |
Good Semi-supervised Learning that Requires a Bad GAN
Title | Good Semi-supervised Learning that Requires a Bad GAN |
Authors | Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov |
Abstract | Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets. |
Tasks | Semi-Supervised Image Classification |
Published | 2017-05-27 |
URL | http://arxiv.org/abs/1705.09783v3 |
http://arxiv.org/pdf/1705.09783v3.pdf | |
PWC | https://paperswithcode.com/paper/good-semi-supervised-learning-that-requires-a |
Repo | https://github.com/kimiyoung/ssl_bad_gan |
Framework | pytorch |
“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection
Title | “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection |
Authors | William Yang Wang |
Abstract | Automatic fake news detection is a challenging problem in deception detection, and it has tremendous real-world political and social impacts. However, statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. In this paper, we present liar: a new, publicly available dataset for fake news detection. We collected a decade-long, 12.8K manually labeled short statements in various contexts from PolitiFact.com, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. Empirically, we investigate automatic fake news detection based on surface-level linguistic patterns. We have designed a novel, hybrid convolutional neural network to integrate meta-data with text. We show that this hybrid approach can improve a text-only deep learning model. |
Tasks | Deception Detection, Fake News Detection |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00648v1 |
http://arxiv.org/pdf/1705.00648v1.pdf | |
PWC | https://paperswithcode.com/paper/liar-liar-pants-on-fire-a-new-benchmark |
Repo | https://github.com/JelenaBanjac/AppliedDataAnalysis |
Framework | none |
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
Title | Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency |
Authors | Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qixing Huang |
Abstract | In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency can provide effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization. We demonstrate the effectiveness of our approach on real datasets and present experimental results showing that our approach is superior to state-of-the-art general-purpose domain adaptation techniques. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2017-12-15 |
URL | http://arxiv.org/abs/1712.05765v2 |
http://arxiv.org/pdf/1712.05765v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-3d |
Repo | https://github.com/xingyizhou/3DKeypoints-DA |
Framework | pytorch |
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
Title | Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks |
Authors | Dinesh Jayaraman, Kristen Grauman |
Abstract | It is common to implicitly assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is itself a major challenge. We address the problem of learning to look around: if a visual agent has the ability to voluntarily acquire new views to observe its environment, how can it learn efficient exploratory behaviors to acquire informative observations? We propose a reinforcement learning solution, where the agent is rewarded for actions that reduce its uncertainty about the unobserved portions of its environment. Based on this principle, we develop a recurrent neural network-based approach to perform active completion of panoramic natural scenes and 3D object shapes. Crucially, the learned policies are not tied to any recognition task nor to the particular semantic content seen during training. As a result, 1) the learned “look around” behavior is relevant even for new tasks in unseen environments, and 2) training data acquisition involves no manual labeling. Through tests in diverse settings, we demonstrate that our approach learns useful generic policies that transfer to new unseen tasks and environments. Completion episodes are shown at https://goo.gl/BgWX3W. |
Tasks | |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00507v2 |
http://arxiv.org/pdf/1709.00507v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-look-around-intelligently |
Repo | https://github.com/dineshj1/lookaround |
Framework | torch |
Incorporating Feedback into Tree-based Anomaly Detection
Title | Incorporating Feedback into Tree-based Anomaly Detection |
Authors | Shubhomoy Das, Weng-Keen Wong, Alan Fern, Thomas G. Dietterich, Md Amran Siddiqui |
Abstract | Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analyst’s job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets. |
Tasks | Anomaly Detection |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09441v1 |
http://arxiv.org/pdf/1708.09441v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-feedback-into-tree-based |
Repo | https://github.com/shubhomoydas/ad_examples |
Framework | tf |
Neural Ranking Models with Weak Supervision
Title | Neural Ranking Models with Weak Supervision |
Authors | Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft |
Abstract | Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models. |
Tasks | Ad-Hoc Information Retrieval, Information Retrieval |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08803v2 |
http://arxiv.org/pdf/1704.08803v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-ranking-models-with-weak-supervision |
Repo | https://github.com/mikvrax/TrecingLab |
Framework | none |
Compressed Sensing using Generative Models
Title | Compressed Sensing using Generative Models |
Authors | Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis |
Abstract | The goal of compressed sensing is to estimate a vector from an underdetermined system of noisy linear measurements, by making use of prior knowledge on the structure of vectors in the relevant domain. For almost all results in this literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we suppose that vectors lie near the range of a generative model $G: \mathbb{R}^k \to \mathbb{R}^n$. Our main theorem is that, if $G$ is $L$-Lipschitz, then roughly $O(k \log L)$ random Gaussian measurements suffice for an $\ell_2/\ell_2$ recovery guarantee. We demonstrate our results using generative models from published variational autoencoder and generative adversarial networks. Our method can use $5$-$10$x fewer measurements than Lasso for the same accuracy. |
Tasks | |
Published | 2017-03-09 |
URL | http://arxiv.org/abs/1703.03208v1 |
http://arxiv.org/pdf/1703.03208v1.pdf | |
PWC | https://paperswithcode.com/paper/compressed-sensing-using-generative-models |
Repo | https://github.com/gabsens/Compressed-Sensing-ENSAE |
Framework | none |
Character-Based Handwritten Text Transcription with Attention Networks
Title | Character-Based Handwritten Text Transcription with Attention Networks |
Authors | Jason Poulos, Rafael Valle |
Abstract | The paper approaches the task of handwritten text transcription with attentional encoder-decoder networks that are trained on sequences of characters. We experiment on lines of text from a popular handwriting database and compare different attention mechanisms for the decoder. The model trained with softmax attention achieves the lowest test error, outperforming several other RNN-based models. Softmax attention is able to learn a linear alignment between image pixels and target characters whereas the alignment generated by sigmoid attention is linear but much less precise. When no function is used to obtain attention weights, the model performs poorly because it lacks a precise alignment between the source and text output. |
Tasks | |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.04046v2 |
http://arxiv.org/pdf/1712.04046v2.pdf | |
PWC | https://paperswithcode.com/paper/attention-networks-for-image-to-text |
Repo | https://github.com/jvpoulos/Attention-OCR |
Framework | tf |
Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks
Title | Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks |
Authors | Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre |
Abstract | In this work, we investigate various methods to deal with semantic labeling of very high resolution multi-modal remote sensing data. Especially, we study how deep fully convolutional networks can be adapted to deal with multi-modal and multi-scale remote sensing data for semantic labeling. Our contributions are threefold: a) we present an efficient multi-scale approach to leverage both a large spatial context and the high resolution data, b) we investigate early and late fusion of Lidar and multispectral data, c) we validate our methods on two public datasets with state-of-the-art results. Our results indicate that late fusion make it possible to recover errors steaming from ambiguous data, while early fusion allows for better joint-feature learning but at the cost of higher sensitivity to missing data. |
Tasks | |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08681v1 |
http://arxiv.org/pdf/1711.08681v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-rgb-very-high-resolution-urban-remote |
Repo | https://github.com/nshaud/DeepNetsForEO |
Framework | pytorch |
Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery
Title | Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery |
Authors | Ryuhei Hamaguchi, Aito Fujita, Keisuke Nemoto, Tomoyuki Imaizumi, Shuhei Hikosaka |
Abstract | Thanks to recent advances in CNNs, solid improvements have been made in semantic segmentation of high resolution remote sensing imagery. However, most of the previous works have not fully taken into account the specific difficulties that exist in remote sensing tasks. One of such difficulties is that objects are small and crowded in remote sensing imagery. To tackle with this challenging task we have proposed a novel architecture called local feature extraction (LFE) module attached on top of dilated front-end module. The LFE module is based on our findings that aggressively increasing dilation factors fails to aggregate local features due to sparsity of the kernel, and detrimental to small objects. The proposed LFE module solves this problem by aggregating local features with decreasing dilation factor. We tested our network on three remote sensing datasets and acquired remarkably good results for all datasets especially for small objects. |
Tasks | Semantic Segmentation |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00179v1 |
http://arxiv.org/pdf/1709.00179v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-use-of-dilated-convolutions-for |
Repo | https://github.com/minerva-ml/open-solution-mapping-challenge |
Framework | pytorch |
Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning
Title | Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning |
Authors | Ronald Kemker, Carl Salvaggio, Christopher Kanan |
Abstract | Deep convolutional neural networks (DCNNs) have been used to achieve state-of-the-art performance on many computer vision tasks (e.g., object recognition, object detection, semantic segmentation) thanks to a large repository of annotated image data. Large labeled datasets for other sensor modalities, e.g., multispectral imagery (MSI), are not available due to the large cost and manpower required. In this paper, we adapt state-of-the-art DCNN frameworks in computer vision for semantic segmentation for MSI imagery. To overcome label scarcity for MSI data, we substitute real MSI for generated synthetic MSI in order to initialize a DCNN framework. We evaluate our network initialization scheme on the new RIT-18 dataset that we present in this paper. This dataset contains very-high resolution MSI collected by an unmanned aircraft system. The models initialized with synthetic imagery were less prone to over-fitting and provide a state-of-the-art baseline for future work. |
Tasks | Object Detection, Object Recognition, Semantic Segmentation |
Published | 2017-03-19 |
URL | http://arxiv.org/abs/1703.06452v3 |
http://arxiv.org/pdf/1703.06452v3.pdf | |
PWC | https://paperswithcode.com/paper/algorithms-for-semantic-segmentation-of |
Repo | https://github.com/rmkemker/RIT-18 |
Framework | none |
Sub-sampled Cubic Regularization for Non-convex Optimization
Title | Sub-sampled Cubic Regularization for Non-convex Optimization |
Authors | Jonas Moritz Kohler, Aurelien Lucchi |
Abstract | We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale learning. Here, we propose a novel method that uses sub-sampling to lower this computational cost. By the use of concentration inequalities we provide a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods. To the best of our knowledge this is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions. Furthermore, we provide experimental results supporting our theory. |
Tasks | |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05933v3 |
http://arxiv.org/pdf/1705.05933v3.pdf | |
PWC | https://paperswithcode.com/paper/sub-sampled-cubic-regularization-for-non |
Repo | https://github.com/dalab/subsampled_cubic_regularization |
Framework | none |
A Reasoning System for a First-Order Logic of Limited Belief
Title | A Reasoning System for a First-Order Logic of Limited Belief |
Authors | Christoph Schwering |
Abstract | Logics of limited belief aim at enabling computationally feasible reasoning in highly expressive representation languages. These languages are often dialects of first-order logic with a weaker form of logical entailment that keeps reasoning decidable or even tractable. While a number of such logics have been proposed in the past, they tend to remain for theoretical analysis only and their practical relevance is very limited. In this paper, we aim to go beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we develop a logic of limited belief that is highly expressive while remaining decidable in the first-order and tractable in the propositional case and exhibits some characteristics that make it attractive for an implementation. We introduce a reasoning system that employs this logic as representation language and present experimental results that showcase the benefit of limited belief. |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01817v1 |
http://arxiv.org/pdf/1705.01817v1.pdf | |
PWC | https://paperswithcode.com/paper/a-reasoning-system-for-a-first-order-logic-of |
Repo | https://github.com/schwering/limbo |
Framework | none |