Paper Group AWR 63
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. Automated Phrase Mining from Massive Text Corpora. Neural Attentive Session-based Recommendation. The Robust Manifold Defense: Adversarial Training using Generative Models. Learning Efficient Convolutional Networks through Network Slimming. Quantifying Mental Health from Soci …
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling
Title | Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling |
Authors | Alexander Richard, Hilde Kuehne, Juergen Gall |
Abstract | We present an approach for weakly supervised learning of human actions. Given a set of videos and an ordered list of the occurring actions, the goal is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. To address this task, we propose a combination of a discriminative representation of subactions, modeled by a recurrent neural network, and a coarse probabilistic model to allow for a temporal alignment and inference over long sequences. While this system alone already generates good results, we show that the performance can be further improved by approximating the number of subactions to the characteristics of the different action classes. To this end, we adapt the number of subaction classes by iterating realignment and reestimation during training. The proposed system is evaluated on two benchmark datasets, the Breakfast and the Hollywood extended dataset, showing a competitive performance on various weak learning tasks such as temporal action segmentation and action alignment. |
Tasks | action segmentation |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08132v3 |
http://arxiv.org/pdf/1703.08132v3.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-action-learning-with-rnn |
Repo | https://github.com/alexanderrichard/weakly-sup-action-learning |
Framework | none |
Automated Phrase Mining from Massive Text Corpora
Title | Automated Phrase Mining from Massive Text Corpora |
Authors | Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han |
Abstract | As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Phrase mining is important in various tasks such as information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. Recently, a few data-driven methods have been developed successfully for extraction of phrases from massive domain-specific text. However, none of the state-of-the-art models is fully automated because they require human experts for designing rules or labeling phrases. Since one can easily obtain many quality phrases from public knowledge bases to a scale that is much larger than that produced by human experts, in this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which leverages this large amount of high-quality phrases in an effective way and achieves better performance compared to limited human labeled phrases. In addition, we develop a POS-guided phrasal segmentation model, which incorporates the shallow syntactic information in part-of-speech (POS) tags to further enhance the performance, when a POS tagger is available. Note that, AutoPhrase can support any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, the new method has shown significant improvements in effectiveness on five real-world datasets across different domains and languages. |
Tasks | |
Published | 2017-02-15 |
URL | http://arxiv.org/abs/1702.04457v2 |
http://arxiv.org/pdf/1702.04457v2.pdf | |
PWC | https://paperswithcode.com/paper/automated-phrase-mining-from-massive-text |
Repo | https://github.com/shangjingbo1226/AutoNER |
Framework | pytorch |
Neural Attentive Session-based Recommendation
Title | Neural Attentive Session-based Recommendation |
Authors | Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma |
Abstract | Given e-commerce scenarios that user profiles are invisible, session-based recommendation is proposed to generate recommendation results from short sessions. Previous work only considers the user’s sequential behavior in the current session, whereas the user’s main purpose in the current session is not emphasized. In this paper, we propose a novel neural networks framework, i.e., Neural Attentive Recommendation Machine (NARM), to tackle this problem. Specifically, we explore a hybrid encoder with an attention mechanism to model the user’s sequential behavior and capture the user’s main purpose in the current session, which are combined as a unified session representation later. We then compute the recommendation scores for each candidate item with a bi-linear matching scheme based on this unified session representation. We train NARM by jointly learning the item and session representations as well as their matchings. We carried out extensive experiments on two benchmark datasets. Our experimental results show that NARM outperforms state-of-the-art baselines on both datasets. Furthermore, we also find that NARM achieves a significant improvement on long sessions, which demonstrates its advantages in modeling the user’s sequential behavior and main purpose simultaneously. |
Tasks | Session-Based Recommendations |
Published | 2017-11-13 |
URL | https://arxiv.org/abs/1711.04725v1 |
https://arxiv.org/pdf/1711.04725v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-attentive-session-based-recommendation |
Repo | https://github.com/lijingsdu/sessionRec_NARM |
Framework | none |
The Robust Manifold Defense: Adversarial Training using Generative Models
Title | The Robust Manifold Defense: Adversarial Training using Generative Models |
Authors | Ajil Jalal, Andrew Ilyas, Constantinos Daskalakis, Alexandros G. Dimakis |
Abstract | We propose a new type of attack for finding adversarial examples for image classifiers. Our method exploits spanners, i.e. deep neural networks whose input space is low-dimensional and whose output range approximates the set of images of interest. Spanners may be generators of GANs or decoders of VAEs. The key idea in our attack is to search over latent code pairs to find ones that generate nearby images with different classifier outputs. We argue that our attack is stronger than searching over perturbations of real images. Moreover, we show that our stronger attack can be used to reduce the accuracy of Defense-GAN to 3%, resolving an open problem from the well-known paper by Athalye et al. We combine our attack with normal adversarial training to obtain the most robust known MNIST classifier, significantly improving the state of the art against PGD attacks. Our formulation involves solving a min-max problem, where the min player sets the parameters of the classifier and the max player is running our attack, and is thus searching for adversarial examples in the {\em low-dimensional} input space of the spanner. All code and models are available at \url{https://github.com/ajiljalal/manifold-defense.git} |
Tasks | |
Published | 2017-12-26 |
URL | https://arxiv.org/abs/1712.09196v5 |
https://arxiv.org/pdf/1712.09196v5.pdf | |
PWC | https://paperswithcode.com/paper/the-robust-manifold-defense-adversarial |
Repo | https://github.com/ajiljalal/manifold-defense |
Framework | pytorch |
Learning Efficient Convolutional Networks through Network Slimming
Title | Learning Efficient Convolutional Networks through Network Slimming |
Authors | Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, Changshui Zhang |
Abstract | The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20x reduction in model size and a 5x reduction in computing operations. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.06519v1 |
http://arxiv.org/pdf/1708.06519v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-efficient-convolutional-networks |
Repo | https://github.com/liuzhuang13/slimming |
Framework | pytorch |
Quantifying Mental Health from Social Media with Neural User Embeddings
Title | Quantifying Mental Health from Social Media with Neural User Embeddings |
Authors | Silvio Amir, Glen Coppersmith, Paula Carvalho, Mário J. Silva, Byron C. Wallace |
Abstract | Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions. We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control’ users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health. | |
Tasks | Representation Learning |
Published | 2017-04-30 |
URL | http://arxiv.org/abs/1705.00335v1 |
http://arxiv.org/pdf/1705.00335v1.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-mental-health-from-social-media |
Repo | https://github.com/samiroid/usr2vec |
Framework | none |
Non-Autoregressive Neural Machine Translation
Title | Non-Autoregressive Neural Machine Translation |
Authors | Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher |
Abstract | Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian. |
Tasks | Machine Translation |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02281v2 |
http://arxiv.org/pdf/1711.02281v2.pdf | |
PWC | https://paperswithcode.com/paper/non-autoregressive-neural-machine-translation-1 |
Repo | https://github.com/MultiPath/NA-NMT |
Framework | pytorch |
Bias and high-dimensional adjustment in observational studies of peer effects
Title | Bias and high-dimensional adjustment in observational studies of peer effects |
Authors | Dean Eckles, Eytan Bakshy |
Abstract | Peer effects, in which the behavior of an individual is affected by the behavior of their peers, are posited by multiple theories in the social sciences. Other processes can also produce behaviors that are correlated in networks and groups, thereby generating debate about the credibility of observational (i.e. nonexperimental) studies of peer effects. Randomized field experiments that identify peer effects, however, are often expensive or infeasible. Thus, many studies of peer effects use observational data, and prior evaluations of causal inference methods for adjusting observational data to estimate peer effects have lacked an experimental “gold standard” for comparison. Here we show, in the context of information and media diffusion on Facebook, that high-dimensional adjustment of a nonexperimental control group (677 million observations) using propensity score models produces estimates of peer effects statistically indistinguishable from those from using a large randomized experiment (220 million observations). Naive observational estimators overstate peer effects by 320% and commonly used variables (e.g., demographics) offer little bias reduction, but adjusting for a measure of prior behaviors closely related to the focal behavior reduces bias by 91%. High-dimensional models adjusting for over 3,700 past behaviors provide additional bias reduction, such that the full model reduces bias by over 97%. This experimental evaluation demonstrates that detailed records of individuals’ past behavior can improve studies of social influence, information diffusion, and imitation; these results are encouraging for the credibility of some studies but also cautionary for studies of rare or new behaviors. More generally, these results show how large, high-dimensional data sets and statistical learning techniques can be used to improve causal inference in the behavioral sciences. |
Tasks | Causal Inference |
Published | 2017-06-14 |
URL | http://arxiv.org/abs/1706.04692v1 |
http://arxiv.org/pdf/1706.04692v1.pdf | |
PWC | https://paperswithcode.com/paper/bias-and-high-dimensional-adjustment-in |
Repo | https://github.com/fghjorth/vkme18 |
Framework | none |
Task-Oriented Query Reformulation with Reinforcement Learning
Title | Task-Oriented Query Reformulation with Reinforcement Learning |
Authors | Rodrigo Nogueira, Kyunghyun Cho |
Abstract | Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. We train this neural network with reinforcement learning. The actions correspond to selecting terms to build a reformulated query, and the reward is the document recall. We evaluate our approach on three datasets against strong baselines and show a relative improvement of 5-20% in terms of recall. Furthermore, we present a simple method to estimate a conservative upper-bound performance of a model in a particular environment and verify that there is still large room for improvements. |
Tasks | |
Published | 2017-04-15 |
URL | https://arxiv.org/abs/1704.04572v4 |
https://arxiv.org/pdf/1704.04572v4.pdf | |
PWC | https://paperswithcode.com/paper/task-oriented-query-reformulation-with-1 |
Repo | https://github.com/nyu-dl/dl4ir-query-reformulator |
Framework | none |
Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network
Title | Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network |
Authors | Sepideh Hosseinzadeh, Moein Shakeri, Hong Zhang |
Abstract | In recent years, various shadow detection methods from a single image have been proposed and used in vision systems; however, most of them are not appropriate for the robotic applications due to the expensive time complexity. This paper introduces a fast shadow detection method using a deep learning framework, with a time cost that is appropriate for robotic applications. In our solution, we first obtain a shadow prior map with the help of multi-class support vector machine using statistical features. Then, we use a semantic- aware patch-level Convolutional Neural Network that efficiently trains on shadow examples by combining the original image and the shadow prior map. Experiments on benchmark datasets demonstrate the proposed method significantly decreases the time complexity of shadow detection, by one or two orders of magnitude compared with state-of-the-art methods, without losing accuracy. |
Tasks | Shadow Detection |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.09283v2 |
http://arxiv.org/pdf/1709.09283v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-shadow-detection-from-a-single-image |
Repo | https://github.com/sepidehhosseinzadeh/Fast-Shadow-Detection |
Framework | none |
Dynamic Entity Representations in Neural Language Models
Title | Dynamic Entity Representations in Neural Language Models |
Authors | Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, Noah A. Smith |
Abstract | Understanding a long document requires tracking how entities are introduced and evolve over time. We present a new type of language model, EntityNLM, that can explicitly model entities, dynamically update their representations, and contextually generate their mentions. Our model is generative and flexible; it can model an arbitrary number of entities in context while generating each entity mention at an arbitrary length. In addition, it can be used for several different tasks such as language modeling, coreference resolution, and entity prediction. Experimental results with all these tasks demonstrate that our model consistently outperforms strong baselines and prior work. |
Tasks | Coreference Resolution, Language Modelling |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00781v1 |
http://arxiv.org/pdf/1708.00781v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-entity-representations-in-neural |
Repo | https://github.com/smartschat/cort |
Framework | none |
Progressive Color Transfer with Dense Semantic Correspondences
Title | Progressive Color Transfer with Dense Semantic Correspondences |
Authors | Mingming He, Jing Liao, Dongdong Chen, Lu Yuan, Pedro V. Sander |
Abstract | We propose a new algorithm for color transfer between images that have perceptually similar semantic structures. We aim to achieve a more accurate color transfer that leverages semantically-meaningful dense correspondence between images. To accomplish this, our algorithm uses neural representations for matching. Additionally, the color transfer should be spatially variant and globally coherent. Therefore, our algorithm optimizes a local linear model for color transfer satisfying both local and global constraints. Our proposed approach jointly optimizes matching and color transfer, adopting a coarse-to-fine strategy. The proposed method can be successfully extended from one-to-one to one-to-many color transfer. The latter further addresses the problem of mismatching elements of the input image. We validate our proposed method by testing it on a large variety of image content. |
Tasks | |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00756v2 |
http://arxiv.org/pdf/1710.00756v2.pdf | |
PWC | https://paperswithcode.com/paper/progressive-color-transfer-with-dense |
Repo | https://github.com/hokkaido/otomo |
Framework | pytorch |
Clustering Signed Networks with the Geometric Mean of Laplacians
Title | Clustering Signed Networks with the Geometric Mean of Laplacians |
Authors | Pedro Mercado, Francesco Tudisco, Matthias Hein |
Abstract | Signed networks allow to model positive and negative relationships. We analyze existing extensions of spectral clustering to signed networks. It turns out that existing approaches do not recover the ground truth clustering in several situations where either the positive or the negative network structures contain no noise. Our analysis shows that these problems arise as existing approaches take some form of arithmetic mean of the Laplacians of the positive and negative part. As a solution we propose to use the geometric mean of the Laplacians of positive and negative part and show that it outperforms the existing approaches. While the geometric mean of matrices is computationally expensive, we show that eigenvectors of the geometric mean can be computed efficiently, leading to a numerical scheme for sparse matrices which is of independent interest. |
Tasks | |
Published | 2017-01-03 |
URL | http://arxiv.org/abs/1701.00757v1 |
http://arxiv.org/pdf/1701.00757v1.pdf | |
PWC | https://paperswithcode.com/paper/clustering-signed-networks-with-the-geometric |
Repo | https://github.com/melopeo/GM |
Framework | none |
Gradient Episodic Memory for Continual Learning
Title | Gradient Episodic Memory for Continual Learning |
Authors | David Lopez-Paz, Marc’Aurelio Ranzato |
Abstract | One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art. |
Tasks | Continual Learning |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08840v5 |
http://arxiv.org/pdf/1706.08840v5.pdf | |
PWC | https://paperswithcode.com/paper/gradient-episodic-memory-for-continual |
Repo | https://github.com/facebookresearch/GradientEpisodicMemory |
Framework | pytorch |
Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage
Title | Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage |
Authors | Alp Yurtsever, Madeleine Udell, Joel A. Tropp, Volkan Cevher |
Abstract | This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics. |
Tasks | |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06838v1 |
http://arxiv.org/pdf/1702.06838v1.pdf | |
PWC | https://paperswithcode.com/paper/sketchy-decisions-convex-low-rank-matrix |
Repo | https://github.com/dseuss/pyscgm |
Framework | none |