Paper Group ANR 684
Visual Reasoning with Natural Language. Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples. Deep Learning for Sensor-based Activity Recognition: A Survey. Generative Semantic Manipulation with Contrasting GAN. An interpretable latent variable model for attribute applicability in the Amazon catalogue. LIUBoost : Locality I …
Visual Reasoning with Natural Language
Title | Visual Reasoning with Natural Language |
Authors | Stephanie Zhou, Alane Suhr, Yoav Artzi |
Abstract | Natural language provides a widely accessible and expressive interface for robotic agents. To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world. Such reasoning over language and vision is an open problem that is receiving increasing attention. While existing data sets focus on visual diversity, they do not display the full range of natural language expressions, such as counting, set reasoning, and comparisons. We propose a simple task for natural language visual reasoning, where images are paired with descriptive statements. The task is to predict if a statement is true for the given scene. This abstract describes our existing synthetic images corpus and our current work on collecting real vision data. |
Tasks | Visual Reasoning |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00453v1 |
http://arxiv.org/pdf/1710.00453v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-reasoning-with-natural-language |
Repo | |
Framework | |
Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples
Title | Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples |
Authors | Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao |
Abstract | Deep neural networks (DNNs) have demonstrated impressive performance on a wide array of tasks, but they are usually considered opaque since internal structure and learned parameters are not interpretable. In this paper, we re-examine the internal representations of DNNs using adversarial images, which are generated by an ensemble-optimization algorithm. We find that: (1) the neurons in DNNs do not truly detect semantic objects/parts, but respond to objects/parts only as recurrent discriminative patches; (2) deep visual representations are not robust distributed codes of visual concepts because the representations of adversarial images are largely not consistent with those of real images, although they have similar visual appearance, both of which are different from previous findings. To further improve the interpretability of DNNs, we propose an adversarial training scheme with a consistent loss such that the neurons are endowed with human-interpretable concepts. The induced interpretable representations enable us to trace eventual outcomes back to influential neurons. Therefore, human users can know how the models make predictions, as well as when and why they make errors. |
Tasks | |
Published | 2017-08-18 |
URL | http://arxiv.org/abs/1708.05493v1 |
http://arxiv.org/pdf/1708.05493v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-interpretable-deep-neural-networks-by |
Repo | |
Framework | |
Deep Learning for Sensor-based Activity Recognition: A Survey
Title | Deep Learning for Sensor-based Activity Recognition: A Survey |
Authors | Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, Lisha Hu |
Abstract | Sensor-based activity recognition seeks the profound high-level knowledge about human activities from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, those methods often heavily rely on heuristic hand-crafted feature extraction, which could hinder their generalization performance. Additionally, existing methods are undermined for unsupervised and incremental learning tasks. Recently, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. This paper surveys the recent advance of deep learning based sensor-based activity recognition. We summarize existing literature from three aspects: sensor modality, deep model, and application. We also present detailed insights on existing work and propose grand challenges for future research. |
Tasks | Activity Recognition |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03502v2 |
http://arxiv.org/pdf/1707.03502v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-sensor-based-activity |
Repo | |
Framework | |
Generative Semantic Manipulation with Contrasting GAN
Title | Generative Semantic Manipulation with Contrasting GAN |
Authors | Xiaodan Liang, Hao Zhang, Eric P. Xing |
Abstract | Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer. However, existing models can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high-level semantic meanings (e.g., geometric structure or content) of objects. On the other hand, while some researches can synthesize compelling real-world images given a class label or caption, they cannot condition on arbitrary shapes or structures, which largely limits their application scenarios and interpretive capability of model results. In this work, we focus on a more challenging semantic manipulation task, which aims to modify the semantic meaning of an object while preserving its own characteristics (e.g. viewpoints and shapes), such as cow$\rightarrow$sheep, motor$\rightarrow$ bicycle, cat$\rightarrow$dog. To tackle such large semantic changes, we introduce a contrasting GAN (contrast-GAN) with a novel adversarial contrasting objective. Instead of directly making the synthesized samples close to target data as previous GANs did, our adversarial contrasting objective optimizes over the distance comparisons between samples, that is, enforcing the manipulated data be semantically closer to the real data with target category than the input data. Equipped with the new contrasting objective, a novel mask-conditional contrast-GAN architecture is proposed to enable disentangle image background with object semantic changes. Experiments on several semantic manipulation tasks on ImageNet and MSCOCO dataset show considerable performance gain by our contrast-GAN over other conditional GANs. Quantitative results further demonstrate the superiority of our model on generating manipulated results with high visual fidelity and reasonable object semantics. |
Tasks | Image-to-Image Translation, Style Transfer |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00315v1 |
http://arxiv.org/pdf/1708.00315v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-semantic-manipulation-with |
Repo | |
Framework | |
An interpretable latent variable model for attribute applicability in the Amazon catalogue
Title | An interpretable latent variable model for attribute applicability in the Amazon catalogue |
Authors | Tammo Rukat, Dustin Lange, Cédric Archambeau |
Abstract | Learning attribute applicability of products in the Amazon catalog (e.g., predicting that a shoe should have a value for size, but not for battery-type at scale is a challenge. The need for an interpretable model is contingent on (1) the lack of ground truth training data, (2) the need to utilise prior information about the underlying latent space and (3) the ability to understand the quality of predictions on new, unseen data. To this end, we develop the MaxMachine, a probabilistic latent variable model that learns distributed binary representations, associated to sets of features that are likely to co-occur in the data. Layers of MaxMachines can be stacked such that higher layers encode more abstract information. Any set of variables can be clamped to encode prior information. We develop fast sampling based posterior inference. Preliminary results show that the model improves over the baseline in 17 out of 19 product groups and provides qualitatively reasonable predictions. |
Tasks | |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1712.00126v2 |
http://arxiv.org/pdf/1712.00126v2.pdf | |
PWC | https://paperswithcode.com/paper/an-interpretable-latent-variable-model-for |
Repo | |
Framework | |
LIUBoost : Locality Informed Underboosting for Imbalanced Data Classification
Title | LIUBoost : Locality Informed Underboosting for Imbalanced Data Classification |
Authors | Sajid Ahmed, Farshid Rayhan, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid, Chowdhury Mofizur Rahman |
Abstract | The problem of class imbalance along with class-overlapping has become a major issue in the domain of supervised learning. Most supervised learning algorithms assume equal cardinality of the classes under consideration while optimizing the cost function and this assumption does not hold true for imbalanced datasets which results in sub-optimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble based methods have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting based method called LIUBoost. LIUBoost uses under sampling for balancing the datasets in every boosting iteration like RUSBoost while incorporating a cost term for every instance based on their hardness into the weight update formula minimizing the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets and the results indicate significant improvement over existing best performing method RUSBoost. |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05365v1 |
http://arxiv.org/pdf/1711.05365v1.pdf | |
PWC | https://paperswithcode.com/paper/liuboost-locality-informed-underboosting-for |
Repo | |
Framework | |
GazeDirector: Fully Articulated Eye Gaze Redirection in Video
Title | GazeDirector: Fully Articulated Eye Gaze Redirection in Video |
Authors | Erroll Wood, Tadas Baltrusaitis, Louis-Philippe Morency, Peter Robinson, Andreas Bulling |
Abstract | We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior. |
Tasks | Gaze Estimation |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08763v1 |
http://arxiv.org/pdf/1704.08763v1.pdf | |
PWC | https://paperswithcode.com/paper/gazedirector-fully-articulated-eye-gaze |
Repo | |
Framework | |
Personal Names in Modern Turkey
Title | Personal Names in Modern Turkey |
Authors | Amaç Herdağdelen |
Abstract | We analyzed the most common 5000 male and 5000 female Turkish names based on their etymological, morphological, and semantic attributes. The name statistics are based on all Turkish citizens who were alive in 2014 and they cover 90% of all population. To the best of our knowledge, this study is the most comprehensive data-driven analysis of Turkish personal names. Female names have a greater diversity than male names (e.g., top 15 male names cover 25% of the male population, whereas top 28 female names cover 25% of the female population). Despite their diversity, female names exhibit predictable patterns. For example, certain roots such as g"ul and nar (rose and pomegranate/red, respectively) are used to generate hundreds of unique female names. Turkish personal names have their origins mainly in Arabic, followed by Turkish and Persian. We computed overall frequencies of names according to broad semantic themes that were identified in previous studies. We found that foreign-origin names such as olga and khaled, pastoral names such as ya\u{g}mur and deniz (rain and sea, respectively), and names based on fruits and plants such as filiz and menek\c{s}e (sprout and violet, respectively) are more frequently observed among females. Among males, names based on animals such as arslan and yunus (lion and dolphin, respectively) and names based on famous and/or historical figures such as mustafa kemal and o\u{g}uz ka\u{g}an (founder of the Turkish Republic and the founder of the Turks in Turkish mythology, respectively) are observed more frequently. |
Tasks | |
Published | 2017-12-29 |
URL | http://arxiv.org/abs/1801.00049v2 |
http://arxiv.org/pdf/1801.00049v2.pdf | |
PWC | https://paperswithcode.com/paper/personal-names-in-modern-turkey |
Repo | |
Framework | |
Learning Generalized Reactive Policies using Deep Neural Networks
Title | Learning Generalized Reactive Policies using Deep Neural Networks |
Authors | Edward Groshev, Maxwell Goldstein, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel |
Abstract | We present a new approach to learning for planning, where knowledge acquired while solving a given set of planning problems is used to plan faster in related, but new problem instances. We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances. In contrast to prior efforts in this direction, our approach significantly reduces the dependence of learning on handcrafted domain knowledge or feature selection. Instead, the GRP is trained from scratch using a set of successful execution traces. We show that our approach can also be used to automatically learn a heuristic function that can be used in directed search algorithms. We evaluate our approach using an extensive suite of experiments on two challenging planning problem domains and show that our approach facilitates learning complex decision making policies and powerful heuristic functions with minimal human input. Videos of our results are available at goo.gl/Hpy4e3. |
Tasks | Decision Making, Feature Selection |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1708.07280v3 |
http://arxiv.org/pdf/1708.07280v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-generalized-reactive-policies-using |
Repo | |
Framework | |
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Title | Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study |
Authors | Siddique Latif, Rajib Rana, Junaid Qadir, Julien Epps |
Abstract | Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have shown strong success in many problems, especially in image processing. In particular, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success for generating features for natural images. Inspired by this, we propose VAEs for deriving the latent representation of speech signals and use this representation to classify emotions. To the best of our knowledge, we are the first to propose VAEs for speech emotion classification. Evaluations on the IEMOCAP dataset demonstrate that features learned by VAEs can produce state-of-the-art results for speech emotion classification. |
Tasks | Emotion Classification, Emotion Recognition, Speech Emotion Recognition |
Published | 2017-12-23 |
URL | http://arxiv.org/abs/1712.08708v2 |
http://arxiv.org/pdf/1712.08708v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-autoencoders-for-learning-latent |
Repo | |
Framework | |
AI-Powered Social Bots
Title | AI-Powered Social Bots |
Authors | Terrence Adams |
Abstract | This paper gives an overview of impersonation bots that generate output in one, or possibly, multiple modalities. We also discuss rapidly advancing areas of machine learning and artificial intelligence that could lead to frighteningly powerful new multi-modal social bots. Our main conclusion is that most commonly known bots are one dimensional (i.e., chatterbot), and far from deceiving serious interrogators. However, using recent advances in machine learning, it is possible to unleash incredibly powerful, human-like armies of social bots, in potentially well coordinated campaigns of deception and influence. |
Tasks | |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1706.05143v1 |
http://arxiv.org/pdf/1706.05143v1.pdf | |
PWC | https://paperswithcode.com/paper/ai-powered-social-bots |
Repo | |
Framework | |
Multi-Modal Obstacle Detection in Unstructured Environments with Conditional Random Fields
Title | Multi-Modal Obstacle Detection in Unstructured Environments with Conditional Random Fields |
Authors | Mikkel Kragh, James Underwood |
Abstract | Reliable obstacle detection and classification in rough and unstructured terrain such as agricultural fields or orchards remains a challenging problem. These environments involve large variations in both geometry and appearance, challenging perception systems that rely on only a single sensor modality. Geometrically, tall grass, fallen leaves, or terrain roughness can mistakenly be perceived as nontraversable or might even obscure actual obstacles. Likewise, traversable grass or dirt roads and obstacles such as trees and bushes might be visually ambiguous. In this paper, we combine appearance- and geometry-based detection methods by probabilistically fusing lidar and camera sensing with semantic segmentation using a conditional random field. We apply a state-of-the-art multimodal fusion algorithm from the scene analysis domain and adjust it for obstacle detection in agriculture with moving ground vehicles. This involves explicitly handling sparse point cloud data and exploiting both spatial, temporal, and multimodal links between corresponding 2D and 3D regions. The proposed method was evaluated on a diverse data set, comprising a dairy paddock and different orchards gathered with a perception research robot in Australia. Results showed that for a two-class classification problem (ground and nonground), only the camera leveraged from information provided by the other modality with an increase in the mean classification score of 0.5%. However, as more classes were introduced (ground, sky, vegetation, and object), both modalities complemented each other with improvements of 1.4% in 2D and 7.9% in 3D. Finally, introducing temporal links between successive frames resulted in improvements of 0.2% in 2D and 1.5% in 3D. |
Tasks | Semantic Segmentation |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.02908v2 |
http://arxiv.org/pdf/1706.02908v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-obstacle-detection-in |
Repo | |
Framework | |
Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization
Title | Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization |
Authors | Feihu Huang, Songcan Chen |
Abstract | In this paper, we study the stochastic gradient descent (SGD) method for the nonconvex nonsmooth optimization, and propose an accelerated SGD method by combining the variance reduction technique with Nesterov’s extrapolation technique. Moreover, based on the local error bound condition, we establish the linear convergence of our method to obtain a stationary point of the nonconvex optimization. In particular, we prove that not only the sequence generated linearly converges to a stationary point of the problem, but also the corresponding sequence of objective values is linearly convergent. Finally, some numerical experiments demonstrate the effectiveness of our method. To the best of our knowledge, it is first proved that the accelerated SGD method converges linearly to the local minimum of the nonconvex optimization. |
Tasks | |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.07953v2 |
http://arxiv.org/pdf/1704.07953v2.pdf | |
PWC | https://paperswithcode.com/paper/linear-convergence-of-accelerated-stochastic |
Repo | |
Framework | |
Sparse Communication for Distributed Gradient Descent
Title | Sparse Communication for Distributed Gradient Descent |
Authors | Alham Fikri Aji, Kenneth Heafield |
Abstract | We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and apply them to neural machine translation and MNIST image classification tasks. Most configurations work on MNIST, whereas different configurations reduce convergence rate on the more complex translation task. Our experiments show that we can achieve up to 49% speed up on MNIST and 22% on NMT without damaging the final accuracy or BLEU. |
Tasks | Image Classification, Machine Translation, Quantization |
Published | 2017-04-17 |
URL | http://arxiv.org/abs/1704.05021v2 |
http://arxiv.org/pdf/1704.05021v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-communication-for-distributed-gradient |
Repo | |
Framework | |
Optimal Cooperative Inference
Title | Optimal Cooperative Inference |
Authors | Scott Cheng-Hsin Yang, Yue Yu, Arash Givchi, Pei Wang, Wai Keen Vong, Patrick Shafto |
Abstract | Cooperative transmission of data fosters rapid accumulation of knowledge by efficiently combining experiences across learners. Although well studied in human learning and increasingly in machine learning, we lack formal frameworks through which we may reason about the benefits and limitations of cooperative inference. We present such a framework. We introduce novel indices for measuring the effectiveness of probabilistic and cooperative information transmission. We relate our indices to the well-known Teaching Dimension in deterministic settings. We prove conditions under which optimal cooperative inference can be achieved, including a representation theorem that constrains the form of inductive biases for learners optimized for cooperative inference. We conclude by demonstrating how these principles may inform the design of machine learning algorithms and discuss implications for human and machine learning. |
Tasks | |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08971v2 |
http://arxiv.org/pdf/1705.08971v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-cooperative-inference |
Repo | |
Framework | |