May 7, 2019

3444 words 17 mins read

Paper Group AWR 29

Paper Group AWR 29

Online Learning of Event Definitions. Neural Summarization by Extracting Sentences and Words. Image Captioning with Deep Bidirectional LSTMs. Trained Ternary Quantization. View Synthesis by Appearance Flow. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Action Recognition with Dynamic Image Networks. Regularizing …

Online Learning of Event Definitions

Title Online Learning of Event Definitions
Authors Nikos Katzouris, Alexander Artikis, Georgios Paliouras
Abstract Systems for symbolic event recognition infer occurrences of events in time using a set of event definitions in the form of first-order rules. The Event Calculus is a temporal logic that has been used as a basis in event recognition applications, providing among others, direct connections to machine learning, via Inductive Logic Programming (ILP). We present an ILP system for online learning of Event Calculus theories. To allow for a single-pass learning strategy, we use the Hoeffding bound for evaluating clauses on a subset of the input stream. We employ a decoupling scheme of the Event Calculus axioms during the learning process, that allows to learn each clause in isolation. Moreover, we use abductive-inductive logic programming techniques to handle unobserved target predicates. We evaluate our approach on an activity recognition application and compare it to a number of batch learning techniques. We obtain results of comparable predicative accuracy with significant speed-ups in training time. We also outperform hand-crafted rules and match the performance of a sound incremental learner that can only operate on noise-free datasets. This paper is under consideration for acceptance in TPLP.
Tasks Activity Recognition
Published 2016-07-30
URL http://arxiv.org/abs/1608.00100v1
PDF http://arxiv.org/pdf/1608.00100v1.pdf
PWC https://paperswithcode.com/paper/online-learning-of-event-definitions
Repo https://github.com/nkatzz/OLED
Framework none

Neural Summarization by Extracting Sentences and Words

Title Neural Summarization by Extracting Sentences and Words
Authors Jianpeng Cheng, Mirella Lapata
Abstract Traditional approaches to extractive summarization rely heavily on human-engineered features. In this work we propose a data-driven approach based on neural networks and continuous sentence features. We develop a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor. This architecture allows us to develop different classes of summarization models which can extract sentences or words. We train our models on large scale corpora containing hundreds of thousands of document-summary pairs. Experimental results on two summarization datasets demonstrate that our models obtain results comparable to the state of the art without any access to linguistic annotation.
Tasks Document Summarization
Published 2016-03-23
URL http://arxiv.org/abs/1603.07252v3
PDF http://arxiv.org/pdf/1603.07252v3.pdf
PWC https://paperswithcode.com/paper/neural-summarization-by-extracting-sentences
Repo https://github.com/kedz/nnsum
Framework pytorch

Image Captioning with Deep Bidirectional LSTMs

Title Image Captioning with Deep Bidirectional LSTMs
Authors Cheng Wang, Haojin Yang, Christian Bartz, Christoph Meinel
Abstract This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning long term visual-language interactions by making use of history and future context information at high level semantic space. Two novel deep bidirectional variant models, in which we increase the depth of nonlinearity transition in different way, are proposed to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale and vertical mirror are proposed to prevent overfitting in training deep models. We visualize the evolution of bidirectional LSTM internal states over time and qualitatively analyze how our models “translate” image to sentence. Our proposed models are evaluated on caption generation and image-sentence retrieval tasks with three benchmark datasets: Flickr8K, Flickr30K and MSCOCO datasets. We demonstrate that bidirectional LSTM models achieve highly competitive performance to the state-of-the-art results on caption generation even without integrating additional mechanism (e.g. object detection, attention model etc.) and significantly outperform recent methods on retrieval task.
Tasks Data Augmentation, Image Captioning, Object Detection
Published 2016-04-04
URL http://arxiv.org/abs/1604.00790v3
PDF http://arxiv.org/pdf/1604.00790v3.pdf
PWC https://paperswithcode.com/paper/image-captioning-with-deep-bidirectional
Repo https://github.com/deepsemantic/image_captioning
Framework none

Trained Ternary Quantization

Title Trained Ternary Quantization
Authors Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
Abstract Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. And our AlexNet model is trained from scratch, which means it’s as easy as to train normal full precision model. We highlight our trained quantization method that can learn both ternary values and ternary assignment. During inference, only ternary values (2-bit weights) and scaling factors are needed, therefore our models are nearly 16x smaller than full-precision models. Our ternary models can also be viewed as sparse binary weight networks, which can potentially be accelerated with custom circuit. Experiments on CIFAR-10 show that the ternary models obtained by trained quantization method outperform full-precision models of ResNet-32,44,56 by 0.04%, 0.16%, 0.36%, respectively. On ImageNet, our model outperforms full-precision AlexNet model by 0.3% of Top-1 accuracy and outperforms previous ternary models by 3%.
Tasks Quantization
Published 2016-12-04
URL http://arxiv.org/abs/1612.01064v3
PDF http://arxiv.org/pdf/1612.01064v3.pdf
PWC https://paperswithcode.com/paper/trained-ternary-quantization
Repo https://github.com/vinsis/ternary-quantization
Framework pytorch

View Synthesis by Appearance Flow

Title View Synthesis by Appearance Flow
Authors Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros
Abstract We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.
Tasks Novel View Synthesis
Published 2016-05-11
URL http://arxiv.org/abs/1605.03557v3
PDF http://arxiv.org/pdf/1605.03557v3.pdf
PWC https://paperswithcode.com/paper/view-synthesis-by-appearance-flow
Repo https://github.com/RenYurui/Global-Flow-Local-Attention
Framework pytorch

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Title Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Authors Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
Abstract We propose a technique for producing “visual explanations” for decisions from a large class of CNN-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept. Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers, (2) CNNs used for structured outputs, (3) CNNs used in tasks with multimodal inputs or reinforcement learning, without any architectural changes or re-training. We combine Grad-CAM with fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to off-the-shelf image classification, captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into their failure modes, (b) are robust to adversarial images, (c) outperform previous methods on localization, (d) are more faithful to the underlying model and (e) help achieve generalization by identifying dataset bias. For captioning and VQA, we show that even non-attention based models can localize inputs. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM helps users establish appropriate trust in predictions from models and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ nodel from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo at http://gradcam.cloudcv.org, and a video at youtu.be/COjUB9Izk6E.
Tasks Image Classification, Interpretable Machine Learning, Visual Question Answering
Published 2016-10-07
URL https://arxiv.org/abs/1610.02391v4
PDF https://arxiv.org/pdf/1610.02391v4.pdf
PWC https://paperswithcode.com/paper/grad-cam-visual-explanations-from-deep
Repo https://github.com/Murali81/Grad-CAM
Framework none

Action Recognition with Dynamic Image Networks

Title Action Recognition with Dynamic Image Networks
Authors Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi
Abstract We introduce the concept of “dynamic image”, a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of `rank pooling’. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. When a linear ranking machine is used, the resulting representation is in the form of an image, which we call dynamic because it summarizes the video dynamics in addition of appearance. This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos. We also present an efficient and effective approximate rank pooling operator, accelerating standard rank pooling algorithms by orders of magnitude, and formulate that as a CNN layer. This new layer allows generalizing dynamic images to dynamic feature maps. We demonstrate the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance. |
Tasks Optical Flow Estimation, Temporal Action Localization
Published 2016-12-02
URL http://arxiv.org/abs/1612.00738v2
PDF http://arxiv.org/pdf/1612.00738v2.pdf
PWC https://paperswithcode.com/paper/action-recognition-with-dynamic-image
Repo https://github.com/hbilen/dynamic-image-nets
Framework none

Regularizing CNNs with Locally Constrained Decorrelations

Title Regularizing CNNs with Locally Constrained Decorrelations
Authors Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
Abstract Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting. However, the most prevalent regularizations do not leverage all the capacity of the models since they rely on reducing the effective number of parameters. Feature decorrelation is an alternative for using the full capacity of the models but the overfitting reduction margins are too narrow given the overhead it introduces. In this paper, we show that regularizing negatively correlated features is an obstacle for effective decorrelation and present OrthoReg, a novel regularization technique that locally enforces feature orthogonality. As a result, imposing locality constraints in feature decorrelation removes interferences between negatively correlated feature weights, allowing the regularizer to reach higher decorrelation bounds, and reducing the overfitting more effectively. In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present. Moreover, since our regularization is directly performed on the weights, it is especially suitable for fully convolutional neural networks, where the weight space is constant compared to the feature map space. As a result, we are able to reduce the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.
Tasks
Published 2016-11-07
URL http://arxiv.org/abs/1611.01967v2
PDF http://arxiv.org/pdf/1611.01967v2.pdf
PWC https://paperswithcode.com/paper/regularizing-cnns-with-locally-constrained
Repo https://github.com/prlz77/orthoreg
Framework none

DeepMind Lab

Title DeepMind Lab
Authors Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, Stig Petersen
Abstract DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community.
Tasks
Published 2016-12-12
URL http://arxiv.org/abs/1612.03801v2
PDF http://arxiv.org/pdf/1612.03801v2.pdf
PWC https://paperswithcode.com/paper/deepmind-lab
Repo https://github.com/deepmind/lab
Framework none

Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

Title Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors
Authors Christos Louizos, Max Welling
Abstract We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian \cite{gupta1999matrix} parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achieve a more efficient way to represent those correlations that is also cheaper than fully factorized parameter posteriors. We further show that with the “local reprarametrization trick” \cite{kingma2015variational} on this posterior distribution we arrive at a Gaussian Process \cite{rasmussen2006gaussian} interpretation of the hidden units in each layer and we, similarly with \cite{gal2015dropout}, provide connections with deep Gaussian processes. We continue in taking advantage of this duality and incorporate “pseudo-data” \cite{snelson2005sparse} in our model, which in turn allows for more efficient sampling while maintaining the properties of the original model. The validity of the proposed approach is verified through extensive experiments.
Tasks Gaussian Processes
Published 2016-03-15
URL http://arxiv.org/abs/1603.04733v5
PDF http://arxiv.org/pdf/1603.04733v5.pdf
PWC https://paperswithcode.com/paper/structured-and-efficient-variational-deep
Repo https://github.com/AMLab-Amsterdam/SEVDL_MGP
Framework none

Stealing Machine Learning Models via Prediction APIs

Title Stealing Machine Learning Models via Prediction APIs
Authors Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, Thomas Ristenpart
Abstract Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data, aims to duplicate the functionality of (i.e., “steal”) the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.
Tasks
Published 2016-09-09
URL http://arxiv.org/abs/1609.02943v2
PDF http://arxiv.org/pdf/1609.02943v2.pdf
PWC https://paperswithcode.com/paper/stealing-machine-learning-models-via
Repo https://github.com/ftramer/Steal-ML
Framework none

From Node Embedding To Community Embedding

Title From Node Embedding To Community Embedding
Authors Vincent W. Zheng, Sandro Cavallari, Hongyun Cai, Kevin Chen-Chuan Chang, Erik Cambria
Abstract Most of the existing graph embedding methods focus on nodes, which aim to output a vector representation for each node in the graph such that two nodes being “close” on the graph are close too in the low-dimensional space. Despite the success of embedding individual nodes for graph analytics, we notice that an important concept of embedding communities (i.e., groups of nodes) is missing. Embedding communities is useful, not only for supporting various community-level applications, but also to help preserve community structure in graph embedding. In fact, we see community embedding as providing a higher-order proximity to define the node closeness, whereas most of the popular graph embedding methods focus on first-order and/or second-order proximities. To learn the community embedding, we hinge upon the insight that community embedding and node embedding reinforce with each other. As a result, we propose ComEmbed, the first community embedding method, which jointly optimizes the community embedding and node embedding together. We evaluate ComEmbed on real-world data sets. We show it outperforms the state-of-the-art baselines in both tasks of node classification and community prediction.
Tasks Graph Embedding, Node Classification
Published 2016-10-31
URL http://arxiv.org/abs/1610.09950v2
PDF http://arxiv.org/pdf/1610.09950v2.pdf
PWC https://paperswithcode.com/paper/from-node-embedding-to-community-embedding
Repo https://github.com/vwz/topolstm
Framework none

Cost-Sensitive Label Embedding for Multi-Label Classification

Title Cost-Sensitive Label Embedding for Multi-Label Classification
Authors Kuan-Hao Huang, Hsuan-Tien Lin
Abstract Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.
Tasks Multi-Label Classification
Published 2016-03-30
URL http://arxiv.org/abs/1603.09048v5
PDF http://arxiv.org/pdf/1603.09048v5.pdf
PWC https://paperswithcode.com/paper/cost-sensitive-label-embedding-for-multi
Repo https://github.com/ej0cl6/csmlc
Framework none

The Little Engine that Could: Regularization by Denoising (RED)

Title The Little Engine that Could: Regularization by Denoising (RED)
Authors Yaniv Romano, Michael Elad, Peyman Milanfar
Abstract Removal of noise from an image is an extensively studied problem in image processing. Indeed, the recent advent of sophisticated and highly effective denoising algorithms lead some to believe that existing methods are touching the ceiling in terms of noise removal performance. Can we leverage this impressive achievement to treat other tasks in image processing? Recent work has answered this question positively, in the form of the Plug-and-Play Prior ($P^3$) method, showing that any inverse problem can be handled by sequentially applying image denoising steps. This relies heavily on the ADMM optimization technique in order to obtain this chained denoising interpretation. Is this the only way in which tasks in image processing can exploit the image denoising engine? In this paper we provide an alternative, more powerful and more flexible framework for achieving the same goal. As opposed to the $P^3$ method, we offer Regularization by Denoising (RED): using the denoising engine in defining the regularization of the inverse problem. We propose an explicit image-adaptive Laplacian-based regularization functional, making the overall objective functional clearer and better defined. With a complete flexibility to choose the iterative optimization procedure for minimizing the above functional, RED is capable of incorporating any image denoising algorithm, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result. We test this approach and demonstrate state-of-the-art results in the image deblurring and super-resolution problems.
Tasks Deblurring, Denoising, Image Denoising, Super-Resolution
Published 2016-11-09
URL http://arxiv.org/abs/1611.02862v3
PDF http://arxiv.org/pdf/1611.02862v3.pdf
PWC https://paperswithcode.com/paper/the-little-engine-that-could-regularization
Repo https://github.com/happyhongt/Acceleration-of-RED-via-Vector-Extrapolation
Framework none

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

Title Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
Authors Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao
Abstract Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, and background environment, thus leading to large intra-class variations. In addition, with the increasing number of scene categories, label ambiguity has become another crucial issue in large-scale classification. This paper focuses on large-scale scene recognition and makes two major contributions to tackle these issues. First, we propose a multi-resolution CNN architecture that captures visual content and structure at multiple levels. The multi-resolution CNNs are composed of coarse resolution CNNs and fine resolution CNNs, which are complementary to each other. Second, we design two knowledge guided disambiguation techniques to deal with the problem of label ambiguity. (i) We exploit the knowledge from the confusion matrix computed on validation data to merge ambiguous classes into a super category. (ii) We utilize the knowledge of extra networks to produce a soft label for each image. Then the super categories or soft labels are employed to guide CNN training on the Places2. We conduct extensive experiments on three large-scale image datasets (ImageNet, Places, and Places2), demonstrating the effectiveness of our approach. Furthermore, our method takes part in two major scene recognition challenges, and achieves the second place at the Places2 challenge in ILSVRC 2015, and the first place at the LSUN challenge in CVPR 2016. Finally, we directly test the learned representations on other scene benchmarks, and obtain the new state-of-the-art results on the MIT Indoor67 (86.7%) and SUN397 (72.0%). We release the code and models at~\url{https://github.com/wanglimin/MRCNN-Scene-Recognition}.
Tasks Scene Classification, Scene Recognition
Published 2016-10-04
URL http://arxiv.org/abs/1610.01119v2
PDF http://arxiv.org/pdf/1610.01119v2.pdf
PWC https://paperswithcode.com/paper/knowledge-guided-disambiguation-for-large
Repo https://github.com/wanglimin/MRCNN-Scene-Recognition
Framework none
comments powered by Disqus