May 7, 2019

3444 words 17 mins read

Paper Group AWR 29

Online Learning of Event Definitions. Neural Summarization by Extracting Sentences and Words. Image Captioning with Deep Bidirectional LSTMs. Trained Ternary Quantization. View Synthesis by Appearance Flow. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Action Recognition with Dynamic Image Networks. Regularizing …

Online Learning of Event Definitions


Title	Online Learning of Event Definitions
Authors	Nikos Katzouris, Alexander Artikis, Georgios Paliouras
Abstract	Systems for symbolic event recognition infer occurrences of events in time using a set of event definitions in the form of first-order rules. The Event Calculus is a temporal logic that has been used as a basis in event recognition applications, providing among others, direct connections to machine learning, via Inductive Logic Programming (ILP). We present an ILP system for online learning of Event Calculus theories. To allow for a single-pass learning strategy, we use the Hoeffding bound for evaluating clauses on a subset of the input stream. We employ a decoupling scheme of the Event Calculus axioms during the learning process, that allows to learn each clause in isolation. Moreover, we use abductive-inductive logic programming techniques to handle unobserved target predicates. We evaluate our approach on an activity recognition application and compare it to a number of batch learning techniques. We obtain results of comparable predicative accuracy with significant speed-ups in training time. We also outperform hand-crafted rules and match the performance of a sound incremental learner that can only operate on noise-free datasets. This paper is under consideration for acceptance in TPLP.
Tasks	Activity Recognition
Published	2016-07-30
URL	http://arxiv.org/abs/1608.00100v1
PDF	http://arxiv.org/pdf/1608.00100v1.pdf
PWC	https://paperswithcode.com/paper/online-learning-of-event-definitions
Repo	https://github.com/nkatzz/OLED
Framework	none

Neural Summarization by Extracting Sentences and Words


Title	Neural Summarization by Extracting Sentences and Words
Authors	Jianpeng Cheng, Mirella Lapata
Abstract	Traditional approaches to extractive summarization rely heavily on human-engineered features. In this work we propose a data-driven approach based on neural networks and continuous sentence features. We develop a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor. This architecture allows us to develop different classes of summarization models which can extract sentences or words. We train our models on large scale corpora containing hundreds of thousands of document-summary pairs. Experimental results on two summarization datasets demonstrate that our models obtain results comparable to the state of the art without any access to linguistic annotation.
Tasks	Document Summarization
Published	2016-03-23
URL	http://arxiv.org/abs/1603.07252v3
PDF	http://arxiv.org/pdf/1603.07252v3.pdf
PWC	https://paperswithcode.com/paper/neural-summarization-by-extracting-sentences
Repo	https://github.com/kedz/nnsum
Framework	pytorch

Image Captioning with Deep Bidirectional LSTMs


Title	Image Captioning with Deep Bidirectional LSTMs
Authors	Cheng Wang, Haojin Yang, Christian Bartz, Christoph Meinel
Abstract	This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning long term visual-language interactions by making use of history and future context information at high level semantic space. Two novel deep bidirectional variant models, in which we increase the depth of nonlinearity transition in different way, are proposed to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale and vertical mirror are proposed to prevent overfitting in training deep models. We visualize the evolution of bidirectional LSTM internal states over time and qualitatively analyze how our models “translate” image to sentence. Our proposed models are evaluated on caption generation and image-sentence retrieval tasks with three benchmark datasets: Flickr8K, Flickr30K and MSCOCO datasets. We demonstrate that bidirectional LSTM models achieve highly competitive performance to the state-of-the-art results on caption generation even without integrating additional mechanism (e.g. object detection, attention model etc.) and significantly outperform recent methods on retrieval task.
Tasks	Data Augmentation, Image Captioning, Object Detection
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00790v3
PDF	http://arxiv.org/pdf/1604.00790v3.pdf
PWC	https://paperswithcode.com/paper/image-captioning-with-deep-bidirectional
Repo	https://github.com/deepsemantic/image_captioning
Framework	none

Trained Ternary Quantization


Title	Trained Ternary Quantization
Authors	Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
Abstract	Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. And our AlexNet model is trained from scratch, which means it’s as easy as to train normal full precision model. We highlight our trained quantization method that can learn both ternary values and ternary assignment. During inference, only ternary values (2-bit weights) and scaling factors are needed, therefore our models are nearly 16x smaller than full-precision models. Our ternary models can also be viewed as sparse binary weight networks, which can potentially be accelerated with custom circuit. Experiments on CIFAR-10 show that the ternary models obtained by trained quantization method outperform full-precision models of ResNet-32,44,56 by 0.04%, 0.16%, 0.36%, respectively. On ImageNet, our model outperforms full-precision AlexNet model by 0.3% of Top-1 accuracy and outperforms previous ternary models by 3%.
Tasks	Quantization
Published	2016-12-04
URL	http://arxiv.org/abs/1612.01064v3
PDF	http://arxiv.org/pdf/1612.01064v3.pdf
PWC	https://paperswithcode.com/paper/trained-ternary-quantization
Repo	https://github.com/vinsis/ternary-quantization
Framework	pytorch

View Synthesis by Appearance Flow


Title	View Synthesis by Appearance Flow
Authors	Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros
Abstract	We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.
Tasks	Novel View Synthesis
Published	2016-05-11
URL	http://arxiv.org/abs/1605.03557v3
PDF	http://arxiv.org/pdf/1605.03557v3.pdf
PWC	https://paperswithcode.com/paper/view-synthesis-by-appearance-flow
Repo	https://github.com/RenYurui/Global-Flow-Local-Attention
Framework	pytorch

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization


Title	Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Authors	Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
Abstract	We propose a technique for producing “visual explanations” for decisions from a large class of CNN-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept. Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers, (2) CNNs used for structured outputs, (3) CNNs used in tasks with multimodal inputs or reinforcement learning, without any architectural changes or re-training. We combine Grad-CAM with fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to off-the-shelf image classification, captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into their failure modes, (b) are robust to adversarial images, (c) outperform previous methods on localization, (d) are more faithful to the underlying model and (e) help achieve generalization by identifying dataset bias. For captioning and VQA, we show that even non-attention based models can localize inputs. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM helps users establish appropriate trust in predictions from models and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ nodel from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo at http://gradcam.cloudcv.org, and a video at youtu.be/COjUB9Izk6E.
Tasks	Image Classification, Interpretable Machine Learning, Visual Question Answering
Published	2016-10-07
URL	https://arxiv.org/abs/1610.02391v4
PDF	https://arxiv.org/pdf/1610.02391v4.pdf
PWC	https://paperswithcode.com/paper/grad-cam-visual-explanations-from-deep
Repo	https://github.com/Murali81/Grad-CAM
Framework	none

Action Recognition with Dynamic Image Networks


Title	Action Recognition with Dynamic Image Networks
Authors	Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi
Abstract	We introduce the concept of “dynamic image”, a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of `rank pooling’. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. When a linear ranking machine is used, the resulting representation is in the form of an image, which we call dynamic because it summarizes the video dynamics in addition of appearance. This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos. We also present an efficient and effective approximate rank pooling operator, accelerating standard rank pooling algorithms by orders of magnitude, and formulate that as a CNN layer. This new layer allows generalizing dynamic images to dynamic feature maps. We demonstrate the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance. \|
Tasks	Optical Flow Estimation, Temporal Action Localization
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00738v2
PDF	http://arxiv.org/pdf/1612.00738v2.pdf
PWC	https://paperswithcode.com/paper/action-recognition-with-dynamic-image
Repo	https://github.com/hbilen/dynamic-image-nets
Framework	none

Regularizing CNNs with Locally Constrained Decorrelations


Title	Regularizing CNNs with Locally Constrained Decorrelations
Authors	Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
Abstract	Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting. However, the most prevalent regularizations do not leverage all the capacity of the models since they rely on reducing the effective number of parameters. Feature decorrelation is an alternative for using the full capacity of the models but the overfitting reduction margins are too narrow given the overhead it introduces. In this paper, we show that regularizing negatively correlated features is an obstacle for effective decorrelation and present OrthoReg, a novel regularization technique that locally enforces feature orthogonality. As a result, imposing locality constraints in feature decorrelation removes interferences between negatively correlated feature weights, allowing the regularizer to reach higher decorrelation bounds, and reducing the overfitting more effectively. In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present. Moreover, since our regularization is directly performed on the weights, it is especially suitable for fully convolutional neural networks, where the weight space is constant compared to the feature map space. As a result, we are able to reduce the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.
Tasks
Published	2016-11-07
URL	http://arxiv.org/abs/1611.01967v2
PDF	http://arxiv.org/pdf/1611.01967v2.pdf
PWC	https://paperswithcode.com/paper/regularizing-cnns-with-locally-constrained
Repo	https://github.com/prlz77/orthoreg
Framework	none

DeepMind Lab


Title	DeepMind Lab
Authors	Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, Stig Petersen
Abstract	DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community.
Tasks
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03801v2
PDF	http://arxiv.org/pdf/1612.03801v2.pdf
PWC	https://paperswithcode.com/paper/deepmind-lab
Repo	https://github.com/deepmind/lab
Framework	none

Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors


Title	Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors
Authors	Christos Louizos, Max Welling
Abstract	We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian \cite{gupta1999matrix} parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achieve a more efficient way to represent those correlations that is also cheaper than fully factorized parameter posteriors. We further show that with the “local reprarametrization trick” \cite{kingma2015variational} on this posterior distribution we arrive at a Gaussian Process \cite{rasmussen2006gaussian} interpretation of the hidden units in each layer and we, similarly with \cite{gal2015dropout}, provide connections with deep Gaussian processes. We continue in taking advantage of this duality and incorporate “pseudo-data” \cite{snelson2005sparse} in our model, which in turn allows for more efficient sampling while maintaining the properties of the original model. The validity of the proposed approach is verified through extensive experiments.
Tasks	Gaussian Processes
Published	2016-03-15
URL	http://arxiv.org/abs/1603.04733v5
PDF	http://arxiv.org/pdf/1603.04733v5.pdf
PWC	https://paperswithcode.com/paper/structured-and-efficient-variational-deep
Repo	https://github.com/AMLab-Amsterdam/SEVDL_MGP
Framework	none

Stealing Machine Learning Models via Prediction APIs


Title	Stealing Machine Learning Models via Prediction APIs
Authors	Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, Thomas Ristenpart
Abstract	Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data, aims to duplicate the functionality of (i.e., “steal”) the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.
Tasks
Published	2016-09-09
URL	http://arxiv.org/abs/1609.02943v2
PDF	http://arxiv.org/pdf/1609.02943v2.pdf
PWC	https://paperswithcode.com/paper/stealing-machine-learning-models-via
Repo	https://github.com/ftramer/Steal-ML
Framework	none

From Node Embedding To Community Embedding


Title	From Node Embedding To Community Embedding
Authors	Vincent W. Zheng, Sandro Cavallari, Hongyun Cai, Kevin Chen-Chuan Chang, Erik Cambria
Abstract	Most of the existing graph embedding methods focus on nodes, which aim to output a vector representation for each node in the graph such that two nodes being “close” on the graph are close too in the low-dimensional space. Despite the success of embedding individual nodes for graph analytics, we notice that an important concept of embedding communities (i.e., groups of nodes) is missing. Embedding communities is useful, not only for supporting various community-level applications, but also to help preserve community structure in graph embedding. In fact, we see community embedding as providing a higher-order proximity to define the node closeness, whereas most of the popular graph embedding methods focus on first-order and/or second-order proximities. To learn the community embedding, we hinge upon the insight that community embedding and node embedding reinforce with each other. As a result, we propose ComEmbed, the first community embedding method, which jointly optimizes the community embedding and node embedding together. We evaluate ComEmbed on real-world data sets. We show it outperforms the state-of-the-art baselines in both tasks of node classification and community prediction.
Tasks	Graph Embedding, Node Classification
Published	2016-10-31
URL	http://arxiv.org/abs/1610.09950v2
PDF	http://arxiv.org/pdf/1610.09950v2.pdf
PWC	https://paperswithcode.com/paper/from-node-embedding-to-community-embedding
Repo	https://github.com/vwz/topolstm
Framework	none

Cost-Sensitive Label Embedding for Multi-Label Classification


Title	Cost-Sensitive Label Embedding for Multi-Label Classification
Authors	Kuan-Hao Huang, Hsuan-Tien Lin
Abstract	Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.
Tasks	Multi-Label Classification
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09048v5
PDF	http://arxiv.org/pdf/1603.09048v5.pdf
PWC	https://paperswithcode.com/paper/cost-sensitive-label-embedding-for-multi
Repo	https://github.com/ej0cl6/csmlc
Framework	none

The Little Engine that Could: Regularization by Denoising (RED)


Title	The Little Engine that Could: Regularization by Denoising (RED)
Authors	Yaniv Romano, Michael Elad, Peyman Milanfar
Abstract	Removal of noise from an image is an extensively studied problem in image processing. Indeed, the recent advent of sophisticated and highly effective denoising algorithms lead some to believe that existing methods are touching the ceiling in terms of noise removal performance. Can we leverage this impressive achievement to treat other tasks in image processing? Recent work has answered this question positively, in the form of the Plug-and-Play Prior ($P^3$) method, showing that any inverse problem can be handled by sequentially applying image denoising steps. This relies heavily on the ADMM optimization technique in order to obtain this chained denoising interpretation. Is this the only way in which tasks in image processing can exploit the image denoising engine? In this paper we provide an alternative, more powerful and more flexible framework for achieving the same goal. As opposed to the $P^3$ method, we offer Regularization by Denoising (RED): using the denoising engine in defining the regularization of the inverse problem. We propose an explicit image-adaptive Laplacian-based regularization functional, making the overall objective functional clearer and better defined. With a complete flexibility to choose the iterative optimization procedure for minimizing the above functional, RED is capable of incorporating any image denoising algorithm, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result. We test this approach and demonstrate state-of-the-art results in the image deblurring and super-resolution problems.
Tasks	Deblurring, Denoising, Image Denoising, Super-Resolution
Published	2016-11-09
URL	http://arxiv.org/abs/1611.02862v3
PDF	http://arxiv.org/pdf/1611.02862v3.pdf
PWC	https://paperswithcode.com/paper/the-little-engine-that-could-regularization
Repo	https://github.com/happyhongt/Acceleration-of-RED-via-Vector-Extrapolation
Framework	none

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs


Title	Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
Authors	Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao
Abstract	Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, and background environment, thus leading to large intra-class variations. In addition, with the increasing number of scene categories, label ambiguity has become another crucial issue in large-scale classification. This paper focuses on large-scale scene recognition and makes two major contributions to tackle these issues. First, we propose a multi-resolution CNN architecture that captures visual content and structure at multiple levels. The multi-resolution CNNs are composed of coarse resolution CNNs and fine resolution CNNs, which are complementary to each other. Second, we design two knowledge guided disambiguation techniques to deal with the problem of label ambiguity. (i) We exploit the knowledge from the confusion matrix computed on validation data to merge ambiguous classes into a super category. (ii) We utilize the knowledge of extra networks to produce a soft label for each image. Then the super categories or soft labels are employed to guide CNN training on the Places2. We conduct extensive experiments on three large-scale image datasets (ImageNet, Places, and Places2), demonstrating the effectiveness of our approach. Furthermore, our method takes part in two major scene recognition challenges, and achieves the second place at the Places2 challenge in ILSVRC 2015, and the first place at the LSUN challenge in CVPR 2016. Finally, we directly test the learned representations on other scene benchmarks, and obtain the new state-of-the-art results on the MIT Indoor67 (86.7%) and SUN397 (72.0%). We release the code and models at~\url{https://github.com/wanglimin/MRCNN-Scene-Recognition}.
Tasks	Scene Classification, Scene Recognition
Published	2016-10-04
URL	http://arxiv.org/abs/1610.01119v2
PDF	http://arxiv.org/pdf/1610.01119v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-guided-disambiguation-for-large
Repo	https://github.com/wanglimin/MRCNN-Scene-Recognition
Framework	none