Paper Group AWR 29
Online Learning of Event Definitions. Neural Summarization by Extracting Sentences and Words. Image Captioning with Deep Bidirectional LSTMs. Trained Ternary Quantization. View Synthesis by Appearance Flow. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Action Recognition with Dynamic Image Networks. Regularizing …
Online Learning of Event Definitions
Title | Online Learning of Event Definitions |
Authors | Nikos Katzouris, Alexander Artikis, Georgios Paliouras |
Abstract | Systems for symbolic event recognition infer occurrences of events in time using a set of event definitions in the form of first-order rules. The Event Calculus is a temporal logic that has been used as a basis in event recognition applications, providing among others, direct connections to machine learning, via Inductive Logic Programming (ILP). We present an ILP system for online learning of Event Calculus theories. To allow for a single-pass learning strategy, we use the Hoeffding bound for evaluating clauses on a subset of the input stream. We employ a decoupling scheme of the Event Calculus axioms during the learning process, that allows to learn each clause in isolation. Moreover, we use abductive-inductive logic programming techniques to handle unobserved target predicates. We evaluate our approach on an activity recognition application and compare it to a number of batch learning techniques. We obtain results of comparable predicative accuracy with significant speed-ups in training time. We also outperform hand-crafted rules and match the performance of a sound incremental learner that can only operate on noise-free datasets. This paper is under consideration for acceptance in TPLP. |
Tasks | Activity Recognition |
Published | 2016-07-30 |
URL | http://arxiv.org/abs/1608.00100v1 |
http://arxiv.org/pdf/1608.00100v1.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-of-event-definitions |
Repo | https://github.com/nkatzz/OLED |
Framework | none |
Neural Summarization by Extracting Sentences and Words
Title | Neural Summarization by Extracting Sentences and Words |
Authors | Jianpeng Cheng, Mirella Lapata |
Abstract | Traditional approaches to extractive summarization rely heavily on human-engineered features. In this work we propose a data-driven approach based on neural networks and continuous sentence features. We develop a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor. This architecture allows us to develop different classes of summarization models which can extract sentences or words. We train our models on large scale corpora containing hundreds of thousands of document-summary pairs. Experimental results on two summarization datasets demonstrate that our models obtain results comparable to the state of the art without any access to linguistic annotation. |
Tasks | Document Summarization |
Published | 2016-03-23 |
URL | http://arxiv.org/abs/1603.07252v3 |
http://arxiv.org/pdf/1603.07252v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-summarization-by-extracting-sentences |
Repo | https://github.com/kedz/nnsum |
Framework | pytorch |
Image Captioning with Deep Bidirectional LSTMs
Title | Image Captioning with Deep Bidirectional LSTMs |
Authors | Cheng Wang, Haojin Yang, Christian Bartz, Christoph Meinel |
Abstract | This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning long term visual-language interactions by making use of history and future context information at high level semantic space. Two novel deep bidirectional variant models, in which we increase the depth of nonlinearity transition in different way, are proposed to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale and vertical mirror are proposed to prevent overfitting in training deep models. We visualize the evolution of bidirectional LSTM internal states over time and qualitatively analyze how our models “translate” image to sentence. Our proposed models are evaluated on caption generation and image-sentence retrieval tasks with three benchmark datasets: Flickr8K, Flickr30K and MSCOCO datasets. We demonstrate that bidirectional LSTM models achieve highly competitive performance to the state-of-the-art results on caption generation even without integrating additional mechanism (e.g. object detection, attention model etc.) and significantly outperform recent methods on retrieval task. |
Tasks | Data Augmentation, Image Captioning, Object Detection |
Published | 2016-04-04 |
URL | http://arxiv.org/abs/1604.00790v3 |
http://arxiv.org/pdf/1604.00790v3.pdf | |
PWC | https://paperswithcode.com/paper/image-captioning-with-deep-bidirectional |
Repo | https://github.com/deepsemantic/image_captioning |
Framework | none |
Trained Ternary Quantization
Title | Trained Ternary Quantization |
Authors | Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally |
Abstract | Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. And our AlexNet model is trained from scratch, which means it’s as easy as to train normal full precision model. We highlight our trained quantization method that can learn both ternary values and ternary assignment. During inference, only ternary values (2-bit weights) and scaling factors are needed, therefore our models are nearly 16x smaller than full-precision models. Our ternary models can also be viewed as sparse binary weight networks, which can potentially be accelerated with custom circuit. Experiments on CIFAR-10 show that the ternary models obtained by trained quantization method outperform full-precision models of ResNet-32,44,56 by 0.04%, 0.16%, 0.36%, respectively. On ImageNet, our model outperforms full-precision AlexNet model by 0.3% of Top-1 accuracy and outperforms previous ternary models by 3%. |
Tasks | Quantization |
Published | 2016-12-04 |
URL | http://arxiv.org/abs/1612.01064v3 |
http://arxiv.org/pdf/1612.01064v3.pdf | |
PWC | https://paperswithcode.com/paper/trained-ternary-quantization |
Repo | https://github.com/vinsis/ternary-quantization |
Framework | pytorch |
View Synthesis by Appearance Flow
Title | View Synthesis by Appearance Flow |
Authors | Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros |
Abstract | We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques. |
Tasks | Novel View Synthesis |
Published | 2016-05-11 |
URL | http://arxiv.org/abs/1605.03557v3 |
http://arxiv.org/pdf/1605.03557v3.pdf | |
PWC | https://paperswithcode.com/paper/view-synthesis-by-appearance-flow |
Repo | https://github.com/RenYurui/Global-Flow-Local-Attention |
Framework | pytorch |
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Title | Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization |
Authors | Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra |
Abstract | We propose a technique for producing “visual explanations” for decisions from a large class of CNN-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept. Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers, (2) CNNs used for structured outputs, (3) CNNs used in tasks with multimodal inputs or reinforcement learning, without any architectural changes or re-training. We combine Grad-CAM with fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to off-the-shelf image classification, captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into their failure modes, (b) are robust to adversarial images, (c) outperform previous methods on localization, (d) are more faithful to the underlying model and (e) help achieve generalization by identifying dataset bias. For captioning and VQA, we show that even non-attention based models can localize inputs. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM helps users establish appropriate trust in predictions from models and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ nodel from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo at http://gradcam.cloudcv.org, and a video at youtu.be/COjUB9Izk6E. |
Tasks | Image Classification, Interpretable Machine Learning, Visual Question Answering |
Published | 2016-10-07 |
URL | https://arxiv.org/abs/1610.02391v4 |
https://arxiv.org/pdf/1610.02391v4.pdf | |
PWC | https://paperswithcode.com/paper/grad-cam-visual-explanations-from-deep |
Repo | https://github.com/Murali81/Grad-CAM |
Framework | none |
Action Recognition with Dynamic Image Networks
Title | Action Recognition with Dynamic Image Networks |
Authors | Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi |
Abstract | We introduce the concept of “dynamic image”, a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of `rank pooling’. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. When a linear ranking machine is used, the resulting representation is in the form of an image, which we call dynamic because it summarizes the video dynamics in addition of appearance. This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos. We also present an efficient and effective approximate rank pooling operator, accelerating standard rank pooling algorithms by orders of magnitude, and formulate that as a CNN layer. This new layer allows generalizing dynamic images to dynamic feature maps. We demonstrate the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance. | |
Tasks | Optical Flow Estimation, Temporal Action Localization |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00738v2 |
http://arxiv.org/pdf/1612.00738v2.pdf | |
PWC | https://paperswithcode.com/paper/action-recognition-with-dynamic-image |
Repo | https://github.com/hbilen/dynamic-image-nets |
Framework | none |
Regularizing CNNs with Locally Constrained Decorrelations
Title | Regularizing CNNs with Locally Constrained Decorrelations |
Authors | Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca |
Abstract | Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting. However, the most prevalent regularizations do not leverage all the capacity of the models since they rely on reducing the effective number of parameters. Feature decorrelation is an alternative for using the full capacity of the models but the overfitting reduction margins are too narrow given the overhead it introduces. In this paper, we show that regularizing negatively correlated features is an obstacle for effective decorrelation and present OrthoReg, a novel regularization technique that locally enforces feature orthogonality. As a result, imposing locality constraints in feature decorrelation removes interferences between negatively correlated feature weights, allowing the regularizer to reach higher decorrelation bounds, and reducing the overfitting more effectively. In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present. Moreover, since our regularization is directly performed on the weights, it is especially suitable for fully convolutional neural networks, where the weight space is constant compared to the feature map space. As a result, we are able to reduce the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.01967v2 |
http://arxiv.org/pdf/1611.01967v2.pdf | |
PWC | https://paperswithcode.com/paper/regularizing-cnns-with-locally-constrained |
Repo | https://github.com/prlz77/orthoreg |
Framework | none |
DeepMind Lab
Title | DeepMind Lab |
Authors | Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, Stig Petersen |
Abstract | DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community. |
Tasks | |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03801v2 |
http://arxiv.org/pdf/1612.03801v2.pdf | |
PWC | https://paperswithcode.com/paper/deepmind-lab |
Repo | https://github.com/deepmind/lab |
Framework | none |
Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors
Title | Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors |
Authors | Christos Louizos, Max Welling |
Abstract | We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian \cite{gupta1999matrix} parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achieve a more efficient way to represent those correlations that is also cheaper than fully factorized parameter posteriors. We further show that with the “local reprarametrization trick” \cite{kingma2015variational} on this posterior distribution we arrive at a Gaussian Process \cite{rasmussen2006gaussian} interpretation of the hidden units in each layer and we, similarly with \cite{gal2015dropout}, provide connections with deep Gaussian processes. We continue in taking advantage of this duality and incorporate “pseudo-data” \cite{snelson2005sparse} in our model, which in turn allows for more efficient sampling while maintaining the properties of the original model. The validity of the proposed approach is verified through extensive experiments. |
Tasks | Gaussian Processes |
Published | 2016-03-15 |
URL | http://arxiv.org/abs/1603.04733v5 |
http://arxiv.org/pdf/1603.04733v5.pdf | |
PWC | https://paperswithcode.com/paper/structured-and-efficient-variational-deep |
Repo | https://github.com/AMLab-Amsterdam/SEVDL_MGP |
Framework | none |
Stealing Machine Learning Models via Prediction APIs
Title | Stealing Machine Learning Models via Prediction APIs |
Authors | Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, Thomas Ristenpart |
Abstract | Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data, aims to duplicate the functionality of (i.e., “steal”) the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures. |
Tasks | |
Published | 2016-09-09 |
URL | http://arxiv.org/abs/1609.02943v2 |
http://arxiv.org/pdf/1609.02943v2.pdf | |
PWC | https://paperswithcode.com/paper/stealing-machine-learning-models-via |
Repo | https://github.com/ftramer/Steal-ML |
Framework | none |
From Node Embedding To Community Embedding
Title | From Node Embedding To Community Embedding |
Authors | Vincent W. Zheng, Sandro Cavallari, Hongyun Cai, Kevin Chen-Chuan Chang, Erik Cambria |
Abstract | Most of the existing graph embedding methods focus on nodes, which aim to output a vector representation for each node in the graph such that two nodes being “close” on the graph are close too in the low-dimensional space. Despite the success of embedding individual nodes for graph analytics, we notice that an important concept of embedding communities (i.e., groups of nodes) is missing. Embedding communities is useful, not only for supporting various community-level applications, but also to help preserve community structure in graph embedding. In fact, we see community embedding as providing a higher-order proximity to define the node closeness, whereas most of the popular graph embedding methods focus on first-order and/or second-order proximities. To learn the community embedding, we hinge upon the insight that community embedding and node embedding reinforce with each other. As a result, we propose ComEmbed, the first community embedding method, which jointly optimizes the community embedding and node embedding together. We evaluate ComEmbed on real-world data sets. We show it outperforms the state-of-the-art baselines in both tasks of node classification and community prediction. |
Tasks | Graph Embedding, Node Classification |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.09950v2 |
http://arxiv.org/pdf/1610.09950v2.pdf | |
PWC | https://paperswithcode.com/paper/from-node-embedding-to-community-embedding |
Repo | https://github.com/vwz/topolstm |
Framework | none |
Cost-Sensitive Label Embedding for Multi-Label Classification
Title | Cost-Sensitive Label Embedding for Multi-Label Classification |
Authors | Kuan-Hao Huang, Hsuan-Tien Lin |
Abstract | Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions. |
Tasks | Multi-Label Classification |
Published | 2016-03-30 |
URL | http://arxiv.org/abs/1603.09048v5 |
http://arxiv.org/pdf/1603.09048v5.pdf | |
PWC | https://paperswithcode.com/paper/cost-sensitive-label-embedding-for-multi |
Repo | https://github.com/ej0cl6/csmlc |
Framework | none |
The Little Engine that Could: Regularization by Denoising (RED)
Title | The Little Engine that Could: Regularization by Denoising (RED) |
Authors | Yaniv Romano, Michael Elad, Peyman Milanfar |
Abstract | Removal of noise from an image is an extensively studied problem in image processing. Indeed, the recent advent of sophisticated and highly effective denoising algorithms lead some to believe that existing methods are touching the ceiling in terms of noise removal performance. Can we leverage this impressive achievement to treat other tasks in image processing? Recent work has answered this question positively, in the form of the Plug-and-Play Prior ($P^3$) method, showing that any inverse problem can be handled by sequentially applying image denoising steps. This relies heavily on the ADMM optimization technique in order to obtain this chained denoising interpretation. Is this the only way in which tasks in image processing can exploit the image denoising engine? In this paper we provide an alternative, more powerful and more flexible framework for achieving the same goal. As opposed to the $P^3$ method, we offer Regularization by Denoising (RED): using the denoising engine in defining the regularization of the inverse problem. We propose an explicit image-adaptive Laplacian-based regularization functional, making the overall objective functional clearer and better defined. With a complete flexibility to choose the iterative optimization procedure for minimizing the above functional, RED is capable of incorporating any image denoising algorithm, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result. We test this approach and demonstrate state-of-the-art results in the image deblurring and super-resolution problems. |
Tasks | Deblurring, Denoising, Image Denoising, Super-Resolution |
Published | 2016-11-09 |
URL | http://arxiv.org/abs/1611.02862v3 |
http://arxiv.org/pdf/1611.02862v3.pdf | |
PWC | https://paperswithcode.com/paper/the-little-engine-that-could-regularization |
Repo | https://github.com/happyhongt/Acceleration-of-RED-via-Vector-Extrapolation |
Framework | none |
Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
Title | Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs |
Authors | Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao |
Abstract | Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, and background environment, thus leading to large intra-class variations. In addition, with the increasing number of scene categories, label ambiguity has become another crucial issue in large-scale classification. This paper focuses on large-scale scene recognition and makes two major contributions to tackle these issues. First, we propose a multi-resolution CNN architecture that captures visual content and structure at multiple levels. The multi-resolution CNNs are composed of coarse resolution CNNs and fine resolution CNNs, which are complementary to each other. Second, we design two knowledge guided disambiguation techniques to deal with the problem of label ambiguity. (i) We exploit the knowledge from the confusion matrix computed on validation data to merge ambiguous classes into a super category. (ii) We utilize the knowledge of extra networks to produce a soft label for each image. Then the super categories or soft labels are employed to guide CNN training on the Places2. We conduct extensive experiments on three large-scale image datasets (ImageNet, Places, and Places2), demonstrating the effectiveness of our approach. Furthermore, our method takes part in two major scene recognition challenges, and achieves the second place at the Places2 challenge in ILSVRC 2015, and the first place at the LSUN challenge in CVPR 2016. Finally, we directly test the learned representations on other scene benchmarks, and obtain the new state-of-the-art results on the MIT Indoor67 (86.7%) and SUN397 (72.0%). We release the code and models at~\url{https://github.com/wanglimin/MRCNN-Scene-Recognition}. |
Tasks | Scene Classification, Scene Recognition |
Published | 2016-10-04 |
URL | http://arxiv.org/abs/1610.01119v2 |
http://arxiv.org/pdf/1610.01119v2.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-guided-disambiguation-for-large |
Repo | https://github.com/wanglimin/MRCNN-Scene-Recognition |
Framework | none |