October 20, 2019

3389 words 16 mins read

Paper Group ANR 52

Paper Group ANR 52

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent. DeepTwist: Learning Model Compression via Occasional Weight Distortion. Learning to Evaluate Image Captioning. To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference. Domain2Vec: Deep Domain Generalization. Detecting the T …

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

Title TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent
Authors Alex Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski
Abstract In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)—a vectorized adaptive step-size method for supervised learning—to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.
Tasks Representation Learning
Published 2018-04-10
URL http://arxiv.org/abs/1804.03334v1
PDF http://arxiv.org/pdf/1804.03334v1.pdf
PWC https://paperswithcode.com/paper/tidbd-adapting-temporal-difference-step-sizes
Repo
Framework

DeepTwist: Learning Model Compression via Occasional Weight Distortion

Title DeepTwist: Learning Model Compression via Occasional Weight Distortion
Authors Dongsoo Lee, Parichay Kapoor, Byeongwook Kim
Abstract Model compression has been introduced to reduce the required hardware resources while maintaining the model accuracy. Lots of techniques for model compression, such as pruning, quantization, and low-rank approximation, have been suggested along with different inference implementation characteristics. Adopting model compression is, however, still challenging because the design complexity of model compression is rapidly increasing due to additional hyper-parameters and computation overhead in order to achieve a high compression ratio. In this paper, we propose a simple and efficient model compression framework called DeepTwist which distorts weights in an occasional manner without modifying the underlying training algorithms. The ideas of designing weight distortion functions are intuitive and straightforward given formats of compressed weights. We show that our proposed framework improves compression rate significantly for pruning, quantization, and low-rank approximation techniques while the efforts of additional retraining and/or hyper-parameter search are highly reduced. Regularization effects of DeepTwist are also reported.
Tasks Model Compression, Quantization
Published 2018-10-30
URL http://arxiv.org/abs/1810.12823v1
PDF http://arxiv.org/pdf/1810.12823v1.pdf
PWC https://paperswithcode.com/paper/deeptwist-learning-model-compression-via
Repo
Framework

Learning to Evaluate Image Captioning

Title Learning to Evaluate Image Captioning
Authors Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie
Abstract Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified. For example, the newly proposed SPICE correlates well with human judgments, but fails to capture the syntactic structure of a sentence. To address these two challenges, we propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. In addition, we further propose a data augmentation scheme to explicitly incorporate pathological transformations as negative examples during training. The proposed metric is evaluated with three kinds of robustness tests and its correlation with human judgments. Extensive experiments show that the proposed data augmentation scheme not only makes our metric more robust toward several pathological transformations, but also improves its correlation with human judgments. Our metric outperforms other metrics on both caption level human correlation in Flickr 8k and system level human correlation in COCO. The proposed approach could be served as a learning based evaluation metric that is complementary to existing rule-based metrics.
Tasks Data Augmentation, Image Captioning
Published 2018-06-17
URL http://arxiv.org/abs/1806.06422v1
PDF http://arxiv.org/pdf/1806.06422v1.pdf
PWC https://paperswithcode.com/paper/learning-to-evaluate-image-captioning
Repo
Framework

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Title To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference
Authors Qing Qin, Jie Ren, Jialong Yu, Ling Gao, Hai Wang, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang
Abstract The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-offloading that is often infeasible due to privacy concerns or high latency. However, it remains unclear how model compression techniques perform across a wide range of DNNs. To design efficient embedded deep learning solutions, we need to understand their behaviors. This work develops a quantitative approach to characterize model compression techniques on a representative embedded deep learning architecture, the NVIDIA Jetson Tx2. We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains. We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model storage size, inference time, energy consumption and performance metrics. We demonstrate that there are opportunities to achieve fast deep inference on embedded systems, but one must carefully choose the compression settings. Our results provide insights on when and how to apply model compression techniques and guidelines for designing efficient embedded deep learning systems.
Tasks Image Classification, Model Compression, Quantization
Published 2018-10-21
URL http://arxiv.org/abs/1810.08899v1
PDF http://arxiv.org/pdf/1810.08899v1.pdf
PWC https://paperswithcode.com/paper/to-compress-or-not-to-compress-characterizing
Repo
Framework

Domain2Vec: Deep Domain Generalization

Title Domain2Vec: Deep Domain Generalization
Authors Aniket Anand Deshmukh, Ankit Bansal, Akash Rastogi
Abstract We address the problem of domain generalization where a decision function is learned from the data of several related domains, and the goal is to apply it on an unseen domain successfully. It is assumed that there is plenty of labeled data available in source domains (also called as training domain), but no labeled data is available for the unseen domain (also called a target domain or test domain). We propose a novel neural network architecture, Domain2Vec (D2V) that learns domain-specific embedding and then uses this embedding to generalize the learning across related domains. The proposed algorithm, D2V extends the idea of distribution regression and kernelized domain generalization to the neural networks setting. We propose a neural network architecture to learn domain-specific embedding and then use this embedding along with the data point specific features to label it. We show the effectiveness of the architecture by accurately estimating domain to domain similarity. We evaluate our algorithm against standard domain generalization datasets for image classification and outperform other state of the art algorithms.
Tasks Domain Generalization, Image Classification
Published 2018-07-09
URL http://arxiv.org/abs/1807.02919v1
PDF http://arxiv.org/pdf/1807.02919v1.pdf
PWC https://paperswithcode.com/paper/domain2vec-deep-domain-generalization
Repo
Framework

Detecting the Trend in Musical Taste over the Decade – A Novel Feature Extraction Algorithm to Classify Musical Content with Simple Features

Title Detecting the Trend in Musical Taste over the Decade – A Novel Feature Extraction Algorithm to Classify Musical Content with Simple Features
Authors Anish Acharya
Abstract This work proposes a novel feature selection algorithm to classify Songs into different groups. Classification of musical content is often a non-trivial job and still relatively less explored area. The main idea conveyed in this article is to come up with a new feature selection scheme that does the classification job elegantly and with high accuracy but with simpler but wisely chosen small number of features thus being less prone to over-fitting. This uses a very basic general idea about the structure of the audio signal which is generally in the shape of a trapezium. So, using this general idea of the Musical Community we propose three frames to be considered and analyzed for feature extraction for each of the audio signal – opening, stanzas and closing – and it has been established with the help of a lot of experiments that this scheme leads to much efficient classification with less complex features in a low dimensional feature space thus is also a computationally less expensive method. Step by step analysis of feature extraction, feature ranking, dimensionality reduction using PCA has been carried in this article. Sequential Forward selection (SFS) algorithm is used to explore the most significant features both with the raw Fisher Discriminant Ratio (FDR) and also with the significant eigen-values after PCA. Also during classification extensive validation and cross validation has been done in a monte-carlo manner to ensure validity of the claims.
Tasks Dimensionality Reduction, Feature Selection
Published 2018-12-19
URL http://arxiv.org/abs/1901.02053v1
PDF http://arxiv.org/pdf/1901.02053v1.pdf
PWC https://paperswithcode.com/paper/detecting-the-trend-in-musical-taste-over-the
Repo
Framework

Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN

Title Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN
Authors Deepak Babu Sam, Neeraj N Sajjan, R. Venkatesh Babu
Abstract Automated counting of people in crowd images is a challenging task. The major difficulty stems from the large diversity in the way people appear in crowds. In fact, features available for crowd discrimination largely depend on the crowd density to the extent that people are only seen as blobs in a highly dense scene. We tackle this problem with a growing CNN which can progressively increase its capacity to account for the wide variability seen in crowd scenes. Our model starts from a base CNN density regressor, which is trained in equivalence on all types of crowd images. In order to adapt with the huge diversity, we create two child regressors which are exact copies of the base CNN. A differential training procedure divides the dataset into two clusters and fine-tunes the child networks on their respective specialties. Consequently, without any hand-crafted criteria for forming specialties, the child regressors become experts on certain types of crowds. The child networks are again split recursively, creating two experts at every division. This hierarchical training leads to a CNN tree, where the child regressors are more fine experts than any of their parents. The leaf nodes are taken as the final experts and a classifier network is then trained to predict the correct specialty for a given test image patch. The proposed model achieves higher count accuracy on major crowd datasets. Further, we analyse the characteristics of specialties mined automatically by our method.
Tasks
Published 2018-07-26
URL http://arxiv.org/abs/1807.09993v1
PDF http://arxiv.org/pdf/1807.09993v1.pdf
PWC https://paperswithcode.com/paper/divide-and-grow-capturing-huge-diversity-in
Repo
Framework

Review: Metaheuristic Search-Based Fuzzy Clustering Algorithms

Title Review: Metaheuristic Search-Based Fuzzy Clustering Algorithms
Authors Waleed Alomoush, Ayat Alrosan
Abstract Fuzzy clustering is a famous unsupervised learning method used to collecting similar data elements within cluster according to some similarity measurement. But, clustering algorithms suffer from some drawbacks. Among the main weakness including, selecting the initial cluster centres and the appropriate clusters number is normally unknown. These weaknesses are considered the most challenging tasks in clustering algorithms. This paper introduces a comprehensive review of metahueristic search to solve fuzzy clustering algorithms problems.
Tasks
Published 2018-01-21
URL http://arxiv.org/abs/1802.08729v1
PDF http://arxiv.org/pdf/1802.08729v1.pdf
PWC https://paperswithcode.com/paper/review-metaheuristic-search-based-fuzzy
Repo
Framework

Viewpoint-aware Video Summarization

Title Viewpoint-aware Video Summarization
Authors Atsushi Kanehira, Luc Van Gool, Yoshitaka Ushiku, Tatsuya Harada
Abstract This paper introduces a novel variant of video summarization, namely building a summary that depends on the particular aspect of a video the viewer focuses on. We refer to this as $\textit{viewpoint}$. To infer what the desired $\textit{viewpoint}$ may be, we assume that several other videos are available, especially groups of videos, e.g., as folders on a person’s phone or laptop. The semantic similarity between videos in a group vs. the dissimilarity between groups is used to produce $\textit{viewpoint}$-specific summaries. For considering similarity as well as avoiding redundancy, output summary should be (A) diverse, (B) representative of videos in the same group, and (C) discriminative against videos in the different groups. To satisfy these requirements (A)-(C) simultaneously, we proposed a novel video summarization method from multiple groups of videos. Inspired by Fisher’s discriminant criteria, it selects summary by optimizing the combination of three terms (a) inner-summary, (b) inner-group, and (c) between-group variances defined on the feature representation of summary, which can simply represent (A)-(C). Moreover, we developed a novel dataset to investigate how well the generated summary reflects the underlying $\textit{viewpoint}$. Quantitative and qualitative experiments conducted on the dataset demonstrate the effectiveness of proposed method.
Tasks Semantic Similarity, Semantic Textual Similarity, Video Summarization
Published 2018-04-09
URL http://arxiv.org/abs/1804.02843v2
PDF http://arxiv.org/pdf/1804.02843v2.pdf
PWC https://paperswithcode.com/paper/viewpoint-aware-video-summarization
Repo
Framework

NICT’s Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task

Title NICT’s Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task
Authors Rui Wang, Benjamin Marie, Masao Utiyama, Eiichiro Sumita
Abstract This paper presents the NICT’s participation in the WMT18 shared parallel corpus filtering task. The organizers provided 1 billion words German-English corpus crawled from the web as part of the Paracrawl project. This corpus is too noisy to build an acceptable neural machine translation (NMT) system. Using the clean data of the WMT18 shared news translation task, we designed several features and trained a classifier to score each sentence pairs in the noisy data. Finally, we sampled 100 million and 10 million words and built corresponding NMT systems. Empirical results show that our NMT systems trained on sampled data achieve promising performance.
Tasks Machine Translation
Published 2018-09-19
URL http://arxiv.org/abs/1809.07043v2
PDF http://arxiv.org/pdf/1809.07043v2.pdf
PWC https://paperswithcode.com/paper/nicts-corpus-filtering-systems-for-the-wmt18
Repo
Framework

An Ontology Based Modeling Framework for Design of Educational Technologies

Title An Ontology Based Modeling Framework for Design of Educational Technologies
Authors Sridhar Chimalakonda, Kesav V. Nori
Abstract Despite rapid progress, most of the educational technologies today lack a strong instructional design knowledge basis leading to questionable quality of instruction. In addition, a major challenge is to customize these educational technologies for a wide range of instructional designs. Ontologies are one of the pertinent mechanisms to represent instructional design in the literature. However, existing approaches do not support modeling of flexible instructional designs. To address this problem, in this paper, we propose an ontology based framework for systematic modeling of different aspects of instructional design knowledge based on domain patterns. As part of the framework, we present ontologies for modeling goals, instructional processes and instructional materials. We demonstrate the ontology framework by presenting instances of the ontology for the large scale case study of adult literacy in India (287 million learners spread across 22 Indian Languages), which requires creation of 1000 similar but varied eLearning Systems based on flexible instructional designs. The implemented framework is available at http://rice.iiit.ac.in and is transferred to National Literacy Mission of Government of India. This framework could be used for modeling instructional design knowledge of systems for skills, school education and beyond.
Tasks
Published 2018-02-07
URL http://arxiv.org/abs/1802.04337v1
PDF http://arxiv.org/pdf/1802.04337v1.pdf
PWC https://paperswithcode.com/paper/an-ontology-based-modeling-framework-for
Repo
Framework

Using Machine Learning to Discern Eruption in Noisy Environments: A Case Study using CO2-driven Cold-Water Geyser in Chimayo, New Mexico

Title Using Machine Learning to Discern Eruption in Noisy Environments: A Case Study using CO2-driven Cold-Water Geyser in Chimayo, New Mexico
Authors B. Yuan, Y. J. Tan, M. K. Mudunuru, O. E. Marcillo, A. A. Delorey, P. M. Roberts, J. D. Webster, C. N. L. Gammans, S. Karra, G. D. Guthrie, P. A. Johnson
Abstract We present an approach based on machine learning (ML) to distinguish eruption and precursory signals of Chimay'{o} geyser (New Mexico, USA) under noisy environments. This geyser can be considered as a natural analog of $\mathrm{CO}_2$ intrusion into shallow water aquifers. By studying this geyser, we can understand upwelling of $\mathrm{CO}_2$-rich fluids from depth, which has relevance to leak monitoring in a $\mathrm{CO}_2$ sequestration project. ML methods such as Random Forests (RF) are known to be robust multi-class classifiers and perform well under unfavorable noisy conditions. However, the extent of the RF method’s accuracy is poorly understood for this $\mathrm{CO}_2$-driven geysering application. The current study aims to quantify the performance of RF-classifiers to discern the geyser state. Towards this goal, we first present the data collected from the seismometer that is installed near the Chimay'{o} geyser. The seismic signals collected at this site contain different types of noises such as daily temperature variations, seasonal trends, animal movement near the geyser, and human activity. First, we filter the signals from these noises by combining the Butterworth-Highpass filter and an Autoregressive method in a multi-level fashion. We show that by combining these filtering techniques, in a hierarchical fashion, leads to reduction in the noise in the seismic data without removing the precursors and eruption event signals. We then use RF on the filtered data to classify the state of geyser into three classes – remnant noise, precursor, and eruption states. We show that the classification accuracy using RF on the filtered data is greater than 90%.These aspects make the proposed ML framework attractive for event discrimination and signal enhancement under noisy conditions, with strong potential for application to monitoring leaks in $\mathrm{CO}_2$ sequestration.
Tasks
Published 2018-10-01
URL http://arxiv.org/abs/1810.01488v1
PDF http://arxiv.org/pdf/1810.01488v1.pdf
PWC https://paperswithcode.com/paper/using-machine-learning-to-discern-eruption-in
Repo
Framework

Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention

Title Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention
Authors Karuna Ahuja, Karan Sikka, Anirban Roy, Ajay Divakaran
Abstract We tackle the problem of understanding visual ads where given an ad image, our goal is to rank appropriate human generated statements describing the purpose of the ad. This problem is generally addressed by jointly embedding images and candidate statements to establish correspondence. Decoding a visual ad requires inference of both semantic and symbolic nuances referenced in an image and prior methods may fail to capture such associations especially with weakly annotated symbols. In order to create better embeddings, we leverage an attention mechanism to associate image proposals with symbols and thus effectively aggregate information from aligned multimodal representations. We propose a multihop co-attention mechanism that iteratively refines the attention map to ensure accurate attention estimation. Our attention based embedding model is learned end-to-end guided by a max-margin loss function. We show that our model outperforms other baselines on the benchmark Ad dataset and also show qualitative results to highlight the advantages of using multihop co-attention.
Tasks
Published 2018-07-04
URL http://arxiv.org/abs/1807.01448v1
PDF http://arxiv.org/pdf/1807.01448v1.pdf
PWC https://paperswithcode.com/paper/understanding-visual-ads-by-aligning-symbols
Repo
Framework

Deep Hybrid Scattering Image Learning

Title Deep Hybrid Scattering Image Learning
Authors Mu Yang, Zheng-Hao Liu, Ze-Di Cheng, Jin-Shi Xu, Chuan-Feng Li, Guang-Can Guo
Abstract A well-trained deep neural network is shown to gain capability of simultaneously restoring two kinds of images, which are completely destroyed by two distinct scattering medias respectively. The network, based on the U-net architecture, can be trained by blended dataset of speckles-reference images pairs. We experimentally demonstrate the power of the network in reconstructing images which are strongly diffused by glass diffuser or multi-mode fiber. The learning model further shows good generalization ability to reconstruct images that are distinguished from the training dataset. Our work facilitates the study of optical transmission and expands machine learning’s application in optics.
Tasks
Published 2018-09-19
URL http://arxiv.org/abs/1809.07706v1
PDF http://arxiv.org/pdf/1809.07706v1.pdf
PWC https://paperswithcode.com/paper/deep-hybrid-scattering-image-learning
Repo
Framework

Mutual Suppression Network for Video Prediction using Disentangled Features

Title Mutual Suppression Network for Video Prediction using Disentangled Features
Authors Jungbeom Lee, Jangho Lee, Sungmin Lee, Sungroh Yoon
Abstract Video prediction has been considered a difficult problem because the video contains not only high-dimensional spatial information but also complex temporal information. Video prediction can be performed by finding features in recent frames, and using them to generate approximations to upcoming frames. We approach this problem by disentangling spatial and temporal features in videos. We introduce a mutual suppression network (MSnet) which are trained in an adversarial manner and then produces spatial features which are free of motion information, and motion features with no spatial information. MSnet then uses motion-guided connection within an encoder-decoder-based architecture to transform spatial features from a previous frame to the time of an upcoming frame. We show how MSnet can be used for video prediction using disentangled representations. We also carry out experiments to assess the effectiveness of our method to disentangle features. MSnet obtains better results than other recent video prediction methods even though it has simpler encoders.
Tasks Optical Flow Estimation, Representation Learning, Video Prediction
Published 2018-04-13
URL https://arxiv.org/abs/1804.04810v2
PDF https://arxiv.org/pdf/1804.04810v2.pdf
PWC https://paperswithcode.com/paper/msnet-mutual-suppression-network-for
Repo
Framework
comments powered by Disqus