May 7, 2019

2911 words 14 mins read

Paper Group AWR 9

Paper Group AWR 9

Unsupervised Learning of Important Objects from First-Person Videos. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. Hyperparameter optimization with approximate gradient. Learning Typographic Style. openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit. Minimax Filter: Learning to Preserve Privacy fro …

Unsupervised Learning of Important Objects from First-Person Videos

Title Unsupervised Learning of Important Objects from First-Person Videos
Authors Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi
Abstract A first-person camera, placed at a person’s head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer’s internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial (“where”) and visual (“what”) pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method’s success on two different important object datasets, where our method achieves similar or better results as the supervised methods.
Tasks Semantic Segmentation
Published 2016-11-16
URL http://arxiv.org/abs/1611.05335v3
PDF http://arxiv.org/pdf/1611.05335v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-important-objects
Repo https://github.com/gberta/Visual-Spatial-Network
Framework none

Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

Title Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change
Authors William L. Hamilton, Jure Leskovec, Dan Jurafsky
Abstract Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust methodology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. We then use this methodology to reveal statistical laws of semantic evolution. Using six historical corpora spanning four languages and two centuries, we propose two quantitative laws of semantic change: (i) the law of conformity—the rate of semantic change scales with an inverse power-law of word frequency; (ii) the law of innovation—independent of frequency, words that are more polysemous have higher rates of semantic change.
Tasks Word Embeddings
Published 2016-05-30
URL http://arxiv.org/abs/1605.09096v6
PDF http://arxiv.org/pdf/1605.09096v6.pdf
PWC https://paperswithcode.com/paper/diachronic-word-embeddings-reveal-statistical
Repo https://github.com/network-embeddings/temporal_embedding_matching
Framework none

Hyperparameter optimization with approximate gradient

Title Hyperparameter optimization with approximate gradient
Authors Fabian Pedregosa
Abstract Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Finally, we validate the empirical performance of this method on the estimation of regularization constants of L2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.
Tasks Hyperparameter Optimization
Published 2016-02-07
URL http://arxiv.org/abs/1602.02355v5
PDF http://arxiv.org/pdf/1602.02355v5.pdf
PWC https://paperswithcode.com/paper/hyperparameter-optimization-with-approximate
Repo https://github.com/fabianp/hoag
Framework none

Learning Typographic Style

Title Learning Typographic Style
Authors Shumeet Baluja
Abstract Typography is a ubiquitous art form that affects our understanding, perception, and trust in what we read. Thousands of different font-faces have been created with enormous variations in the characters. In this paper, we learn the style of a font by analyzing a small subset of only four letters. From these four letters, we learn two tasks. The first is a discrimination task: given the four letters and a new candidate letter, does the new letter belong to the same font? Second, given the four basis letters, can we generate all of the other letters with the same characteristics as those in the basis set? We use deep neural networks to address both tasks, quantitatively and qualitatively measure the results in a variety of novel manners, and present a thorough investigation of the weaknesses and strengths of the approach.
Tasks
Published 2016-03-13
URL http://arxiv.org/abs/1603.04000v1
PDF http://arxiv.org/pdf/1603.04000v1.pdf
PWC https://paperswithcode.com/paper/learning-typographic-style
Repo https://github.com/kaonashi-tyc/Rewrite
Framework tf

openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit

Title openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit
Authors Maximilian Schmitt, Björn W. Schuller
Abstract We introduce openXBOW, an open-source toolkit for the generation of bag-of-words (BoW) representations from multimodal input. In the BoW principle, word histograms were first used as features in document classification, but the idea was and can easily be adapted to, e.g., acoustic or visual low-level descriptors, introducing a prior step of vector quantisation. The openXBOW toolkit supports arbitrary numeric input features and text input and concatenates computed subbags to a final bag. It provides a variety of extensions and options. To our knowledge, openXBOW is the first publicly available toolkit for the generation of crossmodal bags-of-words. The capabilities of the tool are exemplified in two sample scenarios: time-continuous speech-based emotion recognition and sentiment analysis in tweets where improved results over other feature representation forms were observed.
Tasks Document Classification, Emotion Recognition, Sentiment Analysis
Published 2016-05-22
URL http://arxiv.org/abs/1605.06778v1
PDF http://arxiv.org/pdf/1605.06778v1.pdf
PWC https://paperswithcode.com/paper/openxbow-introducing-the-passau-open-source
Repo https://github.com/openXBOW/openXBOW
Framework tf

Minimax Filter: Learning to Preserve Privacy from Inference Attacks

Title Minimax Filter: Learning to Preserve Privacy from Inference Attacks
Authors Jihun Hamm
Abstract Preserving privacy of continuous and/or high-dimensional data such as images, videos and audios, can be challenging with syntactic anonymization methods which are designed for discrete attributes. Differential privacy, which provides a more formal definition of privacy, has shown more success in sanitizing continuous data. However, both syntactic and differential privacy are susceptible to inference attacks, i.e., an adversary can accurately infer sensitive attributes from sanitized data. The paper proposes a novel filter-based mechanism which preserves privacy of continuous and high-dimensional attributes against inference attacks. Finding the optimal utility-privacy tradeoff is formulated as a min-diff-max optimization problem. The paper provides an ERM-like analysis of the generalization error and also a practical algorithm to perform the optimization. In addition, the paper proposes an extension that combines minimax filter and differentially-private noisy mechanism. Advantages of the method over purely noisy mechanisms is explained and demonstrated with examples. Experiments with several real-world tasks including facial expression classification, speech emotion classification, and activity classification from motion, show that the minimax filter can simultaneously achieve similar or better target task accuracy and lower inference accuracy, often significantly lower than previous methods.
Tasks Emotion Classification
Published 2016-10-12
URL http://arxiv.org/abs/1610.03577v3
PDF http://arxiv.org/pdf/1610.03577v3.pdf
PWC https://paperswithcode.com/paper/minimax-filter-learning-to-preserve-privacy
Repo https://github.com/jihunhamm/MinimaxFilter
Framework none

GAP Safe Screening Rules for Sparse-Group-Lasso

Title GAP Safe Screening Rules for Sparse-Group-Lasso
Authors Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon
Abstract In high dimensional settings, sparse structures are crucial for efficiency, either in term of memory, computation or performance. In some contexts, it is natural to handle more refined structures than pure sparsity, such as for instance group sparsity. Sparse-Group Lasso has recently been introduced in the context of linear regression to enforce sparsity both at the feature level and at the group level. We adapt to the case of Sparse-Group Lasso recent safe screening rules that discard early in the solver irrelevant features/groups. Such rules have led to important speed-ups for a wide range of iterative methods. Thanks to dual gap computations, we provide new safe screening rules for Sparse-Group Lasso and show significant gains in term of computing time for a coordinate descent implementation.
Tasks
Published 2016-02-19
URL http://arxiv.org/abs/1602.06225v1
PDF http://arxiv.org/pdf/1602.06225v1.pdf
PWC https://paperswithcode.com/paper/gap-safe-screening-rules-for-sparse-group
Repo https://github.com/EugeneNdiaye/GAPSAFE_SGL
Framework none

Efficient softmax approximation for GPUs

Title Efficient softmax approximation for GPUs
Authors Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou
Abstract We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.
Tasks
Published 2016-09-14
URL http://arxiv.org/abs/1609.04309v3
PDF http://arxiv.org/pdf/1609.04309v3.pdf
PWC https://paperswithcode.com/paper/efficient-softmax-approximation-for-gpus
Repo https://github.com/simon555/LM_word
Framework pytorch

Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks

Title Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks
Authors Huy Phan, Lars Hertel, Marco Maass, Alfred Mertins
Abstract We present in this paper a simple, yet efficient convolutional neural network (CNN) architecture for robust audio event recognition. Opposing to deep CNN architectures with multiple convolutional and pooling layers topped up with multiple fully connected layers, the proposed network consists of only three layers: convolutional, pooling, and softmax layer. Two further features distinguish it from the deep architectures that have been proposed for the task: varying-size convolutional filters at the convolutional layer and 1-max pooling scheme at the pooling layer. In intuition, the network tends to select the most discriminative features from the whole audio signals for recognition. Our proposed CNN not only shows state-of-the-art performance on the standard task of robust audio event recognition but also outperforms other deep architectures up to 4.5% in terms of recognition accuracy, which is equivalent to 76.3% relative error reduction.
Tasks
Published 2016-04-21
URL http://arxiv.org/abs/1604.06338v2
PDF http://arxiv.org/pdf/1604.06338v2.pdf
PWC https://paperswithcode.com/paper/robust-audio-event-recognition-with-1-max
Repo https://github.com/9552nZ/SmartSheetMusic
Framework none

BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings

Title BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings
Authors Biao Zhang, Deyi Xiong, Jinsong Su
Abstract In this paper, we propose a bidimensional attention based recursive autoencoder (BattRAE) to integrate clues and sourcetarget interactions at multiple levels of granularity into bilingual phrase representations. We employ recursive autoencoders to generate tree structures of phrases with embeddings at different levels of granularity (e.g., words, sub-phrases and phrases). Over these embeddings on the source and target side, we introduce a bidimensional attention network to learn their interactions encoded in a bidimensional attention matrix, from which we extract two soft attention weight distributions simultaneously. These weight distributions enable BattRAE to generate compositive phrase representations via convolution. Based on the learned phrase representations, we further use a bilinear neural model, trained via a max-margin method, to measure bilingual semantic similarity. To evaluate the effectiveness of BattRAE, we incorporate this semantic similarity as an additional feature into a state-of-the-art SMT system. Extensive experiments on NIST Chinese-English test sets show that our model achieves a substantial improvement of up to 1.63 BLEU points on average over the baseline.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2016-05-25
URL http://arxiv.org/abs/1605.07874v2
PDF http://arxiv.org/pdf/1605.07874v2.pdf
PWC https://paperswithcode.com/paper/battrae-bidimensional-attention-based
Repo https://github.com/DeepLearnXMU/BattRAE
Framework none

Unsupervised Learning for Computational Phenotyping

Title Unsupervised Learning for Computational Phenotyping
Authors Chris Hodapp
Abstract With large volumes of health care data comes the research area of computational phenotyping, making use of techniques such as machine learning to describe illnesses and other clinical concepts from the data itself. The “traditional” approach of using supervised learning relies on a domain expert, and has two main limitations: requiring skilled humans to supply correct labels limits its scalability and accuracy, and relying on existing clinical descriptions limits the sorts of patterns that can be found. For instance, it may fail to acknowledge that a disease treated as a single condition may really have several subtypes with different phenotypes, as seems to be the case with asthma and heart disease. Some recent papers cite successes instead using unsupervised learning. This shows great potential for finding patterns in Electronic Health Records that would otherwise be hidden and that can lead to greater understanding of conditions and treatments. This work implements a method derived strongly from Lasko et al., but implements it in Apache Spark and Python and generalizes it to laboratory time-series data in MIMIC-III. It is released as an open-source tool for exploration, analysis, and visualization, available at https://github.com/Hodapp87/mimic3_phenotyping
Tasks Computational Phenotyping, Time Series
Published 2016-12-26
URL http://arxiv.org/abs/1612.08425v2
PDF http://arxiv.org/pdf/1612.08425v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-for-computational
Repo https://github.com/Hodapp87/mimic3_phenotyping
Framework tf

Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition

Title Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition
Authors Yaming Wang, Vlad I. Morariu, Larry S. Davis
Abstract Compared to earlier multistage frameworks using CNN features, recent end-to-end deep approaches for fine-grained recognition essentially enhance the mid-level learning capability of CNNs. Previous approaches achieve this by introducing an auxiliary network to infuse localization information into the main classification network, or a sophisticated feature encoding method to capture higher order feature statistics. We show that mid-level representation learning can be enhanced within the CNN framework, by learning a bank of convolutional filters that capture class-specific discriminative patches without extra part or bounding box annotations. Such a filter bank is well structured, properly initialized and discriminatively learned through a novel asymmetric multi-stream architecture with convolutional filter supervision and a non-random layer initialization. Experimental results show that our approach achieves state-of-the-art on three publicly available fine-grained recognition datasets (CUB-200-2011, Stanford Cars and FGVC-Aircraft). Ablation studies and visualizations are provided to understand our approach.
Tasks Representation Learning
Published 2016-11-29
URL http://arxiv.org/abs/1611.09932v3
PDF http://arxiv.org/pdf/1611.09932v3.pdf
PWC https://paperswithcode.com/paper/learning-a-discriminative-filter-bank-within
Repo https://github.com/jobinkv/Ongoing
Framework none

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Title DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Authors Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou
Abstract We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.
Tasks
Published 2016-06-20
URL http://arxiv.org/abs/1606.06160v3
PDF http://arxiv.org/pdf/1606.06160v3.pdf
PWC https://paperswithcode.com/paper/dorefa-net-training-low-bitwidth
Repo https://github.com/hpi-xnor/BMXNet
Framework mxnet

Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

Title Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
Authors Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson
Abstract Yes, they do. This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small or shallow models of high accuracy to be trained. Although previous research showed that shallow feed-forward nets sometimes can learn the complex functions previously learned by deep nets while using the same number of parameters as the deep models they mimic, in this paper we demonstrate that the same methods cannot be used to train accurate models on CIFAR-10 unless the student models contain multiple layers of convolution. Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional teacher.
Tasks
Published 2016-03-17
URL http://arxiv.org/abs/1603.05691v4
PDF http://arxiv.org/pdf/1603.05691v4.pdf
PWC https://paperswithcode.com/paper/do-deep-convolutional-nets-really-need-to-be
Repo https://github.com/LeonardoGracioS/DD2424_Deep_Learning_Project
Framework none

Song Recommendation with Non-Negative Matrix Factorization and Graph Total Variation

Title Song Recommendation with Non-Negative Matrix Factorization and Graph Total Variation
Authors Kirell Benzi, Vassilis Kalofolias, Xavier Bresson, Pierre Vandergheynst
Abstract This work formulates a novel song recommender system as a matrix completion problem that benefits from collaborative filtering through Non-negative Matrix Factorization (NMF) and content-based filtering via total variation (TV) on graphs. The graphs encode both playlist proximity information and song similarity, using a rich combination of audio, meta-data and social features. As we demonstrate, our hybrid recommendation system is very versatile and incorporates several well-known methods while outperforming them. Particularly, we show on real-world data that our model overcomes w.r.t. two evaluation metrics the recommendation of models solely based on low-rank information, graph-based information or a combination of both.
Tasks Matrix Completion, Recommendation Systems
Published 2016-01-08
URL http://arxiv.org/abs/1601.01892v2
PDF http://arxiv.org/pdf/1601.01892v2.pdf
PWC https://paperswithcode.com/paper/song-recommendation-with-non-negative-matrix
Repo https://github.com/kikohs/recog
Framework none
comments powered by Disqus