July 29, 2019

3072 words 15 mins read

Paper Group AWR 84

Paper Group AWR 84

Subregular Complexity and Deep Learning. Tensor-Train Recurrent Neural Networks for Video Classification. Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint. Convolutional Neural Networks for Facial Expression Recognition. In Defense of the Triplet Loss for Person Re-Identification. Truly Multi-modal YouTube-8M Video Cl …

Subregular Complexity and Deep Learning

Title Subregular Complexity and Deep Learning
Authors Enes Avcu, Chihiro Shibata, Jeffrey Heinz
Abstract This paper argues that the judicial use of formal language theory and grammatical inference are invaluable tools in understanding how deep neural networks can and cannot represent and learn long-term dependencies in temporal sequences. Learning experiments were conducted with two types of Recurrent Neural Networks (RNNs) on six formal languages drawn from the Strictly Local (SL) and Strictly Piecewise (SP) classes. The networks were Simple RNNs (s-RNNs) and Long Short-Term Memory RNNs (LSTMs) of varying sizes. The SL and SP classes are among the simplest in a mathematically well-understood hierarchy of subregular classes. They encode local and long-term dependencies, respectively. The grammatical inference algorithm Regular Positive and Negative Inference (RPNI) provided a baseline. According to earlier research, the LSTM architecture should be capable of learning long-term dependencies and should outperform s-RNNs. The results of these experiments challenge this narrative. First, the LSTMs’ performance was generally worse in the SP experiments than in the SL ones. Second, the s-RNNs out-performed the LSTMs on the most complex SP experiment and performed comparably to them on the others.
Tasks
Published 2017-05-16
URL http://arxiv.org/abs/1705.05940v3
PDF http://arxiv.org/pdf/1705.05940v3.pdf
PWC https://paperswithcode.com/paper/subregular-complexity-and-deep-learning
Repo https://github.com/enesavc/subreg_deeplearning
Framework none

Tensor-Train Recurrent Neural Networks for Video Classification

Title Tensor-Train Recurrent Neural Networks for Video Classification
Authors Yinchong Yang, Denis Krompass, Volker Tresp
Abstract The Recurrent Neural Networks and their variants have shown promising performances in sequence modeling tasks such as Natural Language Processing. These models, however, turn out to be impractical and difficult to train when exposed to very high-dimensional inputs due to the large input-to-hidden weight matrix. This may have prevented RNNs’ large-scale application in tasks that involve very high input dimensions such as video modeling; current approaches reduce the input dimensions using various feature extractors. To address this challenge, we propose a new, more general and efficient approach by factorizing the input-to-hidden weight matrix using Tensor-Train decomposition which is trained simultaneously with the weights themselves. We test our model on classification tasks using multiple real-world video datasets and achieve competitive performances with state-of-the-art models, even though our model architecture is orders of magnitude less complex. We believe that the proposed approach provides a novel and fundamental building block for modeling high-dimensional sequential data with RNN architectures and opens up many possibilities to transfer the expressive and advanced architectures from other domains such as NLP to modeling high-dimensional sequential data.
Tasks Video Classification
Published 2017-07-06
URL http://arxiv.org/abs/1707.01786v1
PDF http://arxiv.org/pdf/1707.01786v1.pdf
PWC https://paperswithcode.com/paper/tensor-train-recurrent-neural-networks-for
Repo https://github.com/Tuyki/TT_RNN
Framework tf

Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint

Title Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint
Authors Sungeun Hong, Woobin Im, Hyun S. Yang
Abstract Up to now, only limited research has been conducted on cross-modal retrieval of suitable music for a specified video or vice versa. Moreover, much of the existing research relies on metadata such as keywords, tags, or associated description that must be individually produced and attached posterior. This paper introduces a new content-based, cross-modal retrieval method for video and music that is implemented through deep neural networks. We train the network via inter-modal ranking loss such that videos and music with similar semantics end up close together in the embedding space. However, if only the inter-modal ranking constraint is used for embedding, modality-specific characteristics can be lost. To address this problem, we propose a novel soft intra-modal structure loss that leverages the relative distance relationship between intra-modal samples before embedding. We also introduce reasonable quantitative and qualitative experimental protocols to solve the lack of standard protocols for less-mature video-music related tasks. Finally, we construct a large-scale 200K video-music pair benchmark. All the datasets and source code can be found in our online repository (https://github.com/csehong/VM-NET).
Tasks Cross-Modal Retrieval
Published 2017-04-22
URL http://arxiv.org/abs/1704.06761v2
PDF http://arxiv.org/pdf/1704.06761v2.pdf
PWC https://paperswithcode.com/paper/content-based-video-music-retrieval-using
Repo https://github.com/csehong/VM-NET
Framework tf

Convolutional Neural Networks for Facial Expression Recognition

Title Convolutional Neural Networks for Facial Expression Recognition
Authors Shima Alizadeh, Azar Fazel
Abstract We have developed convolutional neural networks (CNN) for a facial expression recognition task. The goal is to classify each facial image into one of the seven facial emotion categories considered in this study. We trained CNN models with different depth using gray-scale images. We developed our models in Torch and exploited Graphics Processing Unit (GPU) computation in order to expedite the training process. In addition to the networks performing based on raw pixel data, we employed a hybrid feature strategy by which we trained a novel CNN model with the combination of raw pixel data and Histogram of Oriented Gradients (HOG) features. To reduce the overfitting of the models, we utilized different techniques including dropout and batch normalization in addition to L2 regularization. We applied cross validation to determine the optimal hyper-parameters and evaluated the performance of the developed models by looking at their training histories. We also present the visualization of different layers of a network to show what features of a face can be learned by CNN models.
Tasks Facial Expression Recognition, L2 Regularization
Published 2017-04-22
URL http://arxiv.org/abs/1704.06756v1
PDF http://arxiv.org/pdf/1704.06756v1.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-networks-for-facial
Repo https://github.com/jaydeepthik/kaggle-facial-expression-recognition
Framework tf

In Defense of the Triplet Loss for Person Re-Identification

Title In Defense of the Triplet Loss for Person Re-Identification
Authors Alexander Hermans, Lucas Beyer, Bastian Leibe
Abstract In the past few years, the field of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person re-identification subfield is no exception to this. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (classification, verification) followed by a separate metric learning step. We show that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.
Tasks Metric Learning, Person Re-Identification
Published 2017-03-22
URL http://arxiv.org/abs/1703.07737v4
PDF http://arxiv.org/pdf/1703.07737v4.pdf
PWC https://paperswithcode.com/paper/in-defense-of-the-triplet-loss-for-person-re
Repo https://github.com/AsuradaYuci/tripletreid-zhushi
Framework tf

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Title Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text
Authors Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D’Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar
Abstract The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text. The newly introduced text data is termed as YouTube-8M-Text. We present a classification framework for the joint use of text, visual and audio features, and conduct an extensive set of experiments to quantify the benefit that this additional mode brings. The inclusion of text yields state-of-the-art results, e.g. 86.7% GAP on the YouTube-8M-Text validation dataset.
Tasks Video Classification
Published 2017-06-17
URL http://arxiv.org/abs/1706.05461v3
PDF http://arxiv.org/pdf/1706.05461v3.pdf
PWC https://paperswithcode.com/paper/truly-multi-modal-youtube-8m-video
Repo https://github.com/hrx2010/YouTube8m-Text
Framework tf

NoReC: The Norwegian Review Corpus

Title NoReC: The Norwegian Review Corpus
Authors Erik Velldal, Lilja Øvrelid, Eivind Alexander Bergem, Cathrine Stadsnes, Samia Touileb, Fredrik Jørgensen
Abstract This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1-6, as provided by the rating of the original author. This first release of the corpus comprises more than 35,000 reviews. It is distributed using the CoNLL-U format, pre-processed using UDPipe, along with a rich set of metadata. The work reported in this paper forms part of the SANT initiative (Sentiment Analysis for Norwegian Text), a project seeking to provide resources and tools for sentiment analysis and opinion mining for Norwegian. As resources for sentiment analysis have so far been unavailable for Norwegian, NoReC represents a highly valuable and sought-after addition to Norwegian language technology.
Tasks Opinion Mining, Sentiment Analysis
Published 2017-10-15
URL http://arxiv.org/abs/1710.05370v1
PDF http://arxiv.org/pdf/1710.05370v1.pdf
PWC https://paperswithcode.com/paper/norec-the-norwegian-review-corpus
Repo https://github.com/ltgoslo/norec
Framework none

Style2Vec: Representation Learning for Fashion Items from Style Sets

Title Style2Vec: Representation Learning for Fashion Items from Style Sets
Authors Hanbit Lee, Jinseok Seol, Sang-goo Lee
Abstract With the rapid growth of online fashion market, demand for effective fashion recommendation systems has never been greater. In fashion recommendation, the ability to find items that goes well with a few other items based on style is more important than picking a single item based on the user’s entire purchase history. Since the same user may have purchased dress suits in one month and casual denims in another, it is impossible to learn the latent style features of those items using only the user ratings. If we were able to represent the style features of fashion items in a reasonable way, we will be able to recommend new items that conform to some small subset of pre-purchased items that make up a coherent style set. We propose Style2Vec, a vector representation model for fashion items. Based on the intuition of distributional semantics used in word embeddings, Style2Vec learns the representation of a fashion item using other items in matching outfits as context. Two different convolutional neural networks are trained to maximize the probability of item co-occurrences. For evaluation, a fashion analogy test is conducted to show that the resulting representation connotes diverse fashion related semantics like shapes, colors, patterns and even latent styles. We also perform style classification using Style2Vec features and show that our method outperforms other baselines.
Tasks Recommendation Systems, Representation Learning, Word Embeddings
Published 2017-08-14
URL http://arxiv.org/abs/1708.04014v1
PDF http://arxiv.org/pdf/1708.04014v1.pdf
PWC https://paperswithcode.com/paper/style2vec-representation-learning-for-fashion
Repo https://github.com/trhgu/awesome-fashion-contents
Framework none

Clustering with t-SNE, provably

Title Clustering with t-SNE, provably
Authors George C. Linderman, Stefan Steinerberger
Abstract t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration’ phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter $\alpha$ and step size $h$. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods. |
Tasks
Published 2017-06-08
URL http://arxiv.org/abs/1706.02582v1
PDF http://arxiv.org/pdf/1706.02582v1.pdf
PWC https://paperswithcode.com/paper/clustering-with-t-sne-provably
Repo https://github.com/KlugerLab/t-SNE-Heatmaps
Framework none

Procedural Content Generation via Machine Learning (PCGML)

Title Procedural Content Generation via Machine Learning (PCGML)
Authors Adam Summerville, Sam Snodgrass, Matthew Guzdial, Christoffer Holmgård, Amy K. Hoover, Aaron Isaksen, Andy Nealen, Julian Togelius
Abstract This survey explores Procedural Content Generation via Machine Learning (PCGML), defined as the generation of game content using machine learning models trained on existing content. As the importance of PCG for game development increases, researchers explore new avenues for generating high-quality content with or without human involvement; this paper addresses the relatively new paradigm of using machine learning (in contrast with search-based, solver-based, and constructive methods). We focus on what is most often considered functional game content such as platformer levels, game maps, interactive fiction stories, and cards in collectible card games, as opposed to cosmetic content such as sprites and sound effects. In addition to using PCG for autonomous generation, co-creativity, mixed-initiative design, and compression, PCGML is suited for repair, critique, and content analysis because of its focus on modeling existing content. We discuss various data sources and representations that affect the resulting generated content. Multiple PCGML methods are covered, including neural networks, long short-term memory (LSTM) networks, autoencoders, and deep convolutional networks; Markov models, $n$-grams, and multi-dimensional Markov chains; clustering; and matrix factorization. Finally, we discuss open problems in the application of PCGML, including learning from small datasets, lack of training data, multi-layered learning, style-transfer, parameter tuning, and PCG as a game mechanic.
Tasks Card Games, Style Transfer
Published 2017-02-02
URL http://arxiv.org/abs/1702.00539v3
PDF http://arxiv.org/pdf/1702.00539v3.pdf
PWC https://paperswithcode.com/paper/procedural-content-generation-via-machine
Repo https://github.com/michaelbrave/Procedural-Generation-And-Generative-Systems-Resources
Framework none

Time-Contrastive Networks: Self-Supervised Learning from Video

Title Time-Contrastive Networks: Self-Supervised Learning from Video
Authors Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine
Abstract We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitate
Tasks Metric Learning
Published 2017-04-23
URL http://arxiv.org/abs/1704.06888v3
PDF http://arxiv.org/pdf/1704.06888v3.pdf
PWC https://paperswithcode.com/paper/time-contrastive-networks-self-supervised
Repo https://github.com/tensorflow/models
Framework tf

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

Title Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
Authors Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel
Abstract Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.
Tasks Meta-Learning
Published 2017-10-10
URL http://arxiv.org/abs/1710.03641v2
PDF http://arxiv.org/pdf/1710.03641v2.pdf
PWC https://paperswithcode.com/paper/continuous-adaptation-via-meta-learning-in
Repo https://github.com/openai/robosumo
Framework tf

Learning Representations and Generative Models for 3D Point Clouds

Title Learning Representations and Generative Models for 3D Point Clouds
Authors Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, Leonidas Guibas
Abstract Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds. We introduce a deep AutoEncoder (AE) network with state-of-the-art reconstruction quality and generalization ability. The learned representations outperform existing methods on 3D recognition tasks and enable shape editing via simple algebraic manipulations, such as semantic part editing, shape analogies and shape interpolation, as well as shape completion. We perform a thorough study of different generative models including GANs operating on the raw point clouds, significantly improved GANs trained in the fixed latent space of our AEs, and Gaussian Mixture Models (GMMs). To quantitatively evaluate generative models we introduce measures of sample fidelity and diversity based on matchings between sets of point clouds. Interestingly, our evaluation of generalization, fidelity and diversity reveals that GMMs trained in the latent space of our AEs yield the best results overall.
Tasks Representation Learning
Published 2017-07-08
URL http://arxiv.org/abs/1707.02392v3
PDF http://arxiv.org/pdf/1707.02392v3.pdf
PWC https://paperswithcode.com/paper/learning-representations-and-generative
Repo https://github.com/optas/latent_3d_points
Framework tf

Analogical Inference for Multi-Relational Embeddings

Title Analogical Inference for Multi-Relational Embeddings
Authors Hanxiao Liu, Yuexin Wu, Yiming Yang
Abstract Large-scale multi-relational embedding refers to the task of learning the latent representations for entities and relations in large knowledge graphs. An effective and scalable solution for this problem is crucial for the true success of knowledge-based inference in a broad range of applications. This paper proposes a novel framework for optimizing the latent representations with respect to the \textit{analogical} properties of the embedded entities and relations. By formulating the learning objective in a differentiable fashion, our model enjoys both theoretical power and computational scalability, and significantly outperformed a large number of representative baseline methods on benchmark datasets. Furthermore, the model offers an elegant unification of several well-known methods in multi-relational embedding, which can be proven to be special instantiations of our framework.
Tasks Knowledge Graphs, Link Prediction
Published 2017-05-06
URL http://arxiv.org/abs/1705.02426v2
PDF http://arxiv.org/pdf/1705.02426v2.pdf
PWC https://paperswithcode.com/paper/analogical-inference-for-multi-relational
Repo https://github.com/quark0/ANALOGY
Framework none

An Overview of Multi-Task Learning in Deep Neural Networks

Title An Overview of Multi-Task Learning in Deep Neural Networks
Authors Sebastian Ruder
Abstract Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.
Tasks Drug Discovery, Multi-Task Learning, Speech Recognition
Published 2017-06-15
URL http://arxiv.org/abs/1706.05098v1
PDF http://arxiv.org/pdf/1706.05098v1.pdf
PWC https://paperswithcode.com/paper/an-overview-of-multi-task-learning-in-deep
Repo https://github.com/HazyResearch/metal
Framework pytorch
comments powered by Disqus