July 29, 2019

3070 words 15 mins read

Paper Group AWR 109

Paper Group AWR 109

Geometric GAN. Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks. Conditional Image-Text Embedding Networks. Cost Adaptation for Robust Decentralized Swarm Behaviour. Learning Discrete Representations via Information Maximizing Self-Augmented Training. TextureGAN: Controlling Deep Image Synthesis with Texture Patche …

Geometric GAN

Title Geometric GAN
Authors Jae Hyun Lim, Jong Chul Ye
Abstract Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN.
Tasks
Published 2017-05-08
URL http://arxiv.org/abs/1705.02894v2
PDF http://arxiv.org/pdf/1705.02894v2.pdf
PWC https://paperswithcode.com/paper/geometric-gan
Repo https://github.com/WangZesen/Spectral-Normalization-GAN
Framework tf

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

Title Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
Authors Michael Gygli
Abstract Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results while running at an unprecedented speed of more than 120x real-time.
Tasks Boundary Detection, Temporal Action Localization
Published 2017-05-23
URL http://arxiv.org/abs/1705.08214v1
PDF http://arxiv.org/pdf/1705.08214v1.pdf
PWC https://paperswithcode.com/paper/ridiculously-fast-shot-boundary-detection
Repo https://github.com/abramjos/Scene-boundary-detection
Framework tf

Conditional Image-Text Embedding Networks

Title Conditional Image-Text Embedding Networks
Authors Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik
Abstract This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.
Tasks Phrase Grounding
Published 2017-11-22
URL http://arxiv.org/abs/1711.08389v4
PDF http://arxiv.org/pdf/1711.08389v4.pdf
PWC https://paperswithcode.com/paper/conditional-image-text-embedding-networks
Repo https://github.com/BryanPlummer/cite
Framework tf

Cost Adaptation for Robust Decentralized Swarm Behaviour

Title Cost Adaptation for Robust Decentralized Swarm Behaviour
Authors Peter Henderson, Matthew Vertescher, David Meger, Mark Coates
Abstract Decentralized receding horizon control (D-RHC) provides a mechanism for coordination in multi-agent settings without a centralized command center. However, combining a set of different goals, costs, and constraints to form an efficient optimization objective for D-RHC can be difficult. To allay this problem, we use a meta-learning process – cost adaptation – which generates the optimization objective for D-RHC to solve based on a set of human-generated priors (cost and constraint functions) and an auxiliary heuristic. We use this adaptive D-RHC method for control of mesh-networked swarm agents. This formulation allows a wide range of tasks to be encoded and can account for network delays, heterogeneous capabilities, and increasingly large swarms through the adaptation mechanism. We leverage the Unity3D game engine to build a simulator capable of introducing artificial networking failures and delays in the swarm. Using the simulator we validate our method on an example coordinated exploration task. We demonstrate that cost adaptation allows for more efficient and safer task completion under varying environment conditions and increasingly large swarm sizes. We release our simulator and code to the community for future work.
Tasks Meta-Learning
Published 2017-09-21
URL http://arxiv.org/abs/1709.07114v2
PDF http://arxiv.org/pdf/1709.07114v2.pdf
PWC https://paperswithcode.com/paper/cost-adaptation-for-robust-decentralized
Repo https://github.com/Breakend/SocraticSwarm
Framework none

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Title Learning Discrete Representations via Information Maximizing Self-Augmented Training
Authors Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
Abstract Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefore, we need to carefully regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. To this end, we propose a method called Information Maximizing Self-Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose the invariance on discrete representations. More specifically, we encourage the predicted representations of augmented data points to be close to those of the original data points in an end-to-end fashion. At the same time, we maximize the information-theoretic dependency between data and their predicted discrete representations. Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning.
Tasks Data Augmentation, Unsupervised Image Classification
Published 2017-02-28
URL http://arxiv.org/abs/1702.08720v3
PDF http://arxiv.org/pdf/1702.08720v3.pdf
PWC https://paperswithcode.com/paper/learning-discrete-representations-via
Repo https://github.com/weihua916/imsat
Framework none

TextureGAN: Controlling Deep Image Synthesis with Texture Patches

Title TextureGAN: Controlling Deep Image Synthesis with Texture Patches
Authors Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
Abstract In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly.
Tasks Image Generation, Texture Synthesis
Published 2017-06-09
URL http://arxiv.org/abs/1706.02823v3
PDF http://arxiv.org/pdf/1706.02823v3.pdf
PWC https://paperswithcode.com/paper/texturegan-controlling-deep-image-synthesis
Repo https://github.com/yuchuanhui/TextureGanPython3
Framework pytorch

CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression

Title CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression
Authors Marcella Astrid, Seung-Ik Lee
Abstract Convolutional Neural Networks (CNNs) has shown a great success in many areas including complex image classification tasks. However, they need a lot of memory and computational cost, which hinders them from running in relatively low-end smart devices such as smart phones. We propose a CNN compression method based on CP-decomposition and Tensor Power Method. We also propose an iterative fine tuning, with which we fine-tune the whole network after decomposing each layer, but before decomposing the next layer. Significant reduction in memory and computation cost is achieved compared to state-of-the-art previous work with no more accuracy loss.
Tasks Image Classification
Published 2017-01-25
URL http://arxiv.org/abs/1701.07148v1
PDF http://arxiv.org/pdf/1701.07148v1.pdf
PWC https://paperswithcode.com/paper/cp-decomposition-with-tensor-power-method-for
Repo https://github.com/larry0123du/Decompose-CNN
Framework pytorch

Learning to Represent Programs with Graphs

Title Learning to Represent Programs with Graphs
Authors Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi
Abstract Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code’s known syntax. For example, long-range dependencies induced by using the same variable or function in distant locations are often not considered. We propose to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures. In this work, we present how to construct graphs from source code and how to scale Gated Graph Neural Networks training to such large graphs. We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage, and VarMisuse, in which the network learns to reason about selecting the correct variable that should be used at a given program location. Our comparison to methods that use less structured program representations shows the advantages of modeling known structure, and suggests that our models learn to infer meaningful names and to solve the VarMisuse task in many cases. Additionally, our testing showed that VarMisuse identifies a number of bugs in mature open-source projects.
Tasks
Published 2017-11-01
URL http://arxiv.org/abs/1711.00740v3
PDF http://arxiv.org/pdf/1711.00740v3.pdf
PWC https://paperswithcode.com/paper/learning-to-represent-programs-with-graphs
Repo https://github.com/kano1021/my-internship
Framework none

MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification

Title MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification
Authors Keigo Kimura, Lu Sun, Mineichi Kudo
Abstract Multi-Label Classification toolbox is a MATLAB/OCTAVE library for Multi-Label Classification (MLC). There exists a few Java libraries for MLC, but no MATLAB/OCTAVE library that covers various methods. This toolbox offers an environment for evaluation, comparison and visualization of the MLC results. One attraction of this toolbox is that it enables us to try many combinations of feature space dimension reduction, sample clustering, label space dimension reduction and ensemble, etc.
Tasks Dimensionality Reduction, Multi-Label Classification
Published 2017-04-09
URL http://arxiv.org/abs/1704.02592v1
PDF http://arxiv.org/pdf/1704.02592v1.pdf
PWC https://paperswithcode.com/paper/mlc-toolbox-a-matlaboctave-library-for-multi
Repo https://github.com/hinanmu/multi-label-papers
Framework none

A practical guide and software for analysing pairwise comparison experiments

Title A practical guide and software for analysing pairwise comparison experiments
Authors Maria Perez-Ortiz, Rafal K. Mantiuk
Abstract Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.
Tasks Image Quality Assessment
Published 2017-12-11
URL http://arxiv.org/abs/1712.03686v2
PDF http://arxiv.org/pdf/1712.03686v2.pdf
PWC https://paperswithcode.com/paper/a-practical-guide-and-software-for-analysing
Repo https://github.com/mantiuk/pwcmp
Framework none

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Title Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
Authors Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, Jie Tang
Abstract Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network’s normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices’ context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks’ Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
Tasks Network Embedding, Representation Learning
Published 2017-10-09
URL http://arxiv.org/abs/1710.02971v4
PDF http://arxiv.org/pdf/1710.02971v4.pdf
PWC https://paperswithcode.com/paper/network-embedding-as-matrix-factorization
Repo https://github.com/benedekrozemberczki/karateclub
Framework none

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

Title Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
Authors Yang Zhang, Philip David, Boqing Gong
Abstract During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models’ performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.
Tasks Autonomous Driving, Domain Adaptation, Image-to-Image Translation, Semantic Segmentation, Synthetic-to-Real Translation
Published 2017-07-29
URL http://arxiv.org/abs/1707.09465v5
PDF http://arxiv.org/pdf/1707.09465v5.pdf
PWC https://paperswithcode.com/paper/curriculum-domain-adaptation-for-semantic
Repo https://github.com/YangZhang4065/AdaptationSeg
Framework tf

Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks

Title Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
Authors Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer
Abstract Deep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation airborne collision avoidance system for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods.
Tasks
Published 2017-02-03
URL http://arxiv.org/abs/1702.01135v2
PDF http://arxiv.org/pdf/1702.01135v2.pdf
PWC https://paperswithcode.com/paper/reluplex-an-efficient-smt-solver-for
Repo https://github.com/eth-sri/eran
Framework tf

Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates

Title Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates
Authors Guillem Collell, Luc Van Gool, Marie-Francine Moens
Abstract Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit spatial relationship (e.g., “on”, “below”, etc.). In contrast with prior work that restricts spatial templates to explicit spatial prepositions (e.g., “glass on table”), here we extend this concept to implicit spatial language, i.e., those relationships (generally actions) for which the spatial arrangement of the objects is only implicitly implied (e.g., “man riding horse”). In contrast with explicit relationships, predicting spatial arrangements from implicit spatial language requires significant common sense spatial understanding. Here, we introduce the task of predicting spatial templates for two objects under a relationship, which can be seen as a spatial question-answering task with a (2D) continuous output (“where is the man w.r.t. a horse when the man is walking the horse?"). We present two simple neural-based models that leverage annotated images and structured text to learn this task. The good performance of these models reveals that spatial locations are to a large extent predictable from implicit spatial language. Crucially, the models attain similar performance in a challenging generalized setting, where the object-relation-object combinations (e.g.,“man walking dog”) have never been seen before. Next, we go one step further by presenting the models with unseen objects (e.g., “dog”). In this scenario, we show that leveraging word embeddings enables the models to output accurate spatial predictions, proving that the models acquire solid common sense spatial knowledge allowing for such generalization.
Tasks Common Sense Reasoning, Question Answering, Word Embeddings
Published 2017-11-18
URL https://arxiv.org/abs/1711.06821v3
PDF https://arxiv.org/pdf/1711.06821v3.pdf
PWC https://paperswithcode.com/paper/acquiring-common-sense-spatial-knowledge
Repo https://github.com/gcollell/spatial-commonsense
Framework tf

Improved Training of Wasserstein GANs

Title Improved Training of Wasserstein GANs
Authors Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville
Abstract Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.
Tasks Conditional Image Generation, Image Generation
Published 2017-03-31
URL http://arxiv.org/abs/1704.00028v3
PDF http://arxiv.org/pdf/1704.00028v3.pdf
PWC https://paperswithcode.com/paper/improved-training-of-wasserstein-gans
Repo https://github.com/adler-j/bwgan
Framework tf
comments powered by Disqus