July 29, 2019

3070 words 15 mins read

Paper Group AWR 109

Geometric GAN. Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks. Conditional Image-Text Embedding Networks. Cost Adaptation for Robust Decentralized Swarm Behaviour. Learning Discrete Representations via Information Maximizing Self-Augmented Training. TextureGAN: Controlling Deep Image Synthesis with Texture Patche …

Geometric GAN


Title	Geometric GAN
Authors	Jae Hyun Lim, Jong Chul Ye
Abstract	Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN.
Tasks
Published	2017-05-08
URL	http://arxiv.org/abs/1705.02894v2
PDF	http://arxiv.org/pdf/1705.02894v2.pdf
PWC	https://paperswithcode.com/paper/geometric-gan
Repo	https://github.com/WangZesen/Spectral-Normalization-GAN
Framework	tf

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks


Title	Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
Authors	Michael Gygli
Abstract	Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results while running at an unprecedented speed of more than 120x real-time.
Tasks	Boundary Detection, Temporal Action Localization
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08214v1
PDF	http://arxiv.org/pdf/1705.08214v1.pdf
PWC	https://paperswithcode.com/paper/ridiculously-fast-shot-boundary-detection
Repo	https://github.com/abramjos/Scene-boundary-detection
Framework	tf

Conditional Image-Text Embedding Networks


Title	Conditional Image-Text Embedding Networks
Authors	Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik
Abstract	This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.
Tasks	Phrase Grounding
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08389v4
PDF	http://arxiv.org/pdf/1711.08389v4.pdf
PWC	https://paperswithcode.com/paper/conditional-image-text-embedding-networks
Repo	https://github.com/BryanPlummer/cite
Framework	tf

Cost Adaptation for Robust Decentralized Swarm Behaviour


Title	Cost Adaptation for Robust Decentralized Swarm Behaviour
Authors	Peter Henderson, Matthew Vertescher, David Meger, Mark Coates
Abstract	Decentralized receding horizon control (D-RHC) provides a mechanism for coordination in multi-agent settings without a centralized command center. However, combining a set of different goals, costs, and constraints to form an efficient optimization objective for D-RHC can be difficult. To allay this problem, we use a meta-learning process – cost adaptation – which generates the optimization objective for D-RHC to solve based on a set of human-generated priors (cost and constraint functions) and an auxiliary heuristic. We use this adaptive D-RHC method for control of mesh-networked swarm agents. This formulation allows a wide range of tasks to be encoded and can account for network delays, heterogeneous capabilities, and increasingly large swarms through the adaptation mechanism. We leverage the Unity3D game engine to build a simulator capable of introducing artificial networking failures and delays in the swarm. Using the simulator we validate our method on an example coordinated exploration task. We demonstrate that cost adaptation allows for more efficient and safer task completion under varying environment conditions and increasingly large swarm sizes. We release our simulator and code to the community for future work.
Tasks	Meta-Learning
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07114v2
PDF	http://arxiv.org/pdf/1709.07114v2.pdf
PWC	https://paperswithcode.com/paper/cost-adaptation-for-robust-decentralized
Repo	https://github.com/Breakend/SocraticSwarm
Framework	none

Learning Discrete Representations via Information Maximizing Self-Augmented Training


Title	Learning Discrete Representations via Information Maximizing Self-Augmented Training
Authors	Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
Abstract	Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefore, we need to carefully regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. To this end, we propose a method called Information Maximizing Self-Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose the invariance on discrete representations. More specifically, we encourage the predicted representations of augmented data points to be close to those of the original data points in an end-to-end fashion. At the same time, we maximize the information-theoretic dependency between data and their predicted discrete representations. Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning.
Tasks	Data Augmentation, Unsupervised Image Classification
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08720v3
PDF	http://arxiv.org/pdf/1702.08720v3.pdf
PWC	https://paperswithcode.com/paper/learning-discrete-representations-via
Repo	https://github.com/weihua916/imsat
Framework	none

TextureGAN: Controlling Deep Image Synthesis with Texture Patches


Title	TextureGAN: Controlling Deep Image Synthesis with Texture Patches
Authors	Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
Abstract	In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly.
Tasks	Image Generation, Texture Synthesis
Published	2017-06-09
URL	http://arxiv.org/abs/1706.02823v3
PDF	http://arxiv.org/pdf/1706.02823v3.pdf
PWC	https://paperswithcode.com/paper/texturegan-controlling-deep-image-synthesis
Repo	https://github.com/yuchuanhui/TextureGanPython3
Framework	pytorch

CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression


Title	CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression
Authors	Marcella Astrid, Seung-Ik Lee
Abstract	Convolutional Neural Networks (CNNs) has shown a great success in many areas including complex image classification tasks. However, they need a lot of memory and computational cost, which hinders them from running in relatively low-end smart devices such as smart phones. We propose a CNN compression method based on CP-decomposition and Tensor Power Method. We also propose an iterative fine tuning, with which we fine-tune the whole network after decomposing each layer, but before decomposing the next layer. Significant reduction in memory and computation cost is achieved compared to state-of-the-art previous work with no more accuracy loss.
Tasks	Image Classification
Published	2017-01-25
URL	http://arxiv.org/abs/1701.07148v1
PDF	http://arxiv.org/pdf/1701.07148v1.pdf
PWC	https://paperswithcode.com/paper/cp-decomposition-with-tensor-power-method-for
Repo	https://github.com/larry0123du/Decompose-CNN
Framework	pytorch

Learning to Represent Programs with Graphs


Title	Learning to Represent Programs with Graphs
Authors	Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi
Abstract	Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code’s known syntax. For example, long-range dependencies induced by using the same variable or function in distant locations are often not considered. We propose to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures. In this work, we present how to construct graphs from source code and how to scale Gated Graph Neural Networks training to such large graphs. We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage, and VarMisuse, in which the network learns to reason about selecting the correct variable that should be used at a given program location. Our comparison to methods that use less structured program representations shows the advantages of modeling known structure, and suggests that our models learn to infer meaningful names and to solve the VarMisuse task in many cases. Additionally, our testing showed that VarMisuse identifies a number of bugs in mature open-source projects.
Tasks
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00740v3
PDF	http://arxiv.org/pdf/1711.00740v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-represent-programs-with-graphs
Repo	https://github.com/kano1021/my-internship
Framework	none

MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification


Title	MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification
Authors	Keigo Kimura, Lu Sun, Mineichi Kudo
Abstract	Multi-Label Classification toolbox is a MATLAB/OCTAVE library for Multi-Label Classification (MLC). There exists a few Java libraries for MLC, but no MATLAB/OCTAVE library that covers various methods. This toolbox offers an environment for evaluation, comparison and visualization of the MLC results. One attraction of this toolbox is that it enables us to try many combinations of feature space dimension reduction, sample clustering, label space dimension reduction and ensemble, etc.
Tasks	Dimensionality Reduction, Multi-Label Classification
Published	2017-04-09
URL	http://arxiv.org/abs/1704.02592v1
PDF	http://arxiv.org/pdf/1704.02592v1.pdf
PWC	https://paperswithcode.com/paper/mlc-toolbox-a-matlaboctave-library-for-multi
Repo	https://github.com/hinanmu/multi-label-papers
Framework	none

A practical guide and software for analysing pairwise comparison experiments


Title	A practical guide and software for analysing pairwise comparison experiments
Authors	Maria Perez-Ortiz, Rafal K. Mantiuk
Abstract	Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.
Tasks	Image Quality Assessment
Published	2017-12-11
URL	http://arxiv.org/abs/1712.03686v2
PDF	http://arxiv.org/pdf/1712.03686v2.pdf
PWC	https://paperswithcode.com/paper/a-practical-guide-and-software-for-analysing
Repo	https://github.com/mantiuk/pwcmp
Framework	none

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec


Title	Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
Authors	Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, Jie Tang
Abstract	Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network’s normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices’ context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks’ Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
Tasks	Network Embedding, Representation Learning
Published	2017-10-09
URL	http://arxiv.org/abs/1710.02971v4
PDF	http://arxiv.org/pdf/1710.02971v4.pdf
PWC	https://paperswithcode.com/paper/network-embedding-as-matrix-factorization
Repo	https://github.com/benedekrozemberczki/karateclub
Framework	none

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes


Title	Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
Authors	Yang Zhang, Philip David, Boqing Gong
Abstract	During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models’ performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.
Tasks	Autonomous Driving, Domain Adaptation, Image-to-Image Translation, Semantic Segmentation, Synthetic-to-Real Translation
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09465v5
PDF	http://arxiv.org/pdf/1707.09465v5.pdf
PWC	https://paperswithcode.com/paper/curriculum-domain-adaptation-for-semantic
Repo	https://github.com/YangZhang4065/AdaptationSeg
Framework	tf

Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks


Title	Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
Authors	Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer
Abstract	Deep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation airborne collision avoidance system for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods.
Tasks
Published	2017-02-03
URL	http://arxiv.org/abs/1702.01135v2
PDF	http://arxiv.org/pdf/1702.01135v2.pdf
PWC	https://paperswithcode.com/paper/reluplex-an-efficient-smt-solver-for
Repo	https://github.com/eth-sri/eran
Framework	tf

Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates


Title	Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates
Authors	Guillem Collell, Luc Van Gool, Marie-Francine Moens
Abstract	Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit spatial relationship (e.g., “on”, “below”, etc.). In contrast with prior work that restricts spatial templates to explicit spatial prepositions (e.g., “glass on table”), here we extend this concept to implicit spatial language, i.e., those relationships (generally actions) for which the spatial arrangement of the objects is only implicitly implied (e.g., “man riding horse”). In contrast with explicit relationships, predicting spatial arrangements from implicit spatial language requires significant common sense spatial understanding. Here, we introduce the task of predicting spatial templates for two objects under a relationship, which can be seen as a spatial question-answering task with a (2D) continuous output (“where is the man w.r.t. a horse when the man is walking the horse?"). We present two simple neural-based models that leverage annotated images and structured text to learn this task. The good performance of these models reveals that spatial locations are to a large extent predictable from implicit spatial language. Crucially, the models attain similar performance in a challenging generalized setting, where the object-relation-object combinations (e.g.,“man walking dog”) have never been seen before. Next, we go one step further by presenting the models with unseen objects (e.g., “dog”). In this scenario, we show that leveraging word embeddings enables the models to output accurate spatial predictions, proving that the models acquire solid common sense spatial knowledge allowing for such generalization.
Tasks	Common Sense Reasoning, Question Answering, Word Embeddings
Published	2017-11-18
URL	https://arxiv.org/abs/1711.06821v3
PDF	https://arxiv.org/pdf/1711.06821v3.pdf
PWC	https://paperswithcode.com/paper/acquiring-common-sense-spatial-knowledge
Repo	https://github.com/gcollell/spatial-commonsense
Framework	tf

Improved Training of Wasserstein GANs


Title	Improved Training of Wasserstein GANs
Authors	Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville
Abstract	Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.
Tasks	Conditional Image Generation, Image Generation
Published	2017-03-31
URL	http://arxiv.org/abs/1704.00028v3
PDF	http://arxiv.org/pdf/1704.00028v3.pdf
PWC	https://paperswithcode.com/paper/improved-training-of-wasserstein-gans
Repo	https://github.com/adler-j/bwgan
Framework	tf