Paper Group AWR 109
Geometric GAN. Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks. Conditional Image-Text Embedding Networks. Cost Adaptation for Robust Decentralized Swarm Behaviour. Learning Discrete Representations via Information Maximizing Self-Augmented Training. TextureGAN: Controlling Deep Image Synthesis with Texture Patche …
Geometric GAN
Title | Geometric GAN |
Authors | Jae Hyun Lim, Jong Chul Ye |
Abstract | Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN. |
Tasks | |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.02894v2 |
http://arxiv.org/pdf/1705.02894v2.pdf | |
PWC | https://paperswithcode.com/paper/geometric-gan |
Repo | https://github.com/WangZesen/Spectral-Normalization-GAN |
Framework | tf |
Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
Title | Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks |
Authors | Michael Gygli |
Abstract | Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results while running at an unprecedented speed of more than 120x real-time. |
Tasks | Boundary Detection, Temporal Action Localization |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08214v1 |
http://arxiv.org/pdf/1705.08214v1.pdf | |
PWC | https://paperswithcode.com/paper/ridiculously-fast-shot-boundary-detection |
Repo | https://github.com/abramjos/Scene-boundary-detection |
Framework | tf |
Conditional Image-Text Embedding Networks
Title | Conditional Image-Text Embedding Networks |
Authors | Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik |
Abstract | This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline. |
Tasks | Phrase Grounding |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08389v4 |
http://arxiv.org/pdf/1711.08389v4.pdf | |
PWC | https://paperswithcode.com/paper/conditional-image-text-embedding-networks |
Repo | https://github.com/BryanPlummer/cite |
Framework | tf |
Cost Adaptation for Robust Decentralized Swarm Behaviour
Title | Cost Adaptation for Robust Decentralized Swarm Behaviour |
Authors | Peter Henderson, Matthew Vertescher, David Meger, Mark Coates |
Abstract | Decentralized receding horizon control (D-RHC) provides a mechanism for coordination in multi-agent settings without a centralized command center. However, combining a set of different goals, costs, and constraints to form an efficient optimization objective for D-RHC can be difficult. To allay this problem, we use a meta-learning process – cost adaptation – which generates the optimization objective for D-RHC to solve based on a set of human-generated priors (cost and constraint functions) and an auxiliary heuristic. We use this adaptive D-RHC method for control of mesh-networked swarm agents. This formulation allows a wide range of tasks to be encoded and can account for network delays, heterogeneous capabilities, and increasingly large swarms through the adaptation mechanism. We leverage the Unity3D game engine to build a simulator capable of introducing artificial networking failures and delays in the swarm. Using the simulator we validate our method on an example coordinated exploration task. We demonstrate that cost adaptation allows for more efficient and safer task completion under varying environment conditions and increasingly large swarm sizes. We release our simulator and code to the community for future work. |
Tasks | Meta-Learning |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07114v2 |
http://arxiv.org/pdf/1709.07114v2.pdf | |
PWC | https://paperswithcode.com/paper/cost-adaptation-for-robust-decentralized |
Repo | https://github.com/Breakend/SocraticSwarm |
Framework | none |
Learning Discrete Representations via Information Maximizing Self-Augmented Training
Title | Learning Discrete Representations via Information Maximizing Self-Augmented Training |
Authors | Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama |
Abstract | Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefore, we need to carefully regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. To this end, we propose a method called Information Maximizing Self-Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose the invariance on discrete representations. More specifically, we encourage the predicted representations of augmented data points to be close to those of the original data points in an end-to-end fashion. At the same time, we maximize the information-theoretic dependency between data and their predicted discrete representations. Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning. |
Tasks | Data Augmentation, Unsupervised Image Classification |
Published | 2017-02-28 |
URL | http://arxiv.org/abs/1702.08720v3 |
http://arxiv.org/pdf/1702.08720v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-discrete-representations-via |
Repo | https://github.com/weihua916/imsat |
Framework | none |
TextureGAN: Controlling Deep Image Synthesis with Texture Patches
Title | TextureGAN: Controlling Deep Image Synthesis with Texture Patches |
Authors | Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, James Hays |
Abstract | In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly. |
Tasks | Image Generation, Texture Synthesis |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.02823v3 |
http://arxiv.org/pdf/1706.02823v3.pdf | |
PWC | https://paperswithcode.com/paper/texturegan-controlling-deep-image-synthesis |
Repo | https://github.com/yuchuanhui/TextureGanPython3 |
Framework | pytorch |
CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression
Title | CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression |
Authors | Marcella Astrid, Seung-Ik Lee |
Abstract | Convolutional Neural Networks (CNNs) has shown a great success in many areas including complex image classification tasks. However, they need a lot of memory and computational cost, which hinders them from running in relatively low-end smart devices such as smart phones. We propose a CNN compression method based on CP-decomposition and Tensor Power Method. We also propose an iterative fine tuning, with which we fine-tune the whole network after decomposing each layer, but before decomposing the next layer. Significant reduction in memory and computation cost is achieved compared to state-of-the-art previous work with no more accuracy loss. |
Tasks | Image Classification |
Published | 2017-01-25 |
URL | http://arxiv.org/abs/1701.07148v1 |
http://arxiv.org/pdf/1701.07148v1.pdf | |
PWC | https://paperswithcode.com/paper/cp-decomposition-with-tensor-power-method-for |
Repo | https://github.com/larry0123du/Decompose-CNN |
Framework | pytorch |
Learning to Represent Programs with Graphs
Title | Learning to Represent Programs with Graphs |
Authors | Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi |
Abstract | Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code’s known syntax. For example, long-range dependencies induced by using the same variable or function in distant locations are often not considered. We propose to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures. In this work, we present how to construct graphs from source code and how to scale Gated Graph Neural Networks training to such large graphs. We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage, and VarMisuse, in which the network learns to reason about selecting the correct variable that should be used at a given program location. Our comparison to methods that use less structured program representations shows the advantages of modeling known structure, and suggests that our models learn to infer meaningful names and to solve the VarMisuse task in many cases. Additionally, our testing showed that VarMisuse identifies a number of bugs in mature open-source projects. |
Tasks | |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00740v3 |
http://arxiv.org/pdf/1711.00740v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-represent-programs-with-graphs |
Repo | https://github.com/kano1021/my-internship |
Framework | none |
MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification
Title | MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification |
Authors | Keigo Kimura, Lu Sun, Mineichi Kudo |
Abstract | Multi-Label Classification toolbox is a MATLAB/OCTAVE library for Multi-Label Classification (MLC). There exists a few Java libraries for MLC, but no MATLAB/OCTAVE library that covers various methods. This toolbox offers an environment for evaluation, comparison and visualization of the MLC results. One attraction of this toolbox is that it enables us to try many combinations of feature space dimension reduction, sample clustering, label space dimension reduction and ensemble, etc. |
Tasks | Dimensionality Reduction, Multi-Label Classification |
Published | 2017-04-09 |
URL | http://arxiv.org/abs/1704.02592v1 |
http://arxiv.org/pdf/1704.02592v1.pdf | |
PWC | https://paperswithcode.com/paper/mlc-toolbox-a-matlaboctave-library-for-multi |
Repo | https://github.com/hinanmu/multi-label-papers |
Framework | none |
A practical guide and software for analysing pairwise comparison experiments
Title | A practical guide and software for analysing pairwise comparison experiments |
Authors | Maria Perez-Ortiz, Rafal K. Mantiuk |
Abstract | Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment. |
Tasks | Image Quality Assessment |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03686v2 |
http://arxiv.org/pdf/1712.03686v2.pdf | |
PWC | https://paperswithcode.com/paper/a-practical-guide-and-software-for-analysing |
Repo | https://github.com/mantiuk/pwcmp |
Framework | none |
Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
Title | Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec |
Authors | Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, Jie Tang |
Abstract | Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network’s normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices’ context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks’ Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning. |
Tasks | Network Embedding, Representation Learning |
Published | 2017-10-09 |
URL | http://arxiv.org/abs/1710.02971v4 |
http://arxiv.org/pdf/1710.02971v4.pdf | |
PWC | https://paperswithcode.com/paper/network-embedding-as-matrix-factorization |
Repo | https://github.com/benedekrozemberczki/karateclub |
Framework | none |
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
Title | Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes |
Authors | Yang Zhang, Philip David, Boqing Gong |
Abstract | During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models’ performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach. |
Tasks | Autonomous Driving, Domain Adaptation, Image-to-Image Translation, Semantic Segmentation, Synthetic-to-Real Translation |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1707.09465v5 |
http://arxiv.org/pdf/1707.09465v5.pdf | |
PWC | https://paperswithcode.com/paper/curriculum-domain-adaptation-for-semantic |
Repo | https://github.com/YangZhang4065/AdaptationSeg |
Framework | tf |
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
Title | Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks |
Authors | Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer |
Abstract | Deep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation airborne collision avoidance system for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods. |
Tasks | |
Published | 2017-02-03 |
URL | http://arxiv.org/abs/1702.01135v2 |
http://arxiv.org/pdf/1702.01135v2.pdf | |
PWC | https://paperswithcode.com/paper/reluplex-an-efficient-smt-solver-for |
Repo | https://github.com/eth-sri/eran |
Framework | tf |
Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates
Title | Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates |
Authors | Guillem Collell, Luc Van Gool, Marie-Francine Moens |
Abstract | Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit spatial relationship (e.g., “on”, “below”, etc.). In contrast with prior work that restricts spatial templates to explicit spatial prepositions (e.g., “glass on table”), here we extend this concept to implicit spatial language, i.e., those relationships (generally actions) for which the spatial arrangement of the objects is only implicitly implied (e.g., “man riding horse”). In contrast with explicit relationships, predicting spatial arrangements from implicit spatial language requires significant common sense spatial understanding. Here, we introduce the task of predicting spatial templates for two objects under a relationship, which can be seen as a spatial question-answering task with a (2D) continuous output (“where is the man w.r.t. a horse when the man is walking the horse?"). We present two simple neural-based models that leverage annotated images and structured text to learn this task. The good performance of these models reveals that spatial locations are to a large extent predictable from implicit spatial language. Crucially, the models attain similar performance in a challenging generalized setting, where the object-relation-object combinations (e.g.,“man walking dog”) have never been seen before. Next, we go one step further by presenting the models with unseen objects (e.g., “dog”). In this scenario, we show that leveraging word embeddings enables the models to output accurate spatial predictions, proving that the models acquire solid common sense spatial knowledge allowing for such generalization. |
Tasks | Common Sense Reasoning, Question Answering, Word Embeddings |
Published | 2017-11-18 |
URL | https://arxiv.org/abs/1711.06821v3 |
https://arxiv.org/pdf/1711.06821v3.pdf | |
PWC | https://paperswithcode.com/paper/acquiring-common-sense-spatial-knowledge |
Repo | https://github.com/gcollell/spatial-commonsense |
Framework | tf |
Improved Training of Wasserstein GANs
Title | Improved Training of Wasserstein GANs |
Authors | Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville |
Abstract | Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1704.00028v3 |
http://arxiv.org/pdf/1704.00028v3.pdf | |
PWC | https://paperswithcode.com/paper/improved-training-of-wasserstein-gans |
Repo | https://github.com/adler-j/bwgan |
Framework | tf |