Paper Group AWR 92
Visual Attribute Transfer through Deep Image Analogy. Fader Networks: Manipulating Images by Sliding Attributes. ZOOpt: Toolbox for Derivative-Free Optimization. Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks. Molecular De Novo Design through Deep Reinforcement Learning. Towards Faster Training of Global Covariance Poo …
Visual Attribute Transfer through Deep Image Analogy
Title | Visual Attribute Transfer through Deep Image Analogy |
Authors | Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, Sing Bing Kang |
Abstract | We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure. By visual attribute transfer, we mean transfer of visual information (such as color, tone, texture, and style) from one image to another. For example, one image could be that of a painting or a sketch while the other is a photo of a real scene, and both depict the same type of scene. Our technique finds semantically-meaningful dense correspondences between two input images. To accomplish this, it adapts the notion of “image analogy” with features extracted from a Deep Convolutional Neutral Network for matching; we call our technique Deep Image Analogy. A coarse-to-fine strategy is used to compute the nearest-neighbor field for generating the results. We validate the effectiveness of our proposed method in a variety of cases, including style/texture transfer, color/style swap, sketch/painting to photo, and time lapse. |
Tasks | |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.01088v2 |
http://arxiv.org/pdf/1705.01088v2.pdf | |
PWC | https://paperswithcode.com/paper/visual-attribute-transfer-through-deep-image |
Repo | https://github.com/Ben-Louis/Deep-Image-Analogy-PyTorch |
Framework | pytorch |
Fader Networks: Manipulating Images by Sliding Attributes
Title | Fader Networks: Manipulating Images by Sliding Attributes |
Authors | Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc’Aurelio Ranzato |
Abstract | This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images. |
Tasks | |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00409v2 |
http://arxiv.org/pdf/1706.00409v2.pdf | |
PWC | https://paperswithcode.com/paper/fader-networks-manipulating-images-by-sliding |
Repo | https://github.com/facebookresearch/FaderNetworks |
Framework | pytorch |
ZOOpt: Toolbox for Derivative-Free Optimization
Title | ZOOpt: Toolbox for Derivative-Free Optimization |
Authors | Yu-Ren Liu, Yi-Qi Hu, Hong Qian, Yang Yu, Chao Qian |
Abstract | Recent advances of derivative-free optimization allow efficient approximating the global optimal solutions of sophisticated functions, such as functions with many local optima, non-differentiable and non-continuous functions. This article describes the ZOOpt (https://github.com/eyounx/ZOOpt) toolbox that provides efficient derivative-free solvers and are designed easy to use. ZOOpt provides a Python package for single-thread optimization, and a light-weighted distributed version with the help of the Julia language for Python described functions. ZOOpt toolbox particularly focuses on optimization problems in machine learning, addressing high-dimensional, noisy, and large-scale problems. The toolbox is being maintained toward ready-to-use tool in real-world machine learning tasks. |
Tasks | |
Published | 2017-12-31 |
URL | http://arxiv.org/abs/1801.00329v2 |
http://arxiv.org/pdf/1801.00329v2.pdf | |
PWC | https://paperswithcode.com/paper/zoopt-toolbox-for-derivative-free |
Repo | https://github.com/eyounx/ZOOpt |
Framework | none |
Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks
Title | Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks |
Authors | Yuzhen Lu, Fathi M. Salem |
Abstract | The standard LSTM recurrent neural networks while very powerful in long-range dependency sequence applications have highly complex structure and relatively large (adaptive) parameters. In this work, we present empirical comparison between the standard LSTM recurrent neural network architecture and three new parameter-reduced variants obtained by eliminating combinations of the input signal, bias, and hidden unit signals from individual gating signals. The experiments on two sequence datasets show that the three new variants, called simply as LSTM1, LSTM2, and LSTM3, can achieve comparable performance to the standard LSTM model with less (adaptive) parameters. |
Tasks | |
Published | 2017-01-12 |
URL | http://arxiv.org/abs/1701.03441v1 |
http://arxiv.org/pdf/1701.03441v1.pdf | |
PWC | https://paperswithcode.com/paper/simplified-gating-in-long-short-term-memory |
Repo | https://github.com/jingweimo/Modified-LSTM |
Framework | none |
Molecular De Novo Design through Deep Reinforcement Learning
Title | Molecular De Novo Design through Deep Reinforcement Learning |
Authors | Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen |
Abstract | This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. |
Tasks | Activity Prediction |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07555v2 |
http://arxiv.org/pdf/1704.07555v2.pdf | |
PWC | https://paperswithcode.com/paper/molecular-de-novo-design-through-deep |
Repo | https://github.com/MarcusOlivecrona/REINVENT |
Framework | pytorch |
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Title | Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization |
Authors | Peihua Li, Jiangtao Xie, Qilong Wang, Zilin Gao |
Abstract | Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COV |
Tasks | Fine-Grained Image Classification, Fine-Grained Image Recognition, Image Classification |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.01034v2 |
http://arxiv.org/pdf/1712.01034v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-faster-training-of-global-covariance |
Repo | https://github.com/jiangtaoxie/fast-MPN-COV |
Framework | pytorch |
Forward Thinking: Building and Training Neural Networks One Layer at a Time
Title | Forward Thinking: Building and Training Neural Networks One Layer at a Time |
Authors | Chris Hettinger, Tanner Christensen, Ben Ehlert, Jeffrey Humpherys, Tyler Jarvis, Sean Wade |
Abstract | We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they are trained, the input data are mapped forward through the layer to create a new learning problem. The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance. We call this forward thinking and demonstrate a proof of concept by achieving state-of-the-art accuracy on the MNIST dataset for convolutional neural networks. We also provide a general mathematical formulation of forward thinking that allows for other types of deep learning problems to be considered. |
Tasks | |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02480v1 |
http://arxiv.org/pdf/1706.02480v1.pdf | |
PWC | https://paperswithcode.com/paper/forward-thinking-building-and-training-neural |
Repo | https://github.com/tkchris93/ForwardThinking |
Framework | tf |
The Case for Learned Index Structures
Title | The Case for Learned Index Structures |
Authors | Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis |
Abstract | Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible. |
Tasks | |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.01208v3 |
http://arxiv.org/pdf/1712.01208v3.pdf | |
PWC | https://paperswithcode.com/paper/the-case-for-learned-index-structures |
Repo | https://github.com/stoianmihail/XY-sorting |
Framework | none |
A Domain Based Approach to Social Relation Recognition
Title | A Domain Based Approach to Social Relation Recognition |
Authors | Qianru Sun, Bernt Schiele, Mario Fritz |
Abstract | Social relations are the foundation of human daily life. Developing techniques to analyze such relations from visual data bears great potential to build machines that better understand us and are capable of interacting with us at a social level. Previous investigations have remained partial due to the overwhelming diversity and complexity of the topic and consequently have only focused on a handful of social relations. In this paper, we argue that the domain-based theory from social psychology is a great starting point to systematically approach this problem. The theory provides coverage of all aspects of social relations and equally is concrete and predictive about the visual attributes and behaviors defining the relations included in each domain. We provide the first dataset built on this holistic conceptualization of social life that is composed of a hierarchical label space of social domains and social relations. We also contribute the first models to recognize such domains and relations and find superior performance for attribute based features. Beyond the encouraging performance of the attribute based approach, we also find interpretable features that are in accordance with the predictions from social psychology literature. Beyond our findings, we believe that our contributions more tightly interleave visual recognition and social psychology theory that has the potential to complement the theoretical work in the area with empirical and data-driven models of social life. |
Tasks | |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.06456v1 |
http://arxiv.org/pdf/1704.06456v1.pdf | |
PWC | https://paperswithcode.com/paper/a-domain-based-approach-to-social-relation |
Repo | https://github.com/HCPLab-SYSU/SR |
Framework | pytorch |
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Title | A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets |
Authors | Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter |
Abstract | The original ImageNet dataset is a popular large-scale benchmark for training Deep Neural Networks. Since the cost of performing experiments (e.g, algorithm design, architecture search, and hyperparameter tuning) on the original dataset might be prohibitive, we propose to consider a downsampled version of ImageNet. In contrast to the CIFAR datasets and earlier downsampled versions of ImageNet, our proposed ImageNet32$\times$32 (and its variants ImageNet64$\times$64 and ImageNet16$\times$16) contains exactly the same number of classes and images as ImageNet, with the only difference that the images are downsampled to 32$\times$32 pixels per image (64$\times$64 and 16$\times$16 pixels for the variants, respectively). Experiments on these downsampled variants are dramatically faster than on the original ImageNet and the characteristics of the downsampled datasets with respect to optimal hyperparameters appear to remain similar. The proposed datasets and scripts to reproduce our results are available at http://image-net.org/download-images and https://github.com/PatrykChrabaszcz/Imagenet32_Scripts |
Tasks | Neural Architecture Search |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08819v3 |
http://arxiv.org/pdf/1707.08819v3.pdf | |
PWC | https://paperswithcode.com/paper/a-downsampled-variant-of-imagenet-as-an |
Repo | https://github.com/PatrykChrabaszcz/Imagenet32_Scripts |
Framework | none |
Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags
Title | Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags |
Authors | Han Bao, Tomoya Sakai, Issei Sato, Masashi Sugiyama |
Abstract | Multiple instance learning (MIL) is a variation of traditional supervised learning problems where data (referred to as bags) are composed of sub-elements (referred to as instances) and only bag labels are available. MIL has a variety of applications such as content-based image retrieval, text categorization and medical diagnosis. Most of the previous work for MIL assume that the training bags are fully labeled. However, it is often difficult to obtain an enough number of labeled bags in practical situations, while many unlabeled bags are available. A learning framework called PU learning (positive and unlabeled learning) can address this problem. In this paper, we propose a convex PU learning method to solve an MIL problem. We experimentally show that the proposed method achieves better performance with significantly lower computational costs than an existing method for PU-MIL. |
Tasks | Content-Based Image Retrieval, Image Retrieval, Multiple Instance Learning, Text Categorization |
Published | 2017-04-22 |
URL | http://arxiv.org/abs/1704.06767v3 |
http://arxiv.org/pdf/1704.06767v3.pdf | |
PWC | https://paperswithcode.com/paper/convex-formulation-of-multiple-instance |
Repo | https://github.com/levelfour/pumil |
Framework | none |
Dual-Glance Model for Deciphering Social Relationships
Title | Dual-Glance Model for Deciphering Social Relationships |
Authors | Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli |
Abstract | Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life. In the computer vision literature, much progress has been made in scene understanding, such as object detection and scene parsing. Recent research focuses on the relationship between objects based on its functionality and geometrical relations. In this work, we aim to study the problem of social relationship recognition, in still images. We have proposed a dual-glance model for social relationship recognition, where the first glance fixates at the individual pair of interest and the second glance deploys attention mechanism to explore contextual cues. We have also collected a new large scale People in Social Context (PISC) dataset, which comprises of 22,670 images and 76,568 annotated samples from 9 types of social relationship. We provide benchmark results on the PISC dataset, and qualitatively demonstrate the efficacy of the proposed model. |
Tasks | Object Detection, Scene Parsing, Scene Understanding, Visual Social Relationship Recognition |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00634v1 |
http://arxiv.org/pdf/1708.00634v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-glance-model-for-deciphering-social |
Repo | https://github.com/HCPLab-SYSU/SR |
Framework | pytorch |
Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis
Title | Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis |
Authors | Pooyan Jamshidi, Norbert Siegmund, Miguel Velez, Christian Kästner, Akshay Patel, Yuvraj Agarwal |
Abstract | Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space. |
Tasks | Transfer Learning |
Published | 2017-09-07 |
URL | http://arxiv.org/abs/1709.02280v1 |
http://arxiv.org/pdf/1709.02280v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-performance-modeling-of-1 |
Repo | https://github.com/pooyanjamshidi/ase17 |
Framework | none |
QMDP-Net: Deep Learning for Planning under Partial Observability
Title | QMDP-Net: Deep Learning for Planning under Partial Observability |
Authors | Peter Karkus, David Hsu, Wee Sun Lee |
Abstract | This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set and “transfer” to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning. |
Tasks | |
Published | 2017-03-20 |
URL | http://arxiv.org/abs/1703.06692v3 |
http://arxiv.org/pdf/1703.06692v3.pdf | |
PWC | https://paperswithcode.com/paper/qmdp-net-deep-learning-for-planning-under |
Repo | https://github.com/AdaCompNUS/qmdp-net |
Framework | tf |
Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens
Title | Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens |
Authors | Philip J. Corr, Guenole C. Silvestre, Chris J. Bleakley |
Abstract | This paper presents an evaluation of deep neural networks for recognition of digits entered by users on a smartphone touchscreen. A new large dataset of Arabic numerals was collected for training and evaluation of the network. The dataset consists of spatial and temporal touch data recorded for 80 digits entered by 260 users. Two neural network models were investigated. The first model was a 2D convolutional neural (ConvNet) network applied to bitmaps of the glpyhs created by interpolation of the sensed screen touches and its topology is similar to that of previously published models for offline handwriting recognition from scanned images. The second model used a 1D ConvNet architecture but was applied to the sequence of polar vectors connecting the touch points. The models were found to provide accuracies of 98.50% and 95.86%, respectively. The second model was much simpler, providing a reduction in the number of parameters from 1,663,370 to 287,690. The dataset has been made available to the community as an open source resource. |
Tasks | Gesture Recognition |
Published | 2017-09-20 |
URL | http://arxiv.org/abs/1709.06871v1 |
http://arxiv.org/pdf/1709.06871v1.pdf | |
PWC | https://paperswithcode.com/paper/open-source-dataset-and-deep-learning-models |
Repo | https://github.com/PhilipCorr/numeral-gesture-dataset |
Framework | none |