July 29, 2019

3131 words 15 mins read

Paper Group AWR 92

Visual Attribute Transfer through Deep Image Analogy. Fader Networks: Manipulating Images by Sliding Attributes. ZOOpt: Toolbox for Derivative-Free Optimization. Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks. Molecular De Novo Design through Deep Reinforcement Learning. Towards Faster Training of Global Covariance Poo …

Visual Attribute Transfer through Deep Image Analogy


Title	Visual Attribute Transfer through Deep Image Analogy
Authors	Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, Sing Bing Kang
Abstract	We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure. By visual attribute transfer, we mean transfer of visual information (such as color, tone, texture, and style) from one image to another. For example, one image could be that of a painting or a sketch while the other is a photo of a real scene, and both depict the same type of scene. Our technique finds semantically-meaningful dense correspondences between two input images. To accomplish this, it adapts the notion of “image analogy” with features extracted from a Deep Convolutional Neutral Network for matching; we call our technique Deep Image Analogy. A coarse-to-fine strategy is used to compute the nearest-neighbor field for generating the results. We validate the effectiveness of our proposed method in a variety of cases, including style/texture transfer, color/style swap, sketch/painting to photo, and time lapse.
Tasks
Published	2017-05-02
URL	http://arxiv.org/abs/1705.01088v2
PDF	http://arxiv.org/pdf/1705.01088v2.pdf
PWC	https://paperswithcode.com/paper/visual-attribute-transfer-through-deep-image
Repo	https://github.com/Ben-Louis/Deep-Image-Analogy-PyTorch
Framework	pytorch

Fader Networks: Manipulating Images by Sliding Attributes


Title	Fader Networks: Manipulating Images by Sliding Attributes
Authors	Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc’Aurelio Ranzato
Abstract	This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images.
Tasks
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00409v2
PDF	http://arxiv.org/pdf/1706.00409v2.pdf
PWC	https://paperswithcode.com/paper/fader-networks-manipulating-images-by-sliding
Repo	https://github.com/facebookresearch/FaderNetworks
Framework	pytorch

ZOOpt: Toolbox for Derivative-Free Optimization


Title	ZOOpt: Toolbox for Derivative-Free Optimization
Authors	Yu-Ren Liu, Yi-Qi Hu, Hong Qian, Yang Yu, Chao Qian
Abstract	Recent advances of derivative-free optimization allow efficient approximating the global optimal solutions of sophisticated functions, such as functions with many local optima, non-differentiable and non-continuous functions. This article describes the ZOOpt (https://github.com/eyounx/ZOOpt) toolbox that provides efficient derivative-free solvers and are designed easy to use. ZOOpt provides a Python package for single-thread optimization, and a light-weighted distributed version with the help of the Julia language for Python described functions. ZOOpt toolbox particularly focuses on optimization problems in machine learning, addressing high-dimensional, noisy, and large-scale problems. The toolbox is being maintained toward ready-to-use tool in real-world machine learning tasks.
Tasks
Published	2017-12-31
URL	http://arxiv.org/abs/1801.00329v2
PDF	http://arxiv.org/pdf/1801.00329v2.pdf
PWC	https://paperswithcode.com/paper/zoopt-toolbox-for-derivative-free
Repo	https://github.com/eyounx/ZOOpt
Framework	none

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks


Title	Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks
Authors	Yuzhen Lu, Fathi M. Salem
Abstract	The standard LSTM recurrent neural networks while very powerful in long-range dependency sequence applications have highly complex structure and relatively large (adaptive) parameters. In this work, we present empirical comparison between the standard LSTM recurrent neural network architecture and three new parameter-reduced variants obtained by eliminating combinations of the input signal, bias, and hidden unit signals from individual gating signals. The experiments on two sequence datasets show that the three new variants, called simply as LSTM1, LSTM2, and LSTM3, can achieve comparable performance to the standard LSTM model with less (adaptive) parameters.
Tasks
Published	2017-01-12
URL	http://arxiv.org/abs/1701.03441v1
PDF	http://arxiv.org/pdf/1701.03441v1.pdf
PWC	https://paperswithcode.com/paper/simplified-gating-in-long-short-term-memory
Repo	https://github.com/jingweimo/Modified-LSTM
Framework	none

Molecular De Novo Design through Deep Reinforcement Learning


Title	Molecular De Novo Design through Deep Reinforcement Learning
Authors	Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen
Abstract	This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.
Tasks	Activity Prediction
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07555v2
PDF	http://arxiv.org/pdf/1704.07555v2.pdf
PWC	https://paperswithcode.com/paper/molecular-de-novo-design-through-deep
Repo	https://github.com/MarcusOlivecrona/REINVENT
Framework	pytorch

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization


Title	Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Authors	Peihua Li, Jiangtao Xie, Qilong Wang, Zilin Gao
Abstract	Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COV
Tasks	Fine-Grained Image Classification, Fine-Grained Image Recognition, Image Classification
Published	2017-12-04
URL	http://arxiv.org/abs/1712.01034v2
PDF	http://arxiv.org/pdf/1712.01034v2.pdf
PWC	https://paperswithcode.com/paper/towards-faster-training-of-global-covariance
Repo	https://github.com/jiangtaoxie/fast-MPN-COV
Framework	pytorch

Forward Thinking: Building and Training Neural Networks One Layer at a Time


Title	Forward Thinking: Building and Training Neural Networks One Layer at a Time
Authors	Chris Hettinger, Tanner Christensen, Ben Ehlert, Jeffrey Humpherys, Tyler Jarvis, Sean Wade
Abstract	We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they are trained, the input data are mapped forward through the layer to create a new learning problem. The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance. We call this forward thinking and demonstrate a proof of concept by achieving state-of-the-art accuracy on the MNIST dataset for convolutional neural networks. We also provide a general mathematical formulation of forward thinking that allows for other types of deep learning problems to be considered.
Tasks
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02480v1
PDF	http://arxiv.org/pdf/1706.02480v1.pdf
PWC	https://paperswithcode.com/paper/forward-thinking-building-and-training-neural
Repo	https://github.com/tkchris93/ForwardThinking
Framework	tf

The Case for Learned Index Structures


Title	The Case for Learned Index Structures
Authors	Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis
Abstract	Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible.
Tasks
Published	2017-12-04
URL	http://arxiv.org/abs/1712.01208v3
PDF	http://arxiv.org/pdf/1712.01208v3.pdf
PWC	https://paperswithcode.com/paper/the-case-for-learned-index-structures
Repo	https://github.com/stoianmihail/XY-sorting
Framework	none


Title	A Domain Based Approach to Social Relation Recognition
Authors	Qianru Sun, Bernt Schiele, Mario Fritz
Abstract	Social relations are the foundation of human daily life. Developing techniques to analyze such relations from visual data bears great potential to build machines that better understand us and are capable of interacting with us at a social level. Previous investigations have remained partial due to the overwhelming diversity and complexity of the topic and consequently have only focused on a handful of social relations. In this paper, we argue that the domain-based theory from social psychology is a great starting point to systematically approach this problem. The theory provides coverage of all aspects of social relations and equally is concrete and predictive about the visual attributes and behaviors defining the relations included in each domain. We provide the first dataset built on this holistic conceptualization of social life that is composed of a hierarchical label space of social domains and social relations. We also contribute the first models to recognize such domains and relations and find superior performance for attribute based features. Beyond the encouraging performance of the attribute based approach, we also find interpretable features that are in accordance with the predictions from social psychology literature. Beyond our findings, we believe that our contributions more tightly interleave visual recognition and social psychology theory that has the potential to complement the theoretical work in the area with empirical and data-driven models of social life.
Tasks
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06456v1
PDF	http://arxiv.org/pdf/1704.06456v1.pdf
PWC	https://paperswithcode.com/paper/a-domain-based-approach-to-social-relation
Repo	https://github.com/HCPLab-SYSU/SR
Framework	pytorch

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets


Title	A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Authors	Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter
Abstract	The original ImageNet dataset is a popular large-scale benchmark for training Deep Neural Networks. Since the cost of performing experiments (e.g, algorithm design, architecture search, and hyperparameter tuning) on the original dataset might be prohibitive, we propose to consider a downsampled version of ImageNet. In contrast to the CIFAR datasets and earlier downsampled versions of ImageNet, our proposed ImageNet32$\times$32 (and its variants ImageNet64$\times$64 and ImageNet16$\times$16) contains exactly the same number of classes and images as ImageNet, with the only difference that the images are downsampled to 32$\times$32 pixels per image (64$\times$64 and 16$\times$16 pixels for the variants, respectively). Experiments on these downsampled variants are dramatically faster than on the original ImageNet and the characteristics of the downsampled datasets with respect to optimal hyperparameters appear to remain similar. The proposed datasets and scripts to reproduce our results are available at http://image-net.org/download-images and https://github.com/PatrykChrabaszcz/Imagenet32_Scripts
Tasks	Neural Architecture Search
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08819v3
PDF	http://arxiv.org/pdf/1707.08819v3.pdf
PWC	https://paperswithcode.com/paper/a-downsampled-variant-of-imagenet-as-an
Repo	https://github.com/PatrykChrabaszcz/Imagenet32_Scripts
Framework	none

Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags


Title	Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags
Authors	Han Bao, Tomoya Sakai, Issei Sato, Masashi Sugiyama
Abstract	Multiple instance learning (MIL) is a variation of traditional supervised learning problems where data (referred to as bags) are composed of sub-elements (referred to as instances) and only bag labels are available. MIL has a variety of applications such as content-based image retrieval, text categorization and medical diagnosis. Most of the previous work for MIL assume that the training bags are fully labeled. However, it is often difficult to obtain an enough number of labeled bags in practical situations, while many unlabeled bags are available. A learning framework called PU learning (positive and unlabeled learning) can address this problem. In this paper, we propose a convex PU learning method to solve an MIL problem. We experimentally show that the proposed method achieves better performance with significantly lower computational costs than an existing method for PU-MIL.
Tasks	Content-Based Image Retrieval, Image Retrieval, Multiple Instance Learning, Text Categorization
Published	2017-04-22
URL	http://arxiv.org/abs/1704.06767v3
PDF	http://arxiv.org/pdf/1704.06767v3.pdf
PWC	https://paperswithcode.com/paper/convex-formulation-of-multiple-instance
Repo	https://github.com/levelfour/pumil
Framework	none


Title	Dual-Glance Model for Deciphering Social Relationships
Authors	Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
Abstract	Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life. In the computer vision literature, much progress has been made in scene understanding, such as object detection and scene parsing. Recent research focuses on the relationship between objects based on its functionality and geometrical relations. In this work, we aim to study the problem of social relationship recognition, in still images. We have proposed a dual-glance model for social relationship recognition, where the first glance fixates at the individual pair of interest and the second glance deploys attention mechanism to explore contextual cues. We have also collected a new large scale People in Social Context (PISC) dataset, which comprises of 22,670 images and 76,568 annotated samples from 9 types of social relationship. We provide benchmark results on the PISC dataset, and qualitatively demonstrate the efficacy of the proposed model.
Tasks	Object Detection, Scene Parsing, Scene Understanding, Visual Social Relationship Recognition
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00634v1
PDF	http://arxiv.org/pdf/1708.00634v1.pdf
PWC	https://paperswithcode.com/paper/dual-glance-model-for-deciphering-social
Repo	https://github.com/HCPLab-SYSU/SR
Framework	pytorch

Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis


Title	Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis
Authors	Pooyan Jamshidi, Norbert Siegmund, Miguel Velez, Christian Kästner, Akshay Patel, Yuvraj Agarwal
Abstract	Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space.
Tasks	Transfer Learning
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02280v1
PDF	http://arxiv.org/pdf/1709.02280v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-performance-modeling-of-1
Repo	https://github.com/pooyanjamshidi/ase17
Framework	none

QMDP-Net: Deep Learning for Planning under Partial Observability


Title	QMDP-Net: Deep Learning for Planning under Partial Observability
Authors	Peter Karkus, David Hsu, Wee Sun Lee
Abstract	This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set and “transfer” to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.
Tasks
Published	2017-03-20
URL	http://arxiv.org/abs/1703.06692v3
PDF	http://arxiv.org/pdf/1703.06692v3.pdf
PWC	https://paperswithcode.com/paper/qmdp-net-deep-learning-for-planning-under
Repo	https://github.com/AdaCompNUS/qmdp-net
Framework	tf

Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens


Title	Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens
Authors	Philip J. Corr, Guenole C. Silvestre, Chris J. Bleakley
Abstract	This paper presents an evaluation of deep neural networks for recognition of digits entered by users on a smartphone touchscreen. A new large dataset of Arabic numerals was collected for training and evaluation of the network. The dataset consists of spatial and temporal touch data recorded for 80 digits entered by 260 users. Two neural network models were investigated. The first model was a 2D convolutional neural (ConvNet) network applied to bitmaps of the glpyhs created by interpolation of the sensed screen touches and its topology is similar to that of previously published models for offline handwriting recognition from scanned images. The second model used a 1D ConvNet architecture but was applied to the sequence of polar vectors connecting the touch points. The models were found to provide accuracies of 98.50% and 95.86%, respectively. The second model was much simpler, providing a reduction in the number of parameters from 1,663,370 to 287,690. The dataset has been made available to the community as an open source resource.
Tasks	Gesture Recognition
Published	2017-09-20
URL	http://arxiv.org/abs/1709.06871v1
PDF	http://arxiv.org/pdf/1709.06871v1.pdf
PWC	https://paperswithcode.com/paper/open-source-dataset-and-deep-learning-models
Repo	https://github.com/PhilipCorr/numeral-gesture-dataset
Framework	none