October 20, 2019

3402 words 16 mins read

Paper Group AWR 232

Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance. Image classification and retrieval with random depthwise signed convolutional neural networks. TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes. DORA The Explorer: Directed Outreaching Reinforcemen …

Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance


Title	Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
Authors	Neal Jean, Sang Michael Xie, Stefano Ermon
Abstract	Large amounts of labeled data are typically required to train deep learning models. For many real-world problems, however, acquiring additional data can be expensive or even impossible. We present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework. SSDKL combines the hierarchical representation learning of neural networks with the probabilistic modeling capabilities of Gaussian processes. By leveraging unlabeled data, we show improvements on a diverse set of real-world regression tasks over supervised deep kernel learning and semi-supervised methods such as VAT and mean teacher adapted for regression.
Tasks	Gaussian Processes, Representation Learning
Published	2018-05-26
URL	http://arxiv.org/abs/1805.10407v4
PDF	http://arxiv.org/pdf/1805.10407v4.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-deep-kernel-learning
Repo	https://github.com/ermongroup/ssdkl
Framework	none

Image classification and retrieval with random depthwise signed convolutional neural networks


Title	Image classification and retrieval with random depthwise signed convolutional neural networks
Authors	Yunzhe Xue, Usman Roshan
Abstract	We propose a random convolutional neural network to generate a feature space in which we study image classification and retrieval performance. Put briefly we apply random convolutional blocks followed by global average pooling to generate a new feature, and we repeat this k times to produce a k-dimensional feature space. This can be interpreted as partitioning the space of image patches with random hyperplanes which we formalize as a random depthwise convolutional neural network. In the network’s final layer we perform image classification and retrieval with the linear support vector machine and k-nearest neighbor classifiers and study other empirical properties. We show that the ratio of image pixel distribution similarity across classes to within classes is higher in our network’s final layer compared to the input space. When we apply the linear support vector machine for image classification we see that the accuracy is higher than if we were to train just the final layer of VGG16, ResNet18, and DenseNet40 with random weights. In the same setting we compare it to an unsupervised feature learning method and find our accuracy to be comparable on CIFAR10 but higher on CIFAR100 and STL10. We see that the accuracy is not far behind that of trained networks, particularly in the top-k setting. For example the top-2 accuracy of our network is near 90% on both CIFAR10 and a 10-class mini ImageNet, and 85% on STL10. We find that k-nearest neighbor gives a comparable precision on the Corel Princeton Image Similarity Benchmark than if we were to use the final layer of trained networks. As with other networks we find that our network fails to a black box attack even though we lack a gradient and use the sign activation. We highlight sensitivity of our network to background as a potential pitfall and an advantage. Overall our work pushes the boundary of what can be achieved with random weights.
Tasks	Image Classification
Published	2018-06-15
URL	http://arxiv.org/abs/1806.05789v3
PDF	http://arxiv.org/pdf/1806.05789v3.pdf
PWC	https://paperswithcode.com/paper/image-classification-and-retrieval-with
Repo	https://github.com/xyzacademic/RandomDepthwiseCNN
Framework	pytorch

TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes


Title	TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes
Authors	Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Nießner, Leonidas Guibas
Abstract	We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e.g., color texture maps). The key idea is to utilize a 4-rotational symmetric (4-RoSy) field to define a domain for convolution on a surface. Though 4-RoSy fields have several properties favorable for convolution on surfaces (low distortion, few singularities, consistent parameterization, etc.), orientations are ambiguous up to 4-fold rotation at any sample point. So, we introduce a new convolutional operator invariant to the 4-RoSy ambiguity and use it in a network to extract features from high-resolution signals on geodesic neighborhoods of a surface. In comparison to alternatives, such as PointNet based methods which lack a notion of orientation, the coherent structure given by these neighborhoods results in significantly stronger features. As an example application, we demonstrate the benefits of our architecture for 3D semantic segmentation of textured 3D meshes. The results show that our method outperforms all existing methods on the basis of mean IoU by a significant margin in both geometry-only (6.4%) and RGB+Geometry (6.9-8.2%) settings.
Tasks	3D Semantic Segmentation, Semantic Segmentation
Published	2018-11-30
URL	http://arxiv.org/abs/1812.00020v2
PDF	http://arxiv.org/pdf/1812.00020v2.pdf
PWC	https://paperswithcode.com/paper/texturenet-consistent-local-parametrizations
Repo	https://github.com/hjwdzh/TextureNet
Framework	none

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection


Title	DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
Authors	Leshem Choshen, Lior Fox, Yonatan Loewenstein
Abstract	Exploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based solutions to this shortcoming, a model-free approach is still missing. We propose $E$-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using $E$-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.
Tasks
Published	2018-04-11
URL	http://arxiv.org/abs/1804.04012v1
PDF	http://arxiv.org/pdf/1804.04012v1.pdf
PWC	https://paperswithcode.com/paper/dora-the-explorer-directed-outreaching
Repo	https://github.com/borgr/DORA
Framework	none

Macro action selection with deep reinforcement learning in StarCraft


Title	Macro action selection with deep reinforcement learning in StarCraft
Authors	Sijia Xu, Hongyu Kuang, Zhi Zhuang, Renjie Hu, Yang Liu, Huyang Sun
Abstract	StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also widely accepted as a challenging testbed for AI research because of its enormous state space, partially observed information, multi-agent collaboration, and so on. With the help of annual AIIDE and CIG competitions, a growing number of SC bots are proposed and continuously improved. However, a large gap remains between the top-level bot and the professional human player. One vital reason is that current SC bots mainly rely on predefined rules to select macro actions during their games. These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game. In this paper, we propose a deep reinforcement learning (DRL) framework to improve the selection of macro actions. Our framework is based on the combination of the Ape-X DQN and the Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as LastOrder. Our evaluation, based on training against all bots from the AIIDE 2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning rate, outperforming 26 bots in total 28 entrants.
Tasks	Real-Time Strategy Games, Starcraft
Published	2018-12-02
URL	https://arxiv.org/abs/1812.00336v3
PDF	https://arxiv.org/pdf/1812.00336v3.pdf
PWC	https://paperswithcode.com/paper/macro-action-selection-with-deep
Repo	https://github.com/Bilibili/LastOrder
Framework	tf

Attention on Attention: Architectures for Visual Question Answering (VQA)


Title	Attention on Attention: Architectures for Visual Question Answering (VQA)
Authors	Jasdeep Singh, Vincent Ying, Alex Nutkiewicz
Abstract	Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78%, outperforming the existing state-of-the-art single model’s validation score of 63.15%.
Tasks	Question Answering, Visual Question Answering
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07724v1
PDF	http://arxiv.org/pdf/1803.07724v1.pdf
PWC	https://paperswithcode.com/paper/attention-on-attention-architectures-for
Repo	https://github.com/VincentYing/Attention-on-Attention-for-VQA
Framework	pytorch

A Hybrid Genetic Algorithm for the Traveling Salesman Problem with Drone


Title	A Hybrid Genetic Algorithm for the Traveling Salesman Problem with Drone
Authors	Quang Minh Ha, Yves Deville, Quang Dung Pham, Minh Hoàng Hà
Abstract	This paper addresses the Traveling Salesman Problem with Drone (TSP-D), in which a truck and drone are used to deliver parcels to customers. The objective of this problem is to either minimize the total operational cost (min-cost TSP-D) or minimize the completion time for the truck and drone (min-time TSP-D). This problem has gained a lot of attention in the last few years since it is matched with the recent trends in a new delivery method among logistics companies. To solve the TSP-D, we propose a hybrid genetic search with dynamic population management and adaptive diversity control based on a split algorithm, problem-tailored crossover and local search operators, a new restore method to advance the convergence and an adaptive penalization mechanism to dynamically balance the search between feasible/infeasible solutions. The computational results show that the proposed algorithm outperforms existing methods in terms of solution quality and improves best known solutions found in the literature. Moreover, various analyses on the impacts of crossover choice and heuristic components have been conducted to analysis further their sensitivity to the performance of our method.
Tasks
Published	2018-12-21
URL	http://arxiv.org/abs/1812.09351v1
PDF	http://arxiv.org/pdf/1812.09351v1.pdf
PWC	https://paperswithcode.com/paper/a-hybrid-genetic-algorithm-for-the-traveling
Repo	https://github.com/ovidiuchile/AEA2019
Framework	none

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots


Title	Sim-to-Real: Learning Agile Locomotion For Quadruped Robots
Authors	Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, Vincent Vanhoucke
Abstract	Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.
Tasks
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10332v2
PDF	http://arxiv.org/pdf/1804.10332v2.pdf
PWC	https://paperswithcode.com/paper/sim-to-real-learning-agile-locomotion-for
Repo	https://github.com/taku-y/20181125-pybullet
Framework	tf

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph


Title	Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph
Authors	Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik Sankaranarayanan, Sarath Chandar
Abstract	While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.
Tasks	Knowledge Graphs, Question Answering
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10314v2
PDF	http://arxiv.org/pdf/1801.10314v2.pdf
PWC	https://paperswithcode.com/paper/complex-sequential-question-answering-towards
Repo	https://github.com/mali-git/CSQA_Implementation
Framework	none

Complete the Look: Scene-based Complementary Product Recommendation


Title	Complete the Look: Scene-based Complementary Product Recommendation
Authors	Wang-Cheng Kang, Eric Kim, Jure Leskovec, Charles Rosenberg, Julian McAuley
Abstract	Modeling fashion compatibility is challenging due to its complexity and subjectivity. Existing work focuses on predicting compatibility between product images (e.g. an image containing a t-shirt and an image containing a pair of jeans). However, these approaches ignore real-world ‘scene’ images (e.g. selfies); such images are hard to deal with due to their complexity, clutter, variations in lighting and pose (etc.) but on the other hand could potentially provide key context (e.g. the user’s body type, or the season) for making more accurate recommendations. In this work, we propose a new task called ‘Complete the Look’, which seeks to recommend visually compatible products based on scene images. We design an approach to extract training data for this task, and propose a novel way to learn the scene-product compatibility from fashion or interior design images. Our approach measures compatibility both globally and locally via CNNs and attention mechanisms. Extensive experiments show that our method achieves significant performance gains over alternative systems. Human evaluation and qualitative analysis are also conducted to further understand model behavior. We hope this work could lead to useful applications which link large corpora of real-world scenes with shoppable products.
Tasks	Product Recommendation
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01748v2
PDF	http://arxiv.org/pdf/1812.01748v2.pdf
PWC	https://paperswithcode.com/paper/complete-the-look-scene-based-complementary
Repo	https://github.com/kang205/STL-Dataset
Framework	none

Correlated pseudo-marginal Metropolis-Hastings using quasi-Newton proposals


Title	Correlated pseudo-marginal Metropolis-Hastings using quasi-Newton proposals
Authors	Johan Dahlin, Adrian Wills, Brett Ninness
Abstract	Pseudo-marginal Metropolis-Hastings (pmMH) is a versatile algorithm for sampling from target distributions which are not easy to evaluate point-wise. However, pmMH requires good proposal distributions to sample efficiently from the target, which can be problematic to construct in practice. This is especially a problem for high-dimensional targets when the standard random-walk proposal is inefficient. We extend pmMH to allow for constructing the proposal based on information from multiple past iterations. As a consequence, quasi-Newton (qN) methods can be employed to form proposals which utilize gradient information to guide the Markov chain to areas of high probability and to construct approximations of the local curvature to scale step sizes. The proposed method is demonstrated on several problems which indicate that qN proposals can perform better than other common Hessian-based proposals.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09780v2
PDF	http://arxiv.org/pdf/1806.09780v2.pdf
PWC	https://paperswithcode.com/paper/correlated-pseudo-marginal-metropolis
Repo	https://github.com/compops/pmmh-qn
Framework	none

Hierarchical Generative Modeling for Controllable Speech Synthesis


Title	Hierarchical Generative Modeling for Controllable Speech Synthesis
Authors	Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang
Abstract	This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. The model is formulated as a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables. The first level is a categorical variable, which represents attribute groups (e.g. clean/noisy) and provides interpretability. The second level, conditioned on the first, is a multivariate Gaussian variable, which characterizes specific attribute configurations (e.g. noise level, speaking rate) and enables disentangled fine-grained control over these attributes. This amounts to using a Gaussian mixture model (GMM) for the latent distribution. Extensive evaluation demonstrates its ability to control the aforementioned attributes. In particular, we train a high-quality controllable TTS model on real found data, which is capable of inferring speaker and style attributes from a noisy utterance and use it to synthesize clean speech with controllable speaking style.
Tasks	Speech Synthesis
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07217v2
PDF	http://arxiv.org/pdf/1810.07217v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-generative-modeling-for
Repo	https://github.com/rarefin/TTS_VAE
Framework	pytorch

End-to-End Dense Video Captioning with Masked Transformer


Title	End-to-End Dense Video Captioning with Masked Transformer
Authors	Luowei Zhou, Yingbo Zhou, Jason J. Corso, Richard Socher, Caiming Xiong
Abstract	Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevents direct influence of the language description to the event proposal, which is important for generating accurate descriptions. To address this problem, we propose an end-to-end transformer model for dense video captioning. The encoder encodes the video into appropriate representations. The proposal decoder decodes from the encoding with different anchors to form video event proposals. The captioning decoder employs a masking network to restrict its attention to the proposal event over the encoding feature. This masking network converts the event proposal to a differentiable mask, which ensures the consistency between the proposal and captioning during training. In addition, our model employs a self-attention mechanism, which enables the use of efficient non-recurrent structure during encoding and leads to performance improvements. We demonstrate the effectiveness of this end-to-end model on ActivityNet Captions and YouCookII datasets, where we achieved 10.12 and 6.58 METEOR score, respectively.
Tasks	Dense Video Captioning, Video Captioning
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00819v1
PDF	http://arxiv.org/pdf/1804.00819v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-dense-video-captioning-with-masked
Repo	https://github.com/salesforce/densecap
Framework	pytorch

Semi-supervised Transfer Learning for Image Rain Removal


Title	Semi-supervised Transfer Learning for Image Rain Removal
Authors	Wei Wei, Deyu Meng, Qian Zhao, Zongben Xu, Ying Wu
Abstract	Single image rain removal is a typical inverse problem in computer vision. The deep learning technique has been verified to be effective for this task and achieved state-of-the-art performance. However, previous deep learning methods need to pre-collect a large set of image pairs with/without synthesized rain for training, which tends to make the neural network be biased toward learning the specific patterns of the synthesized rain, while be less able to generalize to real test samples whose rain types differ from those in the training data. To this issue, this paper firstly proposes a semi-supervised learning paradigm toward this task. Different from traditional deep learning methods which only use supervised image pairs with/without synthesized rain, we further put real rainy images, without need of their clean ones, into the network training process. This is realized by elaborately formulating the residual between an input rainy image and its expected network output (clear image without rain) as a specific parametrized rain streaks distribution. The network is therefore trained to adapt real unsupervised diverse rain types through transferring from the supervised synthesized rain, and thus both the short-of-training-sample and bias-to-supervised-sample issues can be evidently alleviated. Experiments on synthetic and real data verify the superiority of our model compared to the state-of-the-arts.
Tasks	Rain Removal, Transfer Learning
Published	2018-07-29
URL	http://arxiv.org/abs/1807.11078v2
PDF	http://arxiv.org/pdf/1807.11078v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-cnn-for-single-image-rain
Repo	https://github.com/wwzjer/Semi-supervised-IRR
Framework	none

Deep Back-Projection Networks For Super-Resolution


Title	Deep Back-Projection Networks For Super-Resolution
Authors	Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita
Abstract	The feed-forward architectures of recently proposed deep super-resolution networks learn representations of low-resolution inputs, and the non-linear mapping from those to high-resolution output. However, this approach does not fully address the mutual dependencies of low- and high-resolution images. We propose Deep Back-Projection Networks (DBPN), that exploit iterative up- and down-sampling layers, providing an error feedback mechanism for projection errors at each stage. We construct mutually-connected up- and down-sampling stages each of which represents different types of image degradation and high-resolution components. We show that extending this idea to allow concatenation of features across up- and down-sampling stages (Dense DBPN) allows us to reconstruct further improve super-resolution, yielding superior results and in particular establishing new state of the art results for large scaling factors such as 8x across multiple data sets.
Tasks	Image Super-Resolution, Super-Resolution, Video Super-Resolution
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02735v1
PDF	http://arxiv.org/pdf/1803.02735v1.pdf
PWC	https://paperswithcode.com/paper/deep-back-projection-networks-for-super
Repo	https://github.com/SimoneDutto/EDSR
Framework	pytorch