October 21, 2019

3005 words 15 mins read

Paper Group AWR 59

Attention, Learn to Solve Routing Problems!. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models. SubGram: Extending Skip-gram Word Representation with Substrings. Inverse Problems in Asteroseismology. Scalable Factorized Hier …

Attention, Learn to Solve Routing Problems!


Title	Attention, Learn to Solve Routing Problems!
Authors	Wouter Kool, Herke van Hoof, Max Welling
Abstract	The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. We significantly improve over recent learned heuristics for the Travelling Salesman Problem (TSP), getting close to optimal results for problems up to 100 nodes. With the same hyperparameters, we learn strong heuristics for two variants of the Vehicle Routing Problem (VRP), the Orienteering Problem (OP) and (a stochastic variant of) the Prize Collecting TSP (PCTSP), outperforming a wide range of baselines and getting results close to highly optimized and specialized algorithms.
Tasks	Combinatorial Optimization
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08475v3
PDF	http://arxiv.org/pdf/1803.08475v3.pdf
PWC	https://paperswithcode.com/paper/attention-learn-to-solve-routing-problems
Repo	https://github.com/raphaelavalos/attention_tsp_graph_net
Framework	tf

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment


Title	MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment
Authors	Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis
Abstract	This research strives for natural language moment retrieval in long, untrimmed video streams. The problem is not trivial especially when a video contains multiple moments of interests and the language describes complex temporal dependencies, which often happens in real scenarios. We identify two crucial challenges: semantic misalignment and structural misalignment. However, existing approaches treat different moments separately and do not explicitly model complex moment-wise temporal relations. In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network. MAN naturally assigns candidate moment representations aligned with language semantics over different temporal locations and scales. Most importantly, we propose to explicitly model moment-wise temporal relations as a structured graph and devise an iterative graph adjustment network to jointly learn the best structure in an end-to-end manner. We evaluate the proposed approach on two challenging public benchmarks DiDeMo and Charades-STA, where our MAN significantly outperforms the state-of-the-art by a large margin.
Tasks	Natural Language Moment Retrieval
Published	2018-11-30
URL	https://arxiv.org/abs/1812.00087v2
PDF	https://arxiv.org/pdf/1812.00087v2.pdf
PWC	https://paperswithcode.com/paper/man-moment-alignment-network-for-natural
Repo	https://github.com/dazhang-cv/Project
Framework	none

Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models


Title	Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models
Authors	Satoru Katsumata, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi
Abstract	Encoder-decoder models typically only employ words that are frequently used in the training corpus to reduce the computational costs and exclude noise. However, this vocabulary set may still include words that interfere with learning in encoder-decoder models. This paper proposes a method for selecting more suitable words for learning encoders by utilizing not only frequency, but also co-occurrence information, which we capture using the HITS algorithm. We apply our proposed method to two tasks: machine translation and grammatical error correction. For Japanese-to-English translation, this method achieves a BLEU score that is 0.56 points more than that of a baseline. It also outperforms the baseline method for English grammatical error correction, with an F0.5-measure that is 1.48 points higher.
Tasks	Grammatical Error Correction, Machine Translation
Published	2018-05-28
URL	http://arxiv.org/abs/1805.11189v1
PDF	http://arxiv.org/pdf/1805.11189v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-filtering-of-out-of-vocabulary
Repo	https://github.com/Katsumata420/HITS_Ranking
Framework	none

SubGram: Extending Skip-gram Word Representation with Substrings


Title	SubGram: Extending Skip-gram Word Representation with Substrings
Authors	Tom Kocmi, Ondřej Bojar
Abstract	Skip-gram (word2vec) is a recent method for creating vector representations of words (“distributed word representations”) using a neural network. The representation gained popularity in various areas of natural language processing, because it seems to capture syntactic and semantic information about words without any explicit supervision in this respect. We propose SubGram, a refinement of the Skip-gram model to consider also the word structure during the training process, achieving large gains on the Skip-gram original test set.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06571v1
PDF	http://arxiv.org/pdf/1806.06571v1.pdf
PWC	https://paperswithcode.com/paper/subgram-extending-skip-gram-word
Repo	https://github.com/tomkocmi/SubGram
Framework	none

Inverse Problems in Asteroseismology


Title	Inverse Problems in Asteroseismology
Authors	Earl Patrick Bellinger
Abstract	Asteroseismology allows us to probe the internal structure of stars through their global modes of oscillation. Thanks to missions such as the NASA Kepler space observatory, we now have high-quality asteroseismic data for nearly 100 solar-type stars. In this thesis, new techniques to measure the ages, masses, and radii of stars are presented, as well as a way to infer their internal structure.
Tasks
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06649v1
PDF	http://arxiv.org/pdf/1808.06649v1.pdf
PWC	https://paperswithcode.com/paper/inverse-problems-in-asteroseismology
Repo	https://github.com/earlbellinger/thesis
Framework	none

Scalable Factorized Hierarchical Variational Autoencoder Training


Title	Scalable Factorized Hierarchical Variational Autoencoder Training
Authors	Wei-Ning Hsu, James Glass
Abstract	Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations. Among them, a factorized hierarchical variational autoencoder (FHVAE) is a variational inference-based model that formulates a hierarchical generative process for sequential data. Specifically, an FHVAE model can learn disentangled and interpretable representations, which have been proven useful for numerous speech applications, such as speaker verification, robust speech recognition, and voice conversion. However, as we will elaborate in this paper, the training algorithm proposed in the original paper is not scalable to datasets of thousands of hours, which makes this model less applicable on a larger scale. After identifying limitations in terms of runtime, memory, and hyperparameter optimization, we propose a hierarchical sampling training algorithm to address all three issues. Our proposed method is evaluated comprehensively on a wide variety of datasets, ranging from 3 to 1,000 hours and involving different types of generating factors, such as recording conditions and noise types. In addition, we also present a new visualization method for qualitatively evaluating the performance with respect to the interpretability and disentanglement. Models trained with our proposed algorithm demonstrate the desired characteristics on all the datasets.
Tasks	Hyperparameter Optimization, Robust Speech Recognition, Speaker Verification, Speech Recognition, Voice Conversion
Published	2018-04-09
URL	http://arxiv.org/abs/1804.03201v2
PDF	http://arxiv.org/pdf/1804.03201v2.pdf
PWC	https://paperswithcode.com/paper/scalable-factorized-hierarchical-variational
Repo	https://github.com/wnhsu/ScalableFHVAE
Framework	tf

GPGPU Linear Complexity t-SNE Optimization


Title	GPGPU Linear Complexity t-SNE Optimization
Authors	Nicola Pezzotti, Julian Thijssen, Alexander Mordvintsev, Thomas Hollt, Baldur van Lew, Boudewijn P. F. Lelieveldt, Elmar Eisemann, Anna Vilanova
Abstract	The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional data points at different scales while it requires only minimal tuning of its parameters. Despite these advantages, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of tSNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the tSNE embedding for large datasets. In this work, we present a novel approach to the minimization of the tSNE objective function that heavily relies on modern graphics hardware and has linear computational complexity. Our technique does not only beat the state of the art, but can even be executed on the client side in a browser. We propose to approximate the repulsion forces between data points using adaptive-resolution textures that are drawn at every iteration with WebGL. This approximation allows us to reformulate the tSNE minimization problem as a series of tensor operation that are computed with TensorFlow.js, a JavaScript library for scalable tensor computations.
Tasks
Published	2018-05-28
URL	https://arxiv.org/abs/1805.10817v2
PDF	https://arxiv.org/pdf/1805.10817v2.pdf
PWC	https://paperswithcode.com/paper/linear-tsne-optimization-for-the-web
Repo	https://github.com/tensorflow/tfjs-tsne
Framework	tf

What Do We Understand About Convolutional Networks?


Title	What Do We Understand About Convolutional Networks?
Authors	Isma Hadji, Richard P. Wildes
Abstract	This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations and empirical studies will be reviewed. The ultimate goal is to shed light on the role of each layer of processing involved in a ConvNet architecture, distill what we currently understand about ConvNets and highlight critical open problems.
Tasks
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08834v1
PDF	http://arxiv.org/pdf/1803.08834v1.pdf
PWC	https://paperswithcode.com/paper/what-do-we-understand-about-convolutional
Repo	https://github.com/joshua-ns-jordan/Inspirations
Framework	none

AutoFocus: Efficient Multi-Scale Inference


Title	AutoFocus: Efficient Multi-Scale Inference
Authors	Mahyar Najibi, Bharat Singh, Larry S. Davis
Abstract	This paper describes AutoFocus, an efficient multi-scale inference algorithm for deep-learning based object detectors. Instead of processing an entire image pyramid, AutoFocus adopts a coarse to fine approach and only processes regions which are likely to contain small objects at finer scales. This is achieved by predicting category agnostic segmentation maps for small objects at coarser scales, called FocusPixels. FocusPixels can be predicted with high recall, and in many cases, they only cover a small fraction of the entire image. To make efficient use of FocusPixels, an algorithm is proposed which generates compact rectangular FocusChips which enclose FocusPixels. The detector is only applied inside FocusChips, which reduces computation while processing finer scales. Different types of error can arise when detections from FocusChips of multiple scales are combined, hence techniques to correct them are proposed. AutoFocus obtains an mAP of 47.9% (68.3% at 50% overlap) on the COCO test-dev set while processing 6.4 images per second on a Titan X (Pascal) GPU. This is 2.5X faster than our multi-scale baseline detector and matches its mAP. The number of pixels processed in the pyramid can be reduced by 5X with a 1% drop in mAP. AutoFocus obtains more than 10% mAP gain compared to RetinaNet but runs at the same speed with the same ResNet-101 backbone.
Tasks
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01600v2
PDF	https://arxiv.org/pdf/1812.01600v2.pdf
PWC	https://paperswithcode.com/paper/autofocus-efficient-multi-scale-inference
Repo	https://github.com/MahyarNajibi/SNIPER
Framework	mxnet

Transfer of Deep Reactive Policies for MDP Planning


Title	Transfer of Deep Reactive Policies for MDP Planning
Authors	Aniket Bajpai, Sankalp Garg, Mausam
Abstract	Domain-independent probabilistic planners input an MDP description in a factored representation language such as PPDDL or RDDL, and exploit the specifics of the representation for faster planning. Traditional algorithms operate on each problem instance independently, and good methods for transferring experience from policies of other instances of a domain to a new instance do not exist. Recently, researchers have begun exploring the use of deep reactive policies, trained via deep reinforcement learning (RL), for MDP planning domains. One advantage of deep reactive policies is that they are more amenable to transfer learning. In this paper, we present the first domain-independent transfer algorithm for MDP planning domains expressed in an RDDL representation. Our architecture exploits the symbolic state configuration and transition function of the domain (available via RDDL) to learn a shared embedding space for states and state-action pairs for all problem instances of a domain. We then learn an RL agent in the embedding space, making a near zero-shot transfer possible, i.e., without much training on the new instance, and without using the domain simulator at all. Experiments on three different benchmark domains underscore the value of our transfer algorithm. Compared against planning from scratch, and a state-of-the-art RL transfer algorithm, our transfer solution has significantly superior learning curves.
Tasks	Transfer Learning
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11488v1
PDF	http://arxiv.org/pdf/1810.11488v1.pdf
PWC	https://paperswithcode.com/paper/transfer-of-deep-reactive-policies-for-mdp
Repo	https://github.com/dair-iitd/torpido
Framework	none

Translations as Additional Contexts for Sentence Classification


Title	Translations as Additional Contexts for Sentence Classification
Authors	Reinald Kim Amplayo, Kyungjae Lee, Jinyeong Yeo, Seung-won Hwang
Abstract	In sentence classification tasks, additional contexts, such as the neighboring sentences, may improve the accuracy of the classifier. However, such contexts are domain-dependent and thus cannot be used for another classification task with an inappropriate domain. In contrast, we propose the use of translated sentences as context that is always available regardless of the domain. We find that naive feature expansion of translations gains only marginal improvements and may decrease the performance of the classifier, due to possible inaccurate translations thus producing noisy sentence vectors. To this end, we present multiple context fixing attachment (MCFA), a series of modules attached to multiple sentence vectors to fix the noise in the vectors using the other sentence vectors as context. We show that our method performs competitively compared to previous models, achieving best classification performance on multiple data sets. We are the first to use translations as domain-free contexts for sentence classification.
Tasks	Sentence Classification, Subjectivity Analysis, Text Classification
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05516v1
PDF	http://arxiv.org/pdf/1806.05516v1.pdf
PWC	https://paperswithcode.com/paper/translations-as-additional-contexts-for
Repo	https://github.com/rktamplayo/MCFA
Framework	tf

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator


Title	Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator
Authors	Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf
Abstract	Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backpropagating observed errors. However, neural network architectures are growing increasingly sophisticated and diverse, which motivates an emerging quest for even more general forms of differentiable programming, where arbitrary parameterized computations can be trained by gradient descent. In this paper, we take a fresh look at automatic differentiation (AD) techniques, and especially aim to demystify the reverse-mode form of AD that generalizes backpropagation in neural networks. We uncover a tight connection between reverse-mode AD and delimited continuations, which permits implementing reverse-mode AD purely via operator overloading and without any auxiliary data structures. We further show how this formulation of AD can be fruitfully combined with multi-stage programming (staging), leading to a highly efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch).
Tasks	Machine Translation
Published	2018-03-27
URL	https://arxiv.org/abs/1803.10228v3
PDF	https://arxiv.org/pdf/1803.10228v3.pdf
PWC	https://paperswithcode.com/paper/demystifying-differentiable-programming
Repo	https://github.com/sunze1/Differential-Programming
Framework	tf

Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency


Title	Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency
Authors	Won Ik Cho, Hyeon Seung Lee, Ji Won Yoon, Seok Min Kim, Nam Soo Kim
Abstract	For a large portion of real-life utterances, the intention cannot be solely decided by either their semantic or syntactic characteristics. Although not all the sociolinguistic and pragmatic information can be digitized, at least phonetic features are indispensable in understanding the spoken language. Especially in head-final languages such as Korean, sentence-final prosody has great importance in identifying the speaker’s intention. This paper suggests a system which identifies the inherent intention of a spoken utterance given its transcript, in some cases using auxiliary acoustic features. The main point here is a separate distinction for cases where discrimination of intention requires an acoustic cue. Thus, the proposed classification system decides whether the given utterance is a fragment, statement, question, command, or a rhetorical question/command, utilizing the intonation-dependency coming from the head-finality. Based on an intuitive understanding of the Korean language that is engaged in the data annotation, we construct a network which identifies the intention of a speech, and validate its utility with the test sentences. The system, if combined with up-to-date speech recognizers, is expected to be flexibly inserted into various language understanding modules.
Tasks
Published	2018-11-10
URL	https://arxiv.org/abs/1811.04231v2
PDF	https://arxiv.org/pdf/1811.04231v2.pdf
PWC	https://paperswithcode.com/paper/speech-intention-understanding-in-a-head
Repo	https://github.com/warnikchow/3i4k
Framework	tf

Exploring the Limits of Weakly Supervised Pretraining


Title	Exploring the Limits of Weakly Supervised Pretraining
Authors	Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten
Abstract	State-of-the-art visual perception models for a wide range of tasks rely on supervised pretraining. ImageNet classification is the de facto pretraining task for these models. Yet, ImageNet is now nearly ten years old and is by modern standards “small”. Even so, relatively little is known about the behavior of pretraining with datasets that are multiple orders of magnitude larger. The reasons are obvious: such datasets are difficult to collect and annotate. In this paper, we present a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images. Our experiments demonstrate that training for large-scale hashtag prediction leads to excellent results. We show improvements on several image classification and object detection tasks, and report the highest ImageNet-1k single-crop, top-1 accuracy to date: 85.4% (97.6% top-5). We also perform extensive experiments that provide novel empirical data on the relationship between large-scale pretraining and transfer learning performance.
Tasks	Image Classification, Object Detection, Transfer Learning
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00932v1
PDF	http://arxiv.org/pdf/1805.00932v1.pdf
PWC	https://paperswithcode.com/paper/exploring-the-limits-of-weakly-supervised
Repo	https://github.com/eminorhan/ood-benchmarks
Framework	pytorch

Free-Form Image Inpainting with Gated Convolution


Title	Free-Form Image Inpainting with Gated Convolution
Authors	Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang
Abstract	We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting
Tasks	Feature Selection, Image Inpainting
Published	2018-06-10
URL	https://arxiv.org/abs/1806.03589v2
PDF	https://arxiv.org/pdf/1806.03589v2.pdf
PWC	https://paperswithcode.com/paper/free-form-image-inpainting-with-gated
Repo	https://github.com/ShnitzelKiller/generative_inpainting
Framework	tf