Paper Group AWR 59
Attention, Learn to Solve Routing Problems!. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models. SubGram: Extending Skip-gram Word Representation with Substrings. Inverse Problems in Asteroseismology. Scalable Factorized Hier …
Attention, Learn to Solve Routing Problems!
Title | Attention, Learn to Solve Routing Problems! |
Authors | Wouter Kool, Herke van Hoof, Max Welling |
Abstract | The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. We significantly improve over recent learned heuristics for the Travelling Salesman Problem (TSP), getting close to optimal results for problems up to 100 nodes. With the same hyperparameters, we learn strong heuristics for two variants of the Vehicle Routing Problem (VRP), the Orienteering Problem (OP) and (a stochastic variant of) the Prize Collecting TSP (PCTSP), outperforming a wide range of baselines and getting results close to highly optimized and specialized algorithms. |
Tasks | Combinatorial Optimization |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08475v3 |
http://arxiv.org/pdf/1803.08475v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-learn-to-solve-routing-problems |
Repo | https://github.com/raphaelavalos/attention_tsp_graph_net |
Framework | tf |
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment
Title | MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment |
Authors | Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis |
Abstract | This research strives for natural language moment retrieval in long, untrimmed video streams. The problem is not trivial especially when a video contains multiple moments of interests and the language describes complex temporal dependencies, which often happens in real scenarios. We identify two crucial challenges: semantic misalignment and structural misalignment. However, existing approaches treat different moments separately and do not explicitly model complex moment-wise temporal relations. In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network. MAN naturally assigns candidate moment representations aligned with language semantics over different temporal locations and scales. Most importantly, we propose to explicitly model moment-wise temporal relations as a structured graph and devise an iterative graph adjustment network to jointly learn the best structure in an end-to-end manner. We evaluate the proposed approach on two challenging public benchmarks DiDeMo and Charades-STA, where our MAN significantly outperforms the state-of-the-art by a large margin. |
Tasks | Natural Language Moment Retrieval |
Published | 2018-11-30 |
URL | https://arxiv.org/abs/1812.00087v2 |
https://arxiv.org/pdf/1812.00087v2.pdf | |
PWC | https://paperswithcode.com/paper/man-moment-alignment-network-for-natural |
Repo | https://github.com/dazhang-cv/Project |
Framework | none |
Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models
Title | Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models |
Authors | Satoru Katsumata, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi |
Abstract | Encoder-decoder models typically only employ words that are frequently used in the training corpus to reduce the computational costs and exclude noise. However, this vocabulary set may still include words that interfere with learning in encoder-decoder models. This paper proposes a method for selecting more suitable words for learning encoders by utilizing not only frequency, but also co-occurrence information, which we capture using the HITS algorithm. We apply our proposed method to two tasks: machine translation and grammatical error correction. For Japanese-to-English translation, this method achieves a BLEU score that is 0.56 points more than that of a baseline. It also outperforms the baseline method for English grammatical error correction, with an F0.5-measure that is 1.48 points higher. |
Tasks | Grammatical Error Correction, Machine Translation |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.11189v1 |
http://arxiv.org/pdf/1805.11189v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-filtering-of-out-of-vocabulary |
Repo | https://github.com/Katsumata420/HITS_Ranking |
Framework | none |
SubGram: Extending Skip-gram Word Representation with Substrings
Title | SubGram: Extending Skip-gram Word Representation with Substrings |
Authors | Tom Kocmi, Ondřej Bojar |
Abstract | Skip-gram (word2vec) is a recent method for creating vector representations of words (“distributed word representations”) using a neural network. The representation gained popularity in various areas of natural language processing, because it seems to capture syntactic and semantic information about words without any explicit supervision in this respect. We propose SubGram, a refinement of the Skip-gram model to consider also the word structure during the training process, achieving large gains on the Skip-gram original test set. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06571v1 |
http://arxiv.org/pdf/1806.06571v1.pdf | |
PWC | https://paperswithcode.com/paper/subgram-extending-skip-gram-word |
Repo | https://github.com/tomkocmi/SubGram |
Framework | none |
Inverse Problems in Asteroseismology
Title | Inverse Problems in Asteroseismology |
Authors | Earl Patrick Bellinger |
Abstract | Asteroseismology allows us to probe the internal structure of stars through their global modes of oscillation. Thanks to missions such as the NASA Kepler space observatory, we now have high-quality asteroseismic data for nearly 100 solar-type stars. In this thesis, new techniques to measure the ages, masses, and radii of stars are presented, as well as a way to infer their internal structure. |
Tasks | |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06649v1 |
http://arxiv.org/pdf/1808.06649v1.pdf | |
PWC | https://paperswithcode.com/paper/inverse-problems-in-asteroseismology |
Repo | https://github.com/earlbellinger/thesis |
Framework | none |
Scalable Factorized Hierarchical Variational Autoencoder Training
Title | Scalable Factorized Hierarchical Variational Autoencoder Training |
Authors | Wei-Ning Hsu, James Glass |
Abstract | Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations. Among them, a factorized hierarchical variational autoencoder (FHVAE) is a variational inference-based model that formulates a hierarchical generative process for sequential data. Specifically, an FHVAE model can learn disentangled and interpretable representations, which have been proven useful for numerous speech applications, such as speaker verification, robust speech recognition, and voice conversion. However, as we will elaborate in this paper, the training algorithm proposed in the original paper is not scalable to datasets of thousands of hours, which makes this model less applicable on a larger scale. After identifying limitations in terms of runtime, memory, and hyperparameter optimization, we propose a hierarchical sampling training algorithm to address all three issues. Our proposed method is evaluated comprehensively on a wide variety of datasets, ranging from 3 to 1,000 hours and involving different types of generating factors, such as recording conditions and noise types. In addition, we also present a new visualization method for qualitatively evaluating the performance with respect to the interpretability and disentanglement. Models trained with our proposed algorithm demonstrate the desired characteristics on all the datasets. |
Tasks | Hyperparameter Optimization, Robust Speech Recognition, Speaker Verification, Speech Recognition, Voice Conversion |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03201v2 |
http://arxiv.org/pdf/1804.03201v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-factorized-hierarchical-variational |
Repo | https://github.com/wnhsu/ScalableFHVAE |
Framework | tf |
GPGPU Linear Complexity t-SNE Optimization
Title | GPGPU Linear Complexity t-SNE Optimization |
Authors | Nicola Pezzotti, Julian Thijssen, Alexander Mordvintsev, Thomas Hollt, Baldur van Lew, Boudewijn P. F. Lelieveldt, Elmar Eisemann, Anna Vilanova |
Abstract | The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional data points at different scales while it requires only minimal tuning of its parameters. Despite these advantages, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of tSNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the tSNE embedding for large datasets. In this work, we present a novel approach to the minimization of the tSNE objective function that heavily relies on modern graphics hardware and has linear computational complexity. Our technique does not only beat the state of the art, but can even be executed on the client side in a browser. We propose to approximate the repulsion forces between data points using adaptive-resolution textures that are drawn at every iteration with WebGL. This approximation allows us to reformulate the tSNE minimization problem as a series of tensor operation that are computed with TensorFlow.js, a JavaScript library for scalable tensor computations. |
Tasks | |
Published | 2018-05-28 |
URL | https://arxiv.org/abs/1805.10817v2 |
https://arxiv.org/pdf/1805.10817v2.pdf | |
PWC | https://paperswithcode.com/paper/linear-tsne-optimization-for-the-web |
Repo | https://github.com/tensorflow/tfjs-tsne |
Framework | tf |
What Do We Understand About Convolutional Networks?
Title | What Do We Understand About Convolutional Networks? |
Authors | Isma Hadji, Richard P. Wildes |
Abstract | This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations and empirical studies will be reviewed. The ultimate goal is to shed light on the role of each layer of processing involved in a ConvNet architecture, distill what we currently understand about ConvNets and highlight critical open problems. |
Tasks | |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08834v1 |
http://arxiv.org/pdf/1803.08834v1.pdf | |
PWC | https://paperswithcode.com/paper/what-do-we-understand-about-convolutional |
Repo | https://github.com/joshua-ns-jordan/Inspirations |
Framework | none |
AutoFocus: Efficient Multi-Scale Inference
Title | AutoFocus: Efficient Multi-Scale Inference |
Authors | Mahyar Najibi, Bharat Singh, Larry S. Davis |
Abstract | This paper describes AutoFocus, an efficient multi-scale inference algorithm for deep-learning based object detectors. Instead of processing an entire image pyramid, AutoFocus adopts a coarse to fine approach and only processes regions which are likely to contain small objects at finer scales. This is achieved by predicting category agnostic segmentation maps for small objects at coarser scales, called FocusPixels. FocusPixels can be predicted with high recall, and in many cases, they only cover a small fraction of the entire image. To make efficient use of FocusPixels, an algorithm is proposed which generates compact rectangular FocusChips which enclose FocusPixels. The detector is only applied inside FocusChips, which reduces computation while processing finer scales. Different types of error can arise when detections from FocusChips of multiple scales are combined, hence techniques to correct them are proposed. AutoFocus obtains an mAP of 47.9% (68.3% at 50% overlap) on the COCO test-dev set while processing 6.4 images per second on a Titan X (Pascal) GPU. This is 2.5X faster than our multi-scale baseline detector and matches its mAP. The number of pixels processed in the pyramid can be reduced by 5X with a 1% drop in mAP. AutoFocus obtains more than 10% mAP gain compared to RetinaNet but runs at the same speed with the same ResNet-101 backbone. |
Tasks | |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01600v2 |
https://arxiv.org/pdf/1812.01600v2.pdf | |
PWC | https://paperswithcode.com/paper/autofocus-efficient-multi-scale-inference |
Repo | https://github.com/MahyarNajibi/SNIPER |
Framework | mxnet |
Transfer of Deep Reactive Policies for MDP Planning
Title | Transfer of Deep Reactive Policies for MDP Planning |
Authors | Aniket Bajpai, Sankalp Garg, Mausam |
Abstract | Domain-independent probabilistic planners input an MDP description in a factored representation language such as PPDDL or RDDL, and exploit the specifics of the representation for faster planning. Traditional algorithms operate on each problem instance independently, and good methods for transferring experience from policies of other instances of a domain to a new instance do not exist. Recently, researchers have begun exploring the use of deep reactive policies, trained via deep reinforcement learning (RL), for MDP planning domains. One advantage of deep reactive policies is that they are more amenable to transfer learning. In this paper, we present the first domain-independent transfer algorithm for MDP planning domains expressed in an RDDL representation. Our architecture exploits the symbolic state configuration and transition function of the domain (available via RDDL) to learn a shared embedding space for states and state-action pairs for all problem instances of a domain. We then learn an RL agent in the embedding space, making a near zero-shot transfer possible, i.e., without much training on the new instance, and without using the domain simulator at all. Experiments on three different benchmark domains underscore the value of our transfer algorithm. Compared against planning from scratch, and a state-of-the-art RL transfer algorithm, our transfer solution has significantly superior learning curves. |
Tasks | Transfer Learning |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11488v1 |
http://arxiv.org/pdf/1810.11488v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-of-deep-reactive-policies-for-mdp |
Repo | https://github.com/dair-iitd/torpido |
Framework | none |
Translations as Additional Contexts for Sentence Classification
Title | Translations as Additional Contexts for Sentence Classification |
Authors | Reinald Kim Amplayo, Kyungjae Lee, Jinyeong Yeo, Seung-won Hwang |
Abstract | In sentence classification tasks, additional contexts, such as the neighboring sentences, may improve the accuracy of the classifier. However, such contexts are domain-dependent and thus cannot be used for another classification task with an inappropriate domain. In contrast, we propose the use of translated sentences as context that is always available regardless of the domain. We find that naive feature expansion of translations gains only marginal improvements and may decrease the performance of the classifier, due to possible inaccurate translations thus producing noisy sentence vectors. To this end, we present multiple context fixing attachment (MCFA), a series of modules attached to multiple sentence vectors to fix the noise in the vectors using the other sentence vectors as context. We show that our method performs competitively compared to previous models, achieving best classification performance on multiple data sets. We are the first to use translations as domain-free contexts for sentence classification. |
Tasks | Sentence Classification, Subjectivity Analysis, Text Classification |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05516v1 |
http://arxiv.org/pdf/1806.05516v1.pdf | |
PWC | https://paperswithcode.com/paper/translations-as-additional-contexts-for |
Repo | https://github.com/rktamplayo/MCFA |
Framework | tf |
Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator
Title | Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator |
Authors | Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf |
Abstract | Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backpropagating observed errors. However, neural network architectures are growing increasingly sophisticated and diverse, which motivates an emerging quest for even more general forms of differentiable programming, where arbitrary parameterized computations can be trained by gradient descent. In this paper, we take a fresh look at automatic differentiation (AD) techniques, and especially aim to demystify the reverse-mode form of AD that generalizes backpropagation in neural networks. We uncover a tight connection between reverse-mode AD and delimited continuations, which permits implementing reverse-mode AD purely via operator overloading and without any auxiliary data structures. We further show how this formulation of AD can be fruitfully combined with multi-stage programming (staging), leading to a highly efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch). |
Tasks | Machine Translation |
Published | 2018-03-27 |
URL | https://arxiv.org/abs/1803.10228v3 |
https://arxiv.org/pdf/1803.10228v3.pdf | |
PWC | https://paperswithcode.com/paper/demystifying-differentiable-programming |
Repo | https://github.com/sunze1/Differential-Programming |
Framework | tf |
Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency
Title | Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency |
Authors | Won Ik Cho, Hyeon Seung Lee, Ji Won Yoon, Seok Min Kim, Nam Soo Kim |
Abstract | For a large portion of real-life utterances, the intention cannot be solely decided by either their semantic or syntactic characteristics. Although not all the sociolinguistic and pragmatic information can be digitized, at least phonetic features are indispensable in understanding the spoken language. Especially in head-final languages such as Korean, sentence-final prosody has great importance in identifying the speaker’s intention. This paper suggests a system which identifies the inherent intention of a spoken utterance given its transcript, in some cases using auxiliary acoustic features. The main point here is a separate distinction for cases where discrimination of intention requires an acoustic cue. Thus, the proposed classification system decides whether the given utterance is a fragment, statement, question, command, or a rhetorical question/command, utilizing the intonation-dependency coming from the head-finality. Based on an intuitive understanding of the Korean language that is engaged in the data annotation, we construct a network which identifies the intention of a speech, and validate its utility with the test sentences. The system, if combined with up-to-date speech recognizers, is expected to be flexibly inserted into various language understanding modules. |
Tasks | |
Published | 2018-11-10 |
URL | https://arxiv.org/abs/1811.04231v2 |
https://arxiv.org/pdf/1811.04231v2.pdf | |
PWC | https://paperswithcode.com/paper/speech-intention-understanding-in-a-head |
Repo | https://github.com/warnikchow/3i4k |
Framework | tf |
Exploring the Limits of Weakly Supervised Pretraining
Title | Exploring the Limits of Weakly Supervised Pretraining |
Authors | Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten |
Abstract | State-of-the-art visual perception models for a wide range of tasks rely on supervised pretraining. ImageNet classification is the de facto pretraining task for these models. Yet, ImageNet is now nearly ten years old and is by modern standards “small”. Even so, relatively little is known about the behavior of pretraining with datasets that are multiple orders of magnitude larger. The reasons are obvious: such datasets are difficult to collect and annotate. In this paper, we present a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images. Our experiments demonstrate that training for large-scale hashtag prediction leads to excellent results. We show improvements on several image classification and object detection tasks, and report the highest ImageNet-1k single-crop, top-1 accuracy to date: 85.4% (97.6% top-5). We also perform extensive experiments that provide novel empirical data on the relationship between large-scale pretraining and transfer learning performance. |
Tasks | Image Classification, Object Detection, Transfer Learning |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00932v1 |
http://arxiv.org/pdf/1805.00932v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-limits-of-weakly-supervised |
Repo | https://github.com/eminorhan/ood-benchmarks |
Framework | pytorch |
Free-Form Image Inpainting with Gated Convolution
Title | Free-Form Image Inpainting with Gated Convolution |
Authors | Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang |
Abstract | We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting |
Tasks | Feature Selection, Image Inpainting |
Published | 2018-06-10 |
URL | https://arxiv.org/abs/1806.03589v2 |
https://arxiv.org/pdf/1806.03589v2.pdf | |
PWC | https://paperswithcode.com/paper/free-form-image-inpainting-with-gated |
Repo | https://github.com/ShnitzelKiller/generative_inpainting |
Framework | tf |