February 1, 2020

3410 words 17 mins read

Paper Group AWR 267

Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments. Visual Story Post-Editing. Knowledge-Enriched Visual Storytelling. Multiple Testing and Variable Selection along Least Angle Regression’s path. Playgol: learning programs through play. Control Synthesis from Linear Temporal Logic Specifications using Model-Free …

Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments


Title	Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments
Authors	Olly Styles, Tanaya Guha, Victor Sanchez
Abstract	This paper introduces the problem of multiple object forecasting (MOF), in which the goal is to predict future bounding boxes of tracked objects. In contrast to existing works on object trajectory forecasting which primarily consider the problem from a birds-eye perspective, we formulate the problem from an object-level perspective and call for the prediction of full object bounding boxes, rather than trajectories alone. Towards solving this task, we introduce the Citywalks dataset, which consists of over 200k high-resolution video frames. Citywalks comprises of footage recorded in 21 cities from 10 European countries in a variety of weather conditions and over 3.5k unique pedestrian trajectories. For evaluation, we adapt existing trajectory forecasting methods for MOF and confirm cross-dataset generalizability on the MOT-17 dataset without fine-tuning. Finally, we present STED, a novel encoder-decoder architecture for MOF. STED combines visual and temporal features to model both object-motion and ego-motion, and outperforms existing approaches for MOF. Code & dataset link: https://github.com/olly-styles/Multiple-Object-Forecasting
Tasks	Multiple Object Forecasting
Published	2019-09-26
URL	https://arxiv.org/abs/1909.11944v2
PDF	https://arxiv.org/pdf/1909.11944v2.pdf
PWC	https://paperswithcode.com/paper/multiple-object-forecasting-predicting-future
Repo	https://github.com/olly-styles/Multiple-Object-Forecasting
Framework	pytorch

Visual Story Post-Editing


Title	Visual Story Post-Editing
Authors	Ting-Yao Hsu, Chieh-Yang Huang, Yen-Chia Hsu, Ting-Hao ‘Kenneth’ Huang
Abstract	We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task. The dataset, VIST-Edit, includes 14,905 human edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions. We establish baselines for the task, showing how a relatively small set of human edits can be leveraged to boost the performance of large visual storytelling models. We also discuss the weak correlation between automatic evaluation scores and human ratings, motivating the need for new automatic metrics.
Tasks	Visual Storytelling
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01764v1
PDF	https://arxiv.org/pdf/1906.01764v1.pdf
PWC	https://paperswithcode.com/paper/visual-story-post-editing
Repo	https://github.com/tingyaohsu/VIST-Edit
Framework	none

Knowledge-Enriched Visual Storytelling


Title	Knowledge-Enriched Visual Storytelling
Authors	Chao-Chun Hsu, Zi-Yuan Chen, Chi-Yang Hsu, Chih-Chia Li, Tzu-Yuan Lin, Ting-Hao ‘Kenneth’ Huang, Lun-Wei Ku
Abstract	Stories are diverse and highly personalized, resulting in a large possible output space for story generation. Existing end-to-end approaches produce monotonous stories because they are limited to the vocabulary and knowledge in a single training dataset. This paper introduces KG-Story, a three-stage framework that allows the story generation model to take advantage of external Knowledge Graphs to produce interesting stories. KG-Story distills a set of representative words from the input prompts, enriches the word set by using external knowledge graphs, and finally generates stories based on the enriched word set. This distill-enrich-generate framework allows the use of external resources not only for the enrichment phase, but also for the distillation and generation phases. In this paper, we show the superiority of KG-Story for visual storytelling, where the input prompt is a sequence of five photos and the output is a short story. Per the human ranking evaluation, stories generated by KG-Story are on average ranked better than that of the state-of-the-art systems. Our code and output stories are available at https://github.com/zychen423/KE-VIST.
Tasks	Knowledge Graphs, Visual Storytelling
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01496v1
PDF	https://arxiv.org/pdf/1912.01496v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-enriched-visual-storytelling
Repo	https://github.com/zychen423/KE-VIST
Framework	pytorch

Multiple Testing and Variable Selection along Least Angle Regression’s path


Title	Multiple Testing and Variable Selection along Least Angle Regression’s path
Authors	J. -M. Azaïs, Y. De Castro
Abstract	In this article we investigate the outcomes of the standard Least Angle Regression (LAR) algorithm in high dimensions under the Gaussian noise assumption. We give the exact law of the sequence of knots conditional on the sequence of variables entering the model, i.e., the post-selection law of the knots of the LAR. Based on this result, we prove an exact of the False Discovery Rate (FDR) in the orthogonal design case and an exact control of the existence of false negatives in the general design case. First, we build a sequence of testing procedures on the variables entering the model and we give an exact control of the FDR in the orthogonal design case when the noise level can be unknown. Second, we introduce a new exact testing procedure on the existence of false negatives when the noise level can be unknown. This testing procedure can be deployed after any support selection procedure that will produce an estimation of the support (i.e., the indexes of nonzero coefficients) for any designs. The type~$I$ error of the test can be exactly controlled as long as the selection procedure follows some elementary hypotheses, referred to as admissible selection procedures. These support selection procedures are such that the estimation of the support is given by the $k$ first variables entering the model where the random variable $k$ is a stopping time. Monte-Carlo simulations and a real data experiment are provided to illustrate our results.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12072v1
PDF	https://arxiv.org/pdf/1906.12072v1.pdf
PWC	https://paperswithcode.com/paper/multiple-testing-and-variable-selection-along
Repo	https://github.com/ydecastro/lar_testing
Framework	none

Playgol: learning programs through play


Title	Playgol: learning programs through play
Authors	Andrew Cropper
Abstract	Children learn though play. We introduce the analogous idea of learning programs through play. In this approach, a program induction system (the learner) is given a set of tasks and initial background knowledge. Before solving the tasks, the learner enters an unsupervised playing stage where it creates its own tasks to solve, tries to solve them, and saves any solutions (programs) to the background knowledge. After the playing stage is finished, the learner enters the supervised building stage where it tries to solve the user-supplied tasks and can reuse solutions learnt whilst playing. The idea is that playing allows the learner to discover reusable general programs on its own which can then help solve the user-supplied tasks. We claim that playing can improve learning performance. We show that playing can reduce the textual complexity of target concepts which in turn reduces the sample complexity of a learner. We implement our idea in Playgol, a new inductive logic programming system. We experimentally test our claim on two domains: robot planning and real-world string transformations. Our experimental results suggest that playing can substantially improve learning performance. We think that the idea of playing (or, more verbosely, unsupervised bootstrapping for supervised program induction) is an important contribution to the problem of developing program induction approaches that self-discover BK.
Tasks
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08993v2
PDF	https://arxiv.org/pdf/1904.08993v2.pdf
PWC	https://paperswithcode.com/paper/playgol-learning-programs-through-play
Repo	https://github.com/metagol/metagol
Framework	none

Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning


Title	Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning
Authors	Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic
Abstract	We present a reinforcement learning (RL) framework to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a Markov Decision Process (MDP). Specifically, we learn a policy that maximizes the probability of satisfying the LTL formula without learning the transition probabilities. We introduce a novel rewarding and path-dependent discounting mechanism based on the LTL formula such that (i) an optimal policy maximizing the total discounted reward effectively maximizes the probabilities of satisfying LTL objectives, and (ii) a model-free RL algorithm using these rewards and discount factors is guaranteed to converge to such policy. Finally, we illustrate the applicability of our RL-based synthesis approach on two motion planning case studies.
Tasks	Motion Planning
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07299v2
PDF	https://arxiv.org/pdf/1909.07299v2.pdf
PWC	https://paperswithcode.com/paper/control-synthesis-from-linear-temporal-logic
Repo	https://github.com/alperkamil/csrl
Framework	none

Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer


Title	Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer
Authors	Henghui Zhu, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang
Abstract	Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation structure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utterances to decide whether one utterance is the parent of the other. We believe the entire ancestral history is a very important information source to make accurate prediction. Therefore, we design a novel masking mechanism to guide the ancestor flow, and leverage the transformer model to aggregate all ancestors to predict parent utterances. Our experiments are performed on the Reddit dataset (Zhang, Culbertson, and Paritosh 2017) and the Ubuntu IRC dataset (Kummerfeld et al. 2019). In addition, we also report experiments on a new larger corpus from the Reddit platform and release this dataset. We show that the proposed model, that takes into account the ancestral history of the conversation, significantly outperforms several strong baselines including the BERT model on all datasets
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10666v1
PDF	https://arxiv.org/pdf/1911.10666v1.pdf
PWC	https://paperswithcode.com/paper/who-did-they-respond-to-conversation
Repo	https://github.com/henghuiz/MaskedHierarchicalTransformer
Framework	none

Deep Learning Development Environment in Virtual Reality


Title	Deep Learning Development Environment in Virtual Reality
Authors	Kevin C. VanHorn, Meyer Zinn, Murat Can Cobanoglu
Abstract	Virtual reality (VR) offers immersive visualization and intuitive interaction. We leverage VR to enable any biomedical professional to deploy a deep learning (DL) model for image classification. While DL models can be powerful tools for data analysis, they are also challenging to understand and develop. To make deep learning more accessible and intuitive, we have built a virtual reality-based DL development environment. Within our environment, the user can move tangible objects to construct a neural network only using their hands. Our software automatically translates these configurations into a trainable model and then reports its resulting accuracy on a test dataset in real-time. Furthermore, we have enriched the virtual objects with visualizations of the model’s components such that users can achieve insight about the DL models that they are developing. With this approach, we bridge the gap between professionals in different fields of expertise while offering a novel perspective for model analysis and data interaction. We further suggest that techniques of development and visualization in deep learning can benefit by integrating virtual reality.
Tasks	Image Classification
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05925v1
PDF	https://arxiv.org/pdf/1906.05925v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-development-environment-in
Repo	https://github.com/Cobanoglu-Lab/VR4DL
Framework	tf

Exploring Model-based Planning with Policy Networks


Title	Exploring Model-based Planning with Policy Networks
Authors	Tingwu Wang, Jimmy Ba
Abstract	Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance. Despite their initial successes, the existing planning methods search from candidate sequences randomly generated in the action space, which is inefficient in complex high-dimensional environments. In this paper, we propose a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning. More specifically, we formulate action planning at each time-step as an optimization problem using neural networks. We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state-of-the-art algorithms, such as PETS, TD3 and SAC. To explain the effectiveness of our algorithm, we show that the optimization surface in parameter space is smoother than in action space. Further more, we found the distilled policy network can be effectively applied without the expansive model predictive control during test time for some environments such as Cheetah. Code is released in https://github.com/WilsonWangTHU/POPLIN.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08649v1
PDF	https://arxiv.org/pdf/1906.08649v1.pdf
PWC	https://paperswithcode.com/paper/exploring-model-based-planning-with-policy
Repo	https://github.com/WilsonWangTHU/POPLIN
Framework	tf

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling


Title	Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
Authors	Daniel Stoller, Mi Tian, Sebastian Ewert, Simon Dixon
Abstract	Convolutional neural networks (CNNs) with dilated filters such as the Wavenet or the Temporal Convolutional Network (TCN) have shown good results in a variety of sequence modelling tasks. However, efficiently modelling long-term dependencies in these sequences is still challenging. Although the receptive field of these models grows exponentially with the number of layers, computing the convolutions over very long sequences of features in each layer is time and memory-intensive, prohibiting the use of longer receptive fields in practice. To increase efficiency, we make use of the “slow feature” hypothesis stating that many features of interest are slowly varying over time. For this, we use a U-Net architecture that computes features at multiple time-scales and adapt it to our auto-regressive scenario by making convolutions causal. We apply our model (“Seq-U-Net”) to a variety of tasks including language and audio generation. In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.
Tasks	Audio Generation
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06393v1
PDF	https://arxiv.org/pdf/1911.06393v1.pdf
PWC	https://paperswithcode.com/paper/seq-u-net-a-one-dimensional-causal-u-net-for
Repo	https://github.com/f90/Seq-U-Net
Framework	pytorch

Revisit Knowledge Distillation: a Teacher-free Framework


Title	Revisit Knowledge Distillation: a Teacher-free Framework
Authors	Li Yuan, Francis E. H. Tay, Guilin Li, Tao Wang, Jiashi Feng
Abstract	Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. In this work, we challenge this common belief by following experimental observations: 1) beyond the acknowledgment that the teacher can improve the student, the student can also enhance the teacher significantly by reversing the KD procedure; 2) a poorly-trained teacher with much lower accuracy than the student can still improve the latter significantly. To explain these observations, we provide a theoretical analysis of the relationships between KD and label smoothing regularization. We prove that 1) KD is a type of learned label smoothing regularization and 2) label smoothing regularization provides a virtual teacher model for KD. From these results, we argue that the success of KD is not fully due to the similarity information between categories, but also to the regularization of soft targets, which is equally or even more important. Based on these analyses, we further propose a novel Teacher-free Knowledge Distillation (Tf-KD) framework, where a student model learns from itself or manually-designed regularization distribution. The Tf-KD achieves comparable performance with normal KD from a superior teacher, which is well applied when teacher model is unavailable. Meanwhile, Tf-KD is generic and can be directly deployed for training deep neural networks. Without any extra computation cost, Tf-KD achieves up to 0.65% improvement on ImageNet over well-established baseline models, which is superior to label smoothing regularization. The codes are in: \url{https://github.com/yuanli2333/Teacher-free-Knowledge-Distillation}
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11723v1
PDF	https://arxiv.org/pdf/1909.11723v1.pdf
PWC	https://paperswithcode.com/paper/revisit-knowledge-distillation-a-teacher-free
Repo	https://github.com/yuanli2333/Teacher-free-Knowledge-Distillation
Framework	pytorch

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion


Title	Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion
Authors	Joan Serrà, Santiago Pascual, Carlos Segura
Abstract	End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations. Voice conversion, in which a model has to impersonate a speaker in a recording, is one of those situations. In this paper, we propose Blow, a single-scale normalizing flow using hypernetwork conditioning to perform many-to-many voice conversion between raw audio. Blow is trained end-to-end, with non-parallel data, on a frame-by-frame basis using a single speaker identifier. We show that Blow compares favorably to existing flow-based architectures and other competitive baselines, obtaining equal or better performance in both objective and subjective evaluations. We further assess the impact of its main components with an ablation study, and quantify a number of properties such as the necessary amount of training data or the preference for source or target speakers.
Tasks	Audio Generation, Voice Conversion
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00794v2
PDF	https://arxiv.org/pdf/1906.00794v2.pdf
PWC	https://paperswithcode.com/paper/190600794
Repo	https://github.com/liusongxiang/StarGAN-Voice-Conversion
Framework	pytorch

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders


Title	Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders
Authors	Adrien Bitton, Philippe Esling, Antoine Caillon, Martin Fouilleul
Abstract	Generative models have thrived in computer vision, enabling unprecedented image processes. Yet the results in audio remain less advanced. Our project targets real-time sound synthesis from a reduced set of high-level parameters, including semantic controls that can be adapted to different sound libraries and specific tags. These generative variables should allow expressive modulations of target musical qualities and continuously mix into new styles. To this extent we train AEs on an orchestral database of individual note samples, along with their intrinsic attributes: note class, timbre domain and extended playing techniques. We condition the decoder for control over the rendered note attributes and use latent adversarial training for learning expressive style parameters that can ultimately be mixed. We evaluate both generative performances and latent representation. Our ablation study demonstrates the effectiveness of the musical conditioning mechanisms. The proposed model generates notes as magnitude spectrograms from any probabilistic latent code samples, with expressive control of orchestral timbres and playing styles. Its training data subsets can directly be visualized in the 3D latent representation. Waveform rendering can be done offline with GLA. In order to allow real-time interactions, we fine-tune the decoder with a pretrained MCNN and embed the full waveform generation pipeline in a plugin. Moreover the encoder could be used to process new input samples, after manipulating their latent attribute representation, the decoder can generate sample variations as an audio effect would. Our solution remains rather fast to train, it can directly be applied to other sound domains, including an user’s libraries with custom sound tags that could be mapped to specific generative controls. As a result, it fosters creativity and intuitive audio style experimentations.
Tasks	Audio Generation
Published	2019-04-12
URL	https://arxiv.org/abs/1904.06215v2
PDF	https://arxiv.org/pdf/1904.06215v2.pdf
PWC	https://paperswithcode.com/paper/assisted-sound-sample-generation-with-musical
Repo	https://github.com/adrienchaton/Expressive_WAE_FADER
Framework	none

Attending to Discriminative Certainty for Domain Adaptation


Title	Attending to Discriminative Certainty for Domain Adaptation
Authors	Vinod Kumar Kurmi, Shanu Kumar, Vinay P Namboodiri
Abstract	In this paper, we aim to solve for unsupervised domain adaptation of classifiers where we have access to label information for the source domain while these are not available for a target domain. While various methods have been proposed for solving these including adversarial discriminator based methods, most approaches have focused on the entire image based domain adaptation. In an image, there would be regions that can be adapted better, for instance, the foreground object may be similar in nature. To obtain such regions, we propose methods that consider the probabilistic certainty estimate of various regions and specify focus on these during classification for adaptation. We observe that just by incorporating the probabilistic certainty of the discriminator while training the classifier, we are able to obtain state of the art results on various datasets as compared against all the recent methods. We provide a thorough empirical analysis of the method by providing ablation analysis, statistical significance test, and visualization of the attention maps and t-SNE embeddings. These evaluations convincingly demonstrate the effectiveness of the proposed approach.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03502v2
PDF	https://arxiv.org/pdf/1906.03502v2.pdf
PWC	https://paperswithcode.com/paper/attending-to-discriminative-certainty-for-1
Repo	https://github.com/DelTA-Lab-IITK/CADA
Framework	none

SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving


Title	SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving
Authors	Eren Erdal Aksoy, Saimir Baci, Selcuk Cavdar
Abstract	In this paper, we introduce a deep encoder-decoder network, named SalsaNet, for efficient semantic segmentation of 3D LiDAR point clouds. SalsaNet segments the road, i.e. drivable free-space, and vehicles in the scene by employing the Bird-Eye-View (BEV) image projection of the point cloud. To overcome the lack of annotated point cloud data, in particular for the road segments, we introduce an auto-labeling process which transfers automatically generated labels from the camera to LiDAR. We also explore the role of imagelike projection of LiDAR data in semantic segmentation by comparing BEV with spherical-front-view projection and show that SalsaNet is projection-agnostic. We perform quantitative and qualitative evaluations on the KITTI dataset, which demonstrate that the proposed SalsaNet outperforms other state-of-the-art semantic segmentation networks in terms of accuracy and computation time. Our code and data are publicly available at https://gitlab.com/aksoyeren/salsanet.git.
Tasks	3D Semantic Segmentation, Autonomous Driving, Semantic Segmentation
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08291v1
PDF	https://arxiv.org/pdf/1909.08291v1.pdf
PWC	https://paperswithcode.com/paper/salsanet-fast-road-and-vehicle-segmentation
Repo	https://github.com/aksoyeren/salsanet.git
Framework	none