October 21, 2019

3150 words 15 mins read

Paper Group AWR 27

Deep Dynamical Modeling and Control of Unsteady Fluid Flows. Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements. Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines. Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics. A Hierarchical Multi-task Approach f …

Deep Dynamical Modeling and Control of Unsteady Fluid Flows


Title	Deep Dynamical Modeling and Control of Unsteady Fluid Flows
Authors	Jeremy Morton, Freddie D. Witherden, Antony Jameson, Mykel J. Kochenderfer
Abstract	The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07472v2
PDF	http://arxiv.org/pdf/1805.07472v2.pdf
PWC	https://paperswithcode.com/paper/deep-dynamical-modeling-and-control-of
Repo	https://github.com/sisl/deep_flow_control
Framework	tf

Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements


Title	Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements
Authors	Shreyas S. Shivakumar, Kartik Mohta, Bernd Pfrommer, Vijay Kumar, Camillo J. Taylor
Abstract	We present an approach to depth estimation that fuses information from a stereo pair with sparse range measurements derived from a LIDAR sensor or a range camera. The goal of this work is to exploit the complementary strengths of the two sensor modalities, the accurate but sparse range measurements and the ambiguous but dense stereo information. These two sources are effectively and efficiently fused by combining ideas from anisotropic diffusion and semi-global matching. We evaluate our approach on the KITTI 2015 and Middlebury 2014 datasets, using randomly sampled ground truth range measurements as our sparse depth input. We achieve significant performance improvements with a small fraction of range measurements on both datasets. We also provide qualitative results from our platform using the PMDTec Monstar sensor. Our entire pipeline runs on an NVIDIA TX-2 platform at 5Hz on 1280x1024 stereo images with 128 disparity levels.
Tasks	Depth Estimation
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07677v1
PDF	http://arxiv.org/pdf/1809.07677v1.pdf
PWC	https://paperswithcode.com/paper/real-time-dense-depth-estimation-by-fusing
Repo	https://github.com/ShreyasSkandanS/stereo_sparse_depth_fusion
Framework	none

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines


Title	Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
Authors	Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, Zsolt Kira
Abstract	Continual learning has received a great deal of attention recently with several approaches being proposed. However, evaluations involve a diverse set of scenarios making meaningful comparison difficult. This work provides a systematic categorization of the scenarios and evaluates them within a consistent framework including strong baselines and state-of-the-art methods. The results provide an understanding of the relative difficulty of the scenarios and that simple baselines (Adagrad, L2 regularization, and naive rehearsal strategies) can surprisingly achieve similar performance to current mainstream methods. We conclude with several suggestions for creating harder evaluation scenarios and future research directions. The code is available at https://github.com/GT-RIPL/Continual-Learning-Benchmark
Tasks	Continual Learning, L2 Regularization
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12488v4
PDF	http://arxiv.org/pdf/1810.12488v4.pdf
PWC	https://paperswithcode.com/paper/re-evaluating-continual-learning-scenarios-a
Repo	https://github.com/GT-RIPL/Continual-Learning-Benchmark
Framework	pytorch

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics


Title	Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics
Authors	Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand
Abstract	Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and` understanding the financial crisis’. Composed of stylistically and semantically diverse visual and textual elements, infographics pose new challenges for computer vision. While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons’. To bridge this representation gap, we propose a synthetic data generation strategy: we augment background patches in infographics from our Visually29K dataset with Internet-scraped icons which we use as training data for an icon proposal mechanism. On a test set of 1K annotated infographics, icons are located with 38% precision and 34% recall (the best model trained with natural images achieves 14% precision and 7% recall). Combining our icon proposals with icon classification and text extraction, we present a multi-modal summarization application. Our application takes an infographic as input and automatically produces text tags and visual hashtags that are textually and visually representative of the infographic’s topics respectively. \|
Tasks	Synthetic Data Generation
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10441v1
PDF	http://arxiv.org/pdf/1807.10441v1.pdf
PWC	https://paperswithcode.com/paper/synthetically-trained-icon-proposals-for
Repo	https://github.com/cvzoya/visuallydata
Framework	none

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks


Title	A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Authors	Victor Sanh, Thomas Wolf, Sebastian Ruder
Abstract	Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.
Tasks	Multi-Task Learning, Named Entity Recognition, Relation Extraction
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06031v2
PDF	http://arxiv.org/pdf/1811.06031v2.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-multi-task-approach-for
Repo	https://github.com/huggingface/hmtl
Framework	pytorch

Visual Robot Task Planning


Title	Visual Robot Task Planning
Authors	Chris Paxton, Yotam Barnoy, Kapil Katyal, Raman Arora, Gregory D. Hager
Abstract	Prospection, the act of predicting the consequences of many possible futures, is intrinsic to human planning and action, and may even be at the root of consciousness. Surprisingly, this idea has been explored comparatively little in robotics. In this work, we propose a neural network architecture and associated planning algorithm that (1) learns a representation of the world useful for generating prospective futures after the application of high-level actions, (2) uses this generative model to simulate the result of sequences of high-level actions in a variety of environments, and (3) uses this same representation to evaluate these actions and perform tree search to find a sequence of high-level actions in a new environment. Models are trained via imitation learning on a variety of domains, including navigation, pick-and-place, and a surgical robotics task. Our approach allows us to visualize intermediate motion goals and learn to plan complex activity from visual information.
Tasks	Imitation Learning, Robot Task Planning
Published	2018-03-30
URL	http://arxiv.org/abs/1804.00062v1
PDF	http://arxiv.org/pdf/1804.00062v1.pdf
PWC	https://paperswithcode.com/paper/visual-robot-task-planning
Repo	https://github.com/jhu-lcsr/costar_plan
Framework	tf

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond


Title	Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Authors	Mikel Artetxe, Holger Schwenk
Abstract	We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared BPE vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of the 93 languages without any modification. Our experiments in cross-lingual natural language inference (XNLI dataset), cross-lingual document classification (MLDoc dataset) and parallel corpus mining (BUCC dataset) show the effectiveness of our approach. We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages. Our implementation, the pre-trained encoder and the multilingual test set are available at https://github.com/facebookresearch/LASER
Tasks	Cross-Lingual Bitext Mining, Cross-Lingual Document Classification, Cross-Lingual Natural Language Inference, Cross-Lingual Transfer, Document Classification, Joint Multilingual Sentence Representations, Natural Language Inference, Parallel Corpus Mining, Sentence Embeddings
Published	2018-12-26
URL	https://arxiv.org/abs/1812.10464v2
PDF	https://arxiv.org/pdf/1812.10464v2.pdf
PWC	https://paperswithcode.com/paper/massively-multilingual-sentence-embeddings
Repo	https://github.com/LawrenceDuan/myLASER
Framework	pytorch

Learning Converged Propagations with Deep Prior Ensemble for Image Enhancement


Title	Learning Converged Propagations with Deep Prior Ensemble for Image Enhancement
Authors	Risheng Liu, Long Ma, Yiyang Wang, Lei Zhang
Abstract	Enhancing visual qualities of images plays very important roles in various vision and learning applications. In the past few years, both knowledge-driven maximum a posterior (MAP) with prior modelings and fully data-dependent convolutional neural network (CNN) techniques have been investigated to address specific enhancement tasks. In this paper, by exploiting the advantages of these two types of mechanisms within a complementary propagation perspective, we propose a unified framework, named deep prior ensemble (DPE), for solving various image enhancement tasks. Specifically, we first establish the basic propagation scheme based on the fundamental image modeling cues and then introduce residual CNNs to help predicting the propagation direction at each stage. By designing prior projections to perform feedback control, we theoretically prove that even with experience-inspired CNNs, DPE is definitely converged and the output will always satisfy our fundamental task constraints. The main advantage against conventional optimization-based MAP approaches is that our descent directions are learned from collected training data, thus are much more robust to unwanted local minimums. While, compared with existing CNN type networks, which are often designed in heuristic manners without theoretical guarantees, DPE is able to gain advantages from rich task cues investigated on the bases of domain knowledges. Therefore, DPE actually provides a generic ensemble methodology to integrate both knowledge and data-based cues for different image enhancement tasks. More importantly, our theoretical investigations verify that the feedforward propagations of DPE are properly controlled toward our desired solution. Experimental results demonstrate that the proposed DPE outperforms state-of-the-arts on a variety of image enhancement tasks in terms of both quantitative measure and visual perception quality.
Tasks	Image Enhancement
Published	2018-10-09
URL	http://arxiv.org/abs/1810.04012v1
PDF	http://arxiv.org/pdf/1810.04012v1.pdf
PWC	https://paperswithcode.com/paper/learning-converged-propagations-with-deep
Repo	https://github.com/dlut-dimt/DPE-Deep-Prior-Ensemble
Framework	none

Image-based Guidance of Autonomous Aircraft for Wildfire Surveillance and Prediction


Title	Image-based Guidance of Autonomous Aircraft for Wildfire Surveillance and Prediction
Authors	Kyle D. Julian, Mykel J. Kochenderfer
Abstract	Small unmanned aircraft can help firefighters combat wildfires by providing real-time surveillance of the growing fires. However, guiding the aircraft autonomously given only wildfire images is a challenging problem. This work models noisy images obtained from on-board cameras and proposes two approaches to filtering the wildfire images. The first approach uses a simple Kalman filter to reduce noise and update a belief map in observed areas. The second approach uses a particle filter to predict wildfire growth and uses observations to estimate uncertainties relating to wildfire expansion. The belief maps are used to train a deep reinforcement learning controller, which learns a policy to navigate the aircraft to survey the wildfire while avoiding flight directly over the fire. Simulation results show that the proposed controllers precisely guide the aircraft and accurately estimate wildfire growth, and a study of observation noise demonstrates the robustness of the particle filter approach.
Tasks
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02455v2
PDF	http://arxiv.org/pdf/1810.02455v2.pdf
PWC	https://paperswithcode.com/paper/image-based-guidance-of-autonomous-aircraft
Repo	https://github.com/sisl/UAV_Wildfire_Monitoring
Framework	none

Pansori: ASR Corpus Generation from Open Online Video Contents


Title	Pansori: ASR Corpus Generation from Open Online Video Contents
Authors	Yoona Choi, Bowon Lee
Abstract	This paper introduces Pansori, a program used to create ASR (automatic speech recognition) corpora from online video contents. It utilizes a cloud-based speech API to easily create a corpus in different languages. Using this program, we semi-automatically generated the Pansori-TEDxKR dataset from Korean TED conference talks with community-transcribed subtitles. It is the first high-quality corpus for the Korean language freely available for independent research. Pansori is released as an open-source software and the generated corpus is released under a permissive public license for community use and participation.
Tasks	Speech Recognition
Published	2018-12-23
URL	http://arxiv.org/abs/1812.09798v1
PDF	http://arxiv.org/pdf/1812.09798v1.pdf
PWC	https://paperswithcode.com/paper/pansori-asr-corpus-generation-from-open
Repo	https://github.com/yc9701/pansori-tedxkr-corpus
Framework	none

Pelee: A Real-Time Object Detection System on Mobile Devices


Title	Pelee: A Real-Time Object Detection System on Mobile Devices
Authors	Robert J. Wang, Xiang Li, Charles X. Ling
Abstract	An increasing need of running Convolutional Neural Network (CNN) models on mobile devices with limited computing power and memory resource encourages studies on efficient model design. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and MobileNetV2. However, all these models are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2. Meanwhile, PeleeNet is only 66% of the model size of MobileNet. We then propose a real-time object detection system by combining PeleeNet with Single Shot MultiBox Detector (SSD) method and optimizing the architecture for fast speed. Our proposed detection system2, named Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2. The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size.
Tasks	Object Detection, Real-Time Object Detection
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06882v3
PDF	http://arxiv.org/pdf/1804.06882v3.pdf
PWC	https://paperswithcode.com/paper/pelee-a-real-time-object-detection-system-on
Repo	https://github.com/koshian2/PeleeNet-Keras
Framework	tf

Discriminator Rejection Sampling


Title	Discriminator Rejection Sampling
Authors	Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena
Abstract	We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the SAGAN model, state-of-the-art in the image generation task at the time of developing this work. On ImageNet, we train an improved baseline that increases the Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75.
Tasks	Image Generation
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06758v3
PDF	http://arxiv.org/pdf/1810.06758v3.pdf
PWC	https://paperswithcode.com/paper/discriminator-rejection-sampling
Repo	https://github.com/vita-epfl/collaborative-gan-sampling
Framework	tf

IAM at CLEF eHealth 2018: Concept Annotation and Coding in French Death Certificates


Title	IAM at CLEF eHealth 2018: Concept Annotation and Coding in French Death Certificates
Authors	Sébastien Cossin, Vianney Jouhet, Fleur Mougin, Gayo Diallo, Frantz Thiessard
Abstract	In this paper, we describe the approach and results for our participation in the task 1 (multilingual information extraction) of the CLEF eHealth 2018 challenge. We addressed the task of automatically assigning ICD-10 codes to French death certificates. We used a dictionary-based approach using materials provided by the task organizers. The terms of the ICD-10 terminology were normalized, tokenized and stored in a tree data structure. The Levenshtein distance was used to detect typos. Frequent abbreviations were detected by manually creating a small set of them. Our system achieved an F-score of 0.786 (precision: 0.794, recall: 0.779). These scores were substantially higher than the average score of the systems that participated in the challenge.
Tasks
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03674v1
PDF	http://arxiv.org/pdf/1807.03674v1.pdf
PWC	https://paperswithcode.com/paper/iam-at-clef-ehealth-2018-concept-annotation
Repo	https://github.com/scossin/IAMsystem
Framework	none

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation


Title	Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation
Authors	Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz
Abstract	Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. In real world applications, a dynamic scene is commonly captured by a moving camera (i.e., panning, tilting or hand-held), increasing the task complexity because the scene is observed from different view points. The main challenge is the disambiguation of the camera motion from scene motion, which becomes more difficult as the amount of rigidity observed decreases, even with successful estimation of 2D image correspondences. Compared to other state-of-the-art 3D scene flow estimation methods, in this paper we propose to \emph{learn} the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths. With the learned network, we show how we can effectively estimate camera motion and projected scene flow using computed 2D optical flow and the inferred rigidity mask. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground objects with a real background) and an evaluation split that accounts for the percentage of observed non-rigid pixels. Through our evaluation we show the proposed framework outperforms current state-of-the-art scene flow estimation methods in challenging dynamic scenes.
Tasks	Optical Flow Estimation, Scene Flow Estimation, Scene Understanding
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04259v2
PDF	http://arxiv.org/pdf/1804.04259v2.pdf
PWC	https://paperswithcode.com/paper/learning-rigidity-in-dynamic-scenes-with-a
Repo	https://github.com/NVlabs/learningrigidity
Framework	pytorch

Efficient Sequence Labeling with Actor-Critic Training


Title	Efficient Sequence Labeling with Actor-Critic Training
Authors	Saeed Najafi, Colin Cherry, Grzegorz Kondrak
Abstract	Neural approaches to sequence labeling often use a Conditional Random Field (CRF) to model their output dependencies, while Recurrent Neural Networks (RNN) are used for the same purpose in other tasks. We set out to establish RNNs as an attractive alternative to CRFs for sequence labeling. To do so, we address one of the RNN’s most prominent shortcomings, the fact that it is not exposed to its own errors with the maximum-likelihood training. We frame the prediction of the output sequence as a sequential decision-making process, where we train the network with an adjusted actor-critic algorithm (AC-RNN). We comprehensively compare this strategy with maximum-likelihood training for both RNNs and CRFs on three structured-output tasks. The proposed AC-RNN efficiently matches the performance of the CRF on NER and CCG tagging, and outperforms it on Machine Transliteration. We also show that our training strategy is significantly better than other techniques for addressing RNN’s exposure bias, such as Scheduled Sampling, and Self-Critical policy training.
Tasks	Decision Making, Transliteration
Published	2018-09-30
URL	http://arxiv.org/abs/1810.00428v1
PDF	http://arxiv.org/pdf/1810.00428v1.pdf
PWC	https://paperswithcode.com/paper/efficient-sequence-labeling-with-actor-critic
Repo	https://github.com/SaeedNajafi/ac-tagger
Framework	pytorch