Paper Group AWR 27
Deep Dynamical Modeling and Control of Unsteady Fluid Flows. Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements. Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines. Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics. A Hierarchical Multi-task Approach f …
Deep Dynamical Modeling and Control of Unsteady Fluid Flows
Title | Deep Dynamical Modeling and Control of Unsteady Fluid Flows |
Authors | Jeremy Morton, Freddie D. Witherden, Antony Jameson, Mykel J. Kochenderfer |
Abstract | The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07472v2 |
http://arxiv.org/pdf/1805.07472v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-dynamical-modeling-and-control-of |
Repo | https://github.com/sisl/deep_flow_control |
Framework | tf |
Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements
Title | Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements |
Authors | Shreyas S. Shivakumar, Kartik Mohta, Bernd Pfrommer, Vijay Kumar, Camillo J. Taylor |
Abstract | We present an approach to depth estimation that fuses information from a stereo pair with sparse range measurements derived from a LIDAR sensor or a range camera. The goal of this work is to exploit the complementary strengths of the two sensor modalities, the accurate but sparse range measurements and the ambiguous but dense stereo information. These two sources are effectively and efficiently fused by combining ideas from anisotropic diffusion and semi-global matching. We evaluate our approach on the KITTI 2015 and Middlebury 2014 datasets, using randomly sampled ground truth range measurements as our sparse depth input. We achieve significant performance improvements with a small fraction of range measurements on both datasets. We also provide qualitative results from our platform using the PMDTec Monstar sensor. Our entire pipeline runs on an NVIDIA TX-2 platform at 5Hz on 1280x1024 stereo images with 128 disparity levels. |
Tasks | Depth Estimation |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07677v1 |
http://arxiv.org/pdf/1809.07677v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-dense-depth-estimation-by-fusing |
Repo | https://github.com/ShreyasSkandanS/stereo_sparse_depth_fusion |
Framework | none |
Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
Title | Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines |
Authors | Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, Zsolt Kira |
Abstract | Continual learning has received a great deal of attention recently with several approaches being proposed. However, evaluations involve a diverse set of scenarios making meaningful comparison difficult. This work provides a systematic categorization of the scenarios and evaluates them within a consistent framework including strong baselines and state-of-the-art methods. The results provide an understanding of the relative difficulty of the scenarios and that simple baselines (Adagrad, L2 regularization, and naive rehearsal strategies) can surprisingly achieve similar performance to current mainstream methods. We conclude with several suggestions for creating harder evaluation scenarios and future research directions. The code is available at https://github.com/GT-RIPL/Continual-Learning-Benchmark |
Tasks | Continual Learning, L2 Regularization |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12488v4 |
http://arxiv.org/pdf/1810.12488v4.pdf | |
PWC | https://paperswithcode.com/paper/re-evaluating-continual-learning-scenarios-a |
Repo | https://github.com/GT-RIPL/Continual-Learning-Benchmark |
Framework | pytorch |
Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics
Title | Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics |
Authors | Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand |
Abstract | Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including ways to conserve the environment' and understanding the financial crisis’. Composed of stylistically and semantically diverse visual and textual elements, infographics pose new challenges for computer vision. While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons’. To bridge this representation gap, we propose a synthetic data generation strategy: we augment background patches in infographics from our Visually29K dataset with Internet-scraped icons which we use as training data for an icon proposal mechanism. On a test set of 1K annotated infographics, icons are located with 38% precision and 34% recall (the best model trained with natural images achieves 14% precision and 7% recall). Combining our icon proposals with icon classification and text extraction, we present a multi-modal summarization application. Our application takes an infographic as input and automatically produces text tags and visual hashtags that are textually and visually representative of the infographic’s topics respectively. | |
Tasks | Synthetic Data Generation |
Published | 2018-07-27 |
URL | http://arxiv.org/abs/1807.10441v1 |
http://arxiv.org/pdf/1807.10441v1.pdf | |
PWC | https://paperswithcode.com/paper/synthetically-trained-icon-proposals-for |
Repo | https://github.com/cvzoya/visuallydata |
Framework | none |
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Title | A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks |
Authors | Victor Sanh, Thomas Wolf, Sebastian Ruder |
Abstract | Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information. |
Tasks | Multi-Task Learning, Named Entity Recognition, Relation Extraction |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06031v2 |
http://arxiv.org/pdf/1811.06031v2.pdf | |
PWC | https://paperswithcode.com/paper/a-hierarchical-multi-task-approach-for |
Repo | https://github.com/huggingface/hmtl |
Framework | pytorch |
Visual Robot Task Planning
Title | Visual Robot Task Planning |
Authors | Chris Paxton, Yotam Barnoy, Kapil Katyal, Raman Arora, Gregory D. Hager |
Abstract | Prospection, the act of predicting the consequences of many possible futures, is intrinsic to human planning and action, and may even be at the root of consciousness. Surprisingly, this idea has been explored comparatively little in robotics. In this work, we propose a neural network architecture and associated planning algorithm that (1) learns a representation of the world useful for generating prospective futures after the application of high-level actions, (2) uses this generative model to simulate the result of sequences of high-level actions in a variety of environments, and (3) uses this same representation to evaluate these actions and perform tree search to find a sequence of high-level actions in a new environment. Models are trained via imitation learning on a variety of domains, including navigation, pick-and-place, and a surgical robotics task. Our approach allows us to visualize intermediate motion goals and learn to plan complex activity from visual information. |
Tasks | Imitation Learning, Robot Task Planning |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1804.00062v1 |
http://arxiv.org/pdf/1804.00062v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-robot-task-planning |
Repo | https://github.com/jhu-lcsr/costar_plan |
Framework | tf |
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Title | Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond |
Authors | Mikel Artetxe, Holger Schwenk |
Abstract | We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared BPE vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of the 93 languages without any modification. Our experiments in cross-lingual natural language inference (XNLI dataset), cross-lingual document classification (MLDoc dataset) and parallel corpus mining (BUCC dataset) show the effectiveness of our approach. We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages. Our implementation, the pre-trained encoder and the multilingual test set are available at https://github.com/facebookresearch/LASER |
Tasks | Cross-Lingual Bitext Mining, Cross-Lingual Document Classification, Cross-Lingual Natural Language Inference, Cross-Lingual Transfer, Document Classification, Joint Multilingual Sentence Representations, Natural Language Inference, Parallel Corpus Mining, Sentence Embeddings |
Published | 2018-12-26 |
URL | https://arxiv.org/abs/1812.10464v2 |
https://arxiv.org/pdf/1812.10464v2.pdf | |
PWC | https://paperswithcode.com/paper/massively-multilingual-sentence-embeddings |
Repo | https://github.com/LawrenceDuan/myLASER |
Framework | pytorch |
Learning Converged Propagations with Deep Prior Ensemble for Image Enhancement
Title | Learning Converged Propagations with Deep Prior Ensemble for Image Enhancement |
Authors | Risheng Liu, Long Ma, Yiyang Wang, Lei Zhang |
Abstract | Enhancing visual qualities of images plays very important roles in various vision and learning applications. In the past few years, both knowledge-driven maximum a posterior (MAP) with prior modelings and fully data-dependent convolutional neural network (CNN) techniques have been investigated to address specific enhancement tasks. In this paper, by exploiting the advantages of these two types of mechanisms within a complementary propagation perspective, we propose a unified framework, named deep prior ensemble (DPE), for solving various image enhancement tasks. Specifically, we first establish the basic propagation scheme based on the fundamental image modeling cues and then introduce residual CNNs to help predicting the propagation direction at each stage. By designing prior projections to perform feedback control, we theoretically prove that even with experience-inspired CNNs, DPE is definitely converged and the output will always satisfy our fundamental task constraints. The main advantage against conventional optimization-based MAP approaches is that our descent directions are learned from collected training data, thus are much more robust to unwanted local minimums. While, compared with existing CNN type networks, which are often designed in heuristic manners without theoretical guarantees, DPE is able to gain advantages from rich task cues investigated on the bases of domain knowledges. Therefore, DPE actually provides a generic ensemble methodology to integrate both knowledge and data-based cues for different image enhancement tasks. More importantly, our theoretical investigations verify that the feedforward propagations of DPE are properly controlled toward our desired solution. Experimental results demonstrate that the proposed DPE outperforms state-of-the-arts on a variety of image enhancement tasks in terms of both quantitative measure and visual perception quality. |
Tasks | Image Enhancement |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.04012v1 |
http://arxiv.org/pdf/1810.04012v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-converged-propagations-with-deep |
Repo | https://github.com/dlut-dimt/DPE-Deep-Prior-Ensemble |
Framework | none |
Image-based Guidance of Autonomous Aircraft for Wildfire Surveillance and Prediction
Title | Image-based Guidance of Autonomous Aircraft for Wildfire Surveillance and Prediction |
Authors | Kyle D. Julian, Mykel J. Kochenderfer |
Abstract | Small unmanned aircraft can help firefighters combat wildfires by providing real-time surveillance of the growing fires. However, guiding the aircraft autonomously given only wildfire images is a challenging problem. This work models noisy images obtained from on-board cameras and proposes two approaches to filtering the wildfire images. The first approach uses a simple Kalman filter to reduce noise and update a belief map in observed areas. The second approach uses a particle filter to predict wildfire growth and uses observations to estimate uncertainties relating to wildfire expansion. The belief maps are used to train a deep reinforcement learning controller, which learns a policy to navigate the aircraft to survey the wildfire while avoiding flight directly over the fire. Simulation results show that the proposed controllers precisely guide the aircraft and accurately estimate wildfire growth, and a study of observation noise demonstrates the robustness of the particle filter approach. |
Tasks | |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02455v2 |
http://arxiv.org/pdf/1810.02455v2.pdf | |
PWC | https://paperswithcode.com/paper/image-based-guidance-of-autonomous-aircraft |
Repo | https://github.com/sisl/UAV_Wildfire_Monitoring |
Framework | none |
Pansori: ASR Corpus Generation from Open Online Video Contents
Title | Pansori: ASR Corpus Generation from Open Online Video Contents |
Authors | Yoona Choi, Bowon Lee |
Abstract | This paper introduces Pansori, a program used to create ASR (automatic speech recognition) corpora from online video contents. It utilizes a cloud-based speech API to easily create a corpus in different languages. Using this program, we semi-automatically generated the Pansori-TEDxKR dataset from Korean TED conference talks with community-transcribed subtitles. It is the first high-quality corpus for the Korean language freely available for independent research. Pansori is released as an open-source software and the generated corpus is released under a permissive public license for community use and participation. |
Tasks | Speech Recognition |
Published | 2018-12-23 |
URL | http://arxiv.org/abs/1812.09798v1 |
http://arxiv.org/pdf/1812.09798v1.pdf | |
PWC | https://paperswithcode.com/paper/pansori-asr-corpus-generation-from-open |
Repo | https://github.com/yc9701/pansori-tedxkr-corpus |
Framework | none |
Pelee: A Real-Time Object Detection System on Mobile Devices
Title | Pelee: A Real-Time Object Detection System on Mobile Devices |
Authors | Robert J. Wang, Xiang Li, Charles X. Ling |
Abstract | An increasing need of running Convolutional Neural Network (CNN) models on mobile devices with limited computing power and memory resource encourages studies on efficient model design. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and MobileNetV2. However, all these models are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2. Meanwhile, PeleeNet is only 66% of the model size of MobileNet. We then propose a real-time object detection system by combining PeleeNet with Single Shot MultiBox Detector (SSD) method and optimizing the architecture for fast speed. Our proposed detection system2, named Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2. The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size. |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06882v3 |
http://arxiv.org/pdf/1804.06882v3.pdf | |
PWC | https://paperswithcode.com/paper/pelee-a-real-time-object-detection-system-on |
Repo | https://github.com/koshian2/PeleeNet-Keras |
Framework | tf |
Discriminator Rejection Sampling
Title | Discriminator Rejection Sampling |
Authors | Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena |
Abstract | We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the SAGAN model, state-of-the-art in the image generation task at the time of developing this work. On ImageNet, we train an improved baseline that increases the Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75. |
Tasks | Image Generation |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.06758v3 |
http://arxiv.org/pdf/1810.06758v3.pdf | |
PWC | https://paperswithcode.com/paper/discriminator-rejection-sampling |
Repo | https://github.com/vita-epfl/collaborative-gan-sampling |
Framework | tf |
IAM at CLEF eHealth 2018: Concept Annotation and Coding in French Death Certificates
Title | IAM at CLEF eHealth 2018: Concept Annotation and Coding in French Death Certificates |
Authors | Sébastien Cossin, Vianney Jouhet, Fleur Mougin, Gayo Diallo, Frantz Thiessard |
Abstract | In this paper, we describe the approach and results for our participation in the task 1 (multilingual information extraction) of the CLEF eHealth 2018 challenge. We addressed the task of automatically assigning ICD-10 codes to French death certificates. We used a dictionary-based approach using materials provided by the task organizers. The terms of the ICD-10 terminology were normalized, tokenized and stored in a tree data structure. The Levenshtein distance was used to detect typos. Frequent abbreviations were detected by manually creating a small set of them. Our system achieved an F-score of 0.786 (precision: 0.794, recall: 0.779). These scores were substantially higher than the average score of the systems that participated in the challenge. |
Tasks | |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03674v1 |
http://arxiv.org/pdf/1807.03674v1.pdf | |
PWC | https://paperswithcode.com/paper/iam-at-clef-ehealth-2018-concept-annotation |
Repo | https://github.com/scossin/IAMsystem |
Framework | none |
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation
Title | Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation |
Authors | Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz |
Abstract | Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. In real world applications, a dynamic scene is commonly captured by a moving camera (i.e., panning, tilting or hand-held), increasing the task complexity because the scene is observed from different view points. The main challenge is the disambiguation of the camera motion from scene motion, which becomes more difficult as the amount of rigidity observed decreases, even with successful estimation of 2D image correspondences. Compared to other state-of-the-art 3D scene flow estimation methods, in this paper we propose to \emph{learn} the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths. With the learned network, we show how we can effectively estimate camera motion and projected scene flow using computed 2D optical flow and the inferred rigidity mask. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground objects with a real background) and an evaluation split that accounts for the percentage of observed non-rigid pixels. Through our evaluation we show the proposed framework outperforms current state-of-the-art scene flow estimation methods in challenging dynamic scenes. |
Tasks | Optical Flow Estimation, Scene Flow Estimation, Scene Understanding |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04259v2 |
http://arxiv.org/pdf/1804.04259v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-rigidity-in-dynamic-scenes-with-a |
Repo | https://github.com/NVlabs/learningrigidity |
Framework | pytorch |
Efficient Sequence Labeling with Actor-Critic Training
Title | Efficient Sequence Labeling with Actor-Critic Training |
Authors | Saeed Najafi, Colin Cherry, Grzegorz Kondrak |
Abstract | Neural approaches to sequence labeling often use a Conditional Random Field (CRF) to model their output dependencies, while Recurrent Neural Networks (RNN) are used for the same purpose in other tasks. We set out to establish RNNs as an attractive alternative to CRFs for sequence labeling. To do so, we address one of the RNN’s most prominent shortcomings, the fact that it is not exposed to its own errors with the maximum-likelihood training. We frame the prediction of the output sequence as a sequential decision-making process, where we train the network with an adjusted actor-critic algorithm (AC-RNN). We comprehensively compare this strategy with maximum-likelihood training for both RNNs and CRFs on three structured-output tasks. The proposed AC-RNN efficiently matches the performance of the CRF on NER and CCG tagging, and outperforms it on Machine Transliteration. We also show that our training strategy is significantly better than other techniques for addressing RNN’s exposure bias, such as Scheduled Sampling, and Self-Critical policy training. |
Tasks | Decision Making, Transliteration |
Published | 2018-09-30 |
URL | http://arxiv.org/abs/1810.00428v1 |
http://arxiv.org/pdf/1810.00428v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-sequence-labeling-with-actor-critic |
Repo | https://github.com/SaeedNajafi/ac-tagger |
Framework | pytorch |