February 1, 2020

2986 words 15 mins read

Paper Group AWR 235

Decoupling Localization and Classification in Single Shot Temporal Action Detection. Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models. Reinforcement Learning for Market Making in a Multi-agent Dealer Market. LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation. Learning Hierarchic …

Decoupling Localization and Classification in Single Shot Temporal Action Detection


Title	Decoupling Localization and Classification in Single Shot Temporal Action Detection
Authors	Yupan Huang, Qi Dai, Yutong Lu
Abstract	Video temporal action detection aims to temporally localize and recognize the action in untrimmed videos. Existing one-stage approaches mostly focus on unifying two subtasks, i.e., localization of action proposals and classification of each proposal through a fully shared backbone. However, such design of encapsulating all components of two subtasks in one single network might restrict the training by ignoring the specialized characteristic of each subtask. In this paper, we propose a novel Decoupled Single Shot temporal Action Detection (Decouple-SSAD) method to mitigate such problem by decoupling the localization and classification in a one-stage scheme. Particularly, two separate branches are designed in parallel to enable each component to own representations privately for accurate localization or classification. Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream. Each branch produces a set of feature maps by applying deconvolution to the feature maps of the main stream. High-level semantic information from deeper layers is thus incorporated to enhance the feature representations. We conduct extensive experiments on THUMOS14 dataset and demonstrate superior performance over state-of-the-art methods. Our code is available online.
Tasks	Action Detection
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07442v1
PDF	http://arxiv.org/pdf/1904.07442v1.pdf
PWC	https://paperswithcode.com/paper/decoupling-localization-and-classification-in
Repo	https://github.com/hypjudy/Decouple-SSAD
Framework	tf

Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models


Title	Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models
Authors	Kentaro Yoshioka, Edward Lee, Simon Wong, Mark Horowitz
Abstract	Real-time CNN-based object detection models for applications like surveillance can achieve high accuracy but are computationally expensive. Recent works have shown 10 to 100x reduction in computation cost for inference by using domain-specific networks. However, prior works have focused on inference only. If the domain model requires frequent retraining, training costs can pose a significant bottleneck. To address this, we propose Dataset Culling: a pipeline to reduce the size of the dataset for training, based on the prediction difficulty. Images that are easy to classify are filtered out since they contribute little to improving the accuracy. The difficulty is measured using our proposed confidence loss metric with little computational overhead. Dataset Culling is extended to optimize the image resolution to further improve training and inference costs. We develop fixed-angle, long-duration video datasets across several domains, and we show that the dataset size can be culled by a factor of 300x to reduce the total training time by 47x with no accuracy loss or even with slight improvement. Codes are available: https://github.com/kentaroy47/DatasetCulling
Tasks	Object Detection
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00173v3
PDF	https://arxiv.org/pdf/1902.00173v3.pdf
PWC	https://paperswithcode.com/paper/dataset-culling-towards-efficient-training-of
Repo	https://github.com/kentaroy47/DatasetCulling
Framework	pytorch

Reinforcement Learning for Market Making in a Multi-agent Dealer Market


Title	Reinforcement Learning for Market Making in a Multi-agent Dealer Market
Authors	Sumitra Ganesh, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, Manuela Veloso
Abstract	Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-based market maker agent with different competitive scenarios, reward formulations and market price trends (drifts). We show that the reinforcement learning agent is able to learn about its competitor’s pricing policy; it also learns to manage inventory by smartly selecting asymmetric prices on the buy and sell sides (skewing), and maintaining a positive (or negative) inventory depending on whether the market price drift is positive (or negative). Finally, we propose and test reward formulations for creating risk averse RL-based market maker agents.
Tasks
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05892v1
PDF	https://arxiv.org/pdf/1911.05892v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-market-making-in-a
Repo	https://github.com/denisewong1/ASX300
Framework	tf

LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation


Title	LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation
Authors	Yu Wang, Quan Zhou, Jia Liu, Jian Xiong, Guangwei Gao, Xiaofu Wu, Longin Jan Latecki
Abstract	LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation
Tasks	Real-Time Semantic Segmentation, Semantic Segmentation
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02423v3
PDF	https://arxiv.org/pdf/1905.02423v3.pdf
PWC	https://paperswithcode.com/paper/lednet-a-lightweight-encoder-decoder-network
Repo	https://github.com/EEEGUI/LEDNet-pytorch
Framework	pytorch

Learning Hierarchical Discourse-level Structure for Fake News Detection


Title	Learning Hierarchical Discourse-level Structure for Fake News Detection
Authors	Hamid Karimi, Jiliang Tang
Abstract	On the one hand, nowadays, fake news articles are easily propagated through various online media platforms and have become a grand threat to the trustworthiness of information. On the other hand, our understanding of the language of fake news is still minimal. Incorporating hierarchical discourse-level structure of fake and real news articles is one crucial step toward a better understanding of how these articles are structured. Nevertheless, this has rarely been investigated in the fake news detection domain and faces tremendous challenges. First, existing methods for capturing discourse-level structure rely on annotated corpora which are not available for fake news datasets. Second, how to extract out useful information from such discovered structures is another challenge. To address these challenges, we propose Hierarchical Discourse-level Structure for Fake news detection. HDSF learns and constructs a discourse-level structure for fake/real news articles in an automated and data-driven manner. Moreover, we identify insightful structure-related properties, which can explain the discovered structures and boost our understating of fake news. Conducted experiments show the effectiveness of the proposed approach. Further structural analysis suggests that real and fake news present substantial differences in the hierarchical discourse-level structures.
Tasks	Fake News Detection
Published	2019-02-27
URL	http://arxiv.org/abs/1903.07389v6
PDF	http://arxiv.org/pdf/1903.07389v6.pdf
PWC	https://paperswithcode.com/paper/learning-hierarchical-discourse-level
Repo	https://github.com/hamidkarimi/DHSF
Framework	pytorch

Complex Transformer: A Framework for Modeling Complex-Valued Sequence


Title	Complex Transformer: A Framework for Modeling Complex-Valued Sequence
Authors	Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov
Abstract	While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers. However, speech, signal and audio data are naturally complex-valued after Fourier Transform, and studies have shown a potentially richer representation of complex nets. In this paper, we propose a Complex Transformer, which incorporates the transformer model as a backbone for sequence modeling; we also develop attention and encoder-decoder network operating for complex input. The model achieves state-of-the-art performance on the MusicNet dataset and an In-phase Quadrature (IQ) signal dataset.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10202v1
PDF	https://arxiv.org/pdf/1910.10202v1.pdf
PWC	https://paperswithcode.com/paper/complex-transformer-a-framework-for-modeling
Repo	https://github.com/muqiaoy/dl_signal
Framework	pytorch

Robust Chinese Word Segmentation with Contextualized Word Representations


Title	Robust Chinese Word Segmentation with Contextualized Word Representations
Authors	Yung-Sung Chuang
Abstract	In recent years, after the neural-network-based method was proposed, the accuracy of the Chinese word segmentation task has made great progress. However, when dealing with out-of-vocabulary words, there is still a large error rate. We used a simple bidirectional LSTM architecture and a large-scale pretrained language model to generate high-quality contextualize character representations, which successfully reduced the weakness of the ambiguous meanings of each Chinese character that widely appears in Chinese characters, and hence effectively reduced OOV error rate. State-of-the-art performance is achieved on many datasets.
Tasks	Chinese Word Segmentation, Language Modelling
Published	2019-01-17
URL	http://arxiv.org/abs/1901.05816v1
PDF	http://arxiv.org/pdf/1901.05816v1.pdf
PWC	https://paperswithcode.com/paper/robust-chinese-word-segmentation-with
Repo	https://github.com/voidism/pywordseg
Framework	pytorch

Improving Adversarial Robustness via Promoting Ensemble Diversity


Title	Improving Adversarial Robustness via Promoting Ensemble Diversity
Authors	Tianyu Pang, Kun Xu, Chao Du, Ning Chen, Jun Zhu
Abstract	Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks. Many efforts have been devoted to enhancing the robustness of individual networks and then constructing a straightforward ensemble, e.g., by directly averaging the outputs, which ignores the interaction among networks. This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Technically, we define a new notion of ensemble diversity in the adversarial setting as the diversity among non-maximal predictions of individual members, and present an adaptive diversity promoting (ADP) regularizer to encourage the diversity, which leads to globally better robustness for the ensemble by making adversarial examples difficult to transfer among individual members. Our method is computationally efficient and compatible with the defense methods acting on individual networks. Empirical results on various datasets verify that our method can improve adversarial robustness while maintaining state-of-the-art accuracy on normal examples.
Tasks
Published	2019-01-25
URL	https://arxiv.org/abs/1901.08846v3
PDF	https://arxiv.org/pdf/1901.08846v3.pdf
PWC	https://paperswithcode.com/paper/improving-adversarial-robustness-via
Repo	https://github.com/P2333/Adaptive-Diversity-Promoting
Framework	tf

Self-Supervised Correspondence in Visuomotor Policy Learning


Title	Self-Supervised Correspondence in Visuomotor Policy Learning
Authors	Peter Florence, Lucas Manuelli, Russ Tedrake
Abstract	In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense visual correspondence training, and show this enables visuomotor policy learning with surprisingly high generalization performance with modest amounts of data: using imitation learning, we demonstrate extensive hardware validation on challenging manipulation tasks with as few as 50 demonstrations. Our learned policies can generalize across classes of objects, react to deformable object configurations, and manipulate textureless symmetrical objects in a variety of backgrounds, all with closed-loop, real-time vision-based policies. Simulated imitation learning experiments suggest that correspondence training offers sample complexity and generalization benefits compared to autoencoding and end-to-end training.
Tasks	Imitation Learning
Published	2019-09-16
URL	https://arxiv.org/abs/1909.06933v1
PDF	https://arxiv.org/pdf/1909.06933v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-correspondence-in-visuomotor
Repo	https://github.com/peteflorence/visuomotor_correspondence
Framework	pytorch

Image to Images Translation for Multi-Task Organ Segmentation and Bone Suppression in Chest X-Ray Radiography


Title	Image to Images Translation for Multi-Task Organ Segmentation and Bone Suppression in Chest X-Ray Radiography
Authors	Mohammad Eslami, Solale Tabarestani, Shadi Albarqouni, Ehsan Adeli, Nassir Navab, Malek Adjouadi
Abstract	Chest X-ray radiography is one of the earliest medical imaging technologies and remains one of the most widely-used for diagnosis, screening, and treatment follow up of diseases related to lungs and heart. The literature in this field of research reports many interesting studies dealing with the challenging tasks of bone suppression and organ segmentation but performed separately, limiting any learning that comes with the consolidation of parameters that could optimize both processes. This study, and for the first time, introduces a multitask deep learning model that generates simultaneously the bone-suppressed image and the organ-segmented image, enhancing the accuracy of tasks, minimizing the number of parameters needed by the model and optimizing the processing time, all by exploiting the interplay between the network parameters to benefit the performance of both tasks. The architectural design of this model, which relies on a conditional generative adversarial network, reveals the process on how the well-established pix2pix network (image-to-image network) is modified to fit the need for multitasking and extending it to the new image-to-images architecture. The developed source code of this multitask model is shared publicly on Github as the first attempt for providing the two-task pix2pix extension, a supervised/paired/aligned/registered image-to-images translation which would be useful in many multitask applications. Dilated convolutions are also used to improve the results through a more effective receptive field assessment. The comparison with state-of-the-art algorithms along with ablation study and a demonstration video are provided to evaluate efficacy and gauge the merits of the proposed approach.
Tasks	Decision Making
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10089v2
PDF	https://arxiv.org/pdf/1906.10089v2.pdf
PWC	https://paperswithcode.com/paper/image-to-images-translation-for-multi-task
Repo	https://github.com/mohaEs/image-to-images-translation
Framework	tf

PadChest: A large chest x-ray image dataset with multi-label annotated reports


Title	PadChest: A large chest x-ray image dataset with multi-label annotated reports
Authors	Aurelia Bustos, Antonio Pertusa, Jose-Maria Salinas, Maria de la Iglesia-Vayá
Abstract	We present a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan Hospital (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography. The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology. Of these reports, 27% were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms. The labels generated were then validated in an independent test set achieving a 0.93 Micro-F1 score. To the best of our knowledge, this is one of the largest public chest x-ray database suitable for training supervised models concerning radiographs, and the first to contain radiographic reports in Spanish. The PadChest dataset can be downloaded from http://bimcv.cipf.es/bimcv-projects/padchest/.
Tasks
Published	2019-01-22
URL	http://arxiv.org/abs/1901.07441v2
PDF	http://arxiv.org/pdf/1901.07441v2.pdf
PWC	https://paperswithcode.com/paper/padchest-a-large-chest-x-ray-image-dataset
Repo	https://github.com/auriml/Rx-thorax-automatic-captioning
Framework	none

Mapped Convolutions


Title	Mapped Convolutions
Authors	Marc Eder, True Price, Thanh Vu, Akash Bapat, Jan-Michael Frahm
Abstract	We present a versatile formulation of the convolution operation that we term a “mapped convolution.” The standard convolution operation implicitly samples the pixel grid and computes a weighted sum. Our mapped convolution decouples these two components, freeing the operation from the confines of the image grid and allowing the kernel to process any type of structured data. As a test case, we demonstrate its use by applying it to dense inference on spherical data. We perform an in-depth study of existing spherical image convolution methods and propose an improved sampling method for equirectangular images. Then, we discuss the impact of data discretization when deriving a sampling function, highlighting drawbacks of the cube map representation for spherical data. Finally, we illustrate how mapped convolutions enable us to convolve directly on a mesh by projecting the spherical image onto a geodesic grid and training on the textured mesh. This method exceeds the state of the art for spherical depth estimation by nearly 17%. Our findings suggest that mapped convolutions can be instrumental in expanding the application scope of convolutional neural networks.
Tasks	Depth Estimation
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11096v1
PDF	https://arxiv.org/pdf/1906.11096v1.pdf
PWC	https://paperswithcode.com/paper/mapped-convolutions
Repo	https://github.com/meder411/MappedConvolutions
Framework	pytorch

HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds


Title	HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds
Authors	Xiuye Gu, Yijie Wang, Chongruo wu, Yong-Jae lee, Panqu Wang
Abstract	We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds. Inspired by Bilateral Convolutional Layers (BCL), we propose novel DownBCL, UpBCL, and CorrBCL operations that restore structural information from unstructured point clouds, and fuse information from two consecutive point clouds. Operating on discrete and sparse permutohedral lattice points, our architectural design is parsimonious in computational cost. Our model can efficiently process a pair of point cloud frames at once with a maximum of 86K points per frame. Our approach achieves state-of-the-art performance on the FlyingThings3D and KITTI Scene Flow 2015 datasets. Moreover, trained on synthetic data, our approach shows great generalization ability on real-world data and on different point densities without fine-tuning.
Tasks	Scene Flow Estimation
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05332v1
PDF	https://arxiv.org/pdf/1906.05332v1.pdf
PWC	https://paperswithcode.com/paper/hplflownet-hierarchical-permutohedral-lattice-1
Repo	https://github.com/laoreja/HPLFlowNet
Framework	pytorch

Kernel computations from large-scale random features obtained by Optical Processing Units


Title	Kernel computations from large-scale random features obtained by Optical Processing Units
Authors	Ruben Ohana, Jonas Wacker, Jonathan Dong, Sébastien Marmin, Florent Krzakala, Maurizio Filippone, Laurent Daudet
Abstract	Approximating kernel functions with random features (RFs)has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain. More specifically, the OPU performs the multiplication of input vectors by a large random matrix with complex-valued i.i.d. Gaussian entries, followed by the application of an element-wise squared absolute value operation - this last nonlinearity being intrinsic to the sensing process. In this paper, we show that this operation results in a dot-product kernel that has connections to the polynomial kernel, and we extend this computation to arbitrary powers of the feature map. Experiments demonstrate that the OPU kernel and its RF approximation achieve competitive performance in applications using kernel ridge regression and transfer learning for image classification. Crucially, thanks to the use of the OPU, these results are obtained with time and energy savings.
Tasks	Image Classification, Transfer Learning
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09880v2
PDF	https://arxiv.org/pdf/1910.09880v2.pdf
PWC	https://paperswithcode.com/paper/kernel-computations-from-large-scale-random
Repo	https://github.com/joneswack/opu-kernel-experiments
Framework	pytorch

Generative Image Translation for Data Augmentation in Colorectal Histopathology Images


Title	Generative Image Translation for Data Augmentation in Colorectal Histopathology Images
Authors	Jerry Wei, Arief Suriawinata, Louis Vaickus, Bing Ren, Xiaoying Liu, Jason Wei, Saeed Hassanpour
Abstract	We present an image translation approach to generate augmented data for mitigating data imbalances in a dataset of histopathology images of colorectal polyps, adenomatous tumors that can lead to colorectal cancer if left untreated. By applying cycle-consistent generative adversarial networks (CycleGANs) to a source domain of normal colonic mucosa images, we generate synthetic colorectal polyp images that belong to diagnostically less common polyp classes. Generated images maintain the general structure of their source image but exhibit adenomatous features that can be enhanced with our proposed filtration module, called Path-Rank-Filter. We evaluate the quality of generated images through Turing tests with four gastrointestinal pathologists, finding that at least two of the four pathologists could not identify generated images at a statistically significant level. Finally, we demonstrate that using CycleGAN-generated images to augment training data improves the AUC of a convolutional neural network for detecting sessile serrated adenomas by over 10%, suggesting that our approach might warrant further research for other histopathology image classification tasks.
Tasks	Data Augmentation, Image Classification
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05827v1
PDF	https://arxiv.org/pdf/1910.05827v1.pdf
PWC	https://paperswithcode.com/paper/generative-image-translation-for-data-1
Repo	https://github.com/BMIRDS/HistoGAN
Framework	pytorch