February 1, 2020

2957 words 14 mins read

Paper Group AWR 336

Paper Group AWR 336

Hamiltonian Neural Networks. Passage Re-ranking with BERT. FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation. A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet. Planning with Goal-Conditioned Policies. Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool …

Hamiltonian Neural Networks

Title Hamiltonian Neural Networks
Authors Sam Greydanus, Misko Dzamba, Jason Yosinski
Abstract Even though neural networks enjoy widespread use, they still struggle to learn the basic laws of physics. How might we endow them with better inductive biases? In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner. We evaluate our models on problems where conservation of energy is important, including the two-body problem and pixel observations of a pendulum. Our model trains faster and generalizes better than a regular neural network. An interesting side effect is that our model is perfectly reversible in time.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01563v3
PDF https://arxiv.org/pdf/1906.01563v3.pdf
PWC https://paperswithcode.com/paper/hamiltonian-neural-networks
Repo https://github.com/greydanus/hamiltonian-nn
Framework pytorch

Passage Re-ranking with BERT

Title Passage Re-ranking with BERT
Authors Rodrigo Nogueira, Kyunghyun Cho
Abstract Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative) in MRR@10. The code to reproduce our results is available at https://github.com/nyu-dl/dl4marco-bert
Tasks Passage Re-Ranking
Published 2019-01-13
URL http://arxiv.org/abs/1901.04085v4
PDF http://arxiv.org/pdf/1901.04085v4.pdf
PWC https://paperswithcode.com/paper/passage-re-ranking-with-bert
Repo https://github.com/nyu-dl/dl4marco-bert
Framework tf

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

Title FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation
Authors Tianhan Wei, Xiang Li, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang
Abstract Over the past few years, we have witnessed the success of deep learning in image recognition thanks to the availability of large-scale human-annotated datasets such as PASCAL VOC, ImageNet, and COCO. Although these datasets have covered a wide range of object categories, there are still a significant number of objects that are not included. Can we perform the same task without a lot of human annotations? In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our dataset contains significant number of objects that have never been seen or annotated in previous datasets, such as tiny daily objects, merchandise, cartoon characters, logos, etc. We build our baseline model using standard backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise, we found that training our model from scratch using FSS-1000 achieves comparable and even better results than training with weights pre-trained by ImageNet which is more than 100 times larger than FSS-1000. Both our approach and dataset are simple, effective, and easily extensible to learn segmentation of new object classes given very few annotated training examples. Dataset is available at https://github.com/HKUSTCV/FSS-1000.
Tasks Few-Shot Semantic Segmentation, Semantic Segmentation
Published 2019-07-29
URL https://arxiv.org/abs/1907.12347v1
PDF https://arxiv.org/pdf/1907.12347v1.pdf
PWC https://paperswithcode.com/paper/fss-1000-a-1000-class-dataset-for-few-shot
Repo https://github.com/HKUSTCV/FSS-1000
Framework pytorch

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

Title A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Authors Jean-Marc Valin, Jan Skoglund
Abstract Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we present a low-bitrate neural vocoder based on the LPCNet model. The use of linear prediction and sparse recurrent networks makes it possible to achieve real-time operation on general-purpose hardware. We demonstrate that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate. This opens the way for new codec designs based on neural synthesis models.
Tasks Speech Synthesis
Published 2019-03-28
URL https://arxiv.org/abs/1903.12087v2
PDF https://arxiv.org/pdf/1903.12087v2.pdf
PWC https://paperswithcode.com/paper/a-real-time-wideband-neural-vocoder-at-16-kbs
Repo https://github.com/mozilla/LPCNet
Framework none

Planning with Goal-Conditioned Policies

Title Planning with Goal-Conditioned Policies
Authors Soroush Nasiriany, Vitchyr H. Pong, Steven Lin, Sergey Levine
Abstract Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors. However, planning requires suitable abstractions for the states and transitions, which typically need to be designed by hand. In contrast, model-free reinforcement learning (RL) can acquire behaviors from low-level inputs directly, but often struggles with temporally extended tasks. Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches? We show that goal-conditioned policies learned with RL can be incorporated into planning, so that a planner can focus on which states to reach, rather than how those states are reached. However, with complex state observations such as images, not all inputs represent valid states. We therefore also propose using a latent variable model to compactly represent the set of valid states for the planner, so that the policies provide an abstraction of actions, and the latent variable model provides an abstraction of states. We compare our method with planning-based and model-free methods and find that our method significantly outperforms prior work when evaluated on image-based robot navigation and manipulation tasks that require non-greedy, multi-staged behavior.
Tasks Decision Making, Robot Navigation
Published 2019-11-19
URL https://arxiv.org/abs/1911.08453v1
PDF https://arxiv.org/pdf/1911.08453v1.pdf
PWC https://paperswithcode.com/paper/planning-with-goal-conditioned-policies-1
Repo https://github.com/snasiriany/leap
Framework none

Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool

Title Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool
Authors Ludovic Venet, Sarthak Pati, Paul Yushkevich, Spyridon Bakas
Abstract Variously stained histology slices are routinely used by pathologists to assess extracted tissue samples from various anatomical sites and determine the presence or extent of a disease. Evaluation of sequential slides is expected to enable a better understanding of the spatial arrangement and growth patterns of cells and vessels. In this paper we present a practical two-step approach based on diffeomorphic registration to align digitized sequential histopathology stained slides to each other, starting with an initial affine step followed by the estimation of a detailed deformation field.
Tasks
Published 2019-04-26
URL http://arxiv.org/abs/1904.11929v1
PDF http://arxiv.org/pdf/1904.11929v1.pdf
PWC https://paperswithcode.com/paper/accurate-and-robust-alignment-of-variable
Repo https://github.com/CBICA/HistoReg
Framework none

DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News

Title DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News
Authors Xinyi Li, Yinchuan Li, Hongyang Yang, Liuqing Yang, Xiao-Yang Liu
Abstract Stock price prediction is important for value investments in the stock market. In particular, short-term prediction that exploits financial news articles is promising in recent years. In this paper, we propose a novel deep neural network DP-LSTM for stock price prediction, which incorporates the news articles as hidden information and integrates difference news sources through the differential privacy mechanism. First, based on the autoregressive moving average model (ARMA), a sentiment-ARMA is formulated by taking into consideration the information of financial news articles in the model. Then, an LSTM-based deep neural network is designed, which consists of three components: LSTM, VADER model and differential privacy (DP) mechanism. The proposed DP-LSTM scheme can reduce prediction errors and increase the robustness. Extensive experiments on S&P 500 stocks show that (i) the proposed DP-LSTM achieves 0.32% improvement in mean MPA of prediction result, and (ii) for the prediction of the market index S&P 500, we achieve up to 65.79% improvement in MSE.
Tasks Stock Prediction, Stock Price Prediction
Published 2019-12-20
URL https://arxiv.org/abs/1912.10806v1
PDF https://arxiv.org/pdf/1912.10806v1.pdf
PWC https://paperswithcode.com/paper/dp-lstm-differential-privacy-inspired-lstm
Repo https://github.com/Xinyi6/DP-LSTM-Differential-Privacy-inspired-LSTM-for-Stock-Prediction-Using-Financial-News
Framework tf

advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch

Title advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch
Authors Gavin Weiguang Ding, Luyu Wang, Xiaomeng Jin
Abstract advertorch is a toolbox for adversarial robustness research. It contains various implementations for attacks, defenses and robust training methods. advertorch is built on PyTorch (Paszke et al., 2017), and leverages the advantages of the dynamic computational graph to provide concise and efficient reference implementations. The code is licensed under the LGPL license and is open sourced at https://github.com/BorealisAI/advertorch .
Tasks Adversarial Attack, Adversarial Defense
Published 2019-02-20
URL http://arxiv.org/abs/1902.07623v1
PDF http://arxiv.org/pdf/1902.07623v1.pdf
PWC https://paperswithcode.com/paper/advertorch-v01-an-adversarial-robustness
Repo https://github.com/BorealisAI/advertorch
Framework pytorch

Learnable Gated Temporal Shift Module for Deep Video Inpainting

Title Learnable Gated Temporal Shift Module for Deep Video Inpainting
Authors Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu
Abstract How to efficiently utilize temporal information to recover videos in a consistent way is the main issue for video inpainting problems. Conventional 2D CNNs have achieved good performance on image inpainting but often lead to temporally inconsistent results where frames will flicker when applied to videos (see https://www.youtube.com/watch?v=87Vh1HDBjD0&list=PLPoVtv-xp_dL5uckIzz1PKwNjg1yI0I94&index=1); 3D CNNs can capture temporal information but are computationally intensive and hard to train. In this paper, we present a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions. LGTSM is designed to let 2D convolutions make use of neighboring frames more efficiently, which is crucial for video inpainting. Specifically, in each layer, LGTSM learns to shift some channels to its temporal neighbors so that 2D convolutions could be enhanced to handle temporal information. Meanwhile, a gated convolution is applied to the layer to identify the masked areas that are poisoning for conventional convolutions. On the FaceForensics and Free-form Video Inpainting (FVI) dataset, our model achieves state-of-the-art results with simply 33% of parameters and inference time.
Tasks Image Inpainting, Video Inpainting
Published 2019-07-02
URL https://arxiv.org/abs/1907.01131v2
PDF https://arxiv.org/pdf/1907.01131v2.pdf
PWC https://paperswithcode.com/paper/learnable-gated-temporal-shift-module-for
Repo https://github.com/amjltc295/Free-Form-Video-Inpainting
Framework pytorch

Learning to combine primitive skills: A step towards versatile robotic manipulation

Title Learning to combine primitive skills: A step towards versatile robotic manipulation
Authors Robin Strudel, Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Josef Sivic, Cordelia Schmid
Abstract Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. Traditional task and motion planning (TAMP) methods can solve complex tasks but require full state observability and are not adapted to dynamic scene changes. Recent learning methods can operate directly on visual inputs but typically require many demonstrations and/or task-specific reward engineering. In this work we aim to overcome previous limitations and propose a reinforcement learning (RL) approach to task planning that learns to combine primitive skills. First, compared to previous learning methods, our approach requires neither intermediate rewards nor complete task demonstrations during training. Second, we demonstrate the versatility of our vision-based task planning in challenging settings with temporary occlusions and dynamic scene changes. Third, we propose an efficient training of basic skills from few synthetic demonstrations by exploring recent CNN architectures and data augmentation. Notably, while all of our policies are learned on visual inputs in simulated environments, we demonstrate the successful transfer and high success rates when applying such policies to manipulation tasks on a real UR5 robotic arm.
Tasks Data Augmentation, Imitation Learning, Motion Planning
Published 2019-08-02
URL https://arxiv.org/abs/1908.00722v2
PDF https://arxiv.org/pdf/1908.00722v2.pdf
PWC https://paperswithcode.com/paper/combining-learned-skills-and-reinforcement
Repo https://github.com/rstrudel/rlbc
Framework pytorch

Omnidirectional Scene Text Detection with Sequential-free Box Discretization

Title Omnidirectional Scene Text Detection with Sequential-free Box Discretization
Authors Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng Wang
Abstract Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability. Source code: https://github.com/Yuliang-Liu/Box_Discretization_Network.
Tasks Scene Text Detection
Published 2019-06-06
URL https://arxiv.org/abs/1906.02371v3
PDF https://arxiv.org/pdf/1906.02371v3.pdf
PWC https://paperswithcode.com/paper/omnidirectional-scene-text-detection-with
Repo https://github.com/Yuliang-Liu/Box_Discretization_Network
Framework pytorch

Enforcing geometric constraints of virtual normal for depth prediction

Title Enforcing geometric constraints of virtual normal for depth prediction
Authors Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan
Abstract Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we can considerably improve the depth prediction accuracy. Significantly, the byproduct of this predicted depth being sufficiently accurate is that we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly from the depth, eliminating the necessity of training new sub-models as was previously done. Experiments on two benchmarks: NYU Depth-V2 and KITTI demonstrate the effectiveness of our method and state-of-the-art performance.
Tasks Depth Estimation, Monocular Depth Estimation
Published 2019-07-29
URL https://arxiv.org/abs/1907.12209v2
PDF https://arxiv.org/pdf/1907.12209v2.pdf
PWC https://paperswithcode.com/paper/enforcing-geometric-constraints-of-virtual
Repo https://github.com/YvanYin/VNL_Monocular_Depth_Prediction
Framework pytorch

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms

Title Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms
Authors Amey Agrawal, Abhishek Dixit, Namrata Shettar, Darshil Kapadia, Rohit Karlupia, Vikram Agrawal, Rajat Gupta
Abstract In many software applications, logs serve as the only interface between the application and the developer. However, navigating through the logs of long-running applications is often challenging. Logs from previously successful application runs can be leveraged to automatically identify errors and provide users with only the logs that are relevant to the debugging process. We describe a privacy preserving framework which can be employed by Platform as a Service (PaaS) providers to utilize the user logs generated on the platform while protecting the potentially sensitive logged data. Further, in order to accurately and scalably parse log lines, we present a distributed log parsing algorithm which leverages Locality Sensitive Hashing (LSH). We outperform the state-of-the-art on multiple datasets. We further demonstrate the scalability of Delog on publicly available Thunderbird log dataset with close to 27,000 unique patterns and 211 million lines.
Tasks
Published 2019-02-13
URL https://arxiv.org/abs/1902.04843v3
PDF https://arxiv.org/pdf/1902.04843v3.pdf
PWC https://paperswithcode.com/paper/delog-a-privacy-preserving-log-filtering
Repo https://github.com/qubole/qubole-log-datasets
Framework none

Gromov-Wasserstein Learning for Graph Matching and Node Embedding

Title Gromov-Wasserstein Learning for Graph Matching and Node Embedding
Authors Hongteng Xu, Dixin Luo, Hongyuan Zha, Lawrence Carin
Abstract A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the optimal transport, the distance of which not only reflects the topological structure of each graph but also yields the correspondence across the graphs. These two learning steps are mutually-beneficial, and are unified here by minimizing the Gromov-Wasserstein discrepancy with structural regularizers. This framework leads to an optimization problem that is solved by a proximal point method. We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches.
Tasks Graph Matching
Published 2019-01-17
URL https://arxiv.org/abs/1901.06003v2
PDF https://arxiv.org/pdf/1901.06003v2.pdf
PWC https://paperswithcode.com/paper/gromov-wasserstein-learning-for-graph
Repo https://github.com/HongtengXu/gwl
Framework pytorch

Generalizing Monocular 3D Human Pose Estimation in the Wild

Title Generalizing Monocular 3D Human Pose Estimation in the Wild
Authors Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, Jimmy S. Ren
Abstract The availability of the large-scale labeled 3D poses in the Human3.6M dataset plays an important role in advancing the algorithms for 3D human pose estimation from a still image. We observe that recent innovation in this area mainly focuses on new techniques that explicitly address the generalization issue when using this dataset, because this database is constructed in a highly controlled environment with limited human subjects and background variations. Despite such efforts, we can show that the results of the current methods are still error-prone especially when tested against the images taken in-the-wild. In this paper, we aim to tackle this problem from a different perspective. We propose a principled approach to generate high quality 3D pose ground truth given any in-the-wild image with a person inside. We achieve this by first devising a novel stereo inspired neural network to directly map any 2D pose to high quality 3D counterpart. We then perform a carefully designed geometric searching scheme to further refine the joints. Based on this scheme, we build a large-scale dataset with 400,000 in-the-wild images and their corresponding 3D pose ground truth. This enables the training of a high quality neural network model, without specialized training scheme and auxiliary loss function, which performs favorably against the state-of-the-art 3D pose estimation methods. We also evaluate the generalization ability of our model both quantitatively and qualitatively. Results show that our approach convincingly outperforms the previous methods. We make our dataset and code publicly available.
Tasks 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published 2019-04-11
URL http://arxiv.org/abs/1904.05512v1
PDF http://arxiv.org/pdf/1904.05512v1.pdf
PWC https://paperswithcode.com/paper/generalizing-monocular-3d-human-pose
Repo https://github.com/llcshappy/Monocular-3D-Human-Pose
Framework tf
comments powered by Disqus