February 1, 2020

2957 words 14 mins read

Paper Group AWR 336

Hamiltonian Neural Networks. Passage Re-ranking with BERT. FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation. A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet. Planning with Goal-Conditioned Policies. Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool …

Hamiltonian Neural Networks


Title	Hamiltonian Neural Networks
Authors	Sam Greydanus, Misko Dzamba, Jason Yosinski
Abstract	Even though neural networks enjoy widespread use, they still struggle to learn the basic laws of physics. How might we endow them with better inductive biases? In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner. We evaluate our models on problems where conservation of energy is important, including the two-body problem and pixel observations of a pendulum. Our model trains faster and generalizes better than a regular neural network. An interesting side effect is that our model is perfectly reversible in time.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01563v3
PDF	https://arxiv.org/pdf/1906.01563v3.pdf
PWC	https://paperswithcode.com/paper/hamiltonian-neural-networks
Repo	https://github.com/greydanus/hamiltonian-nn
Framework	pytorch

Passage Re-ranking with BERT


Title	Passage Re-ranking with BERT
Authors	Rodrigo Nogueira, Kyunghyun Cho
Abstract	Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative) in MRR@10. The code to reproduce our results is available at https://github.com/nyu-dl/dl4marco-bert
Tasks	Passage Re-Ranking
Published	2019-01-13
URL	http://arxiv.org/abs/1901.04085v4
PDF	http://arxiv.org/pdf/1901.04085v4.pdf
PWC	https://paperswithcode.com/paper/passage-re-ranking-with-bert
Repo	https://github.com/nyu-dl/dl4marco-bert
Framework	tf

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation


Title	FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation
Authors	Tianhan Wei, Xiang Li, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang
Abstract	Over the past few years, we have witnessed the success of deep learning in image recognition thanks to the availability of large-scale human-annotated datasets such as PASCAL VOC, ImageNet, and COCO. Although these datasets have covered a wide range of object categories, there are still a significant number of objects that are not included. Can we perform the same task without a lot of human annotations? In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our dataset contains significant number of objects that have never been seen or annotated in previous datasets, such as tiny daily objects, merchandise, cartoon characters, logos, etc. We build our baseline model using standard backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise, we found that training our model from scratch using FSS-1000 achieves comparable and even better results than training with weights pre-trained by ImageNet which is more than 100 times larger than FSS-1000. Both our approach and dataset are simple, effective, and easily extensible to learn segmentation of new object classes given very few annotated training examples. Dataset is available at https://github.com/HKUSTCV/FSS-1000.
Tasks	Few-Shot Semantic Segmentation, Semantic Segmentation
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12347v1
PDF	https://arxiv.org/pdf/1907.12347v1.pdf
PWC	https://paperswithcode.com/paper/fss-1000-a-1000-class-dataset-for-few-shot
Repo	https://github.com/HKUSTCV/FSS-1000
Framework	pytorch

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet


Title	A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Authors	Jean-Marc Valin, Jan Skoglund
Abstract	Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we present a low-bitrate neural vocoder based on the LPCNet model. The use of linear prediction and sparse recurrent networks makes it possible to achieve real-time operation on general-purpose hardware. We demonstrate that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate. This opens the way for new codec designs based on neural synthesis models.
Tasks	Speech Synthesis
Published	2019-03-28
URL	https://arxiv.org/abs/1903.12087v2
PDF	https://arxiv.org/pdf/1903.12087v2.pdf
PWC	https://paperswithcode.com/paper/a-real-time-wideband-neural-vocoder-at-16-kbs
Repo	https://github.com/mozilla/LPCNet
Framework	none

Planning with Goal-Conditioned Policies


Title	Planning with Goal-Conditioned Policies
Authors	Soroush Nasiriany, Vitchyr H. Pong, Steven Lin, Sergey Levine
Abstract	Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors. However, planning requires suitable abstractions for the states and transitions, which typically need to be designed by hand. In contrast, model-free reinforcement learning (RL) can acquire behaviors from low-level inputs directly, but often struggles with temporally extended tasks. Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches? We show that goal-conditioned policies learned with RL can be incorporated into planning, so that a planner can focus on which states to reach, rather than how those states are reached. However, with complex state observations such as images, not all inputs represent valid states. We therefore also propose using a latent variable model to compactly represent the set of valid states for the planner, so that the policies provide an abstraction of actions, and the latent variable model provides an abstraction of states. We compare our method with planning-based and model-free methods and find that our method significantly outperforms prior work when evaluated on image-based robot navigation and manipulation tasks that require non-greedy, multi-staged behavior.
Tasks	Decision Making, Robot Navigation
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08453v1
PDF	https://arxiv.org/pdf/1911.08453v1.pdf
PWC	https://paperswithcode.com/paper/planning-with-goal-conditioned-policies-1
Repo	https://github.com/snasiriany/leap
Framework	none

Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool


Title	Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool
Authors	Ludovic Venet, Sarthak Pati, Paul Yushkevich, Spyridon Bakas
Abstract	Variously stained histology slices are routinely used by pathologists to assess extracted tissue samples from various anatomical sites and determine the presence or extent of a disease. Evaluation of sequential slides is expected to enable a better understanding of the spatial arrangement and growth patterns of cells and vessels. In this paper we present a practical two-step approach based on diffeomorphic registration to align digitized sequential histopathology stained slides to each other, starting with an initial affine step followed by the estimation of a detailed deformation field.
Tasks
Published	2019-04-26
URL	http://arxiv.org/abs/1904.11929v1
PDF	http://arxiv.org/pdf/1904.11929v1.pdf
PWC	https://paperswithcode.com/paper/accurate-and-robust-alignment-of-variable
Repo	https://github.com/CBICA/HistoReg
Framework	none

DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News


Title	DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News
Authors	Xinyi Li, Yinchuan Li, Hongyang Yang, Liuqing Yang, Xiao-Yang Liu
Abstract	Stock price prediction is important for value investments in the stock market. In particular, short-term prediction that exploits financial news articles is promising in recent years. In this paper, we propose a novel deep neural network DP-LSTM for stock price prediction, which incorporates the news articles as hidden information and integrates difference news sources through the differential privacy mechanism. First, based on the autoregressive moving average model (ARMA), a sentiment-ARMA is formulated by taking into consideration the information of financial news articles in the model. Then, an LSTM-based deep neural network is designed, which consists of three components: LSTM, VADER model and differential privacy (DP) mechanism. The proposed DP-LSTM scheme can reduce prediction errors and increase the robustness. Extensive experiments on S&P 500 stocks show that (i) the proposed DP-LSTM achieves 0.32% improvement in mean MPA of prediction result, and (ii) for the prediction of the market index S&P 500, we achieve up to 65.79% improvement in MSE.
Tasks	Stock Prediction, Stock Price Prediction
Published	2019-12-20
URL	https://arxiv.org/abs/1912.10806v1
PDF	https://arxiv.org/pdf/1912.10806v1.pdf
PWC	https://paperswithcode.com/paper/dp-lstm-differential-privacy-inspired-lstm
Repo	https://github.com/Xinyi6/DP-LSTM-Differential-Privacy-inspired-LSTM-for-Stock-Prediction-Using-Financial-News
Framework	tf

advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch


Title	advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch
Authors	Gavin Weiguang Ding, Luyu Wang, Xiaomeng Jin
Abstract	advertorch is a toolbox for adversarial robustness research. It contains various implementations for attacks, defenses and robust training methods. advertorch is built on PyTorch (Paszke et al., 2017), and leverages the advantages of the dynamic computational graph to provide concise and efficient reference implementations. The code is licensed under the LGPL license and is open sourced at https://github.com/BorealisAI/advertorch .
Tasks	Adversarial Attack, Adversarial Defense
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07623v1
PDF	http://arxiv.org/pdf/1902.07623v1.pdf
PWC	https://paperswithcode.com/paper/advertorch-v01-an-adversarial-robustness
Repo	https://github.com/BorealisAI/advertorch
Framework	pytorch

Learnable Gated Temporal Shift Module for Deep Video Inpainting


Title	Learnable Gated Temporal Shift Module for Deep Video Inpainting
Authors	Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu
Abstract	How to efficiently utilize temporal information to recover videos in a consistent way is the main issue for video inpainting problems. Conventional 2D CNNs have achieved good performance on image inpainting but often lead to temporally inconsistent results where frames will flicker when applied to videos (see https://www.youtube.com/watch?v=87Vh1HDBjD0&list=PLPoVtv-xp_dL5uckIzz1PKwNjg1yI0I94&index=1); 3D CNNs can capture temporal information but are computationally intensive and hard to train. In this paper, we present a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions. LGTSM is designed to let 2D convolutions make use of neighboring frames more efficiently, which is crucial for video inpainting. Specifically, in each layer, LGTSM learns to shift some channels to its temporal neighbors so that 2D convolutions could be enhanced to handle temporal information. Meanwhile, a gated convolution is applied to the layer to identify the masked areas that are poisoning for conventional convolutions. On the FaceForensics and Free-form Video Inpainting (FVI) dataset, our model achieves state-of-the-art results with simply 33% of parameters and inference time.
Tasks	Image Inpainting, Video Inpainting
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01131v2
PDF	https://arxiv.org/pdf/1907.01131v2.pdf
PWC	https://paperswithcode.com/paper/learnable-gated-temporal-shift-module-for
Repo	https://github.com/amjltc295/Free-Form-Video-Inpainting
Framework	pytorch

Learning to combine primitive skills: A step towards versatile robotic manipulation


Title	Learning to combine primitive skills: A step towards versatile robotic manipulation
Authors	Robin Strudel, Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Josef Sivic, Cordelia Schmid
Abstract	Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. Traditional task and motion planning (TAMP) methods can solve complex tasks but require full state observability and are not adapted to dynamic scene changes. Recent learning methods can operate directly on visual inputs but typically require many demonstrations and/or task-specific reward engineering. In this work we aim to overcome previous limitations and propose a reinforcement learning (RL) approach to task planning that learns to combine primitive skills. First, compared to previous learning methods, our approach requires neither intermediate rewards nor complete task demonstrations during training. Second, we demonstrate the versatility of our vision-based task planning in challenging settings with temporary occlusions and dynamic scene changes. Third, we propose an efficient training of basic skills from few synthetic demonstrations by exploring recent CNN architectures and data augmentation. Notably, while all of our policies are learned on visual inputs in simulated environments, we demonstrate the successful transfer and high success rates when applying such policies to manipulation tasks on a real UR5 robotic arm.
Tasks	Data Augmentation, Imitation Learning, Motion Planning
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00722v2
PDF	https://arxiv.org/pdf/1908.00722v2.pdf
PWC	https://paperswithcode.com/paper/combining-learned-skills-and-reinforcement
Repo	https://github.com/rstrudel/rlbc
Framework	pytorch

Omnidirectional Scene Text Detection with Sequential-free Box Discretization


Title	Omnidirectional Scene Text Detection with Sequential-free Box Discretization
Authors	Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng Wang
Abstract	Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability. Source code: https://github.com/Yuliang-Liu/Box_Discretization_Network.
Tasks	Scene Text Detection
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02371v3
PDF	https://arxiv.org/pdf/1906.02371v3.pdf
PWC	https://paperswithcode.com/paper/omnidirectional-scene-text-detection-with
Repo	https://github.com/Yuliang-Liu/Box_Discretization_Network
Framework	pytorch

Enforcing geometric constraints of virtual normal for depth prediction


Title	Enforcing geometric constraints of virtual normal for depth prediction
Authors	Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan
Abstract	Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we can considerably improve the depth prediction accuracy. Significantly, the byproduct of this predicted depth being sufficiently accurate is that we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly from the depth, eliminating the necessity of training new sub-models as was previously done. Experiments on two benchmarks: NYU Depth-V2 and KITTI demonstrate the effectiveness of our method and state-of-the-art performance.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12209v2
PDF	https://arxiv.org/pdf/1907.12209v2.pdf
PWC	https://paperswithcode.com/paper/enforcing-geometric-constraints-of-virtual
Repo	https://github.com/YvanYin/VNL_Monocular_Depth_Prediction
Framework	pytorch

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms


Title	Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms
Authors	Amey Agrawal, Abhishek Dixit, Namrata Shettar, Darshil Kapadia, Rohit Karlupia, Vikram Agrawal, Rajat Gupta
Abstract	In many software applications, logs serve as the only interface between the application and the developer. However, navigating through the logs of long-running applications is often challenging. Logs from previously successful application runs can be leveraged to automatically identify errors and provide users with only the logs that are relevant to the debugging process. We describe a privacy preserving framework which can be employed by Platform as a Service (PaaS) providers to utilize the user logs generated on the platform while protecting the potentially sensitive logged data. Further, in order to accurately and scalably parse log lines, we present a distributed log parsing algorithm which leverages Locality Sensitive Hashing (LSH). We outperform the state-of-the-art on multiple datasets. We further demonstrate the scalability of Delog on publicly available Thunderbird log dataset with close to 27,000 unique patterns and 211 million lines.
Tasks
Published	2019-02-13
URL	https://arxiv.org/abs/1902.04843v3
PDF	https://arxiv.org/pdf/1902.04843v3.pdf
PWC	https://paperswithcode.com/paper/delog-a-privacy-preserving-log-filtering
Repo	https://github.com/qubole/qubole-log-datasets
Framework	none

Gromov-Wasserstein Learning for Graph Matching and Node Embedding


Title	Gromov-Wasserstein Learning for Graph Matching and Node Embedding
Authors	Hongteng Xu, Dixin Luo, Hongyuan Zha, Lawrence Carin
Abstract	A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the optimal transport, the distance of which not only reflects the topological structure of each graph but also yields the correspondence across the graphs. These two learning steps are mutually-beneficial, and are unified here by minimizing the Gromov-Wasserstein discrepancy with structural regularizers. This framework leads to an optimization problem that is solved by a proximal point method. We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches.
Tasks	Graph Matching
Published	2019-01-17
URL	https://arxiv.org/abs/1901.06003v2
PDF	https://arxiv.org/pdf/1901.06003v2.pdf
PWC	https://paperswithcode.com/paper/gromov-wasserstein-learning-for-graph
Repo	https://github.com/HongtengXu/gwl
Framework	pytorch

Generalizing Monocular 3D Human Pose Estimation in the Wild


Title	Generalizing Monocular 3D Human Pose Estimation in the Wild
Authors	Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, Jimmy S. Ren
Abstract	The availability of the large-scale labeled 3D poses in the Human3.6M dataset plays an important role in advancing the algorithms for 3D human pose estimation from a still image. We observe that recent innovation in this area mainly focuses on new techniques that explicitly address the generalization issue when using this dataset, because this database is constructed in a highly controlled environment with limited human subjects and background variations. Despite such efforts, we can show that the results of the current methods are still error-prone especially when tested against the images taken in-the-wild. In this paper, we aim to tackle this problem from a different perspective. We propose a principled approach to generate high quality 3D pose ground truth given any in-the-wild image with a person inside. We achieve this by first devising a novel stereo inspired neural network to directly map any 2D pose to high quality 3D counterpart. We then perform a carefully designed geometric searching scheme to further refine the joints. Based on this scheme, we build a large-scale dataset with 400,000 in-the-wild images and their corresponding 3D pose ground truth. This enables the training of a high quality neural network model, without specialized training scheme and auxiliary loss function, which performs favorably against the state-of-the-art 3D pose estimation methods. We also evaluate the generalization ability of our model both quantitatively and qualitatively. Results show that our approach convincingly outperforms the previous methods. We make our dataset and code publicly available.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05512v1
PDF	http://arxiv.org/pdf/1904.05512v1.pdf
PWC	https://paperswithcode.com/paper/generalizing-monocular-3d-human-pose
Repo	https://github.com/llcshappy/Monocular-3D-Human-Pose
Framework	tf