Paper Group AWR 336
Hamiltonian Neural Networks. Passage Re-ranking with BERT. FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation. A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet. Planning with Goal-Conditioned Policies. Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool …
Hamiltonian Neural Networks
Title | Hamiltonian Neural Networks |
Authors | Sam Greydanus, Misko Dzamba, Jason Yosinski |
Abstract | Even though neural networks enjoy widespread use, they still struggle to learn the basic laws of physics. How might we endow them with better inductive biases? In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner. We evaluate our models on problems where conservation of energy is important, including the two-body problem and pixel observations of a pendulum. Our model trains faster and generalizes better than a regular neural network. An interesting side effect is that our model is perfectly reversible in time. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01563v3 |
https://arxiv.org/pdf/1906.01563v3.pdf | |
PWC | https://paperswithcode.com/paper/hamiltonian-neural-networks |
Repo | https://github.com/greydanus/hamiltonian-nn |
Framework | pytorch |
Passage Re-ranking with BERT
Title | Passage Re-ranking with BERT |
Authors | Rodrigo Nogueira, Kyunghyun Cho |
Abstract | Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative) in MRR@10. The code to reproduce our results is available at https://github.com/nyu-dl/dl4marco-bert |
Tasks | Passage Re-Ranking |
Published | 2019-01-13 |
URL | http://arxiv.org/abs/1901.04085v4 |
http://arxiv.org/pdf/1901.04085v4.pdf | |
PWC | https://paperswithcode.com/paper/passage-re-ranking-with-bert |
Repo | https://github.com/nyu-dl/dl4marco-bert |
Framework | tf |
FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation
Title | FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation |
Authors | Tianhan Wei, Xiang Li, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang |
Abstract | Over the past few years, we have witnessed the success of deep learning in image recognition thanks to the availability of large-scale human-annotated datasets such as PASCAL VOC, ImageNet, and COCO. Although these datasets have covered a wide range of object categories, there are still a significant number of objects that are not included. Can we perform the same task without a lot of human annotations? In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our dataset contains significant number of objects that have never been seen or annotated in previous datasets, such as tiny daily objects, merchandise, cartoon characters, logos, etc. We build our baseline model using standard backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise, we found that training our model from scratch using FSS-1000 achieves comparable and even better results than training with weights pre-trained by ImageNet which is more than 100 times larger than FSS-1000. Both our approach and dataset are simple, effective, and easily extensible to learn segmentation of new object classes given very few annotated training examples. Dataset is available at https://github.com/HKUSTCV/FSS-1000. |
Tasks | Few-Shot Semantic Segmentation, Semantic Segmentation |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12347v1 |
https://arxiv.org/pdf/1907.12347v1.pdf | |
PWC | https://paperswithcode.com/paper/fss-1000-a-1000-class-dataset-for-few-shot |
Repo | https://github.com/HKUSTCV/FSS-1000 |
Framework | pytorch |
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Title | A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet |
Authors | Jean-Marc Valin, Jan Skoglund |
Abstract | Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we present a low-bitrate neural vocoder based on the LPCNet model. The use of linear prediction and sparse recurrent networks makes it possible to achieve real-time operation on general-purpose hardware. We demonstrate that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate. This opens the way for new codec designs based on neural synthesis models. |
Tasks | Speech Synthesis |
Published | 2019-03-28 |
URL | https://arxiv.org/abs/1903.12087v2 |
https://arxiv.org/pdf/1903.12087v2.pdf | |
PWC | https://paperswithcode.com/paper/a-real-time-wideband-neural-vocoder-at-16-kbs |
Repo | https://github.com/mozilla/LPCNet |
Framework | none |
Planning with Goal-Conditioned Policies
Title | Planning with Goal-Conditioned Policies |
Authors | Soroush Nasiriany, Vitchyr H. Pong, Steven Lin, Sergey Levine |
Abstract | Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors. However, planning requires suitable abstractions for the states and transitions, which typically need to be designed by hand. In contrast, model-free reinforcement learning (RL) can acquire behaviors from low-level inputs directly, but often struggles with temporally extended tasks. Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches? We show that goal-conditioned policies learned with RL can be incorporated into planning, so that a planner can focus on which states to reach, rather than how those states are reached. However, with complex state observations such as images, not all inputs represent valid states. We therefore also propose using a latent variable model to compactly represent the set of valid states for the planner, so that the policies provide an abstraction of actions, and the latent variable model provides an abstraction of states. We compare our method with planning-based and model-free methods and find that our method significantly outperforms prior work when evaluated on image-based robot navigation and manipulation tasks that require non-greedy, multi-staged behavior. |
Tasks | Decision Making, Robot Navigation |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08453v1 |
https://arxiv.org/pdf/1911.08453v1.pdf | |
PWC | https://paperswithcode.com/paper/planning-with-goal-conditioned-policies-1 |
Repo | https://github.com/snasiriany/leap |
Framework | none |
Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool
Title | Accurate and Robust Alignment of Variable-stained Histologic Images Using a General-purpose Greedy Diffeomorphic Registration Tool |
Authors | Ludovic Venet, Sarthak Pati, Paul Yushkevich, Spyridon Bakas |
Abstract | Variously stained histology slices are routinely used by pathologists to assess extracted tissue samples from various anatomical sites and determine the presence or extent of a disease. Evaluation of sequential slides is expected to enable a better understanding of the spatial arrangement and growth patterns of cells and vessels. In this paper we present a practical two-step approach based on diffeomorphic registration to align digitized sequential histopathology stained slides to each other, starting with an initial affine step followed by the estimation of a detailed deformation field. |
Tasks | |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11929v1 |
http://arxiv.org/pdf/1904.11929v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-and-robust-alignment-of-variable |
Repo | https://github.com/CBICA/HistoReg |
Framework | none |
DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News
Title | DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News |
Authors | Xinyi Li, Yinchuan Li, Hongyang Yang, Liuqing Yang, Xiao-Yang Liu |
Abstract | Stock price prediction is important for value investments in the stock market. In particular, short-term prediction that exploits financial news articles is promising in recent years. In this paper, we propose a novel deep neural network DP-LSTM for stock price prediction, which incorporates the news articles as hidden information and integrates difference news sources through the differential privacy mechanism. First, based on the autoregressive moving average model (ARMA), a sentiment-ARMA is formulated by taking into consideration the information of financial news articles in the model. Then, an LSTM-based deep neural network is designed, which consists of three components: LSTM, VADER model and differential privacy (DP) mechanism. The proposed DP-LSTM scheme can reduce prediction errors and increase the robustness. Extensive experiments on S&P 500 stocks show that (i) the proposed DP-LSTM achieves 0.32% improvement in mean MPA of prediction result, and (ii) for the prediction of the market index S&P 500, we achieve up to 65.79% improvement in MSE. |
Tasks | Stock Prediction, Stock Price Prediction |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.10806v1 |
https://arxiv.org/pdf/1912.10806v1.pdf | |
PWC | https://paperswithcode.com/paper/dp-lstm-differential-privacy-inspired-lstm |
Repo | https://github.com/Xinyi6/DP-LSTM-Differential-Privacy-inspired-LSTM-for-Stock-Prediction-Using-Financial-News |
Framework | tf |
advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch
Title | advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch |
Authors | Gavin Weiguang Ding, Luyu Wang, Xiaomeng Jin |
Abstract | advertorch is a toolbox for adversarial robustness research. It contains various implementations for attacks, defenses and robust training methods. advertorch is built on PyTorch (Paszke et al., 2017), and leverages the advantages of the dynamic computational graph to provide concise and efficient reference implementations. The code is licensed under the LGPL license and is open sourced at https://github.com/BorealisAI/advertorch . |
Tasks | Adversarial Attack, Adversarial Defense |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07623v1 |
http://arxiv.org/pdf/1902.07623v1.pdf | |
PWC | https://paperswithcode.com/paper/advertorch-v01-an-adversarial-robustness |
Repo | https://github.com/BorealisAI/advertorch |
Framework | pytorch |
Learnable Gated Temporal Shift Module for Deep Video Inpainting
Title | Learnable Gated Temporal Shift Module for Deep Video Inpainting |
Authors | Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu |
Abstract | How to efficiently utilize temporal information to recover videos in a consistent way is the main issue for video inpainting problems. Conventional 2D CNNs have achieved good performance on image inpainting but often lead to temporally inconsistent results where frames will flicker when applied to videos (see https://www.youtube.com/watch?v=87Vh1HDBjD0&list=PLPoVtv-xp_dL5uckIzz1PKwNjg1yI0I94&index=1); 3D CNNs can capture temporal information but are computationally intensive and hard to train. In this paper, we present a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions. LGTSM is designed to let 2D convolutions make use of neighboring frames more efficiently, which is crucial for video inpainting. Specifically, in each layer, LGTSM learns to shift some channels to its temporal neighbors so that 2D convolutions could be enhanced to handle temporal information. Meanwhile, a gated convolution is applied to the layer to identify the masked areas that are poisoning for conventional convolutions. On the FaceForensics and Free-form Video Inpainting (FVI) dataset, our model achieves state-of-the-art results with simply 33% of parameters and inference time. |
Tasks | Image Inpainting, Video Inpainting |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01131v2 |
https://arxiv.org/pdf/1907.01131v2.pdf | |
PWC | https://paperswithcode.com/paper/learnable-gated-temporal-shift-module-for |
Repo | https://github.com/amjltc295/Free-Form-Video-Inpainting |
Framework | pytorch |
Learning to combine primitive skills: A step towards versatile robotic manipulation
Title | Learning to combine primitive skills: A step towards versatile robotic manipulation |
Authors | Robin Strudel, Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Josef Sivic, Cordelia Schmid |
Abstract | Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. Traditional task and motion planning (TAMP) methods can solve complex tasks but require full state observability and are not adapted to dynamic scene changes. Recent learning methods can operate directly on visual inputs but typically require many demonstrations and/or task-specific reward engineering. In this work we aim to overcome previous limitations and propose a reinforcement learning (RL) approach to task planning that learns to combine primitive skills. First, compared to previous learning methods, our approach requires neither intermediate rewards nor complete task demonstrations during training. Second, we demonstrate the versatility of our vision-based task planning in challenging settings with temporary occlusions and dynamic scene changes. Third, we propose an efficient training of basic skills from few synthetic demonstrations by exploring recent CNN architectures and data augmentation. Notably, while all of our policies are learned on visual inputs in simulated environments, we demonstrate the successful transfer and high success rates when applying such policies to manipulation tasks on a real UR5 robotic arm. |
Tasks | Data Augmentation, Imitation Learning, Motion Planning |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00722v2 |
https://arxiv.org/pdf/1908.00722v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-learned-skills-and-reinforcement |
Repo | https://github.com/rstrudel/rlbc |
Framework | pytorch |
Omnidirectional Scene Text Detection with Sequential-free Box Discretization
Title | Omnidirectional Scene Text Detection with Sequential-free Box Discretization |
Authors | Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng Wang |
Abstract | Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability. Source code: https://github.com/Yuliang-Liu/Box_Discretization_Network. |
Tasks | Scene Text Detection |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02371v3 |
https://arxiv.org/pdf/1906.02371v3.pdf | |
PWC | https://paperswithcode.com/paper/omnidirectional-scene-text-detection-with |
Repo | https://github.com/Yuliang-Liu/Box_Discretization_Network |
Framework | pytorch |
Enforcing geometric constraints of virtual normal for depth prediction
Title | Enforcing geometric constraints of virtual normal for depth prediction |
Authors | Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan |
Abstract | Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we can considerably improve the depth prediction accuracy. Significantly, the byproduct of this predicted depth being sufficiently accurate is that we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly from the depth, eliminating the necessity of training new sub-models as was previously done. Experiments on two benchmarks: NYU Depth-V2 and KITTI demonstrate the effectiveness of our method and state-of-the-art performance. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12209v2 |
https://arxiv.org/pdf/1907.12209v2.pdf | |
PWC | https://paperswithcode.com/paper/enforcing-geometric-constraints-of-virtual |
Repo | https://github.com/YvanYin/VNL_Monocular_Depth_Prediction |
Framework | pytorch |
Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms
Title | Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms |
Authors | Amey Agrawal, Abhishek Dixit, Namrata Shettar, Darshil Kapadia, Rohit Karlupia, Vikram Agrawal, Rajat Gupta |
Abstract | In many software applications, logs serve as the only interface between the application and the developer. However, navigating through the logs of long-running applications is often challenging. Logs from previously successful application runs can be leveraged to automatically identify errors and provide users with only the logs that are relevant to the debugging process. We describe a privacy preserving framework which can be employed by Platform as a Service (PaaS) providers to utilize the user logs generated on the platform while protecting the potentially sensitive logged data. Further, in order to accurately and scalably parse log lines, we present a distributed log parsing algorithm which leverages Locality Sensitive Hashing (LSH). We outperform the state-of-the-art on multiple datasets. We further demonstrate the scalability of Delog on publicly available Thunderbird log dataset with close to 27,000 unique patterns and 211 million lines. |
Tasks | |
Published | 2019-02-13 |
URL | https://arxiv.org/abs/1902.04843v3 |
https://arxiv.org/pdf/1902.04843v3.pdf | |
PWC | https://paperswithcode.com/paper/delog-a-privacy-preserving-log-filtering |
Repo | https://github.com/qubole/qubole-log-datasets |
Framework | none |
Gromov-Wasserstein Learning for Graph Matching and Node Embedding
Title | Gromov-Wasserstein Learning for Graph Matching and Node Embedding |
Authors | Hongteng Xu, Dixin Luo, Hongyuan Zha, Lawrence Carin |
Abstract | A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the optimal transport, the distance of which not only reflects the topological structure of each graph but also yields the correspondence across the graphs. These two learning steps are mutually-beneficial, and are unified here by minimizing the Gromov-Wasserstein discrepancy with structural regularizers. This framework leads to an optimization problem that is solved by a proximal point method. We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches. |
Tasks | Graph Matching |
Published | 2019-01-17 |
URL | https://arxiv.org/abs/1901.06003v2 |
https://arxiv.org/pdf/1901.06003v2.pdf | |
PWC | https://paperswithcode.com/paper/gromov-wasserstein-learning-for-graph |
Repo | https://github.com/HongtengXu/gwl |
Framework | pytorch |
Generalizing Monocular 3D Human Pose Estimation in the Wild
Title | Generalizing Monocular 3D Human Pose Estimation in the Wild |
Authors | Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, Jimmy S. Ren |
Abstract | The availability of the large-scale labeled 3D poses in the Human3.6M dataset plays an important role in advancing the algorithms for 3D human pose estimation from a still image. We observe that recent innovation in this area mainly focuses on new techniques that explicitly address the generalization issue when using this dataset, because this database is constructed in a highly controlled environment with limited human subjects and background variations. Despite such efforts, we can show that the results of the current methods are still error-prone especially when tested against the images taken in-the-wild. In this paper, we aim to tackle this problem from a different perspective. We propose a principled approach to generate high quality 3D pose ground truth given any in-the-wild image with a person inside. We achieve this by first devising a novel stereo inspired neural network to directly map any 2D pose to high quality 3D counterpart. We then perform a carefully designed geometric searching scheme to further refine the joints. Based on this scheme, we build a large-scale dataset with 400,000 in-the-wild images and their corresponding 3D pose ground truth. This enables the training of a high quality neural network model, without specialized training scheme and auxiliary loss function, which performs favorably against the state-of-the-art 3D pose estimation methods. We also evaluate the generalization ability of our model both quantitatively and qualitatively. Results show that our approach convincingly outperforms the previous methods. We make our dataset and code publicly available. |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05512v1 |
http://arxiv.org/pdf/1904.05512v1.pdf | |
PWC | https://paperswithcode.com/paper/generalizing-monocular-3d-human-pose |
Repo | https://github.com/llcshappy/Monocular-3D-Human-Pose |
Framework | tf |