January 25, 2020

2942 words 14 mins read

Paper Group NAWR 7

Building English-to-Serbian Machine Translation System for IMDb Movie Reviews. Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses. Limited Data Rolling Bearing Fault Diagnosis with Few-shot Learning. Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks. Scalable Bayesian inferen …

Building English-to-Serbian Machine Translation System for IMDb Movie Reviews


Title	Building English-to-Serbian Machine Translation System for IMDb Movie Reviews
Authors	Pintu Lohar, Maja Popovi{'c}, Andy Way
Abstract	This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3715/
PDF	https://www.aclweb.org/anthology/W19-3715
PWC	https://paperswithcode.com/paper/building-english-to-serbian-machine
Repo	https://github.com/m-popovic/imdb-corpus-for-MT
Framework	none

Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses


Title	Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses
Authors	Jerome Rony, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, Eric Granger
Abstract	Research on adversarial examples in computer vision tasks has shown that small, often imperceptible changes to an image can induce misclassification, which has security implications for a wide range of image processing systems. Considering L2 norm distortions, the Carlini and Wagner attack is presently the most effective white-box attack in the literature. However, this method is slow since it performs a line-search for one of the optimization terms, and often requires thousands of iterations. In this paper, an efficient approach is proposed to generate gradient-based attacks that induce misclassifications with low L2 norm, by decoupling the direction and the norm of the adversarial perturbation that is added to the image. Experiments conducted on the MNIST, CIFAR-10 and ImageNet datasets indicate that our attack achieves comparable results to the state-of-the-art (in terms of L2 norm) with considerably fewer iterations (as few as 100 iterations), which opens the possibility of using these attacks for adversarial training. Models trained with our attack achieve state-of-the-art robustness against white-box gradient-based L2 attacks on the MNIST and CIFAR-10 datasets, outperforming the Madry defense when the attacks are limited to a maximum norm.
Tasks	Adversarial Attack, Adversarial Defense
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Rony_Decoupling_Direction_and_Norm_for_Efficient_Gradient-Based_L2_Adversarial_Attacks_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Rony_Decoupling_Direction_and_Norm_for_Efficient_Gradient-Based_L2_Adversarial_Attacks_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/decoupling-direction-and-norm-for-efficient-1
Repo	https://github.com/jeromerony/fast_adversarial
Framework	pytorch

Limited Data Rolling Bearing Fault Diagnosis with Few-shot Learning


Title	Limited Data Rolling Bearing Fault Diagnosis with Few-shot Learning
Authors	Ansi Zhang, Shaobo Li, Yuxin Cui, Wanli Yang, Rongzhi Dong and Jianjun Hu
Abstract	This paper focuses on bearing fault diagnosis with limited training data. A major challenge in fault diagnosis is the infeasibility of obtaining sufficient training samples for every fault type under all working conditions. Recently deep learning based fault diagnosis methods have achieved promising results. However, most of these methods require large amount of training data. In this study, we propose a deep neural network based few-shot learning approach for rolling bearing fault diagnosis with limited data. Our model is based on the siamese neural network, which learns by exploiting sample pairs of the same or different categories. Experimental results over the standard Case Western Reserve University (CWRU) bearing fault diagnosis benchmark dataset showed that our few-shot learning approach is more effective in fault diagnosis with limited data availability. When tested over different noise environments with minimal amount of training data, the performance of our few-shot learning model surpasses the one of the baseline with reasonable noise level. When evaluated over test sets with new fault types or new working conditions, few-shot models work better than the baseline trained with all fault types. All our models and datasets in this study are open sourced and can be downloaded from https://mekhub.cn/as/fault_diagnosis_with_few-shot_learning/ .
Tasks	Few-Shot Learning
Published	2019-08-22
URL	https://ieeexplore.ieee.org/abstract/document/8793060
PDF	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8793060
PWC	https://paperswithcode.com/paper/limited-data-rolling-bearing-fault-diagnosis
Repo	https://github.com/SNBQT/Limited-Data-Rolling-Bearing-Fault-Diagnosis-with-Few-shot-Learning
Framework	none

Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks


Title	Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks
Authors	Seungjoo Yoo, Hyojin Bahng, Sunghyo Chung, Junsoo Lee, Jaehyuk Chang, Jaegul Choo
Abstract	Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning. Existing models require a significant amount of training data. To tackle this issue, we present a novel memory-augmented colorization model MemoPainter that can produce high-quality colorization with limited data. In particular, our model is able to capture rare instances and successfully colorize them. Also, we propose a novel threshold triplet loss that enables unsupervised training of memory networks without the need for class labels. Experiments show that our model has superior quality in both few-shot and one-shot colorization tasks.
Tasks	Colorization, Few-Shot Learning
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Yoo_Coloring_With_Limited_Data_Few-Shot_Colorization_via_Memory_Augmented_Networks_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Yoo_Coloring_With_Limited_Data_Few-Shot_Colorization_via_Memory_Augmented_Networks_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/coloring-with-limited-data-few-shot
Repo	https://github.com/dongheehand/MemoPainter-PyTorch
Framework	pytorch

Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models


Title	Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models
Authors	Ruoxi Sun, Ian Kinsella, Scott Linderman, Liam Paninski
Abstract	Recent advances in optical voltage sensors have brought us closer to a critical goal in cellular neuroscience: imaging the full spatiotemporal voltage on a dendritic tree. However, current sensors and imaging approaches still face significant limitations in SNR and sampling frequency; therefore statistical denoising and interpolation methods remain critical for understanding single-trial spatiotemporal dendritic voltage dynamics. Previous denoising approaches were either based on an inadequate linear voltage model or scaled poorly to large trees. Here we introduce a scalable fully Bayesian approach. We develop a generative nonlinear model that requires few parameters per compartment of the cell but is nonetheless flexible enough to sample realistic spatiotemporal data. The model captures different dynamics in each compartment and leverages biophysical knowledge to constrain intra- and inter-compartmental dynamics. We obtain a full posterior distribution over spatiotemporal voltage via an augmented Gibbs sampling algorithm. The nonlinear smoother model outperforms previously developed linear methods, and scales to much larger systems than previous methods based on sequential Monte Carlo approaches.
Tasks	Bayesian Inference, Denoising
Published	2019-12-01
URL	http://papers.nips.cc/paper/9206-scalable-bayesian-inference-of-dendritic-voltage-via-spatiotemporal-recurrent-state-space-models
PDF	http://papers.nips.cc/paper/9206-scalable-bayesian-inference-of-dendritic-voltage-via-spatiotemporal-recurrent-state-space-models.pdf
PWC	https://paperswithcode.com/paper/scalable-bayesian-inference-of-dendritic
Repo	https://github.com/SunRuoxi/Voltage_
Framework	none

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning


Title	Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning
Authors	Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Abstract	Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.
Tasks	Continuous Control
Published	2019-12-01
URL	http://papers.nips.cc/paper/9026-loaded-dice-trading-off-bias-and-variance-in-any-order-score-function-gradient-estimators-for-reinforcement-learning
PDF	http://papers.nips.cc/paper/9026-loaded-dice-trading-off-bias-and-variance-in-any-order-score-function-gradient-estimators-for-reinforcement-learning.pdf
PWC	https://paperswithcode.com/paper/loaded-dice-trading-off-bias-and-variance-in-1
Repo	https://github.com/oxwhirl/loaded-dice
Framework	none

Distilling Discrimination and Generalization Knowledge for Event Detection via Delta-Representation Learning


Title	Distilling Discrimination and Generalization Knowledge for Event Detection via Delta-Representation Learning
Authors	Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
Abstract	Event detection systems rely on discrimination knowledge to distinguish ambiguous trigger words and generalization knowledge to detect unseen/sparse trigger words. Current neural event detection approaches focus on trigger-centric representations, which work well on distilling discrimination knowledge, but poorly on learning generalization knowledge. To address this problem, this paper proposes a Delta-learning approach to distill discrimination and generalization knowledge by effectively decoupling, incrementally learning and adaptively fusing event representation. Experiments show that our method significantly outperforms previous approaches on unseen/sparse trigger words, and achieves state-of-the-art performance on both ACE2005 and KBP2017 datasets.
Tasks	Representation Learning
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1429/
PDF	https://www.aclweb.org/anthology/P19-1429
PWC	https://paperswithcode.com/paper/distilling-discrimination-and-generalization
Repo	https://github.com/luyaojie/delta-learning-for-ed
Framework	pytorch

A Self-Training Approach for Short Text Clustering


Title	A Self-Training Approach for Short Text Clustering
Authors	Amir Hadifar, Lucas Sterckx, Thomas Demeester, Chris Develder
Abstract	Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.
Tasks	Sentence Embedding, Text Clustering
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4322/
PDF	https://www.aclweb.org/anthology/W19-4322
PWC	https://paperswithcode.com/paper/a-self-training-approach-for-short-text
Repo	https://github.com/hadifar/stc_clustering
Framework	none

Octave Deep Plane-Sweeping Network: Reducing Spatial Redundancy for Learning-Based Plane-Sweeping Stereo


Title	Octave Deep Plane-Sweeping Network: Reducing Spatial Redundancy for Learning-Based Plane-Sweeping Stereo
Authors	R. Komatsu, H. Fujii, Y. Tamura, A. Yamashita, H. Asama
Abstract	In this paper, we propose the octave deep plane-sweeping network (OctDPSNet). OctDPSNet is a novel learning-based plane-sweeping stereo, which drastically reduces the required GPU memory and computation time while achieving a state-of-the-art depth estimation accuracy. Inspired by octave convolution, we divide image features into high and low spatial frequency features, and two cost volumes are generated from these using our proposed plane-sweeping module. To reduce spatial redundancy, the resolution of the cost volume from the low spatial frequency features is set to half that of the high spatial frequency features, which enables the memory consumption and computational cost to be reduced. After refinement, the two cost volumes are integrated into a final cost volume through our proposed pixel-wise “squeeze-and-excitation” based attention mechanism, and the depth maps are estimated from the final cost volume. We evaluate the proposed model on five datasets: SUN3D, RGB-D SLAM, MVS, Scenes11, and ETH3D. Our model outperforms previous methods on five datasets while drastically reducing the memory consumption and computational cost. Our source code is available at https://github.com/matsuren/octDPSNet.
Tasks	Depth Estimation, Stereo Depth Estimation
Published	2019-10-14
URL	https://ieeexplore.ieee.org/document/8867874
PDF	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8867874
PWC	https://paperswithcode.com/paper/octave-deep-plane-sweeping-network-reducing
Repo	https://github.com/matsuren/octDPSNet
Framework	pytorch

Effective Adversarial Regularization for Neural Machine Translation


Title	Effective Adversarial Regularization for Neural Machine Translation
Authors	Motoki Sato, Jun Suzuki, Shun Kiyono
Abstract	A regularization technique based on adversarial perturbation, which was initially developed in the field of image processing, has been successfully applied to text classification tasks and has yielded attractive improvements. We aim to further leverage this promising methodology into more sophisticated and critical neural models in the natural language processing field, i.e., neural machine translation (NMT) models. However, it is not trivial to apply this methodology to such models. Thus, this paper investigates the effectiveness of several possible configurations of applying the adversarial perturbation and reveals that the adversarial regularization technique can significantly and consistently improve the performance of widely used NMT models, such as LSTM-based and Transformer-based models.
Tasks	Machine Translation, Text Classification
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1020/
PDF	https://www.aclweb.org/anthology/P19-1020
PWC	https://paperswithcode.com/paper/effective-adversarial-regularization-for
Repo	https://github.com/pfnet-research/vat_nmt
Framework	none

Difference of Convolution for Deep Compressive Sensing


Title	Difference of Convolution for Deep Compressive Sensing
Authors	Canh, Thuong Nguyen; Jeon, Byeungwoo
Abstract	Deep learning-based compressive sensing (DCS) has improved the compressive sensing (CS) with fast and high reconstruction quality. Researchers have further extended it to multi-scale DCS which improves reconstruction quality based on Wavelet decomposition. In this work, we mimic the Difference of Gaussian via convolution and propose a scheme named as Difference of convolution-based multi-scale DCS (DoC-DCS). Unlike the multi-scale DCS based on a well-designed filter in wavelet domain, the proposed DoC-DCS learns decomposition, thereby, outperforms other state-of-the-art compressive sensing methods.
Tasks	Compressive Sensing
Published	2019-09-22
URL	https://github.com/AtenaKid/DoC-DCS
PDF	https://github.com/AtenaKid/DoC-DCS
PWC	https://paperswithcode.com/paper/difference-of-convolution-for-deep
Repo	https://github.com/AtenaKid/DoC-DCS
Framework	none

Automated characterization of noise distributions in diffusion MRI data


Title	Automated characterization of noise distributions in diffusion MRI data
Authors	Samuel St-Jean, Alberto De Luca, Chantal M. W. Tax, Max A. Viergever, Alexander Leemans
Abstract	Purpose: To understand and characterize noise distributions in parallel imaging for diffusion MRI. Theory and Methods: Two new automated methods using the moments and the maximum likelihood equations of the Gamma distribution were developed. Simulations using stationary and spatially varying noncentral chi noise distributions were created for two diffusion weightings with SENSE or GRAPPA reconstruction and 8, 12 or 32 receiver coils. Furthermore, MRI data of a water phantom with different combinations of multiband and SENSE acceleration were acquired on a 3T scanner along with noise-only measurements. Finally, an in vivo dataset was acquired at 3T using multiband acceleration and GRAPPA reconstruction. Estimation of the noise distribution was performed with the proposed methods and compared with 3 other existing algorithms. Results: Simulations showed that assuming a Rician distribution can lead to misestimation in parallel imaging. Results on the acquired datasets showed that signal leakage in multiband can lead to a misestimation of the parameters. Noise maps are robust to these artifacts, but may misestimate parameters in some cases. The proposed algorithms herein can estimate both parameters of the noise distribution, are robust to signal leakage artifacts and perform best when used on acquired noise maps. Conclusion: Misestimation of the correct noise distribution can hamper further processing such as bias correction and denoising, especially when the measured distribution differs too much from the actual signal distribution e.g., due to artifacts. The use of noise maps can yield more robust estimates than the use of diffusion weighted images as input for algorithms.
Tasks	Denoising
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12121
PDF	https://arxiv.org/pdf/1906.12121.pdf
PWC	https://paperswithcode.com/paper/automated-characterization-of-noise
Repo	https://github.com/samuelstjean/autodmri
Framework	none

R2D2: Reliable and Repeatable Detector and Descriptor


Title	R2D2: Reliable and Repeatable Detector and Descriptor
Authors	Jerome Revaud, Cesar De Souza, Martin Humenberger, Philippe Weinzaepfel
Abstract	Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical approaches are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection or learning descriptors at the detected keypoint locations. In this work, we argue that repeatable regions are not necessarily discriminative and can therefore lead to select suboptimal keypoints. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows to avoid ambiguous areas, thus leading to reliable keypoint detection and description. Our detection-and-description approach simultaneously outputs sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset and on the recent Aachen Day-Night localization benchmark.
Tasks	Atari Games, Interest Point Detection, Keypoint Detection, Metric Learning
Published	2019-12-01
URL	http://papers.nips.cc/paper/9407-r2d2-reliable-and-repeatable-detector-and-descriptor
PDF	http://papers.nips.cc/paper/9407-r2d2-reliable-and-repeatable-detector-and-descriptor.pdf
PWC	https://paperswithcode.com/paper/r2d2-reliable-and-repeatable-detector-and
Repo	https://github.com/naver/r2d2
Framework	pytorch

Sampling Networks and Aggregate Simulation for Online POMDP Planning


Title	Sampling Networks and Aggregate Simulation for Online POMDP Planning
Authors	Hao(Jackson) Cui, Roni Khardon
Abstract	The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space. The graph captures the result of simulating the model in aggregate under independence assumptions, giving a symbolic representation of the value function. The algorithm supports large observation spaces using sampling networks, a representation of the process of sampling values of observations, which is integrated into the graph representation. Following previous work in MDPs this approach enables action selection in POMDPs through gradient optimization over the graph representation. This approach complements recent algorithms for POMDPs which are based on particle representations of belief states and an explicit search for action selection. Our approach enables scaling to large factored action spaces in addition to large state spaces and observation spaces. An experimental evaluation demonstrates that the algorithm provides excellent performance relative to state of the art in large POMDP problems.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9121-sampling-networks-and-aggregate-simulation-for-online-pomdp-planning
PDF	http://papers.nips.cc/paper/9121-sampling-networks-and-aggregate-simulation-for-online-pomdp-planning.pdf
PWC	https://paperswithcode.com/paper/sampling-networks-and-aggregate-simulation
Repo	https://github.com/hcui01/SNAP
Framework	none

Park: An Open Platform for Learning-Augmented Computer Systems


Title	Park: An Open Platform for Learning-Augmented Computer Systems
Authors	Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Dr.Mohammad Alizadeh
Abstract	We present Park, a platform for researchers to experiment with Reinforcement Learning (RL) for computer systems. Using RL for improving the performance of systems has a lot of potential, but is also in many ways very different from, for example, using RL for games. Thus, in this work we first discuss the unique challenges RL for systems has, and then propose Park an open extensible platform, which makes it easier for ML researchers to work on systems problems. Currently, Park consists of 12 real world system-centric optimization problems with one common easy to use interface. Finally, we present the performance of existing RL approaches over those 12 problems and outline potential areas of future work.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8519-park-an-open-platform-for-learning-augmented-computer-systems
PDF	http://papers.nips.cc/paper/8519-park-an-open-platform-for-learning-augmented-computer-systems.pdf
PWC	https://paperswithcode.com/paper/park-an-open-platform-for-learning-augmented
Repo	https://github.com/park-project/park
Framework	tf