April 2, 2020

3059 words 15 mins read

Paper Group ANR 168

A Multilingual View of Unsupervised Machine Translation. Lost in Embedding Space: Explaining Cross-Lingual Task Performance with Eigenvalue Divergence. Suphx: Mastering Mahjong with Deep Reinforcement Learning. Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model. Segmentation of Cellular Patterns in Confocal Imag …

A Multilingual View of Unsupervised Machine Translation


Title	A Multilingual View of Unsupervised Machine Translation
Authors	Xavier Garcia, Pierre Foret, Thibault Sellam, Ankur P. Parikh
Abstract	We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation. In addition to studying the vanilla case where there is only monolingual data available, we propose a novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxiliary parallel data that contains the other. This auxiliary data can naturally be utilized in our probabilistic framework via a novel cross-translation loss term. Empirically, we show that our approach results in higher BLEU scores over state-of-the-art unsupervised models on the WMT’14 English-French, WMT’16 English-German, and WMT’16 English-Romanian datasets in most directions. In particular, we obtain a +1.65 BLEU advantage over the best-performing unsupervised model in the Romanian-English direction.
Tasks	Machine Translation, Unsupervised Machine Translation
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02955v2
PDF	https://arxiv.org/pdf/2002.02955v2.pdf
PWC	https://paperswithcode.com/paper/a-multilingual-view-of-unsupervised-machine
Repo
Framework

Lost in Embedding Space: Explaining Cross-Lingual Task Performance with Eigenvalue Divergence


Title	Lost in Embedding Space: Explaining Cross-Lingual Task Performance with Eigenvalue Divergence
Authors	Haim Dubossarsky, Ivan Vulić, Roi Reichart, Anna Korhonen
Abstract	Performance in cross-lingual NLP tasks is impacted by the (dis)similarity of languages at hand: e.g., previous work has suggested there is a connection between the expected success of bilingual lexicon induction (BLI) and the assumption of (approximate) isomorphism between monolingual embedding spaces. In this work, we present a large-scale study focused on the correlations between language similarity and task performance, covering thousands of language pairs and four different tasks: BLI, machine translation, parsing, and POS tagging. We propose a novel language distance measure, Eigenvalue Divergence (EVD), which quantifies the degree of isomorphism between two monolingual spaces. We empirically show that 1) language similarity scores derived from embedding-based EVD distances are strongly associated with performance observed in different cross-lingual tasks, 2) EVD outperforms other standard embedding-based language distance measures across the board, at the same time being computationally more tractable and easier to interpret. Finally, we demonstrate that EVD captures information which is complementary to typologically driven language distance measures. We report that their combination yields even higher correlations with performance levels in all cross-lingual tasks.
Tasks	Machine Translation
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11136v1
PDF	https://arxiv.org/pdf/2001.11136v1.pdf
PWC	https://paperswithcode.com/paper/lost-in-embedding-space-explaining-cross
Repo
Framework

Suphx: Mastering Mahjong with Deep Reinforcement Learning


Title	Suphx: Mastering Mahjong with Deep Reinforcement Learning
Authors	Junjie Li, Sotetsu Koyamada, Qiwei Ye, Guoqing Liu, Chao Wang, Ruihan Yang, Li Zhao, Tao Qin, Tie-Yan Liu, Hsiao-Wuen Hon
Abstract	Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold’em) to more complex ones (e.g., multi-player imperfect-information games such as multi-player Texas hold’em and StartCraft II). Mahjong is a popular multi-player imperfect-information game worldwide but very challenging for AI research due to its complex playing/scoring rules and rich hidden information. We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperforms most top human players in Mahjong.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13590v2
PDF	https://arxiv.org/pdf/2003.13590v2.pdf
PWC	https://paperswithcode.com/paper/suphx-mastering-mahjong-with-deep
Repo
Framework

Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model


Title	Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model
Authors	Steve Kommrusch, Théo Barollet, Louis-Noël Pouchet
Abstract	In this work we target the problem of provably computing the equivalence between two programs represented as dataflow graphs. To this end, we formalize the problem of equivalence between two programs as finding a set of semantics-preserving rewrite rules from one into the other, such that after the rewrite the two programs are structurally identical, and therefore trivially equivalent. We then develop the first graph-to-sequence neural network system for program equivalence, trained to produce such rewrite sequences from a carefully crafted automatic example generation algorithm. We extensively evaluate our system on a rich multi-type linear algebra expression language, using arbitrary combinations of 100+ graph-rewriting axioms of equivalence. Our system outputs via inference a correct rewrite sequence for 96% of the 10,000 program pairs isolated for testing, using 30-term programs. And in all cases, the validity of the sequence produced and therefore the provable assertion of program equivalence is computable, in negligible time.
Tasks	Graph-to-Sequence
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06799v1
PDF	https://arxiv.org/pdf/2002.06799v1.pdf
PWC	https://paperswithcode.com/paper/equivalence-of-dataflow-graphs-via-rewrite
Repo
Framework

Segmentation of Cellular Patterns in Confocal Images of Melanocytic Lesions in vivo via a Multiscale Encoder-Decoder Network (MED-Net)


Title	Segmentation of Cellular Patterns in Confocal Images of Melanocytic Lesions in vivo via a Multiscale Encoder-Decoder Network (MED-Net)
Authors	Kivanc Kose, Alican Bozkurt, Christi Alessi-Fox, Melissa Gill, Caterina Longo, Giovanni Pellacani, Jennifer Dy, Dana H. Brooks, Milind Rajadhyaksha
Abstract	In-vivo optical microscopy is advancing into routine clinical practice for non-invasively guiding diagnosis and treatment of cancer and other diseases, and thus beginning to reduce the need for traditional biopsy. However, reading and analysis of the optical microscopic images are generally still qualitative, relying mainly on visual examination. Here we present an automated semantic segmentation method called “Multiscale Encoder-Decoder Network (MED-Net)” that provides pixel-wise labeling into classes of patterns in a quantitative manner. The novelty in our approach is the modeling of textural patterns at multiple scales. This mimics the procedure for examining pathology images, which routinely starts with low magnification (low resolution, large field of view) followed by closer inspection of suspicious areas with higher magnification (higher resolution, smaller fields of view). We trained and tested our model on non-overlapping partitions of 117 reflectance confocal microscopy (RCM) mosaics of melanocytic lesions, an extensive dataset for this application, collected at four clinics in the US, and two in Italy. With patient-wise cross-validation, we achieved pixel-wise mean sensitivity and specificity of $70\pm11%$ and $95\pm2%$, respectively, with $0.71\pm0.09$ Dice coefficient over six classes. In the scenario, we partitioned the data clinic-wise and tested the generalizability of the model over multiple clinics. In this setting, we achieved pixel-wise mean sensitivity and specificity of $74%$ and $95%$, respectively, with $0.75$ Dice coefficient. We compared MED-Net against the state-of-the-art semantic segmentation models and achieved better quantitative segmentation performance. Our results also suggest that, due to its nested multiscale architecture, the MED-Net model annotated RCM mosaics more coherently, avoiding unrealistic-fragmented annotations.
Tasks	Semantic Segmentation
Published	2020-01-03
URL	https://arxiv.org/abs/2001.01005v1
PDF	https://arxiv.org/pdf/2001.01005v1.pdf
PWC	https://paperswithcode.com/paper/segmentation-of-cellular-patterns-in-confocal
Repo
Framework

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks


Title	Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks
Authors	Lenore Cowen, Kapil Devkota, Xiaozhe Hu, James M. Murphy, Kaiyi Wu
Abstract	Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure.
Tasks	Denoising, Dimensionality Reduction
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03616v1
PDF	https://arxiv.org/pdf/2003.03616v1.pdf
PWC	https://paperswithcode.com/paper/diffusion-state-distances-multitemporal
Repo
Framework

Extending BrainScaleS OS for BrainScaleS-2


Title	Extending BrainScaleS OS for BrainScaleS-2
Authors	Eric Müller, Christian Mauch, Philipp Spilger, Oliver Julien Breitwieser, Johann Klähn, David Stöckel, Timo Wunderlich, Johannes Schemmel
Abstract	BrainScaleS-2 is a mixed-signal accelerated neuromorphic system targeted for research in the fields of computational neuroscience and beyond-von-Neumann computing. To augment its flexibility, the analog neural network core is accompanied by an embedded SIMD microprocessor. The BrainScaleS Operating System (BrainScaleS OS) is a software stack designed for the user-friendly operation of the BrainScaleS architectures. We present and walk through the software-architectural enhancements that were introduced for the BrainScaleS-2 architecture. Finally, using a second-version BrainScaleS-2 prototype we demonstrate its application in an example experiment based on spike-based expectation maximization.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13750v1
PDF	https://arxiv.org/pdf/2003.13750v1.pdf
PWC	https://paperswithcode.com/paper/extending-brainscales-os-for-brainscales-2
Repo
Framework

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS


Title	Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
Authors	Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang, Shuang Liu
Abstract	Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset will be released soon.
Tasks	3D Pose Estimation, Pose Estimation
Published	2020-03-09
URL	https://arxiv.org/abs/2003.03972v1
PDF	https://arxiv.org/pdf/2003.03972v1.pdf
PWC	https://paperswithcode.com/paper/cross-view-tracking-for-multi-human-3d-pose
Repo
Framework

Three-Stream Fusion Network for First-Person Interaction Recognition


Title	Three-Stream Fusion Network for First-Person Interaction Recognition
Authors	Ye-Ji Kim, Dong-Gyu Lee, Seong-Whan Lee
Abstract	First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer’s movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. Thre three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory(LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two-public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.
Tasks	Activity Recognition, Human Interaction Recognition
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08219v1
PDF	https://arxiv.org/pdf/2002.08219v1.pdf
PWC	https://paperswithcode.com/paper/three-stream-fusion-network-for-first-person
Repo
Framework

Stochastic Recursive Momentum for Policy Gradient Methods


Title	Stochastic Recursive Momentum for Policy Gradient Methods
Authors	Huizhuo Yuan, Xiangru Lian, Ji Liu, Yuren Zhou
Abstract	In this paper, we propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (STORM-PG), which operates a SARAH-type stochastic recursive variance-reduced policy gradient in an exponential moving average fashion. STORM-PG enjoys a provably sharp $O(1/\epsilon^3)$ sample complexity bound for STORM-PG, matching the best-known convergence rate for policy gradient algorithm. In the mean time, STORM-PG avoids the alternations between large batches and small batches which persists in comparable variance-reduced policy gradient methods, allowing considerably simpler parameter tuning. Numerical experiments depicts the superiority of our algorithm over comparative policy gradient algorithms.
Tasks	Policy Gradient Methods
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04302v1
PDF	https://arxiv.org/pdf/2003.04302v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-recursive-momentum-for-policy
Repo
Framework

Statistically Efficient Off-Policy Policy Gradients


Title	Statistically Efficient Off-Policy Policy Gradients
Authors	Nathan Kallus, Masatoshi Uehara
Abstract	Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from off-policy data, where the estimation is particularly non-trivial. We derive the asymptotic lower bound on the feasible mean-squared error in both Markov and non-Markov decision processes and show that existing estimators fail to achieve it in general settings. We propose a meta-algorithm that achieves the lower bound without any parametric assumptions and exhibits a unique 3-way double robustness property. We discuss how to estimate nuisances that the algorithm relies on. Finally, we establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.
Tasks	Policy Gradient Methods
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04014v2
PDF	https://arxiv.org/pdf/2002.04014v2.pdf
PWC	https://paperswithcode.com/paper/statistically-efficient-off-policy-policy
Repo
Framework

SGLB: Stochastic Gradient Langevin Boosting


Title	SGLB: Stochastic Gradient Langevin Boosting
Authors	Aleksei Ustimenko, Liudmila Prokhorenkova
Abstract	In this paper, we introduce Stochastic Gradient Langevin Boosting (SGLB) - a powerful and efficient machine learning framework, which may deal with a wide range of loss functions and has provable generalization guarantees. The method is based on a special form of Langevin Diffusion equation specifically designed for gradient boosting. This allows us to guarantee the global convergence, while standard gradient boosting algorithms can guarantee only local optima, which is a problem for multimodal loss functions. To illustrate the advantages of SGLB, we apply it to a classification task with 0-1 loss function, which is known to be multimodal, and to a standard Logistic regression task that is convex. The algorithm is implemented as a part of the CatBoost gradient boosting library and outperforms classic gradient boosting methods.
Tasks
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07248v2
PDF	https://arxiv.org/pdf/2001.07248v2.pdf
PWC	https://paperswithcode.com/paper/sglb-stochastic-gradient-langevin-boosting
Repo
Framework

Using CNNs For Users Segmentation In Video See-Through Augmented Virtuality


Title	Using CNNs For Users Segmentation In Video See-Through Augmented Virtuality
Authors	Pierre-Olivier Pigny, Lionel Dominjon
Abstract	In this paper, we present preliminary results on the use of deep learning techniques to integrate the users self-body and other participants into a head-mounted video see-through augmented virtuality scenario. It has been previously shown that seeing users bodies in such simulations may improve the feeling of both self and social presence in the virtual environment, as well as user performance. We propose to use a convolutional neural network for real time semantic segmentation of users bodies in the stereoscopic RGB video streams acquired from the perspective of the user. We describe design issues as well as implementation details of the system and demonstrate the feasibility of using such neural networks for merging users bodies in an augmented virtuality simulation.
Tasks	Real-Time Semantic Segmentation, Semantic Segmentation
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00487v1
PDF	https://arxiv.org/pdf/2001.00487v1.pdf
PWC	https://paperswithcode.com/paper/using-cnns-for-users-segmentation-in-video
Repo
Framework

A Novel Generative Neural Approach for InSAR Joint Phase Filtering and Coherence Estimation


Title	A Novel Generative Neural Approach for InSAR Joint Phase Filtering and Coherence Estimation
Authors	Subhayan Mukherjee, Aaron Zimmer, Xinyao Sun, Parwant Ghuman, Irene Cheng
Abstract	Earth’s physical properties like atmosphere, topography and ground instability can be determined by differencing billions of phase measurements (pixels) in subsequent matching Interferometric Synthetic Aperture Radar (InSAR) images. Quality (coherence) of each pixel can vary from perfect information (1) to complete noise (0), which needs to be quantified, alongside filtering information-bearing pixels. Phase filtering is thus critical to InSAR’s Digital Elevation Model (DEM) production pipeline, as it removes spatial inconsistencies (residues), immensely improving the subsequent unwrapping. Recent explosion in quantity of available InSAR data can facilitate Wide Area Monitoring (WAM) over several geographical regions, if effective and efficient automated processing can obviate manual quality-control. Advances in parallel computing architectures and Convolutional Neural Networks (CNNs) which thrive on them to rival human performance on visual pattern recognition makes this approach ideal for InSAR phase filtering for WAM, but remains largely unexplored. We propose “GenInSAR”, a CNN-based generative model for joint phase filtering and coherence estimation. We use satellite and simulated InSAR images to show overall superior performance of GenInSAR over five algorithms qualitatively, and quantitatively using Phase and Coherence Root-Mean-Squared-Error, Residue Reduction Percentage, and Phase Cosine Error.
Tasks
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09631v1
PDF	https://arxiv.org/pdf/2001.09631v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-generative-neural-approach-for-insar
Repo
Framework

Estimating Uncertainty Intervals from Collaborating Networks


Title	Estimating Uncertainty Intervals from Collaborating Networks
Authors	Tianhui Zhou, Yitong Li, Yuan Wu, David Carlson
Abstract	Effective decision making requires understanding the uncertainty inherent in a prediction. To estimate uncertainty in regression, one could modify a deep neural network to predict coverage intervals, such as by predicting the mean and standard deviation. Unfortunately, in our empirical evaluations the predicted coverage from existing approaches is either overconfident or lacks sharpness (gives imprecise intervals). To address this challenge, we propose a novel method to estimate uncertainty based on two distinct neural networks with two distinct loss functions in a similar vein to Generative Adversarial Networks. Specifically, one network tries to learn the cumulative distribution function, and the second network tries to learn its inverse. Theoretical analysis demonstrates that the idealized solution is a fixed point and that under certain conditions the approach is asymptotically consistent to ground truth. We benchmark the approach on one synthetic and five real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In synthetic data, the proposed approach essentially matches the theoretically optimal solution in all aspects. In the real datasets, the proposed approach is empirically more faithful in its coverage estimates and typically gives sharper intervals than competing methods.
Tasks	Decision Making
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05212v1
PDF	https://arxiv.org/pdf/2002.05212v1.pdf
PWC	https://paperswithcode.com/paper/estimating-uncertainty-intervals-from
Repo
Framework