February 1, 2020

3395 words 16 mins read

Paper Group AWR 220

Exploring Language Similarities with Dimensionality Reduction Technique. Practical Deep Learning with Bayesian Principles. MLQA: Evaluating Cross-lingual Extractive Question Answering. Attentive Modality Hopping Mechanism for Speech Emotion Recognition. Temporal Localization of Moments in Video Collections with Natural Language. An Unsupervised Aut …

Exploring Language Similarities with Dimensionality Reduction Technique


Title	Exploring Language Similarities with Dimensionality Reduction Technique
Authors	Sangarshanan Veeraraghavan
Abstract	In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.
Tasks	Dimensionality Reduction
Published	2019-02-16
URL	http://arxiv.org/abs/1902.06092v1
PDF	http://arxiv.org/pdf/1902.06092v1.pdf
PWC	https://paperswithcode.com/paper/exploring-language-similarities-with
Repo	https://github.com/Sangarshanan/Exploring-Language-similarities
Framework	none

Practical Deep Learning with Bayesian Principles


Title	Practical Deep Learning with Bayesian Principles
Authors	Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan
Abstract	Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles. A PyTorch implementation is available as a plug-and-play optimiser.
Tasks	Continual Learning, Data Augmentation
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02506v2
PDF	https://arxiv.org/pdf/1906.02506v2.pdf
PWC	https://paperswithcode.com/paper/practical-deep-learning-with-bayesian
Repo	https://github.com/team-approx-bayes/dl-with-bayes
Framework	pytorch

MLQA: Evaluating Cross-lingual Extractive Question Answering


Title	MLQA: Evaluating Cross-lingual Extractive Question Answering
Authors	Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk
Abstract	Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making training QA systems in other languages challenging. An alternative to building large monolingual training datasets is to develop cross-lingual systems which can transfer to a target language without requiring training data in that language. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. MLQA contains QA instances in 7 languages, namely English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. It consists of over 12K QA instances in English and 5K in each other language, with each QA instance being parallel between 4 languages on average. MLQA is built using a novel alignment context strategy on Wikipedia articles, and serves as a cross-lingual extension to existing extractive QA datasets. We evaluate current state-of-the-art cross-lingual representations on MLQA, and also provide machine-translation-based baselines. In all cases, transfer results are shown to be significantly behind training-language performance.
Tasks	Machine Translation, Question Answering
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07475v2
PDF	https://arxiv.org/pdf/1910.07475v2.pdf
PWC	https://paperswithcode.com/paper/mlqa-evaluating-cross-lingual-extractive
Repo	https://github.com/facebookresearch/MLQA
Framework	none

Attentive Modality Hopping Mechanism for Speech Emotion Recognition


Title	Attentive Modality Hopping Mechanism for Speech Emotion Recognition
Authors	Seunghyun Yoon, Subhadeep Dey, Hwanhee Lee, Kyomin Jung
Abstract	In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system. The traditional approaches tackle this task by fusing the knowledge from the various modalities independently for performing emotion classification. In contrast to these approaches, we tackle the problem by introducing an attention mechanism to combine the information. In this regard, we first apply a neural network to obtain hidden representations of the modalities. Then, the attention mechanism is defined to select and aggregate important parts of the video data by conditioning on the audio and text data. Furthermore, the attention mechanism is again applied to attend important parts of the speech and textual data, by considering other modality. Experiments are performed on the standard IEMOCAP dataset using all three modalities (audio, text, and video). The achieved results show a significant improvement of 3.65% in terms of weighted accuracy compared to the baseline system.
Tasks	Emotion Classification, Emotion Recognition, Multimodal Emotion Recognition, Speech Emotion Recognition
Published	2019-11-29
URL	https://arxiv.org/abs/1912.00846v1
PDF	https://arxiv.org/pdf/1912.00846v1.pdf
PWC	https://paperswithcode.com/paper/attentive-modality-hopping-mechanism-for
Repo	https://github.com/david-yoon/attentive-modality-hopping-for-SER
Framework	tf

Temporal Localization of Moments in Video Collections with Natural Language


Title	Temporal Localization of Moments in Video Collections with Natural Language
Authors	Victor Escorcia, Mattia Soldan, Josef Sivic, Bernard Ghanem, Bryan Russell
Abstract	In this paper, we introduce the task of retrieving relevant video moments from a large corpus of untrimmed, unsegmented videos given a natural language query. Our task poses unique challenges as a system must efficiently identify both the relevant videos and localize the relevant moments in the videos. This task is in contrast to prior work that localizes relevant moments in a single video or searches a large collection of already-segmented videos. For our task, we introduce Clip Alignment with Language (CAL), a model that aligns features for a natural language query to a sequence of short video clips that compose a candidate moment in a video. Our approach goes beyond prior work that aggregates video features over a candidate moment by allowing for finer clip alignment. Moreover, our approach is amenable to efficient indexing of the resulting clip-level representations, which makes it suitable for moment localization in large video collections. We evaluate our approach on three recently proposed datasets for temporal localization of moments in video with natural language extended to our video corpus moment retrieval setting: DiDeMo, Charades-STA, and ActivityNet-captions. We show that our CAL model outperforms the recently proposed Moment Context Network (MCN) on all criteria across all datasets on our proposed task, obtaining an 8%-85% and 11%-47% boost for average recall and median rank, respectively, and achieves 5x faster retrieval and 8x smaller index size with a 500K video corpus.
Tasks	Temporal Localization
Published	2019-07-30
URL	https://arxiv.org/abs/1907.12763v1
PDF	https://arxiv.org/pdf/1907.12763v1.pdf
PWC	https://paperswithcode.com/paper/temporal-localization-of-moments-in-video
Repo	https://github.com/escorciav/moments-retrieval-page
Framework	pytorch

An Unsupervised Autoregressive Model for Speech Representation Learning


Title	An Unsupervised Autoregressive Model for Speech Representation Learning
Authors	Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass
Abstract	This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is designed to preserve information for a wide range of downstream tasks. In addition, the proposed model does not require any phonetic or word boundary labels, allowing the model to benefit from large quantities of unlabeled data. Speech representations learned by our model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsupervised approaches. Further analysis shows that different levels of speech information are captured by our model at different layers. In particular, the lower layers tend to be more discriminative for speakers, while the upper layers provide more phonetic content.
Tasks	Representation Learning, Speaker Verification
Published	2019-04-05
URL	https://arxiv.org/abs/1904.03240v2
PDF	https://arxiv.org/pdf/1904.03240v2.pdf
PWC	https://paperswithcode.com/paper/an-unsupervised-autoregressive-model-for
Repo	https://github.com/samirsahoo007/Audio-and-Speech-Processing
Framework	pytorch

TERMINATOR: Better Automated UI Test Case Prioritization


Title	TERMINATOR: Better Automated UI Test Case Prioritization
Authors	Zhe Yu, Fahmid M. Fahid, Tim Menzies, Gregg Rothermel, Kyle Patrick, Snehit Cherian
Abstract	Automated UI testing is an important component of the continuous integration process of software development. A modern web-based UI is an amalgam of reports from dozens of microservices written by multiple teams. Queries on a page that opens up another will fail if any of that page’s microservices fails. As a result, the overall cost for automated UI testing is high since the UI elements cannot be tested in isolation. For example, the entire automated UI testing suite at LexisNexis takes around 30 hours (3-5 hours on the cloud) to execute, which slows down the continuous integration process. To mitigate this problem and give developers faster feedback on their code, test case prioritization techniques are used to reorder the automated UI test cases so that more failures can be detected earlier. Given that much of the automated UI testing is “black box” in nature, very little information (only the test case descriptions and testing results) can be utilized to prioritize these automated UI test cases. Hence, this paper evaluates 17 “black box” test case prioritization approaches that do not rely on source code information. Among these, we propose a novel TCP approach, that dynamically re-prioritizes the test cases when new failures are detected, by applying and adapting a state of the art framework from the total recall problem. Experimental results on LexisNexis automated UI testing data show that our new approach (which we call TERMINATOR), outperformed prior state of the art approaches in terms of failure detection rates with negligible CPU overhead.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.07019v2
PDF	https://arxiv.org/pdf/1905.07019v2.pdf
PWC	https://paperswithcode.com/paper/terminator-better-automated-ui-test-case
Repo	https://github.com/ai-se/Data-for-automated-UI-testing-from-LexisNexis
Framework	none

A Learnable ScatterNet: Locally Invariant Convolutional Layers


Title	A Learnable ScatterNet: Locally Invariant Convolutional Layers
Authors	Fergal Cotter, Nick Kingsbury
Abstract	In this paper we explore tying together the ideas from Scattering Transforms and Convolutional Neural Networks (CNN) for Image Analysis by proposing a learnable ScatterNet. Previous attempts at tying them together in hybrid networks have tended to keep the two parts separate, with the ScatterNet forming a fixed front end and a CNN forming a learned backend. We instead look at adding learning between scattering orders, as well as adding learned layers before the ScatterNet. We do this by breaking down the scattering orders into single convolutional-like layers we call ‘locally invariant’ layers, and adding a learned mixing term to this layer. Our experiments show that these locally invariant layers can improve accuracy when added to either a CNN or a ScatterNet. We also discover some surprising results in that the ScatterNet may be best positioned after one or more layers of learning rather than at the front of a neural network.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03137v1
PDF	http://arxiv.org/pdf/1903.03137v1.pdf
PWC	https://paperswithcode.com/paper/a-learnable-scatternet-locally-invariant
Repo	https://github.com/fbcotter/scatnet_learn
Framework	pytorch

Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening


Title	Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening
Authors	Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung, Esther Hwang, Naziya Samreen, S. Gene Kim, Laura Heacock, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
Abstract	We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network’s performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.
Tasks	Breast Cancer Detection
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08297v1
PDF	http://arxiv.org/pdf/1903.08297v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-networks-improve-radiologists
Repo	https://github.com/nyukat/breast_cancer_classifier
Framework	pytorch

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation


Title	HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Authors	Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S. Huang, Lei Zhang
Abstract	Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation.
Tasks	Multi-Person Pose Estimation, Pose Estimation, Pose Prediction, Representation Learning
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10357v3
PDF	https://arxiv.org/pdf/1908.10357v3.pdf
PWC	https://paperswithcode.com/paper/bottom-up-higher-resolution-networks-for
Repo	https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
Framework	pytorch

Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning


Title	Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning
Authors	Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Songhwai Oh
Abstract	In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems, we propose a model-free actor-critic RL method using Tsallis entropy maximization. We evaluate the regularization effect of the Tsallis entropy with various values of entropic indices and show that the entropic index controls the exploration tendency of the proposed method. For a different type of RL problems, we find that a different value of the entropic index is desirable. The proposed method is evaluated using the MuJoCo simulator and achieves the state-of-the-art performance.
Tasks
Published	2019-01-31
URL	http://arxiv.org/abs/1902.00137v2
PDF	http://arxiv.org/pdf/1902.00137v2.pdf
PWC	https://paperswithcode.com/paper/tsallis-reinforcement-learning-a-unified
Repo	https://github.com/rllab-snu/Reinforcement-Learning
Framework	none

Spatio-Temporal Filter Adaptive Network for Video Deblurring


Title	Spatio-Temporal Filter Adaptive Network for Video Deblurring
Authors	Shangchen Zhou, Jiawei Zhang, Jinshan Pan, Haozhe Xie, Wangmeng Zuo, Jimmy Ren
Abstract	Video deblurring is a challenging task due to the spatially variant blur caused by camera shake, object motions, and depth variations, etc. Existing methods usually estimate optical flow in the blurry video to align consecutive frames or approximate blur kernels. However, they tend to generate artifacts or cannot effectively remove blur when the estimated optical flow is not accurate. To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework. The proposed STFAN takes both blurry and restored images of the previous frame as well as blurry image of the current frame as input, and dynamically generates the spatially adaptive filters for the alignment and deblurring. We then propose the new Filter Adaptive Convolutional (FAC) layer to align the deblurred features of the previous frame with the current frame and remove the spatially variant blur from the features of the current frame. Finally, we develop a reconstruction network which takes the fusion of two transformed features to restore the clear frames. Both quantitative and qualitative evaluation results on the benchmark datasets and real-world videos demonstrate that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy, speed as well as model size.
Tasks	Deblurring, Optical Flow Estimation
Published	2019-04-28
URL	https://arxiv.org/abs/1904.12257v2
PDF	https://arxiv.org/pdf/1904.12257v2.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-filter-adaptive-network-for
Repo	https://github.com/sczhou/STFAN
Framework	pytorch

Unsupervised Scalable Representation Learning for Multivariate Time Series


Title	Unsupervised Scalable Representation Learning for Multivariate Time Series
Authors	Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi
Abstract	Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series.
Tasks	Representation Learning, Time Series
Published	2019-01-30
URL	https://arxiv.org/abs/1901.10738v4
PDF	https://arxiv.org/pdf/1901.10738v4.pdf
PWC	https://paperswithcode.com/paper/unsupervised-scalable-representation-learning
Repo	https://github.com/White-Link/UnsupervisedScalableRepresentationLearningTimeSeries
Framework	pytorch

Reweighted Expectation Maximization


Title	Reweighted Expectation Maximization
Authors	Adji B. Dieng, John Paisley
Abstract	Training deep generative models with maximum likelihood remains a challenge. The typical workaround is to use variational inference (VI) and maximize a lower bound to the log marginal likelihood of the data. Variational auto-encoders (VAEs) adopt this approach. They further amortize the cost of inference by using a recognition network to parameterize the variational family. Amortized VI scales approximate posterior inference in deep generative models to large datasets. However it introduces an amortization gap and leads to approximate posteriors of reduced expressivity due to the problem known as posterior collapse. In this paper, we consider expectation maximization (EM) as a paradigm for fitting deep generative models. Unlike VI, EM directly maximizes the log marginal likelihood of the data. We rediscover the importance weighted auto-encoder (IWAE) as an instance of EM and propose a new EM-based algorithm for fitting deep generative models called reweighted expectation maximization (REM). REM learns better generative models than the IWAE by decoupling the learning dynamics of the generative model and the recognition network using a separate expressive proposal found by moment matching. We compared REM to the VAE and the IWAE on several density estimation benchmarks and found it leads to significantly better performance as measured by log-likelihood.
Tasks	Bayesian Inference, Density Estimation
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05850v2
PDF	https://arxiv.org/pdf/1906.05850v2.pdf
PWC	https://paperswithcode.com/paper/reweighted-expectation-maximization
Repo	https://github.com/adjidieng/REM
Framework	pytorch

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos


Title	Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos
Authors	Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang
Abstract	While learning based depth estimation from images/videos has achieved substantial progress, there still exist intrinsic limitations. Supervised methods are limited by a small amount of ground truth or labeled data and unsupervised methods for monocular videos are mostly based on the static scene assumption, not performing well on real world scenarios with the presence of dynamic objects. In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision. The core contribution lies in RDN for proper handling of rigid and non-rigid motions of various objects such as rigidly moving cars and deformable humans. In particular, a deformation based motion representation is proposed to model individual object motion on 2D images. This representation enables our method to be applicable to diverse unconstrained monocular videos. Our method can not only achieve the state-of-the-art results on standard benchmarks KITTI and Cityscapes, but also show promising results on a crowded pedestrian tracking dataset, which demonstrates the effectiveness of the deformation based motion representation. Code and trained models are available at https://github.com/haofeixu/rdn4depth.
Tasks	Depth Estimation
Published	2019-02-26
URL	https://arxiv.org/abs/1902.09907v2
PDF	https://arxiv.org/pdf/1902.09907v2.pdf
PWC	https://paperswithcode.com/paper/region-deformer-networks-for-unsupervised
Repo	https://github.com/haofeixu/rdn4depth
Framework	tf