January 29, 2020

3383 words 16 mins read

Paper Group ANR 578

Taming Momentum in a Distributed Asynchronous Environment. End-to-End Visual Speech Recognition for Small-Scale Datasets. Radial and Directional Posteriors for Bayesian Neural Networks. Discretization based Solutions for Secure Machine Learning against Adversarial Attacks. Pure Exploration with Multiple Correct Answers. A Ranking Model Motivated by …

Taming Momentum in a Distributed Asynchronous Environment


Title	Taming Momentum in a Distributed Asynchronous Environment
Authors	Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
Abstract	Although distributed computing can significantly reduce the training time of deep neural networks, scaling the training process while maintaining high efficiency and final accuracy is challenging. Distributed asynchronous training enjoys near-linear speedup, but asynchrony causes gradient staleness, the main difficulty in scaling stochastic gradient descent to large clusters. Momentum, which is often used to accelerate convergence and escape local minima, exacerbates the gradient staleness, thereby hindering convergence. We propose DANA: a novel asynchronous distributed technique which is based on a new gradient staleness measure that we call the gap. By minimizing the gap, DANA mitigates the gradient staleness, despite using momentum, and therefore scales to large clusters while maintaining high final accuracy and fast convergence. DANA adapts Nesterov’s Accelerated Gradient to a distributed setting, computing the gradient on an estimated future position of the model’s parameters. In turn, we show that DANA’s estimation of the future position amplifies the use of a Taylor expansion, which relies on a fast Hessian approximation, making it much more effective and accurate. Our evaluation on the CIFAR and ImageNet datasets shows that DANA outperforms existing methods, in both final accuracy and convergence speed.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11612v1
PDF	https://arxiv.org/pdf/1907.11612v1.pdf
PWC	https://paperswithcode.com/paper/taming-momentum-in-a-distributed-asynchronous
Repo
Framework

End-to-End Visual Speech Recognition for Small-Scale Datasets


Title	End-to-End Visual Speech Recognition for Small-Scale Datasets
Authors	Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic
Abstract	Visual speech recognition models traditionally consist of two stages, feature extraction and classification. Several deep learning approaches have been recently presented aiming to replace the feature extraction stage by automatically extracting features from mouth images. However, research on joint learning of features and classification remains limited. In addition, most of the existing methods require large amounts of data in order to achieve state-of-the-art performance, otherwise they under-perform. In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets. The model consists of two streams which extract features directly from the mouth and difference images, respectively. The temporal dynamics in each stream are modelled by a Bidirectional LSTM (BLSTM) and the fusion of the two streams takes place via another BLSTM. An absolute improvement of 0.6%, 3.4%, 3.9%, 11.4% over the state-of-the-art is reported on the OuluVS2, CUAVE, AVLetters and AVLetters2 databases, respectively.
Tasks	Speech Recognition, Visual Speech Recognition
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01954v4
PDF	https://arxiv.org/pdf/1904.01954v4.pdf
PWC	https://paperswithcode.com/paper/end-to-end-visual-speech-recognition-for
Repo
Framework

Radial and Directional Posteriors for Bayesian Neural Networks


Title	Radial and Directional Posteriors for Bayesian Neural Networks
Authors	Changyong Oh, Kamil Adamczewski, Mijung Park
Abstract	We propose a new variational family for Bayesian neural networks. We decompose the variational posterior into two components, where the radial component captures the strength of each neuron in terms of its magnitude; while the directional component captures the statistical dependencies among the weight parameters. The dependencies learned via the directional density provide better modeling performance compared to the widely-used Gaussian mean-field-type variational family. In addition, the strength of input and output neurons learned via the radial density provides a structured way to compress neural networks. Indeed, experiments show that our variational family improves predictive performance and yields compressed networks simultaneously.
Tasks
Published	2019-02-07
URL	http://arxiv.org/abs/1902.02603v2
PDF	http://arxiv.org/pdf/1902.02603v2.pdf
PWC	https://paperswithcode.com/paper/radial-and-directional-posteriors-for
Repo
Framework

Discretization based Solutions for Secure Machine Learning against Adversarial Attacks


Title	Discretization based Solutions for Secure Machine Learning against Adversarial Attacks
Authors	Priyadarshini Panda, Indranil Chakraborty, Kaushik Roy
Abstract	Adversarial examples are perturbed inputs that are designed (from a deep learning network’s (DLN) parameter gradients) to mislead the DLN during test time. Intuitively, constraining the dimensionality of inputs or parameters of a network reduces the ‘space’ in which adversarial examples exist. Guided by this intuition, we demonstrate that discretization greatly improves the robustness of DLNs against adversarial attacks. Specifically, discretizing the input space (or allowed pixel levels from 256 values or 8-bit to 4 values or 2-bit) extensively improves the adversarial robustness of DLNs for a substantial range of perturbations for minimal loss in test accuracy. Furthermore, we find that Binary Neural Networks (BNNs) and related variants are intrinsically more robust than their full precision counterparts in adversarial scenarios. Combining input discretization with BNNs furthers the robustness even waiving the need for adversarial training for certain magnitude of perturbation values. We evaluate the effect of discretization on MNIST, CIFAR10, CIFAR100 and Imagenet datasets. Across all datasets, we observe maximal adversarial resistance with 2-bit input discretization that incurs an adversarial accuracy loss of just ~1-2% as compared to clean test accuracy.
Tasks
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03151v2
PDF	http://arxiv.org/pdf/1902.03151v2.pdf
PWC	https://paperswithcode.com/paper/discretization-based-solutions-for-secure
Repo
Framework

Pure Exploration with Multiple Correct Answers


Title	Pure Exploration with Multiple Correct Answers
Authors	Rémy Degenne, Wouter M. Koolen
Abstract	We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.
Tasks
Published	2019-02-09
URL	http://arxiv.org/abs/1902.03475v1
PDF	http://arxiv.org/pdf/1902.03475v1.pdf
PWC	https://paperswithcode.com/paper/pure-exploration-with-multiple-correct
Repo
Framework

A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments


Title	A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments
Authors	Rui Xia, Vincent Y. F. Tan, Louis Filstroff, Cédric Févotte
Abstract	We propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under the proposed statistical model. The model is tested on datasets involving the outcomes of matches between 20 top male and female tennis players over 14 major tournaments for men (including the Grand Slams and the ATP Masters 1000) and 16 major tournaments for women over the past 10 years. Our model automatically infers that the surface of the court (e.g., clay or hard court) is a key determinant of the performances of male players, but less so for females. Top players on various surfaces over this longitudinal period are also identified in an objective manner.
Tasks
Published	2019-03-15
URL	https://arxiv.org/abs/1903.06500v2
PDF	https://arxiv.org/pdf/1903.06500v2.pdf
PWC	https://paperswithcode.com/paper/a-ranking-model-motivated-by-nonnegative
Repo
Framework

Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation


Title	Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation
Authors	Andrea Pilzer, Stéphane Lathuilière, Nicu Sebe, Elisa Ricci
Abstract	Nowadays, the majority of state of the art monocular depth estimation techniques are based on supervised deep learning models. However, collecting RGB images with associated depth maps is a very time consuming procedure. Therefore, recent works have proposed deep architectures for addressing the monocular depth prediction task as a reconstruction problem, thus avoiding the need of collecting ground-truth depth. Following these works, we propose a novel self-supervised deep model for estimating depth maps. Our framework exploits two main strategies: refinement via cycle-inconsistency and distillation. Specifically, first a \emph{student} network is trained to predict a disparity map such as to recover from a frame in a camera view the associated image in the opposite view. Then, a backward cycle network is applied to the generated image to re-synthesize back the input image, estimating the opposite disparity. A third network exploits the inconsistency between the original and the reconstructed input frame in order to output a refined depth map. Finally, knowledge distillation is exploited, such as to transfer information from the refinement network to the student. Our extensive experimental evaluation demonstrate the effectiveness of the proposed framework which outperforms state of the art unsupervised methods on the KITTI benchmark.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04202v2
PDF	http://arxiv.org/pdf/1903.04202v2.pdf
PWC	https://paperswithcode.com/paper/refine-and-distill-exploiting-cycle
Repo
Framework

Online Structured Sparsity-based Moving Object Detection from Satellite Videos


Title	Online Structured Sparsity-based Moving Object Detection from Satellite Videos
Authors	Junpeng Zhang, Xiuping Jia, Jiankun Hu, Jocelyn Chanussot
Abstract	Inspired by the recent developments in computer vision, low-rank and structured sparse matrix decomposition can be potentially be used for extract moving objects in satellite videos. This set of approaches seeks for rank minimization on the background that typically requires batch-based optimization over a sequence of frames, which causes delays in processing and limits their applications. To remedy this delay, we propose an Online Low-rank and Structured Sparse Decomposition (O-LSD). O-LSD reformulates the batch-based low-rank matrix decomposition with the structured sparse penalty to its equivalent frame-wise separable counterpart, which then defines a stochastic optimization problem for online subspace basis estimation. In order to promote online processing, O-LSD conducts the foreground and background separation and the subspace basis update alternatingly for every frame in a video. We also show the convergence of O-LSD theoretically. Experimental results on two satellite videos demonstrate the performance of O-LSD in term of accuracy and time consumption is comparable with the batch-based approaches with significantly reduced delay in processing.
Tasks	Object Detection, Stochastic Optimization
Published	2019-11-29
URL	https://arxiv.org/abs/1911.12989v3
PDF	https://arxiv.org/pdf/1911.12989v3.pdf
PWC	https://paperswithcode.com/paper/online-structured-sparsity-based-moving
Repo
Framework

Image Recognition using Region Creep


Title	Image Recognition using Region Creep
Authors	Kieran Greer
Abstract	This paper describes a new type of image classifier that uses a shallow architecture with a very quick learning phase. The image is parsed into smaller areas and each area is saved directly for a region, along with the related output category. When a new image is presented, a direct match with each part is made and the best matching areas returned. These areas can overlap with each other and when moving from a region to its neighbours, there is likely to be only small changes in the area image part. It would therefore be possible to guess what the best image part is for one region by cumulating the results of its neighbours. This is in fact an associative feature of the classifier that can re-construct missing or noisy input by substituting the direct match with what the region match suggests and is being called ‘Region Creep’. As each area stores the categories it belongs to, the image classification process sums this to return a preferred category for the whole image. The classifier works mostly at a local level and so to give it some type of global picture, rules are added. These rules work at the whole image level and basically state that if one set of pixels are present, another set should be removed or should also be present. While the rules appear to be very specific, most of the construction can be done automatically. Tests on a set of hand-written numbers have produced state-of-the-art results.
Tasks	Image Classification
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10811v1
PDF	https://arxiv.org/pdf/1909.10811v1.pdf
PWC	https://paperswithcode.com/paper/image-recognition-using-region-creep
Repo
Framework

A Unified Formulation for Visual Odometry


Title	A Unified Formulation for Visual Odometry
Authors	Georges Younes, Daniel Asmar, John Zelek
Abstract	Monocular Odometry systems can be broadly categorized as being either Direct, Indirect, or a hybrid of both. While Indirect systems process an alternative image representation to compute geometric residuals, Direct methods process the image pixels directly to generate photometric residuals. Both paradigms have distinct but often complementary properties. This paper presents a Unified Formulation for Visual Odometry, referred to as UFVO, with the following key contributions: (1) a tight coupling of photometric (Direct) and geometric (Indirect) measurements using a joint multi-objective optimization, (2) the use of a utility function as a decision maker that incorporates prior knowledge on both paradigms, (3) descriptor sharing, where a feature can have more than one type of descriptor and its different descriptors are used for tracking and mapping, (4) the depth estimation of both corner features and pixel features within the same map using an inverse depth parametrization, and (5) a corner and pixel selection strategy that extracts both types of information, while promoting a uniform distribution over the image domain. Experiments show that our proposed system can handle large inter-frame motions, inherits the sub-pixel accuracy of direct methods, can run efficiently in real-time, can generate an Indirect map representation at a marginal computational cost when compared to traditional Indirect systems, all while outperforming state of the art in Direct, Indirect and hybrid systems.
Tasks	Depth Estimation, Visual Odometry
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04253v1
PDF	http://arxiv.org/pdf/1903.04253v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-formulation-for-visual-odometry
Repo
Framework

Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation


Title	Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation
Authors	Suryansh Kumar, Ram Srivatsav Ghorakavi, Yuchao Dai, Hongdong Li
Abstract	Recent geometric methods need reliable estimates of 3D motion parameters to procure accurate dense depth map of a complex dynamic scene from monocular images \cite{kumar2017monocular, ranftl2016dense}. Generally, to estimate \textbf{precise} measurements of relative 3D motion parameters and to validate its accuracy using image data is a challenging task. In this work, we propose an alternative approach that circumvents the 3D motion estimation requirement to obtain a dense depth map of a dynamic scene. Given per-pixel optical flow correspondences between two consecutive frames and, the sparse depth prior for the reference frame, we show that, we can effectively recover the dense depth map for the successive frames without solving for 3D motion parameters. Our method assumes a piece-wise planar model of a dynamic scene, which undergoes rigid transformation locally, and as-rigid-as-possible transformation globally between two successive frames. Under our assumption, we can avoid the explicit estimation of 3D rotation and translation to estimate scene depth. In essence, our formulation provides an unconventional way to think and recover the dense depth map of a complex dynamic scene which is incremental and motion free in nature. Our proposed method does not make object level or any other high-level prior assumption about the dynamic scene, as a result, it is applicable to a wide range of scenarios. Experimental results on the benchmarks dataset show the competence of our approach for multiple frames.
Tasks	Depth Estimation, Motion Estimation, Optical Flow Estimation
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03791v2
PDF	http://arxiv.org/pdf/1902.03791v2.pdf
PWC	https://paperswithcode.com/paper/a-motion-free-approach-to-dense-depth
Repo
Framework

NeurAll: Towards a Unified Model for Visual Perception in Automated Driving


Title	NeurAll: Towards a Unified Model for Visual Perception in Automated Driving
Authors	Ganesh Sistu, Isabelle Leang, Sumanth Chennupati, Ciaran Hughes, Stefan Milz, Senthil Yogamani, Samir Rawashdeh
Abstract	Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There is also some evidence for other benefits in improving accuracy for some tasks and easing development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. We first demonstrate results of multi-stream learning and auxiliary learning which are important ingredients to scale to a large multi-task model. Finally, we implement a two-stream three-task network which performs better in many cases compared to their corresponding single-task models, while maintaining network size.
Tasks	Auxiliary Learning, Depth Estimation, Object Detection, Object Recognition, Semantic Segmentation
Published	2019-02-10
URL	https://arxiv.org/abs/1902.03589v2
PDF	https://arxiv.org/pdf/1902.03589v2.pdf
PWC	https://paperswithcode.com/paper/neurall-towards-a-unified-model-for-visual
Repo
Framework

Lung Nodules Detection and Segmentation Using 3D Mask-RCNN


Title	Lung Nodules Detection and Segmentation Using 3D Mask-RCNN
Authors	Evi Kopelowitz, Guy Engelhard
Abstract	Accurate assessment of Lung nodules is a time consuming and error prone ingredient of the radiologist interpretation work. Automating 3D volume detection and segmentation can improve workflow as well as patient care. Previous works have focused either on detecting lung nodules from a full CT scan or on segmenting them from a small ROI. We adapt the state of the art architecture for 2D object detection and segmentation, MaskRCNN, to handle 3D images and employ it to detect and segment lung nodules from CT scans. We report on competitive results for the lung nodule detection on LUNA16 data set. The added value of our method is that in addition to lung nodule detection, our framework produces 3D segmentations of the detected nodules.
Tasks	Lung Nodule Detection, Object Detection
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07676v1
PDF	https://arxiv.org/pdf/1907.07676v1.pdf
PWC	https://paperswithcode.com/paper/lung-nodules-detection-and-segmentation-using
Repo
Framework

Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada


Title	Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada
Authors	Mathieu Ravaut, Hamed Sadeghi, Kin Kwan Leung, Maksims Volkovs, Laura C. Rosella
Abstract	Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The datasets include demographics, lab measurements,drug benefits, healthcare system interactions, ambulatory and hospitalizations records. We perform one of the first large-scale machine learning studies with this data to study the task of predicting diabetes in a range of 1-10 years ahead, which requires no additional screening of individuals.In the best setup, we reach a test AUC of 80.3 with a single-model trained on an observation window of 5 years with a one-year buffer using all datasets. A subset of top 15 features alone (out of a total of 963) could provide a test AUC of 79.1. In this paper, we provide extensive machine learning model performance and feature contribution analysis, which enables us to narrow down to the most important features useful for diabetes forecasting. Examples include chronic conditions such as asthma and hypertension, lab results, diagnostic codes in insurance claims, age and geographical information.
Tasks
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04137v1
PDF	http://arxiv.org/pdf/1904.04137v1.pdf
PWC	https://paperswithcode.com/paper/diabetes-mellitus-forecasting-using
Repo
Framework

Time-aware Test Case Execution Scheduling for Cyber-Physical Systems


Title	Time-aware Test Case Execution Scheduling for Cyber-Physical Systems
Authors	Morten Mossige, Arnaud Gotlieb, Helge Spieker, Hein Meling, Mats Carlsson
Abstract	Testing cyber-physical systems involves the execution of test cases on target-machines equipped with the latest release of a software control system. When testing industrial robots, it is common that the target machines need to share some common resources, e.g., costly hardware devices, and so there is a need to schedule test case execution on the target machines, accounting for these shared resources. With a large number of such tests executed on a regular basis, this scheduling becomes difficult to manage manually. In fact, with manual test execution planning and scheduling, some robots may remain unoccupied for long periods of time and some test cases may not be executed. This paper introduces TC-Sched, a time-aware method for automated test case execution scheduling. TC-Sched uses Constraint Programming to schedule tests to run on multiple machines constrained by the tests’ access to shared resources, such as measurement or networking devices. The CP model is written in SICStus Prolog and uses the Cumulatives global constraint. Given a set of test cases, a set of machines, and a set of shared resources, TC-Sched produces an execution schedule where each test is executed once with minimal time between when a source code change is committed and the test results are reported to the developer. Experiments reveal that TC-Sched can schedule 500 test cases over 100 machines in less than 4 minutes for 99.5% of the instances. In addition, TC-Sched largely outperforms simpler methods based on a greedy algorithm and is suitable for deployment on industrial robot testing.
Tasks
Published	2019-02-12
URL	http://arxiv.org/abs/1902.04627v1
PDF	http://arxiv.org/pdf/1902.04627v1.pdf
PWC	https://paperswithcode.com/paper/time-aware-test-case-execution-scheduling-for
Repo
Framework