Paper Group ANR 578
Taming Momentum in a Distributed Asynchronous Environment. End-to-End Visual Speech Recognition for Small-Scale Datasets. Radial and Directional Posteriors for Bayesian Neural Networks. Discretization based Solutions for Secure Machine Learning against Adversarial Attacks. Pure Exploration with Multiple Correct Answers. A Ranking Model Motivated by …
Taming Momentum in a Distributed Asynchronous Environment
Title | Taming Momentum in a Distributed Asynchronous Environment |
Authors | Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster |
Abstract | Although distributed computing can significantly reduce the training time of deep neural networks, scaling the training process while maintaining high efficiency and final accuracy is challenging. Distributed asynchronous training enjoys near-linear speedup, but asynchrony causes gradient staleness, the main difficulty in scaling stochastic gradient descent to large clusters. Momentum, which is often used to accelerate convergence and escape local minima, exacerbates the gradient staleness, thereby hindering convergence. We propose DANA: a novel asynchronous distributed technique which is based on a new gradient staleness measure that we call the gap. By minimizing the gap, DANA mitigates the gradient staleness, despite using momentum, and therefore scales to large clusters while maintaining high final accuracy and fast convergence. DANA adapts Nesterov’s Accelerated Gradient to a distributed setting, computing the gradient on an estimated future position of the model’s parameters. In turn, we show that DANA’s estimation of the future position amplifies the use of a Taylor expansion, which relies on a fast Hessian approximation, making it much more effective and accurate. Our evaluation on the CIFAR and ImageNet datasets shows that DANA outperforms existing methods, in both final accuracy and convergence speed. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11612v1 |
https://arxiv.org/pdf/1907.11612v1.pdf | |
PWC | https://paperswithcode.com/paper/taming-momentum-in-a-distributed-asynchronous |
Repo | |
Framework | |
End-to-End Visual Speech Recognition for Small-Scale Datasets
Title | End-to-End Visual Speech Recognition for Small-Scale Datasets |
Authors | Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic |
Abstract | Visual speech recognition models traditionally consist of two stages, feature extraction and classification. Several deep learning approaches have been recently presented aiming to replace the feature extraction stage by automatically extracting features from mouth images. However, research on joint learning of features and classification remains limited. In addition, most of the existing methods require large amounts of data in order to achieve state-of-the-art performance, otherwise they under-perform. In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets. The model consists of two streams which extract features directly from the mouth and difference images, respectively. The temporal dynamics in each stream are modelled by a Bidirectional LSTM (BLSTM) and the fusion of the two streams takes place via another BLSTM. An absolute improvement of 0.6%, 3.4%, 3.9%, 11.4% over the state-of-the-art is reported on the OuluVS2, CUAVE, AVLetters and AVLetters2 databases, respectively. |
Tasks | Speech Recognition, Visual Speech Recognition |
Published | 2019-04-02 |
URL | https://arxiv.org/abs/1904.01954v4 |
https://arxiv.org/pdf/1904.01954v4.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-visual-speech-recognition-for |
Repo | |
Framework | |
Radial and Directional Posteriors for Bayesian Neural Networks
Title | Radial and Directional Posteriors for Bayesian Neural Networks |
Authors | Changyong Oh, Kamil Adamczewski, Mijung Park |
Abstract | We propose a new variational family for Bayesian neural networks. We decompose the variational posterior into two components, where the radial component captures the strength of each neuron in terms of its magnitude; while the directional component captures the statistical dependencies among the weight parameters. The dependencies learned via the directional density provide better modeling performance compared to the widely-used Gaussian mean-field-type variational family. In addition, the strength of input and output neurons learned via the radial density provides a structured way to compress neural networks. Indeed, experiments show that our variational family improves predictive performance and yields compressed networks simultaneously. |
Tasks | |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02603v2 |
http://arxiv.org/pdf/1902.02603v2.pdf | |
PWC | https://paperswithcode.com/paper/radial-and-directional-posteriors-for |
Repo | |
Framework | |
Discretization based Solutions for Secure Machine Learning against Adversarial Attacks
Title | Discretization based Solutions for Secure Machine Learning against Adversarial Attacks |
Authors | Priyadarshini Panda, Indranil Chakraborty, Kaushik Roy |
Abstract | Adversarial examples are perturbed inputs that are designed (from a deep learning network’s (DLN) parameter gradients) to mislead the DLN during test time. Intuitively, constraining the dimensionality of inputs or parameters of a network reduces the ‘space’ in which adversarial examples exist. Guided by this intuition, we demonstrate that discretization greatly improves the robustness of DLNs against adversarial attacks. Specifically, discretizing the input space (or allowed pixel levels from 256 values or 8-bit to 4 values or 2-bit) extensively improves the adversarial robustness of DLNs for a substantial range of perturbations for minimal loss in test accuracy. Furthermore, we find that Binary Neural Networks (BNNs) and related variants are intrinsically more robust than their full precision counterparts in adversarial scenarios. Combining input discretization with BNNs furthers the robustness even waiving the need for adversarial training for certain magnitude of perturbation values. We evaluate the effect of discretization on MNIST, CIFAR10, CIFAR100 and Imagenet datasets. Across all datasets, we observe maximal adversarial resistance with 2-bit input discretization that incurs an adversarial accuracy loss of just ~1-2% as compared to clean test accuracy. |
Tasks | |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03151v2 |
http://arxiv.org/pdf/1902.03151v2.pdf | |
PWC | https://paperswithcode.com/paper/discretization-based-solutions-for-secure |
Repo | |
Framework | |
Pure Exploration with Multiple Correct Answers
Title | Pure Exploration with Multiple Correct Answers |
Authors | Rémy Degenne, Wouter M. Koolen |
Abstract | We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound. |
Tasks | |
Published | 2019-02-09 |
URL | http://arxiv.org/abs/1902.03475v1 |
http://arxiv.org/pdf/1902.03475v1.pdf | |
PWC | https://paperswithcode.com/paper/pure-exploration-with-multiple-correct |
Repo | |
Framework | |
A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments
Title | A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments |
Authors | Rui Xia, Vincent Y. F. Tan, Louis Filstroff, Cédric Févotte |
Abstract | We propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under the proposed statistical model. The model is tested on datasets involving the outcomes of matches between 20 top male and female tennis players over 14 major tournaments for men (including the Grand Slams and the ATP Masters 1000) and 16 major tournaments for women over the past 10 years. Our model automatically infers that the surface of the court (e.g., clay or hard court) is a key determinant of the performances of male players, but less so for females. Top players on various surfaces over this longitudinal period are also identified in an objective manner. |
Tasks | |
Published | 2019-03-15 |
URL | https://arxiv.org/abs/1903.06500v2 |
https://arxiv.org/pdf/1903.06500v2.pdf | |
PWC | https://paperswithcode.com/paper/a-ranking-model-motivated-by-nonnegative |
Repo | |
Framework | |
Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation
Title | Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation |
Authors | Andrea Pilzer, Stéphane Lathuilière, Nicu Sebe, Elisa Ricci |
Abstract | Nowadays, the majority of state of the art monocular depth estimation techniques are based on supervised deep learning models. However, collecting RGB images with associated depth maps is a very time consuming procedure. Therefore, recent works have proposed deep architectures for addressing the monocular depth prediction task as a reconstruction problem, thus avoiding the need of collecting ground-truth depth. Following these works, we propose a novel self-supervised deep model for estimating depth maps. Our framework exploits two main strategies: refinement via cycle-inconsistency and distillation. Specifically, first a \emph{student} network is trained to predict a disparity map such as to recover from a frame in a camera view the associated image in the opposite view. Then, a backward cycle network is applied to the generated image to re-synthesize back the input image, estimating the opposite disparity. A third network exploits the inconsistency between the original and the reconstructed input frame in order to output a refined depth map. Finally, knowledge distillation is exploited, such as to transfer information from the refinement network to the student. Our extensive experimental evaluation demonstrate the effectiveness of the proposed framework which outperforms state of the art unsupervised methods on the KITTI benchmark. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04202v2 |
http://arxiv.org/pdf/1903.04202v2.pdf | |
PWC | https://paperswithcode.com/paper/refine-and-distill-exploiting-cycle |
Repo | |
Framework | |
Online Structured Sparsity-based Moving Object Detection from Satellite Videos
Title | Online Structured Sparsity-based Moving Object Detection from Satellite Videos |
Authors | Junpeng Zhang, Xiuping Jia, Jiankun Hu, Jocelyn Chanussot |
Abstract | Inspired by the recent developments in computer vision, low-rank and structured sparse matrix decomposition can be potentially be used for extract moving objects in satellite videos. This set of approaches seeks for rank minimization on the background that typically requires batch-based optimization over a sequence of frames, which causes delays in processing and limits their applications. To remedy this delay, we propose an Online Low-rank and Structured Sparse Decomposition (O-LSD). O-LSD reformulates the batch-based low-rank matrix decomposition with the structured sparse penalty to its equivalent frame-wise separable counterpart, which then defines a stochastic optimization problem for online subspace basis estimation. In order to promote online processing, O-LSD conducts the foreground and background separation and the subspace basis update alternatingly for every frame in a video. We also show the convergence of O-LSD theoretically. Experimental results on two satellite videos demonstrate the performance of O-LSD in term of accuracy and time consumption is comparable with the batch-based approaches with significantly reduced delay in processing. |
Tasks | Object Detection, Stochastic Optimization |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.12989v3 |
https://arxiv.org/pdf/1911.12989v3.pdf | |
PWC | https://paperswithcode.com/paper/online-structured-sparsity-based-moving |
Repo | |
Framework | |
Image Recognition using Region Creep
Title | Image Recognition using Region Creep |
Authors | Kieran Greer |
Abstract | This paper describes a new type of image classifier that uses a shallow architecture with a very quick learning phase. The image is parsed into smaller areas and each area is saved directly for a region, along with the related output category. When a new image is presented, a direct match with each part is made and the best matching areas returned. These areas can overlap with each other and when moving from a region to its neighbours, there is likely to be only small changes in the area image part. It would therefore be possible to guess what the best image part is for one region by cumulating the results of its neighbours. This is in fact an associative feature of the classifier that can re-construct missing or noisy input by substituting the direct match with what the region match suggests and is being called ‘Region Creep’. As each area stores the categories it belongs to, the image classification process sums this to return a preferred category for the whole image. The classifier works mostly at a local level and so to give it some type of global picture, rules are added. These rules work at the whole image level and basically state that if one set of pixels are present, another set should be removed or should also be present. While the rules appear to be very specific, most of the construction can be done automatically. Tests on a set of hand-written numbers have produced state-of-the-art results. |
Tasks | Image Classification |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10811v1 |
https://arxiv.org/pdf/1909.10811v1.pdf | |
PWC | https://paperswithcode.com/paper/image-recognition-using-region-creep |
Repo | |
Framework | |
A Unified Formulation for Visual Odometry
Title | A Unified Formulation for Visual Odometry |
Authors | Georges Younes, Daniel Asmar, John Zelek |
Abstract | Monocular Odometry systems can be broadly categorized as being either Direct, Indirect, or a hybrid of both. While Indirect systems process an alternative image representation to compute geometric residuals, Direct methods process the image pixels directly to generate photometric residuals. Both paradigms have distinct but often complementary properties. This paper presents a Unified Formulation for Visual Odometry, referred to as UFVO, with the following key contributions: (1) a tight coupling of photometric (Direct) and geometric (Indirect) measurements using a joint multi-objective optimization, (2) the use of a utility function as a decision maker that incorporates prior knowledge on both paradigms, (3) descriptor sharing, where a feature can have more than one type of descriptor and its different descriptors are used for tracking and mapping, (4) the depth estimation of both corner features and pixel features within the same map using an inverse depth parametrization, and (5) a corner and pixel selection strategy that extracts both types of information, while promoting a uniform distribution over the image domain. Experiments show that our proposed system can handle large inter-frame motions, inherits the sub-pixel accuracy of direct methods, can run efficiently in real-time, can generate an Indirect map representation at a marginal computational cost when compared to traditional Indirect systems, all while outperforming state of the art in Direct, Indirect and hybrid systems. |
Tasks | Depth Estimation, Visual Odometry |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04253v1 |
http://arxiv.org/pdf/1903.04253v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-formulation-for-visual-odometry |
Repo | |
Framework | |
Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation
Title | Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation |
Authors | Suryansh Kumar, Ram Srivatsav Ghorakavi, Yuchao Dai, Hongdong Li |
Abstract | Recent geometric methods need reliable estimates of 3D motion parameters to procure accurate dense depth map of a complex dynamic scene from monocular images \cite{kumar2017monocular, ranftl2016dense}. Generally, to estimate \textbf{precise} measurements of relative 3D motion parameters and to validate its accuracy using image data is a challenging task. In this work, we propose an alternative approach that circumvents the 3D motion estimation requirement to obtain a dense depth map of a dynamic scene. Given per-pixel optical flow correspondences between two consecutive frames and, the sparse depth prior for the reference frame, we show that, we can effectively recover the dense depth map for the successive frames without solving for 3D motion parameters. Our method assumes a piece-wise planar model of a dynamic scene, which undergoes rigid transformation locally, and as-rigid-as-possible transformation globally between two successive frames. Under our assumption, we can avoid the explicit estimation of 3D rotation and translation to estimate scene depth. In essence, our formulation provides an unconventional way to think and recover the dense depth map of a complex dynamic scene which is incremental and motion free in nature. Our proposed method does not make object level or any other high-level prior assumption about the dynamic scene, as a result, it is applicable to a wide range of scenarios. Experimental results on the benchmarks dataset show the competence of our approach for multiple frames. |
Tasks | Depth Estimation, Motion Estimation, Optical Flow Estimation |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03791v2 |
http://arxiv.org/pdf/1902.03791v2.pdf | |
PWC | https://paperswithcode.com/paper/a-motion-free-approach-to-dense-depth |
Repo | |
Framework | |
NeurAll: Towards a Unified Model for Visual Perception in Automated Driving
Title | NeurAll: Towards a Unified Model for Visual Perception in Automated Driving |
Authors | Ganesh Sistu, Isabelle Leang, Sumanth Chennupati, Ciaran Hughes, Stefan Milz, Senthil Yogamani, Samir Rawashdeh |
Abstract | Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There is also some evidence for other benefits in improving accuracy for some tasks and easing development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. We first demonstrate results of multi-stream learning and auxiliary learning which are important ingredients to scale to a large multi-task model. Finally, we implement a two-stream three-task network which performs better in many cases compared to their corresponding single-task models, while maintaining network size. |
Tasks | Auxiliary Learning, Depth Estimation, Object Detection, Object Recognition, Semantic Segmentation |
Published | 2019-02-10 |
URL | https://arxiv.org/abs/1902.03589v2 |
https://arxiv.org/pdf/1902.03589v2.pdf | |
PWC | https://paperswithcode.com/paper/neurall-towards-a-unified-model-for-visual |
Repo | |
Framework | |
Lung Nodules Detection and Segmentation Using 3D Mask-RCNN
Title | Lung Nodules Detection and Segmentation Using 3D Mask-RCNN |
Authors | Evi Kopelowitz, Guy Engelhard |
Abstract | Accurate assessment of Lung nodules is a time consuming and error prone ingredient of the radiologist interpretation work. Automating 3D volume detection and segmentation can improve workflow as well as patient care. Previous works have focused either on detecting lung nodules from a full CT scan or on segmenting them from a small ROI. We adapt the state of the art architecture for 2D object detection and segmentation, MaskRCNN, to handle 3D images and employ it to detect and segment lung nodules from CT scans. We report on competitive results for the lung nodule detection on LUNA16 data set. The added value of our method is that in addition to lung nodule detection, our framework produces 3D segmentations of the detected nodules. |
Tasks | Lung Nodule Detection, Object Detection |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07676v1 |
https://arxiv.org/pdf/1907.07676v1.pdf | |
PWC | https://paperswithcode.com/paper/lung-nodules-detection-and-segmentation-using |
Repo | |
Framework | |
Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada
Title | Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada |
Authors | Mathieu Ravaut, Hamed Sadeghi, Kin Kwan Leung, Maksims Volkovs, Laura C. Rosella |
Abstract | Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The datasets include demographics, lab measurements,drug benefits, healthcare system interactions, ambulatory and hospitalizations records. We perform one of the first large-scale machine learning studies with this data to study the task of predicting diabetes in a range of 1-10 years ahead, which requires no additional screening of individuals.In the best setup, we reach a test AUC of 80.3 with a single-model trained on an observation window of 5 years with a one-year buffer using all datasets. A subset of top 15 features alone (out of a total of 963) could provide a test AUC of 79.1. In this paper, we provide extensive machine learning model performance and feature contribution analysis, which enables us to narrow down to the most important features useful for diabetes forecasting. Examples include chronic conditions such as asthma and hypertension, lab results, diagnostic codes in insurance claims, age and geographical information. |
Tasks | |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04137v1 |
http://arxiv.org/pdf/1904.04137v1.pdf | |
PWC | https://paperswithcode.com/paper/diabetes-mellitus-forecasting-using |
Repo | |
Framework | |
Time-aware Test Case Execution Scheduling for Cyber-Physical Systems
Title | Time-aware Test Case Execution Scheduling for Cyber-Physical Systems |
Authors | Morten Mossige, Arnaud Gotlieb, Helge Spieker, Hein Meling, Mats Carlsson |
Abstract | Testing cyber-physical systems involves the execution of test cases on target-machines equipped with the latest release of a software control system. When testing industrial robots, it is common that the target machines need to share some common resources, e.g., costly hardware devices, and so there is a need to schedule test case execution on the target machines, accounting for these shared resources. With a large number of such tests executed on a regular basis, this scheduling becomes difficult to manage manually. In fact, with manual test execution planning and scheduling, some robots may remain unoccupied for long periods of time and some test cases may not be executed. This paper introduces TC-Sched, a time-aware method for automated test case execution scheduling. TC-Sched uses Constraint Programming to schedule tests to run on multiple machines constrained by the tests’ access to shared resources, such as measurement or networking devices. The CP model is written in SICStus Prolog and uses the Cumulatives global constraint. Given a set of test cases, a set of machines, and a set of shared resources, TC-Sched produces an execution schedule where each test is executed once with minimal time between when a source code change is committed and the test results are reported to the developer. Experiments reveal that TC-Sched can schedule 500 test cases over 100 machines in less than 4 minutes for 99.5% of the instances. In addition, TC-Sched largely outperforms simpler methods based on a greedy algorithm and is suitable for deployment on industrial robot testing. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04627v1 |
http://arxiv.org/pdf/1902.04627v1.pdf | |
PWC | https://paperswithcode.com/paper/time-aware-test-case-execution-scheduling-for |
Repo | |
Framework | |