February 2, 2020

3061 words 15 mins read

Paper Group AWR 55

Probabilistic Residual Learning for Aleatoric Uncertainty in Image Restoration. GolfDB: A Video Database for Golf Swing Sequencing. Street Crossing Aid Using Light-weight CNNs for the Visually Impaired. Batch Policy Learning under Constraints. Green AI. Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolu …

Probabilistic Residual Learning for Aleatoric Uncertainty in Image Restoration


Title	Probabilistic Residual Learning for Aleatoric Uncertainty in Image Restoration
Authors	Chen Zhang, Bangti Jin
Abstract	Aleatoric uncertainty is an intrinsic property of ill-posed inverse and imaging problems. Its quantification is vital for assessing the reliability of relevant point estimates. In this paper, we propose an efficient framework for quantifying aleatoric uncertainty for deep residual learning and showcase its significant potential on image restoration. In the framework, we divide the conditional probability modeling for the residual variable into a deterministic homo-dimensional level, a stochastic low-dimensional level and a merging level. The low-dimensionality is especially suitable for sparse correlation between image pixels, enables efficient sampling for high dimensional problems and acts as a regularizer for the distribution. Preliminary numerical experiments show that the proposed method can give not only state-of-the-art point estimates of image restoration but also useful associated uncertainty information.
Tasks	Image Restoration
Published	2019-08-01
URL	https://arxiv.org/abs/1908.01010v2
PDF	https://arxiv.org/pdf/1908.01010v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-residual-learning-for-aleatoric
Repo	https://github.com/chenzxyz/prob_res_learning
Framework	tf

GolfDB: A Video Database for Golf Swing Sequencing


Title	GolfDB: A Video Database for Golf Swing Sequencing
Authors	William McNally, Kanav Vats, Tyler Pinto, Chris Dulhanty, John McPhee, Alexander Wong
Abstract	The golf swing is a complex movement requiring considerable full-body coordination to execute proficiently. As such, it is the subject of frequent scrutiny and extensive biomechanical analyses. In this paper, we introduce the notion of golf swing sequencing for detecting key events in the golf swing and facilitating golf swing analysis. To enable consistent evaluation of golf swing sequencing performance, we also introduce the benchmark database GolfDB, consisting of 1400 high-quality golf swing videos, each labeled with event frames, bounding box, player name and sex, club type, and view type. Furthermore, to act as a reference baseline for evaluating golf swing sequencing performance on GolfDB, we propose a lightweight deep neural network called SwingNet, which possesses a hybrid deep convolutional and recurrent neural network architecture. SwingNet correctly detects eight golf swing events at an average rate of 76.1%, and six out of eight events at a rate of 91.8%. In line with the proposed baseline SwingNet, we advocate the use of computationally efficient models in future research to promote in-the-field analysis via deployment on readily-available mobile devices.
Tasks
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06528v1
PDF	http://arxiv.org/pdf/1903.06528v1.pdf
PWC	https://paperswithcode.com/paper/golfdb-a-video-database-for-golf-swing
Repo	https://github.com/wmcnally/GolfDB
Framework	pytorch

Street Crossing Aid Using Light-weight CNNs for the Visually Impaired


Title	Street Crossing Aid Using Light-weight CNNs for the Visually Impaired
Authors	Samuel Yu, Heon Lee, Jung Hoon Kim
Abstract	In this paper, we address an issue that the visually impaired commonly face while crossing intersections and propose a solution that takes form as a mobile application. The application utilizes a deep learning convolutional neural network model, LytNetV2, to output necessary information that the visually impaired may lack when without human companions or guide-dogs. A prototype of the application runs on iOS devices of versions 11 or above. It is designed for comprehensiveness, concision, accuracy, and computational efficiency through delivering the two most important pieces of information, pedestrian traffic light color and direction, required to cross the road in real-time. Furthermore, it is specifically aimed to support those facing financial burden as the solution takes the form of a free mobile application. Through the modification and utilization of key principles in MobileNetV3 such as depthwise seperable convolutions and squeeze-excite layers, the deep neural network model achieves a classification accuracy of 96% and average angle error of 6.15 degrees, while running at a frame rate of 16.34 frames per second. Additionally, the model is trained as an image classifier, allowing for a faster and more accurate model. The network is able to outperform other methods such as object detection and non-deep learning algorithms in both accuracy and thoroughness. The information is delivered through both auditory signals and vibrations, and it has been tested on seven visually impaired and has received above satisfactory responses.
Tasks	Object Detection
Published	2019-09-14
URL	https://arxiv.org/abs/1909.09598v1
PDF	https://arxiv.org/pdf/1909.09598v1.pdf
PWC	https://paperswithcode.com/paper/street-crossing-aid-using-light-weight-cnns
Repo	https://github.com/samuelyu2002/pedestrian-traffic-lights
Framework	pytorch

Batch Policy Learning under Constraints


Title	Batch Policy Learning under Constraints
Authors	Hoang M. Le, Cameron Voloshin, Yisong Yue
Abstract	When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting.
Tasks
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08738v1
PDF	http://arxiv.org/pdf/1903.08738v1.pdf
PWC	https://paperswithcode.com/paper/batch-policy-learning-under-constraints
Repo	https://github.com/clvoloshin/constrained_batch_policy_learning
Framework	none

Green AI


Title	Green AI
Authors	Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni
Abstract	The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018 [2]. These computations have a surprisingly large carbon footprint [38]. Ironically, deep learning was inspired by the human brain, which is remarkably energy efficient. Moreover, the financial cost of the computations can make it difficult for academics, students, and researchers, in particular those from emerging economies, to engage in deep learning research. This position paper advocates a practical solution by making efficiency an evaluation criterion for research alongside accuracy and related measures. In addition, we propose reporting the financial cost or “price tag” of developing, training, and running models to provide baselines for the investigation of increasingly efficient methods. Our goal is to make AI both greener and more inclusive—enabling any inspired undergraduate with a laptop to write high-quality research papers. Green AI is an emerging focus at the Allen Institute for AI.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.10597v3
PDF	https://arxiv.org/pdf/1907.10597v3.pdf
PWC	https://paperswithcode.com/paper/green-ai
Repo	https://github.com/sagabanana/-60daysofudacity
Framework	pytorch

Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks


Title	Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks
Authors	Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller
Abstract	Purpose: Manual feedback from senior surgeons observing less experienced trainees is a laborious task that is very expensive, time-consuming and prone to subjectivity. With the number of surgical procedures increasing annually, there is an unprecedented need to provide an accurate, objective and automatic evaluation of trainees’ surgical skills in order to improve surgical practice. Methods: In this paper, we designed a convolutional neural network (CNN) to classify surgical skills by extracting latent patterns in the trainees’ motions performed during robotic surgery. The method is validated on the JIGSAWS dataset for two surgical skills evaluation tasks: classification and regression. Results: Our results show that deep neural networks constitute robust machine learning models that are able to reach new competitive state-of-the-art performance on the JIGSAWS dataset. While we leveraged from CNNs’ efficiency, we were able to minimize its black-box effect using the class activation map technique. Conclusions: This characteristic allowed our method to automatically pinpoint which parts of the surgery influenced the skill evaluation the most, thus allowing us to explain a surgical skill classification and provide surgeons with a novel personalized feedback technique. We believe this type of interpretable machine learning model could integrate within “Operation Room 2.0” and support novice surgeons in improving their skills to eventually become experts.
Tasks	Interpretable Machine Learning, Surgical Skills Evaluation
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07319v1
PDF	https://arxiv.org/pdf/1908.07319v1.pdf
PWC	https://paperswithcode.com/paper/accurate-and-interpretable-evaluation-of
Repo	https://github.com/hfawaz/ijcars19
Framework	none

Who Blames Whom in a Crisis? Detecting Blame Ties from News Articles Using Neural Networks


Title	Who Blames Whom in a Crisis? Detecting Blame Ties from News Articles Using Neural Networks
Authors	Shuailong Liang, Olivia Nicol, Yue Zhang
Abstract	Blame games tend to follow major disruptions, be they financial crises, natural disasters or terrorist attacks. To study how the blame game evolves and shapes the dominant crisis narratives is of great significance, as sense-making processes can affect regulatory outcomes, social hierarchies, and cultural norms. However, it takes tremendous time and efforts for social scientists to manually examine each relevant news article and extract the blame ties (A blames B). In this study, we define a new task, Blame Tie Extraction, and construct a new dataset related to the United States financial crisis (2007-2010) from The New York Times, The Wall Street Journal and USA Today. We build a Bi-directional Long Short-Term Memory (BiLSTM) network for contexts where the entities appear in and it learns to automatically extract such blame ties at the document level. Leveraging the large unsupervised model such as GloVe and ELMo, our best model achieves an F1 score of 70% on the test set for blame tie extraction, making it a useful tool for social scientists to extract blame ties more efficiently.
Tasks
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10637v1
PDF	http://arxiv.org/pdf/1904.10637v1.pdf
PWC	https://paperswithcode.com/paper/who-blames-whom-in-a-crisis-detecting-blame
Repo	https://github.com/Shuailong/BlamePipeline
Framework	pytorch

Image Difficulty Curriculum for Generative Adversarial Networks (CuGAN)


Title	Image Difficulty Curriculum for Generative Adversarial Networks (CuGAN)
Authors	Petru Soviany, Claudiu Ardei, Radu Tudor Ionescu, Marius Leordeanu
Abstract	Despite the significant advances in recent years, Generative Adversarial Networks (GANs) are still notoriously hard to train. In this paper, we propose three novel curriculum learning strategies for training GANs. All strategies are first based on ranking the training images by their difficulty scores, which are estimated by a state-of-the-art image difficulty predictor. Our first strategy is to divide images into gradually more difficult batches. Our second strategy introduces a novel curriculum loss function for the discriminator that takes into account the difficulty scores of the real images. Our third strategy is based on sampling from an evolving distribution, which favors the easier images during the initial training stages and gradually converges to a uniform distribution, in which samples are equally likely, regardless of difficulty. We compare our curriculum learning strategies with the classic training procedure on two tasks: image generation and image translation. Our experiments indicate that all strategies provide faster convergence and superior results. For example, our best curriculum learning strategy applied on spectrally normalized GANs (SNGANs) fooled human annotators in thinking that generated CIFAR-like images are real in 25.0% of the presented cases, while the SNGANs trained using the classic procedure fooled the annotators in only 18.4% cases. Similarly, in image translation, the human annotators preferred the images produced by the Cycle-consistent GAN (CycleGAN) trained using curriculum learning in 40.5% cases and those produced by CycleGAN based on classic training in only 19.8% cases, 39.7% cases being labeled as ties.
Tasks	Image Generation
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08967v2
PDF	https://arxiv.org/pdf/1910.08967v2.pdf
PWC	https://paperswithcode.com/paper/image-difficulty-curriculum-for-generative
Repo	https://github.com/pittyacg/CurriculumSNGAN
Framework	none

Interpretable Generative Neural Spatio-Temporal Point Processes


Title	Interpretable Generative Neural Spatio-Temporal Point Processes
Authors	Shixiang Zhu, Shuang Li, Yao Xie
Abstract	We present a novel generative model for spatio-temporal correlated discrete event data. Despite the rapid development of one-dimensional point processes for temporal event data, the study of how to model spatial aspects of such discrete event data is scarce. Our proposed Neural Embedding Spatio-Temporal (NEST) point process is a probabilistic generative model, which captures complex spatial influence, by carefully combining statistical models with flexible neural networks with spatial information embedding. NEST also enjoys computational complexity, high-interpretability, and strong expressive capacity for complex spatio-temporal dependency. We present two computationally efficient approaches based on maximum likelihood and imitation learning, which is robust to model mismatch. Experiments based on real data show the superior performance of our method relative to the state-of-the-art.
Tasks	Imitation Learning, Point Processes
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05467v2
PDF	https://arxiv.org/pdf/1906.05467v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-of-spatio-temporal
Repo	https://github.com/meowoodie/Spatio-Temporal-Point-Process-with-Gaussian-Mixture-Diffusion-Kernel
Framework	tf

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration


Title	COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration
Authors	Nicholas Watters, Loic Matthey, Matko Bosnjak, Christopher P. Burgess, Alexander Lerchner
Abstract	Data efficiency and robustness to task-irrelevant perturbations are long-standing challenges for deep reinforcement learning algorithms. Here we introduce a modular approach to addressing these challenges in a continuous control environment, without using hand-crafted or supervised information. Our Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically motivated exploration and unsupervised learning to build object-based models of its environment and action space. Subsequently, it can learn a variety of tasks through model-based search in very few steps and excel on structured hold-out tests of policy robustness.
Tasks	Continuous Control
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09275v2
PDF	https://arxiv.org/pdf/1905.09275v2.pdf
PWC	https://paperswithcode.com/paper/cobra-data-efficient-model-based-rl-through
Repo	https://github.com/deepmind/spriteworld
Framework	none

VINE: Visualizing Statistical Interactions in Black Box Models


Title	VINE: Visualizing Statistical Interactions in Black Box Models
Authors	Matthew Britton
Abstract	As machine learning becomes more pervasive, there is an urgent need for interpretable explanations of predictive models. Prior work has developed effective methods for visualizing global model behavior, as well as generating local (instance-specific) explanations. However, relatively little work has addressed regional explanations - how groups of similar instances behave in a complex model, and the related issue of visualizing statistical feature interactions. The lack of utilities available for these analytical needs hinders the development of models that are mission-critical, transparent, and align with social goals. We present VINE (Visual INteraction Effects), a novel algorithm to extract and visualize statistical interaction effects in black box models. We also present a novel evaluation metric for visualizations in the interpretable ML space.
Tasks
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00561v1
PDF	http://arxiv.org/pdf/1904.00561v1.pdf
PWC	https://paperswithcode.com/paper/vine-visualizing-statistical-interactions-in
Repo	https://github.com/MattJBritton/VINE
Framework	none

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras


Title	Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
Authors	Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova
Abstract	We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal. Similarly to prior work, our method learns by applying differentiable warping to frames and comparing the result to adjacent ones, but it provides several improvements: We address occlusions geometrically and differentiably, directly using the depth maps as predicted during training. We introduce randomized layer normalization, a novel powerful regularizer, and we account for object motion relative to the scene. To the best of our knowledge, our work is the first to learn the camera intrinsic parameters, including lens distortion, from video in an unsupervised manner, thereby allowing us to extract accurate depth and motion from arbitrary videos of unknown origin at scale. We evaluate our results on the Cityscapes, KITTI and EuRoC datasets, establishing new state of the art on depth prediction and odometry, and demonstrate qualitatively that depth prediction can be learned from a collection of YouTube videos.
Tasks	Depth Estimation
Published	2019-04-10
URL	http://arxiv.org/abs/1904.04998v1
PDF	http://arxiv.org/pdf/1904.04998v1.pdf
PWC	https://paperswithcode.com/paper/depth-from-videos-in-the-wild-unsupervised
Repo	https://github.com/robot-love/depth_from_video_in_the_wild
Framework	tf

Formal derivation of Mesh Neural Networks with their Forward-Only gradient Propagation


Title	Formal derivation of Mesh Neural Networks with their Forward-Only gradient Propagation
Authors	Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini
Abstract	This paper proposes the Mesh Neural Network (MNN), a novel architecture which allows neurons to be connected in any topology, to efficiently route information. In MNNs, information is propagated between neurons throughout a state transition function. State and error gradients are then directly computed from state updates without backward computation. The MNN architecture and the error propagation schema is formalized and derived in tensor algebra. The proposed computational model can fully supply a gradient descent process, and is suitable for very large scale NNs, due to its expressivity and training efficiency, with respect to NNs based on back-propagation and computational graphs.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06684v3
PDF	https://arxiv.org/pdf/1905.06684v3.pdf
PWC	https://paperswithcode.com/paper/formal-derivation-of-mesh-neural-networks
Repo	https://github.com/galatolofederico/mesh-neural-networks
Framework	none

Probabilistic Formulation of the Take The Best Heuristic


Title	Probabilistic Formulation of the Take The Best Heuristic
Authors	Tomi Peltola, Jussi Jokinen, Samuel Kaski
Abstract	The framework of cognitively bounded rationality treats problem solving as fundamentally rational, but emphasises that it is constrained by cognitive architecture and the task environment. This paper investigates a simple decision making heuristic, Take The Best (TTB), within that framework. We formulate TTB as a likelihood-based probabilistic model, where the decision strategy arises by probabilistic inference based on the training data and the model constraints. The strengths of the probabilistic formulation, in addition to providing a bounded rational account of the learning of the heuristic, include natural extensibility with additional cognitively plausible constraints and prior information, and the possibility to embed the heuristic as a subpart of a larger probabilistic model. We extend the model to learn cue discrimination thresholds for continuous-valued cues and experiment with using the model to account for biased preference feedback from a boundedly rational agent in a simulated interactive machine learning task.
Tasks	Decision Making
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00572v1
PDF	https://arxiv.org/pdf/1911.00572v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-formulation-of-the-take-the
Repo	https://github.com/to-mi/pttb
Framework	none

Object-Contextual Representations for Semantic Segmentation


Title	Object-Contextual Representations for Semantic Segmentation
Authors	Yuhui Yuan, Xilin Chen, Jingdong Wang
Abstract	In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. Our motivation is that the label of a pixel is the category of the object that the pixel belongs to. We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of the ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region, and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.
Tasks	Semantic Segmentation
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11065v2
PDF	https://arxiv.org/pdf/1909.11065v2.pdf
PWC	https://paperswithcode.com/paper/object-contextual-representations-for
Repo	https://github.com/PkuRainBow/OCNet
Framework	pytorch