January 28, 2020

2974 words 14 mins read

Paper Group ANR 947

Learning In Practice: Reasoning About Quantization. Inefficiency of K-FAC for Large Batch Size Training. Graph Representation learning for Audio & Music genre Classification. Solving Rubik’s Cube with a Robot Hand. Physics Enhanced Artificial Intelligence. On the approximation of rough functions with deep neural networks. Leveraging Multimodal Hapt …

Learning In Practice: Reasoning About Quantization


Title	Learning In Practice: Reasoning About Quantization
Authors	Annie Cherkaev, Waiming Tai, Jeff Phillips, Vivek Srikumar
Abstract	There is a mismatch between the standard theoretical analyses of statistical machine learning and how learning is used in practice. The foundational assumption supporting the theory is that we can represent features and models using real-valued parameters. In practice, however, we do not use real numbers at any point during training or deployment. Instead, we rely on discrete and finite quantizations of the reals, typically floating points. In this paper, we propose a framework for reasoning about learning under arbitrary quantizations. Using this formalization, we prove the convergence of quantization-aware versions of the Perceptron and Frank-Wolfe algorithms. Finally, we report the results of an extensive empirical study of the impact of quantization using a broad spectrum of datasets.
Tasks	Quantization
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11478v1
PDF	https://arxiv.org/pdf/1905.11478v1.pdf
PWC	https://paperswithcode.com/paper/learning-in-practice-reasoning-about
Repo
Framework

Inefficiency of K-FAC for Large Batch Size Training


Title	Inefficiency of K-FAC for Large Batch Size Training
Authors	Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney
Abstract	In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing large batch Stochastic Gradient Descent (SGD) have found sharp diminishing returns, beyond a certain critical batch size. In the hopes of addressing this, it has been suggested that the Kronecker-Factored Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to large batch sizes, for non-convex machine learning problems such as neural network optimization, as well as greater robustness to variation in model hyperparameters. Here, we perform a detailed empirical analysis of large batch size training %of these two hypotheses, for both \mbox{K-FAC} and SGD, evaluating performance in terms of both wall-clock time and aggregate computational cost. Our main results are twofold: first, we find that both \mbox{K-FAC} and SGD doesn’t have ideal scalability behavior beyond a certain batch size, and that \mbox{K-FAC} does not exhibit improved large-batch scalability behavior, as compared to SGD; and second, we find that \mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers from similar hyperparameter sensitivity behavior as does SGD. We discuss extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN, respectively, as well as more general implications of our findings.
Tasks	Stochastic Optimization
Published	2019-03-14
URL	https://arxiv.org/abs/1903.06237v3
PDF	https://arxiv.org/pdf/1903.06237v3.pdf
PWC	https://paperswithcode.com/paper/inefficiency-of-k-fac-for-large-batch-size
Repo
Framework

Graph Representation learning for Audio & Music genre Classification


Title	Graph Representation learning for Audio & Music genre Classification
Authors	Shubham Dokania, Vasudev Singh
Abstract	Music genre is arguably one of the most important and discriminative information for music and audio content. Visual representation based approaches have been explored on spectrograms for music genre classification. However, lack of quality data and augmentation techniques makes it difficult to employ deep learning techniques successfully. We discuss the application of graph neural networks on such task due to their strong inductive bias, and show that combination of CNN and GNN is able to achieve state-of-the-art results on GTZAN, and AudioSet (Imbalanced Music) datasets. We also discuss the role of Siamese Neural Networks as an analogous to GNN for learning edge similarity weights. Furthermore, we also perform visual analysis to understand the field-of-view of our model into the spectrogram based on genre labels.
Tasks	Graph Representation Learning, Representation Learning
Published	2019-10-23
URL	https://arxiv.org/abs/1910.11117v1
PDF	https://arxiv.org/pdf/1910.11117v1.pdf
PWC	https://paperswithcode.com/paper/graph-representation-learning-for-audio-music
Repo
Framework

Solving Rubik’s Cube with a Robot Hand


Title	Solving Rubik’s Cube with a Robot Hand
Authors	OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang
Abstract	We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik’s cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/
Tasks	Meta-Learning
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07113v1
PDF	https://arxiv.org/pdf/1910.07113v1.pdf
PWC	https://paperswithcode.com/paper/solving-rubiks-cube-with-a-robot-hand
Repo
Framework

Physics Enhanced Artificial Intelligence


Title	Physics Enhanced Artificial Intelligence
Authors	Patrick O’Driscoll, Jaehoon Lee, Bo Fu
Abstract	We propose that intelligently combining models from the domains of Artificial Intelligence or Machine Learning with Physical and Expert models will yield a more “trustworthy” model than any one model from a single domain, given a complex and narrow enough problem. Based on mean-variance portfolio theory and bias-variance trade-off analysis, we prove combining models from various domains produces a model that has lower risk, increasing user trust. We call such combined models - physics enhanced artificial intelligence (PEAI), and suggest use cases for PEAI.
Tasks
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04442v1
PDF	http://arxiv.org/pdf/1903.04442v1.pdf
PWC	https://paperswithcode.com/paper/physics-enhanced-artificial-intelligence
Repo
Framework

On the approximation of rough functions with deep neural networks


Title	On the approximation of rough functions with deep neural networks
Authors	Tim De Ryck, Siddhartha Mishra, Deep Ray
Abstract	Deep neural networks and the ENO procedure are both efficient frameworks for approximating rough functions. We prove that at any order, the ENO interpolation procedure can be cast as a deep ReLU neural network. This surprising fact enables the transfer of several desirable properties of the ENO procedure to deep neural networks, including its high-order accuracy at approximating Lipschitz functions. Numerical tests for the resulting neural networks show excellent performance for approximating solutions of nonlinear conservation laws and at data compression.
Tasks
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06732v1
PDF	https://arxiv.org/pdf/1912.06732v1.pdf
PWC	https://paperswithcode.com/paper/on-the-approximation-of-rough-functions-with
Repo
Framework

Leveraging Multimodal Haptic Sensory Data for Robust Cutting


Title	Leveraging Multimodal Haptic Sensory Data for Robust Cutting
Authors	Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer
Abstract	Cutting is a common form of manipulation when working with divisible objects such as food, rope, or clay. Cooking in particular relies heavily on cutting to divide food items into desired shapes. However, cutting food is a challenging task due to the wide range of material properties exhibited by food items. Due to this variability, the same cutting motions cannot be used for all food items. Sensations from contact events, e.g., when placing the knife on the food item, will also vary depending on the material properties, and the robot will need to adapt accordingly. In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events. The robot learns neural networks for performing each of these tasks and generalizing across different material properties. By adapting and monitoring the skill executions, the robot is able to reliably cut through more than 20 different types of food items and even detect whether certain food items are fresh or old.
Tasks
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12460v1
PDF	https://arxiv.org/pdf/1909.12460v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-multimodal-haptic-sensory-data-for
Repo
Framework

Active Learning with Siamese Twins for Sequence Tagging


Title	Active Learning with Siamese Twins for Sequence Tagging
Authors	Rishi Hazra, Shubham Gupta, Ambedkar Dukkipati
Abstract	Deep learning, in general, and natural language processing methods, in particular, rely heavily on annotated samples to achieve good performance. However, manually annotating data is expensive and time consuming. Active Learning (AL) strategies reduce the need for huge volumes of labelled data by iteratively selecting a small number of examples for manual annotation based on their estimated utility in training the given model. In this paper, we argue that since AL strategies choose examples independently, they may potentially select similar examples, all of which do not aid in the learning process. We propose a method, referred to as Active$\mathbf{^2}$ Learning (A$\mathbf{^2}$L), that actively adapts to the sequence tagging model being trained, to further eliminate such redundant examples chosen by an AL strategy. We empirically demonstrate that A$\mathbf{^2}$L improves the performance of state-of-the-art AL strategies on different sequence tagging tasks. Furthermore, we show that A$\mathbf{^2}$L is widely applicable by using it in conjunction with different AL strategies and sequence tagging models. We demonstrate that the proposed A$\mathbf{^2}$L able to reach full data F-score with $\approx\mathbf{2-16 %}$ less data compared to state-of-art AL strategies on different sequence tagging datasets.
Tasks	Active Learning
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00234v1
PDF	https://arxiv.org/pdf/1911.00234v1.pdf
PWC	https://paperswithcode.com/paper/active-learning-with-siamese-twins-for
Repo
Framework

Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection


Title	Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection
Authors	Ping-Yang Chen, Jun-Wei Hsieh, Chien-Yao Wang, Hong-Yuan Mark Liao, Munkhjargal Gochoo
Abstract	State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this top-down pathway cannot preserve accurate object positions due to the shift-effect of pooling. Thus, the advantage of FP to improve detection accuracy will disappear when more layers are used. The original FP lacks a bottom-up pathway to offset the lost information from lower-layer feature maps. It performs well in large-sized object detection but poor in small-sized object detection. A new structure “residual feature pyramid” is proposed in this paper. It is bidirectional to fuse both deep and shallow features towards more effective and robust detection for both small-sized and large-sized objects. Due to the “residual” nature, it can be easily trained and integrated to different backbones (even deeper or lighter) than other bi-directional methods. One important property of this residual FP is: accuracy improvement is still found even if more layers are adopted. Extensive experiments on VOC and MS COCO datasets showed the proposed method achieved the SoTA results for highly-accurate and efficient object detection..
Tasks	Object Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12051v2
PDF	https://arxiv.org/pdf/1911.12051v2.pdf
PWC	https://paperswithcode.com/paper/residual-bi-fusion-feature-pyramid-network
Repo
Framework

Challenges and Prospects in Vision and Language Research


Title	Challenges and Prospects in Vision and Language Research
Authors	Kushal Kafle, Robik Shrestha, Christopher Kanan
Abstract	Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.
Tasks
Published	2019-04-19
URL	https://arxiv.org/abs/1904.09317v2
PDF	https://arxiv.org/pdf/1904.09317v2.pdf
PWC	https://paperswithcode.com/paper/190409317
Repo
Framework

FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image


Title	FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
Authors	Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas Guibas
Abstract	In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. We observe that each pixel in an image corresponds to a surface in the underlying 3D geometry, where a canonical frame can be identified as represented by three orthogonal axes, one along its normal direction and two in its tangent plane. We propose an algorithm to predict these axes from RGB. Our first insight is that canonical frames computed automatically with recently introduced direction field synthesis methods can provide training data for the task. Our second insight is that networks designed for surface normal prediction provide better results when trained jointly to predict canonical frames, and even better when trained to also predict 2D projections of canonical frames. We conjecture this is because projections of canonical tangent directions often align with local gradients in images, and because those directions are tightly linked to 3D canonical frames through projective geometry and orthogonality constraints. In our experiments, we find that our method predicts 3D canonical frames that can be used in applications ranging from surface normal estimation, feature matching, and augmented reality.
Tasks
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12305v1
PDF	http://arxiv.org/pdf/1903.12305v1.pdf
PWC	https://paperswithcode.com/paper/framenet-learning-local-canonical-frames-of
Repo
Framework

Measuring the Transferability of Adversarial Examples


Title	Measuring the Transferability of Adversarial Examples
Authors	Deyan Petrov, Timothy M. Hospedales
Abstract	Adversarial examples are of wide concern due to their impact on the reliability of contemporary machine learning systems. Effective adversarial examples are mostly found via white-box attacks. However, in some cases they can be transferred across models, thus enabling them to attack black-box models. In this work we evaluate the transferability of three adversarial attacks - the Fast Gradient Sign Method, the Basic Iterative Method, and the Carlini & Wagner method, across two classes of models - the VGG class(using VGG16, VGG19 and an ensemble of VGG16 and VGG19), and the Inception class(Inception V3, Xception, Inception Resnet V2, and an ensemble of the three). We also outline the problems with the assessment of transferability in the current body of research and attempt to amend them by picking specific “strong” parameters for the attacks, and by using a L-Infinity clipping technique and the SSIM metric for the final evaluation of the attack transferability.
Tasks
Published	2019-07-14
URL	https://arxiv.org/abs/1907.06291v1
PDF	https://arxiv.org/pdf/1907.06291v1.pdf
PWC	https://paperswithcode.com/paper/measuring-the-transferability-of-adversarial
Repo
Framework

Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan earthquake


Title	Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan earthquake
Authors	Lijun Zhu, Zhigang Peng, James McClellan, Chenyu Li, Dongdong Yao, Zefeng Li, Lihua Fang
Abstract	The increasing volume of seismic data from long-term continuous monitoring motivates the development of algorithms based on convolutional neural network (CNN) for faster and more reliable phase detection and picking. However, many less studied regions lack a significant amount of labeled events needed for traditional CNN approaches. In this paper, we present a CNN-based Phase- Identification Classifier (CPIC) designed for phase detection and picking on small to medium sized training datasets. When trained on 30,146 labeled phases and applied to one-month of continuous recordings during the aftershock sequences of the 2008 MW 7.9 Wenchuan Earthquake in Sichuan, China, CPIC detects 97.5% of the manually picked phases in the standard catalog and predicts their arrival times with a five-times improvement over the ObsPy AR picker. In addition, unlike other CNN-based approaches that require millions of training samples, when the off-line training set size of CPIC is reduced to only a few thousand training samples the accuracy stays above 95%. The online implementation of CPIC takes less than 12 hours to pick arrivals in 31-day recordings on 14 stations. In addition to the catalog phases manually picked by analysts, CPIC finds more phases for existing events and new events missed in the catalog. Among those additional detections, some are confirmed by a matched filter method while others require further investigation. Finally, when tested on a small dataset from a different region (Oklahoma, US), CPIC achieves 97% accuracy after fine tuning only the fully connected layer of the model. This result suggests that the CPIC developed in this study can be used to identify and pick P/S arrivals in other regions with no or minimum labeled phases.
Tasks
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06396v2
PDF	http://arxiv.org/pdf/1901.06396v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-seismic-phase-detection-and
Repo
Framework

CrackGAN: A Labor-Light Crack Detection Approach Using Industrial Pavement Images Based on Generative Adversarial Learning


Title	CrackGAN: A Labor-Light Crack Detection Approach Using Industrial Pavement Images Based on Generative Adversarial Learning
Authors	Kaige Zhang, Yingtao Zhang, Heng-Da Cheng
Abstract	Fully convolutional network is a powerful tool for per-pixel semantic segmentation/detection. However, it is problematic when coping with crack detection using industrial pavement images: the network may easily “converge” to the status that treats all the pixels as background (BG) and still achieves a very good loss, named “All Black” phenomenon, due to the data imbalance and the unavailability of accurate ground truths (GTs). To tackle this problem, we introduce crack-patch-only (CPO) supervision and generative adversarial learning for end-to-end training, which forces the network to always produce crack-GT images while reserves both crack and BG-image translation abilities by feeding a larger-size crack image into an asymmetric U-shape generator to overcome the “All Black” issue. The proposed approach is validated using four crack datasets; and achieves state-of-the-art performance comparing with that of the recently published works in efficiency and accuracy.
Tasks	Semantic Segmentation
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08216v1
PDF	https://arxiv.org/pdf/1909.08216v1.pdf
PWC	https://paperswithcode.com/paper/crackgan-a-labor-light-crack-detection
Repo
Framework

Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs


Title	Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs
Authors	Liu Yang, Sean Treichler, Thorsten Kurth, Keno Fischer, David Barajas-Solano, Josh Romero, Valentin Churavy, Alexandre Tartakovsky, Michael Houston, Prabhat, George Karniadakis
Abstract	Uncertainty quantification for forward and inverse problems is a central challenge across physical and biomedical disciplines. We address this challenge for the problem of modeling subsurface flow at the Hanford Site by combining stochastic computational models with observational data using physics-informed GAN models. The geographic extent, spatial heterogeneity, and multiple correlation length scales of the Hanford Site require training a computationally intensive GAN model to thousands of dimensions. We develop a hierarchical scheme for exploiting domain parallelism, map discriminators and generators to multiple GPUs, and employ efficient communication schemes to ensure training stability and convergence. We developed a highly optimized implementation of this scheme that scales to 27,500 NVIDIA Volta GPUs and 4584 nodes on the Summit supercomputer with a 93.1% scaling efficiency, achieving peak and sustained half-precision rates of 1228 PF/s and 1207 PF/s.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13444v1
PDF	https://arxiv.org/pdf/1910.13444v1.pdf
PWC	https://paperswithcode.com/paper/highly-scalable-physics-informed-gans-for
Repo
Framework