Paper Group ANR 947
Learning In Practice: Reasoning About Quantization. Inefficiency of K-FAC for Large Batch Size Training. Graph Representation learning for Audio & Music genre Classification. Solving Rubik’s Cube with a Robot Hand. Physics Enhanced Artificial Intelligence. On the approximation of rough functions with deep neural networks. Leveraging Multimodal Hapt …
Learning In Practice: Reasoning About Quantization
Title | Learning In Practice: Reasoning About Quantization |
Authors | Annie Cherkaev, Waiming Tai, Jeff Phillips, Vivek Srikumar |
Abstract | There is a mismatch between the standard theoretical analyses of statistical machine learning and how learning is used in practice. The foundational assumption supporting the theory is that we can represent features and models using real-valued parameters. In practice, however, we do not use real numbers at any point during training or deployment. Instead, we rely on discrete and finite quantizations of the reals, typically floating points. In this paper, we propose a framework for reasoning about learning under arbitrary quantizations. Using this formalization, we prove the convergence of quantization-aware versions of the Perceptron and Frank-Wolfe algorithms. Finally, we report the results of an extensive empirical study of the impact of quantization using a broad spectrum of datasets. |
Tasks | Quantization |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11478v1 |
https://arxiv.org/pdf/1905.11478v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-in-practice-reasoning-about |
Repo | |
Framework | |
Inefficiency of K-FAC for Large Batch Size Training
Title | Inefficiency of K-FAC for Large Batch Size Training |
Authors | Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney |
Abstract | In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing large batch Stochastic Gradient Descent (SGD) have found sharp diminishing returns, beyond a certain critical batch size. In the hopes of addressing this, it has been suggested that the Kronecker-Factored Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to large batch sizes, for non-convex machine learning problems such as neural network optimization, as well as greater robustness to variation in model hyperparameters. Here, we perform a detailed empirical analysis of large batch size training %of these two hypotheses, for both \mbox{K-FAC} and SGD, evaluating performance in terms of both wall-clock time and aggregate computational cost. Our main results are twofold: first, we find that both \mbox{K-FAC} and SGD doesn’t have ideal scalability behavior beyond a certain batch size, and that \mbox{K-FAC} does not exhibit improved large-batch scalability behavior, as compared to SGD; and second, we find that \mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers from similar hyperparameter sensitivity behavior as does SGD. We discuss extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN, respectively, as well as more general implications of our findings. |
Tasks | Stochastic Optimization |
Published | 2019-03-14 |
URL | https://arxiv.org/abs/1903.06237v3 |
https://arxiv.org/pdf/1903.06237v3.pdf | |
PWC | https://paperswithcode.com/paper/inefficiency-of-k-fac-for-large-batch-size |
Repo | |
Framework | |
Graph Representation learning for Audio & Music genre Classification
Title | Graph Representation learning for Audio & Music genre Classification |
Authors | Shubham Dokania, Vasudev Singh |
Abstract | Music genre is arguably one of the most important and discriminative information for music and audio content. Visual representation based approaches have been explored on spectrograms for music genre classification. However, lack of quality data and augmentation techniques makes it difficult to employ deep learning techniques successfully. We discuss the application of graph neural networks on such task due to their strong inductive bias, and show that combination of CNN and GNN is able to achieve state-of-the-art results on GTZAN, and AudioSet (Imbalanced Music) datasets. We also discuss the role of Siamese Neural Networks as an analogous to GNN for learning edge similarity weights. Furthermore, we also perform visual analysis to understand the field-of-view of our model into the spectrogram based on genre labels. |
Tasks | Graph Representation Learning, Representation Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.11117v1 |
https://arxiv.org/pdf/1910.11117v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-representation-learning-for-audio-music |
Repo | |
Framework | |
Solving Rubik’s Cube with a Robot Hand
Title | Solving Rubik’s Cube with a Robot Hand |
Authors | OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang |
Abstract | We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik’s cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/ |
Tasks | Meta-Learning |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07113v1 |
https://arxiv.org/pdf/1910.07113v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-rubiks-cube-with-a-robot-hand |
Repo | |
Framework | |
Physics Enhanced Artificial Intelligence
Title | Physics Enhanced Artificial Intelligence |
Authors | Patrick O’Driscoll, Jaehoon Lee, Bo Fu |
Abstract | We propose that intelligently combining models from the domains of Artificial Intelligence or Machine Learning with Physical and Expert models will yield a more “trustworthy” model than any one model from a single domain, given a complex and narrow enough problem. Based on mean-variance portfolio theory and bias-variance trade-off analysis, we prove combining models from various domains produces a model that has lower risk, increasing user trust. We call such combined models - physics enhanced artificial intelligence (PEAI), and suggest use cases for PEAI. |
Tasks | |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04442v1 |
http://arxiv.org/pdf/1903.04442v1.pdf | |
PWC | https://paperswithcode.com/paper/physics-enhanced-artificial-intelligence |
Repo | |
Framework | |
On the approximation of rough functions with deep neural networks
Title | On the approximation of rough functions with deep neural networks |
Authors | Tim De Ryck, Siddhartha Mishra, Deep Ray |
Abstract | Deep neural networks and the ENO procedure are both efficient frameworks for approximating rough functions. We prove that at any order, the ENO interpolation procedure can be cast as a deep ReLU neural network. This surprising fact enables the transfer of several desirable properties of the ENO procedure to deep neural networks, including its high-order accuracy at approximating Lipschitz functions. Numerical tests for the resulting neural networks show excellent performance for approximating solutions of nonlinear conservation laws and at data compression. |
Tasks | |
Published | 2019-12-13 |
URL | https://arxiv.org/abs/1912.06732v1 |
https://arxiv.org/pdf/1912.06732v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-approximation-of-rough-functions-with |
Repo | |
Framework | |
Leveraging Multimodal Haptic Sensory Data for Robust Cutting
Title | Leveraging Multimodal Haptic Sensory Data for Robust Cutting |
Authors | Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer |
Abstract | Cutting is a common form of manipulation when working with divisible objects such as food, rope, or clay. Cooking in particular relies heavily on cutting to divide food items into desired shapes. However, cutting food is a challenging task due to the wide range of material properties exhibited by food items. Due to this variability, the same cutting motions cannot be used for all food items. Sensations from contact events, e.g., when placing the knife on the food item, will also vary depending on the material properties, and the robot will need to adapt accordingly. In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events. The robot learns neural networks for performing each of these tasks and generalizing across different material properties. By adapting and monitoring the skill executions, the robot is able to reliably cut through more than 20 different types of food items and even detect whether certain food items are fresh or old. |
Tasks | |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12460v1 |
https://arxiv.org/pdf/1909.12460v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-multimodal-haptic-sensory-data-for |
Repo | |
Framework | |
Active Learning with Siamese Twins for Sequence Tagging
Title | Active Learning with Siamese Twins for Sequence Tagging |
Authors | Rishi Hazra, Shubham Gupta, Ambedkar Dukkipati |
Abstract | Deep learning, in general, and natural language processing methods, in particular, rely heavily on annotated samples to achieve good performance. However, manually annotating data is expensive and time consuming. Active Learning (AL) strategies reduce the need for huge volumes of labelled data by iteratively selecting a small number of examples for manual annotation based on their estimated utility in training the given model. In this paper, we argue that since AL strategies choose examples independently, they may potentially select similar examples, all of which do not aid in the learning process. We propose a method, referred to as Active$\mathbf{^2}$ Learning (A$\mathbf{^2}$L), that actively adapts to the sequence tagging model being trained, to further eliminate such redundant examples chosen by an AL strategy. We empirically demonstrate that A$\mathbf{^2}$L improves the performance of state-of-the-art AL strategies on different sequence tagging tasks. Furthermore, we show that A$\mathbf{^2}$L is widely applicable by using it in conjunction with different AL strategies and sequence tagging models. We demonstrate that the proposed A$\mathbf{^2}$L able to reach full data F-score with $\approx\mathbf{2-16 %}$ less data compared to state-of-art AL strategies on different sequence tagging datasets. |
Tasks | Active Learning |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00234v1 |
https://arxiv.org/pdf/1911.00234v1.pdf | |
PWC | https://paperswithcode.com/paper/active-learning-with-siamese-twins-for |
Repo | |
Framework | |
Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection
Title | Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection |
Authors | Ping-Yang Chen, Jun-Wei Hsieh, Chien-Yao Wang, Hong-Yuan Mark Liao, Munkhjargal Gochoo |
Abstract | State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this top-down pathway cannot preserve accurate object positions due to the shift-effect of pooling. Thus, the advantage of FP to improve detection accuracy will disappear when more layers are used. The original FP lacks a bottom-up pathway to offset the lost information from lower-layer feature maps. It performs well in large-sized object detection but poor in small-sized object detection. A new structure “residual feature pyramid” is proposed in this paper. It is bidirectional to fuse both deep and shallow features towards more effective and robust detection for both small-sized and large-sized objects. Due to the “residual” nature, it can be easily trained and integrated to different backbones (even deeper or lighter) than other bi-directional methods. One important property of this residual FP is: accuracy improvement is still found even if more layers are adopted. Extensive experiments on VOC and MS COCO datasets showed the proposed method achieved the SoTA results for highly-accurate and efficient object detection.. |
Tasks | Object Detection |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12051v2 |
https://arxiv.org/pdf/1911.12051v2.pdf | |
PWC | https://paperswithcode.com/paper/residual-bi-fusion-feature-pyramid-network |
Repo | |
Framework | |
Challenges and Prospects in Vision and Language Research
Title | Challenges and Prospects in Vision and Language Research |
Authors | Kushal Kafle, Robik Shrestha, Christopher Kanan |
Abstract | Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward. |
Tasks | |
Published | 2019-04-19 |
URL | https://arxiv.org/abs/1904.09317v2 |
https://arxiv.org/pdf/1904.09317v2.pdf | |
PWC | https://paperswithcode.com/paper/190409317 |
Repo | |
Framework | |
FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
Title | FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image |
Authors | Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas Guibas |
Abstract | In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. We observe that each pixel in an image corresponds to a surface in the underlying 3D geometry, where a canonical frame can be identified as represented by three orthogonal axes, one along its normal direction and two in its tangent plane. We propose an algorithm to predict these axes from RGB. Our first insight is that canonical frames computed automatically with recently introduced direction field synthesis methods can provide training data for the task. Our second insight is that networks designed for surface normal prediction provide better results when trained jointly to predict canonical frames, and even better when trained to also predict 2D projections of canonical frames. We conjecture this is because projections of canonical tangent directions often align with local gradients in images, and because those directions are tightly linked to 3D canonical frames through projective geometry and orthogonality constraints. In our experiments, we find that our method predicts 3D canonical frames that can be used in applications ranging from surface normal estimation, feature matching, and augmented reality. |
Tasks | |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1903.12305v1 |
http://arxiv.org/pdf/1903.12305v1.pdf | |
PWC | https://paperswithcode.com/paper/framenet-learning-local-canonical-frames-of |
Repo | |
Framework | |
Measuring the Transferability of Adversarial Examples
Title | Measuring the Transferability of Adversarial Examples |
Authors | Deyan Petrov, Timothy M. Hospedales |
Abstract | Adversarial examples are of wide concern due to their impact on the reliability of contemporary machine learning systems. Effective adversarial examples are mostly found via white-box attacks. However, in some cases they can be transferred across models, thus enabling them to attack black-box models. In this work we evaluate the transferability of three adversarial attacks - the Fast Gradient Sign Method, the Basic Iterative Method, and the Carlini & Wagner method, across two classes of models - the VGG class(using VGG16, VGG19 and an ensemble of VGG16 and VGG19), and the Inception class(Inception V3, Xception, Inception Resnet V2, and an ensemble of the three). We also outline the problems with the assessment of transferability in the current body of research and attempt to amend them by picking specific “strong” parameters for the attacks, and by using a L-Infinity clipping technique and the SSIM metric for the final evaluation of the attack transferability. |
Tasks | |
Published | 2019-07-14 |
URL | https://arxiv.org/abs/1907.06291v1 |
https://arxiv.org/pdf/1907.06291v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-transferability-of-adversarial |
Repo | |
Framework | |
Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan earthquake
Title | Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan earthquake |
Authors | Lijun Zhu, Zhigang Peng, James McClellan, Chenyu Li, Dongdong Yao, Zefeng Li, Lihua Fang |
Abstract | The increasing volume of seismic data from long-term continuous monitoring motivates the development of algorithms based on convolutional neural network (CNN) for faster and more reliable phase detection and picking. However, many less studied regions lack a significant amount of labeled events needed for traditional CNN approaches. In this paper, we present a CNN-based Phase- Identification Classifier (CPIC) designed for phase detection and picking on small to medium sized training datasets. When trained on 30,146 labeled phases and applied to one-month of continuous recordings during the aftershock sequences of the 2008 MW 7.9 Wenchuan Earthquake in Sichuan, China, CPIC detects 97.5% of the manually picked phases in the standard catalog and predicts their arrival times with a five-times improvement over the ObsPy AR picker. In addition, unlike other CNN-based approaches that require millions of training samples, when the off-line training set size of CPIC is reduced to only a few thousand training samples the accuracy stays above 95%. The online implementation of CPIC takes less than 12 hours to pick arrivals in 31-day recordings on 14 stations. In addition to the catalog phases manually picked by analysts, CPIC finds more phases for existing events and new events missed in the catalog. Among those additional detections, some are confirmed by a matched filter method while others require further investigation. Finally, when tested on a small dataset from a different region (Oklahoma, US), CPIC achieves 97% accuracy after fine tuning only the fully connected layer of the model. This result suggests that the CPIC developed in this study can be used to identify and pick P/S arrivals in other regions with no or minimum labeled phases. |
Tasks | |
Published | 2019-01-18 |
URL | http://arxiv.org/abs/1901.06396v2 |
http://arxiv.org/pdf/1901.06396v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-seismic-phase-detection-and |
Repo | |
Framework | |
CrackGAN: A Labor-Light Crack Detection Approach Using Industrial Pavement Images Based on Generative Adversarial Learning
Title | CrackGAN: A Labor-Light Crack Detection Approach Using Industrial Pavement Images Based on Generative Adversarial Learning |
Authors | Kaige Zhang, Yingtao Zhang, Heng-Da Cheng |
Abstract | Fully convolutional network is a powerful tool for per-pixel semantic segmentation/detection. However, it is problematic when coping with crack detection using industrial pavement images: the network may easily “converge” to the status that treats all the pixels as background (BG) and still achieves a very good loss, named “All Black” phenomenon, due to the data imbalance and the unavailability of accurate ground truths (GTs). To tackle this problem, we introduce crack-patch-only (CPO) supervision and generative adversarial learning for end-to-end training, which forces the network to always produce crack-GT images while reserves both crack and BG-image translation abilities by feeding a larger-size crack image into an asymmetric U-shape generator to overcome the “All Black” issue. The proposed approach is validated using four crack datasets; and achieves state-of-the-art performance comparing with that of the recently published works in efficiency and accuracy. |
Tasks | Semantic Segmentation |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08216v1 |
https://arxiv.org/pdf/1909.08216v1.pdf | |
PWC | https://paperswithcode.com/paper/crackgan-a-labor-light-crack-detection |
Repo | |
Framework | |
Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs
Title | Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs |
Authors | Liu Yang, Sean Treichler, Thorsten Kurth, Keno Fischer, David Barajas-Solano, Josh Romero, Valentin Churavy, Alexandre Tartakovsky, Michael Houston, Prabhat, George Karniadakis |
Abstract | Uncertainty quantification for forward and inverse problems is a central challenge across physical and biomedical disciplines. We address this challenge for the problem of modeling subsurface flow at the Hanford Site by combining stochastic computational models with observational data using physics-informed GAN models. The geographic extent, spatial heterogeneity, and multiple correlation length scales of the Hanford Site require training a computationally intensive GAN model to thousands of dimensions. We develop a hierarchical scheme for exploiting domain parallelism, map discriminators and generators to multiple GPUs, and employ efficient communication schemes to ensure training stability and convergence. We developed a highly optimized implementation of this scheme that scales to 27,500 NVIDIA Volta GPUs and 4584 nodes on the Summit supercomputer with a 93.1% scaling efficiency, achieving peak and sustained half-precision rates of 1228 PF/s and 1207 PF/s. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13444v1 |
https://arxiv.org/pdf/1910.13444v1.pdf | |
PWC | https://paperswithcode.com/paper/highly-scalable-physics-informed-gans-for |
Repo | |
Framework | |