Paper Group ANR 789
Fast Point R-CNN. Encoding Musical Style with Transformer Autoencoders. Pathological spectra of the Fisher information metric and its variants in deep neural networks. Forecasting Transformative AI: An Expert Survey. Regression Equilibrium. MiSC: Mixed Strategies Crowdsourcing. Effects of Illumination on the Categorization of Shiny Materials. Towar …
Fast Point R-CNN
Title | Fast Point R-CNN |
Authors | Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia |
Abstract | We present a unified, efficient and effective framework for point-cloud based 3D object detection. Our two-stage approach utilizes both voxel representation and raw point cloud data to exploit respective advantages. The first stage network, with voxel representation as input, only consists of light convolutional operations, producing a small number of high-quality initial predictions. Coordinate and indexed convolutional feature of each point in initial prediction are effectively fused with the attention mechanism, preserving both accurate localization and context information. The second stage works on interior points with their fused feature for further refining the prediction. Our method is evaluated on KITTI dataset, in terms of both 3D and Bird’s Eye View (BEV) detection, and achieves state-of-the-arts with a 15FPS detection rate. |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.02990v2 |
https://arxiv.org/pdf/1908.02990v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-point-r-cnn |
Repo | |
Framework | |
Encoding Musical Style with Transformer Autoencoders
Title | Encoding Musical Style with Transformer Autoencoders |
Authors | Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel |
Abstract | We consider the problem of learning high-level controls over the global structure of sequence generation, particularly in the context of symbolic music generation with complex language models. In this work, we present the Transformer autoencoder, which aggregates encodings of the input data across time to obtain a global representation of style from a given performance. We show it is possible to combine this global embedding with other temporally distributed embeddings, enabling improved control over the separate aspects of performance style and and melody. Empirically, we demonstrate the effectiveness of our method on a variety of music generation tasks on the MAESTRO dataset and a YouTube dataset with 10,000+ hours of piano performances, where we achieve improvements in terms of log-likelihood and mean listening scores as compared to relevant baselines. |
Tasks | Music Generation |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.05537v1 |
https://arxiv.org/pdf/1912.05537v1.pdf | |
PWC | https://paperswithcode.com/paper/encoding-musical-style-with-transformer-1 |
Repo | |
Framework | |
Pathological spectra of the Fisher information metric and its variants in deep neural networks
Title | Pathological spectra of the Fisher information metric and its variants in deep neural networks |
Authors | Ryo Karakida, Shotaro Akaho, Shun-ichi Amari |
Abstract | The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic behavior when the network is sufficiently wide and has random weights and biases. Various FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues take on large values while most of them are close to zero. This implies that the local shape of the parameter space or loss landscape is very steep in a few specific directions and almost flat in the other directions. Similar pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. The quantitative understanding of the FIM and its variants provided here offers important perspectives on learning and signal processing in large-scale DNNs. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.05992v1 |
https://arxiv.org/pdf/1910.05992v1.pdf | |
PWC | https://paperswithcode.com/paper/pathological-spectra-of-the-fisher |
Repo | |
Framework | |
Forecasting Transformative AI: An Expert Survey
Title | Forecasting Transformative AI: An Expert Survey |
Authors | Ross Gruetzemacher, David Paradice, Kang Bok Lee |
Abstract | Transformative AI technologies have the potential to reshape critical aspects of society in the near future. However, in order to properly prepare policy initiatives for the arrival of such technologies accurate forecasts and timelines are necessary. A survey was administered to attendees of three AI conferences during the summer of 2018 (ICML, IJCAI and the HLAI conference). The survey included questions for estimating AI capabilities over the next decade, questions for forecasting five scenarios of transformative AI and questions concerning the impact of computational resources in AI research. Respondents indicated a median of 21.5% of human tasks (i.e., all tasks that humans are currently paid to do) can be feasibly automated now, and that this figure would rise to 40% in 5 years and 60% in 10 years. Median forecasts indicated a 50% probability of AI systems being capable of automating 90% of current human tasks in 25 years and 99% of current human tasks in 50 years. The conference of attendance was found to have a statistically significant impact on all forecasts, with attendees of HLAI providing more optimistic timelines with less uncertainty. These findings suggest that AI experts expect major advances in AI technology to continue over the next decade to a degree that will likely have profound transformative impacts on society. |
Tasks | |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08579v2 |
https://arxiv.org/pdf/1901.08579v2.pdf | |
PWC | https://paperswithcode.com/paper/forecasting-transformative-ai-an-expert |
Repo | |
Framework | |
Regression Equilibrium
Title | Regression Equilibrium |
Authors | Omer Ben-Porat, Moshe Tennenholtz |
Abstract | Prediction is a well-studied machine learning task, and prediction algorithms are core ingredients in online products and services. Despite their centrality in the competition between online companies who offer prediction-based products, the \textit{strategic} use of prediction algorithms remains unexplored. The goal of this paper is to examine strategic use of prediction algorithms. We introduce a novel game-theoretic setting that is based on the PAC learning framework, where each player (aka a prediction algorithm aimed at competition) seeks to maximize the sum of points for which it produces an accurate prediction and the others do not. We show that algorithms aiming at generalization may wittingly mispredict some points to perform better than others on expectation. We analyze the empirical game, i.e., the game induced on a given sample, prove that it always possesses a pure Nash equilibrium, and show that every better-response learning process converges. Moreover, our learning-theoretic analysis suggests that players can, with high probability, learn an approximate pure Nash equilibrium for the whole population using a small number of samples. |
Tasks | |
Published | 2019-05-04 |
URL | https://arxiv.org/abs/1905.02576v1 |
https://arxiv.org/pdf/1905.02576v1.pdf | |
PWC | https://paperswithcode.com/paper/regression-equilibrium |
Repo | |
Framework | |
MiSC: Mixed Strategies Crowdsourcing
Title | MiSC: Mixed Strategies Crowdsourcing |
Authors | Ching-Yun Ko, Rui Lin, Shu Li, Ngai Wong |
Abstract | Popular crowdsourcing techniques mostly focus on evaluating workers’ labeling quality before adjusting their weights during label aggregation. Recently, another cohort of models regard crowdsourced annotations as incomplete tensors and recover unfilled labels by tensor completion. However, mixed strategies of the two methodologies have never been comprehensively investigated, leaving them as rather independent approaches. In this work, we propose $\textit{MiSC}$ ($\textbf{Mi}$xed $\textbf{S}$trategies $\textbf{C}$rowdsourcing), a versatile framework integrating arbitrary conventional crowdsourcing and tensor completion techniques. In particular, we propose a novel iterative Tucker label aggregation algorithm that outperforms state-of-the-art methods in extensive experiments. |
Tasks | |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07394v1 |
https://arxiv.org/pdf/1905.07394v1.pdf | |
PWC | https://paperswithcode.com/paper/misc-mixed-strategies-crowdsourcing |
Repo | |
Framework | |
Effects of Illumination on the Categorization of Shiny Materials
Title | Effects of Illumination on the Categorization of Shiny Materials |
Authors | J. Farley Norman, James T. Todd, Flip Phillips |
Abstract | The present research was designed to examine how patterns of illumination influence the perceptual categorization of metal, shiny black, and shiny white materials. The stimuli depicted three possible objects that were illuminated by five possible HDRI light maps, which varied in their overall distributions of illuminant directions and intensities. The surfaces included a low roughness chrome material, a shiny black material, and a shiny white material with both diffuse and specular components. Observers rated each stimulus by adjusting four sliders to indicate their confidence that the depicted material was metal, shiny black, shiny white or something else, and these adjustments were constrained so that the sum of all four settings was always 100%. The results revealed that the metal and shiny black categories are easily confused. For example, metal materials with low intensity light maps or a narrow range of illuminant directions are often judged as shiny black, whereas shiny black materials with high intensity light maps or a wide range of illuminant directions are often judged as metal. A spherical harmonic analysis was performed on the different light maps in an effort to quantitatively predict how they would bias observers’ judgments of metal and shiny black surfaces. |
Tasks | |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00902v2 |
https://arxiv.org/pdf/1908.00902v2.pdf | |
PWC | https://paperswithcode.com/paper/effects-of-illumination-on-the-categorization |
Repo | |
Framework | |
Towards Automatic Screening of Typical and Atypical Behaviors in Children With Autism
Title | Towards Automatic Screening of Typical and Atypical Behaviors in Children With Autism |
Authors | Andrew Cook, Bappaditya Mandal, Donna Berry, Matthew Johnson |
Abstract | This paper has been withdrawn by the authors due to insufficient or definition error(s) in the ethics approval protocol. Autism spectrum disorders (ASD) impact the cognitive, social, communicative and behavioral abilities of an individual. The development of new clinical decision support systems is of importance in reducing the delay between presentation of symptoms and an accurate diagnosis. In this work, we contribute a new database consisting of video clips of typical (normal) and atypical (such as hand flapping, spinning or rocking) behaviors, displayed in natural settings, which have been collected from the YouTube video website. We propose a preliminary non-intrusive approach based on skeleton keypoint identification using pretrained deep neural networks on human body video clips to extract features and perform body movement analysis that differentiates typical and atypical behaviors of children. Experimental results on the newly contributed database show that our platform performs best with decision tree as the classifier when compared to other popular methodologies and offers a baseline against which alternate approaches may developed and tested. |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12537v2 |
https://arxiv.org/pdf/1907.12537v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-automatic-screening-of-typical-and |
Repo | |
Framework | |
Deep Semantic Segmentation of Natural and Medical Images: A Review
Title | Deep Semantic Segmentation of Natural and Medical Images: A Review |
Authors | Saeid Asgari Taghanaki, Kumar Abhishek, Joseph Paul Cohen, Julien Cohen-Adad, Ghassan Hamarneh |
Abstract | The (medical) image semantic segmentation task consists of classifying each pixel of an image (or just several ones) into an instance, where each instance (or category) corresponding to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the main deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural improvements, data synthesis-based, loss function-based improvements, sequenced models, weakly supervised, and multi-task methods and further for each group we analyzed each variant of these groups and discuss limitations of the current approaches and future research directions for semantic image segmentation. |
Tasks | Medical Image Segmentation, Scene Understanding, Semantic Segmentation |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07655v2 |
https://arxiv.org/pdf/1910.07655v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-semantic-segmentation-of-natural-and |
Repo | |
Framework | |
Generalized Integrated Gradients: A practical method for explaining diverse ensembles
Title | Generalized Integrated Gradients: A practical method for explaining diverse ensembles |
Authors | John Merrill, Geoff Ward, Sean Kamkar, Jay Budzik, Douglas Merrill |
Abstract | We introduce Generalized Integrated Gradients (GIG), a formal extension of the Integrated Gradients (IG) (Sundararajan et al., 2017) method for attributing credit to the input variables of a predictive model. GIG improves IG by explaining a broader variety of functions that arise from practical applications of ML in domains like financial services. GIG is constructed to overcome limitations of Shapley (1953) and Aumann-Shapley (1974), and has desirable properties when compared to other approaches. We prove GIG is the only correct method, under a small set of reasonable axioms, for providing explanations for mixed-type models or games. We describe the implementation, and present results of experiments on several datasets and systems of models. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01869v2 |
https://arxiv.org/pdf/1909.01869v2.pdf | |
PWC | https://paperswithcode.com/paper/generalized-integrated-gradients-a-practical |
Repo | |
Framework | |
Deep-seismic-prior-based reconstruction of seismic data using convolutional neural networks
Title | Deep-seismic-prior-based reconstruction of seismic data using convolutional neural networks |
Authors | Qun Liu, Lihua Fu, Meng Zhang |
Abstract | Reconstruction of seismic data with missing traces is a long-standing issue in seismic data processing. In recent years, rank reduction operations are being commonly utilized to overcome this problem, which require the rank of seismic data to be a prior. However, the rank of field data is unknown; usually it requires much time to manually adjust the rank and just obtain an approximated rank. Methods based on deep learning require very large datasets for training; however acquiring large datasets is difficult owing to physical or financial constraints in practice. Therefore, in this work, we developed a novel method based on unsupervised learning using the intrinsic properties of a convolutional neural network known as U-net, without training datasets. Only one undersampled seismic data was needed, and the deep seismic prior of input data could be exploited by the network itself, thus making the reconstruction convenient. Furthermore, this method can handle both irregular and regular seismic data. Synthetic and field data were tested to assess the performance of the proposed algorithm (DSPRecon algorithm); the advantages of using our method were evaluated by comparing it with the singular spectrum analysis (SSA) method for irregular data reconstruction and de-aliased Cadzow method for regular data reconstruction. Experimental results showed that our method provided better reconstruction performance than the SSA or Cadzow methods. The recovered signal-to-noise ratios (SNRs) were 32.68 dB and 19.11 dB for the DSPRecon and SSA algorithms, respectively. Those for the DSPRecon and Cadzow methods were 35.91 dB and 15.32 dB, respectively. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08784v1 |
https://arxiv.org/pdf/1911.08784v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-seismic-prior-based-reconstruction-of |
Repo | |
Framework | |
SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation
Title | SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation |
Authors | Ken C. L. Wong, Mehdi Moradi |
Abstract | Deep learning has largely reduced the need for manual feature selection in image segmentation. Nevertheless, network architecture optimization and hyperparameter tuning are mostly manual and time consuming. Although there are increasing research efforts on network architecture search in computer vision, most works concentrate on image classification but not segmentation, and there are very limited efforts on medical image segmentation especially in 3D. To remedy this, here we propose a framework, SegNAS3D, for network architecture search of 3D image segmentation. In this framework, a network architecture comprises interconnected building blocks that consist of operations such as convolution and skip connection. By representing the block structure as a learnable directed acyclic graph, hyperparameters such as the number of feature channels and the option of using deep supervision can be learned together through derivative-free global optimization. Experiments on 43 3D brain magnetic resonance images with 19 structures achieved an average Dice coefficient of 82%. Each architecture search required less than three days on three GPUs and produced architectures that were much smaller than the state-of-the-art manually created architectures. |
Tasks | Feature Selection, Image Classification, Medical Image Segmentation, Semantic Segmentation |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05962v1 |
https://arxiv.org/pdf/1909.05962v1.pdf | |
PWC | https://paperswithcode.com/paper/segnas3d-network-architecture-search-with |
Repo | |
Framework | |
Generalized tensor regression with covariates on multiple modes
Title | Generalized tensor regression with covariates on multiple modes |
Authors | Zhuoyan Xu, Jiaxin Hu, Miaoyan Wang |
Abstract | We consider the problem of tensor-response regression given covariates on multiple modes. Such data problems arise frequently in applications such as neuroimaging, network analysis, and spatial-temporal modeling. We propose a new family of tensor response regression models that incorporate covariates, and establish the theoretical accuracy guarantees. Unlike earlier methods, our estimation allows high-dimensionality in both the tensor response and the covariate matrices on multiple modes. An efficient alternating updating algorithm is further developed. Our proposal handles a broad range of data types, including continuous, count, and binary observations. Through simulation and applications to two real datasets, we demonstrate the outperformance of our approach over the state-of-art. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09499v1 |
https://arxiv.org/pdf/1910.09499v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-tensor-regression-with-covariates |
Repo | |
Framework | |
A CNN-based approach to classify cricket bowlers based on their bowling actions
Title | A CNN-based approach to classify cricket bowlers based on their bowling actions |
Authors | Md Nafee Al Islam, Tanzil Bin Hassan, Siamul Karim Khan |
Abstract | With the advances in hardware technologies and deep learning techniques, it has become feasible to apply these techniques in diverse fields. Convolutional Neural Network (CNN), an architecture from the field of deep learning, has revolutionized Computer Vision. Sports is one of the avenues in which the use of computer vision is thriving. Cricket is a complex game consisting of different types of shots, bowling actions and many other activities. Every bowler, in a game of cricket, bowls with a different bowling action. We leverage this point to identify different bowlers. In this paper, we have proposed a CNN model to identify eighteen different cricket bowlers based on their bowling actions using transfer learning. Additionally, we have created a completely new dataset containing 8100 images of these eighteen bowlers to train the proposed framework and evaluate its performance. We have used the VGG16 model pre-trained with the ImageNet dataset and added a few layers on top of it to build our model. After trying out different strategies, we found that freezing the weights for the first 14 layers of the network and training the rest of the layers works best. Our approach achieves an overall average accuracy of 93.3% on the test set and converges to a very low cross-entropy loss. |
Tasks | Game of Cricket, Transfer Learning |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01228v1 |
https://arxiv.org/pdf/1909.01228v1.pdf | |
PWC | https://paperswithcode.com/paper/a-cnn-based-approach-to-classify-cricket |
Repo | |
Framework | |
Graph Nets for Partial Charge Prediction
Title | Graph Nets for Partial Charge Prediction |
Authors | Yuanqing Wang, Josh Fass, Chaya D. Stern, Kun Luo, John Chodera |
Abstract | Atomic partial charges are crucial parameters for Molecular Dynamics (MD) simulations, molecular mechanics calculations, and virtual screening, as they determine the electrostatic contributions to interaction energies. Current methods for calculating partial charges, however, are either slow and scale poorly with molecular size (quantum chemical methods) or unreliable (empirical methods). Here, we present a new charge derivation method based on Graph Nets—a set of update and aggregate functions that operate on molecular topologies and propagate information thereon—that could approximate charges derived from Density Functional Theory (DFT) calculations with high accuracy and an over 500-fold speed up. |
Tasks | |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07903v1 |
https://arxiv.org/pdf/1909.07903v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-nets-for-partial-charge-prediction |
Repo | |
Framework | |