January 29, 2020

3322 words 16 mins read

Paper Group ANR 727

Paper Group ANR 727

A numerical measure of the instability of Mapper-type algorithms. Learning to Generalize to Unseen Tasks with Bilevel Optimization. Convergence Rates of Variational Inference in Sparse Deep Learning. On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost. Machine: The New Art Connoisseur. Improving Model …

A numerical measure of the instability of Mapper-type algorithms

Title A numerical measure of the instability of Mapper-type algorithms
Authors Francisco Belchí, Jacek Brodzki, Matthew Burfitt, Mahesan Niranjan
Abstract Mapper is an unsupervised machine learning algorithm generalising the notion of clustering to obtain a geometric description of a dataset. The procedure splits the data into possibly overlapping bins which are then clustered. The output of the algorithm is a graph where nodes represent clusters and edges represent the sharing of data points between two clusters. However, several parameters must be selected before applying Mapper and the resulting graph may vary dramatically with the choice of parameters. We define an intrinsic notion of Mapper instability that measures the variability of the output as a function of the choice of parameters required to construct a Mapper output. Our results and discussion are general and apply to all Mapper-type algorithms. We derive theoretical results that provide estimates for the instability and suggest practical ways to control it. We provide also experiments to illustrate our results and in particular we demonstrate that a reliable candidate Mapper output can be identified as a local minimum of instability regarded as a function of Mapper input parameters.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01507v1
PDF https://arxiv.org/pdf/1906.01507v1.pdf
PWC https://paperswithcode.com/paper/a-numerical-measure-of-the-instability-of
Repo
Framework

Learning to Generalize to Unseen Tasks with Bilevel Optimization

Title Learning to Generalize to Unseen Tasks with Bilevel Optimization
Authors Hayeon Lee, Donghyun Na, Hae Beom Lee, Sung Ju Hwang
Abstract Recent metric-based meta-learning approaches, which learn a metric space that generalizes well over combinatorial number of different classification tasks sampled from a task distribution, have been shown to be effective for few-shot classification tasks of unseen classes. They are often trained with episodic training where they iteratively train a common metric space that reduces distance between the class representatives and instances belonging to each class, over large number of episodes with random classes. However, this training is limited in that while the main target is the generalization to the classification of unseen classes during training, there is no explicit consideration of generalization during meta-training phase. To tackle this issue, we propose a simple yet effective meta-learning framework for metricbased approaches, which we refer to as learning to generalize (L2G), that explicitly constrains the learning on a sampled classification task to reduce the classification error on a randomly sampled unseen classification task with a bilevel optimization scheme. This explicit learning aimed toward generalization allows the model to obtain a metric that separates well between unseen classes. We validate our L2G framework on mini-ImageNet and tiered-ImageNet datasets with two base meta-learning few-shot classification models, Prototypical Networks and Relation Networks. The results show that L2G significantly improves the performance of the two methods over episodic training. Further visualization shows that L2G obtains a metric space that clusters and separates unseen classes well.
Tasks bilevel optimization, Meta-Learning
Published 2019-08-05
URL https://arxiv.org/abs/1908.01457v1
PDF https://arxiv.org/pdf/1908.01457v1.pdf
PWC https://paperswithcode.com/paper/learning-to-generalize-to-unseen-tasks-with
Repo
Framework

Convergence Rates of Variational Inference in Sparse Deep Learning

Title Convergence Rates of Variational Inference in Sparse Deep Learning
Authors Badr-Eddine Chérief-Abdellatif
Abstract Variational inference is becoming more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Meanwhile, a few recent works have provided theoretical justification and new insights on deep neural networks for estimating smooth functions in usual settings such as nonparametric regression. In this paper, we show that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference. In particular, we highlight the connection between estimation and approximation theories via the classical bias-variance trade-off and show that it leads to near-minimax rates of convergence for H"older smooth functions. Additionally, we show that the model selection framework over the neural network architecture via ELBO maximization does not overfit and adaptively achieves the optimal rate of convergence.
Tasks Bayesian Inference, Model Selection
Published 2019-08-09
URL https://arxiv.org/abs/1908.04847v2
PDF https://arxiv.org/pdf/1908.04847v2.pdf
PWC https://paperswithcode.com/paper/generalization-error-bounds-for-deep
Repo
Framework

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Title On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
Authors Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang
Abstract Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
Tasks bilevel optimization
Published 2019-07-14
URL https://arxiv.org/abs/1907.06246v1
PDF https://arxiv.org/pdf/1907.06246v1.pdf
PWC https://paperswithcode.com/paper/on-the-global-convergence-of-actor-critic-a
Repo
Framework

Machine: The New Art Connoisseur

Title Machine: The New Art Connoisseur
Authors Yucheng Zhu, Yanrong Ji, Yueying Zhang, Linxin Xu, Aven Le Zhou, Ellick Chan
Abstract The process of identifying and understanding art styles to discover artistic influences is essential to the study of art history. Traditionally, trained experts review fine details of the works and compare them to other known works. To automate and scale this task, we use several state-of-the-art CNN architectures to explore how a machine may help perceive and quantify art styles. This study explores: (1) How accurately can a machine classify art styles? (2) What may be the underlying relationships among different styles and artists? To help answer the first question, our best-performing model using Inception V3 achieves a 9-class classification accuracy of 88.35%, which outperforms the model in Elgammal et al.‘s study by more than 20 percent. Visualizations using Grad-CAM heat maps confirm that the model correctly focuses on the characteristic parts of paintings. To help address the second question, we conduct network analysis on the influences among styles and artists by extracting 512 features from the best-performing classification model. Through 2D and 3D T-SNE visualizations, we observe clear chronological patterns of development and separation among the art styles. The network analysis also appears to show anticipated artist level connections from an art historical perspective. This technique appears to help identify some previously unknown linkages that may shed light upon new directions for further exploration by art historians. We hope that humans and machines working in concert may bring new opportunities to the field.
Tasks
Published 2019-11-22
URL https://arxiv.org/abs/1911.10091v2
PDF https://arxiv.org/pdf/1911.10091v2.pdf
PWC https://paperswithcode.com/paper/machine-the-new-art-connoisseur
Repo
Framework

Improving Model Drift for Robust Object Tracking

Title Improving Model Drift for Robust Object Tracking
Authors Qiujie Dong, Xuedong He, Haiyan Ge, Qin Liu, Aifu Han, Shengzong Zhou
Abstract Discriminative correlation filters show excellent performance in object tracking. However, in complex scenes, the apparent characteristics of the tracked target are variable, which makes it easy to pollute the model and cause the model drift. In this paper, considering that the secondary peak has a greater impact on the model update, we propose a method for detecting the primary and secondary peaks of the response map. Secondly, a novel confidence function which uses the adaptive update discriminant mechanism is proposed, which yield good robustness. Thirdly, we propose a robust tracker with correlation filters, which uses hand-crafted features and can improve model drift in complex scenes. Finally, in order to cope with the current trackers’ multi-feature response merge, we propose a simple exponential adaptive merge approach. Extensive experiments are performed on OTB2013, OTB100 and TC128 datasets. Our approach performs superiorly against several state-of-the-art trackers while runs at speed in real time.
Tasks Object Tracking
Published 2019-12-02
URL https://arxiv.org/abs/1912.00826v1
PDF https://arxiv.org/pdf/1912.00826v1.pdf
PWC https://paperswithcode.com/paper/improving-model-drift-for-robust-object
Repo
Framework

Gaze Gestures and Their Applications in human-computer interaction with a head-mounted display

Title Gaze Gestures and Their Applications in human-computer interaction with a head-mounted display
Authors W. X. Chen, X. Y. Cui, J. Zheng, J. M. Zhang, S. Chen, Y. D. Yao
Abstract A head-mounted display (HMD) is a portable and interactive display device. With the development of 5G technology, it may become a general-purpose computing platform in the future. Human-computer interaction (HCI) technology for HMDs has also been of significant interest in recent years. In addition to tracking gestures and speech, tracking human eyes as a means of interaction is highly effective. In this paper, we propose two UnityEyes-based convolutional neural network models, UEGazeNet and UEGazeNet*, which can be used for input images with low resolution and high resolution, respectively. These models can perform rapid interactions by classifying gaze trajectories (GTs), and a GTgestures dataset containing data for 10,200 “eye-painting gestures” collected from 15 individuals is established with our gaze-tracking method. We evaluated the performance both indoors and outdoors and the UEGazeNet can obtaine results 52% and 67% better than those of state-of-the-art networks. The generalizability of our GTgestures dataset using a variety of gaze-tracking models is evaluated, and an average recognition rate of 96.71% is obtained by our method.
Tasks
Published 2019-10-16
URL https://arxiv.org/abs/1910.07428v1
PDF https://arxiv.org/pdf/1910.07428v1.pdf
PWC https://paperswithcode.com/paper/gaze-gestures-and-their-applications-in-human
Repo
Framework

Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing

Title Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
Authors Ben Bogin, Matt Gardner, Jonathan Berant
Abstract Research on parsing language to SQL has largely ignored the structure of the database (DB) schema, either because the DB was very simple, or because it was observed at both training and test time. In Spider, a recently-released text-to-SQL dataset, new and complex DBs are given at test time, and so the structure of the DB schema can inform the predicted SQL query. In this paper, we present an encoder-decoder semantic parser, where the structure of the DB schema is encoded with a graph neural network, and this representation is later used at both encoding and decoding time. Evaluation shows that encoding the schema structure improves our parser accuracy from 33.8% to 39.4%, dramatically above the current state of the art, which is at 19.7%.
Tasks Text-To-Sql
Published 2019-05-15
URL https://arxiv.org/abs/1905.06241v2
PDF https://arxiv.org/pdf/1905.06241v2.pdf
PWC https://paperswithcode.com/paper/representing-schema-structure-with-graph
Repo
Framework

Visual enhancement of Cone-beam CT by use of CycleGAN

Title Visual enhancement of Cone-beam CT by use of CycleGAN
Authors S. Kida, S. Kaji, K. Nawa, T. Imae, T. Nakamoto, S. Ozaki, T. Ohta, Y. Nozawa, K. Nakagawa
Abstract Cone-beam computed tomography (CBCT) offers advantages over conventional fan-beam CT in that it requires a shorter time and less exposure to obtain images. CBCT has found a wide variety of applications in patient positioning for image-guided radiation therapy, extracting radiomic information for designing patient-specific treatment, and computing fractional dose distributions for adaptive radiation therapy. However, CBCT images suffer from low soft-tissue contrast, noise, and artifacts compared to conventional fan-beam CT images. Therefore, it is essential to improve the image quality of CBCT. In this paper, we propose a synthetic approach to translate CBCT images with deep neural networks. Our method requires only unpaired and unaligned CBCT images and planning fan-beam CT (PlanCT) images for training. Once trained, 3D reconstructed CBCT images can be directly translated to high-quality PlanCT-like images. We demonstrate the effectiveness of our method with images obtained from 24 prostate patients, and we provide a statistical and visual comparison. The image quality of the translated images shows substantial improvement in voxel values, spatial uniformity, and artifact suppression compared to those of the original CBCT. The anatomical structures of the original CBCT images were also well preserved in the translated images. Our method enables more accurate adaptive radiation therapy, and opens up new applications for CBCT that hinge on high-quality images.
Tasks
Published 2019-01-17
URL https://arxiv.org/abs/1901.05773v3
PDF https://arxiv.org/pdf/1901.05773v3.pdf
PWC https://paperswithcode.com/paper/cone-beam-ct-to-planning-ct-synthesis-using
Repo
Framework

Multi-Perspective Fusion Network for Commonsense Reading Comprehension

Title Multi-Perspective Fusion Network for Commonsense Reading Comprehension
Authors Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu
Abstract Commonsense Reading Comprehension (CRC) is a significantly challenging task, aiming at choosing the right answer for the question referring to a narrative passage, which may require commonsense knowledge inference. Most of the existing approaches only fuse the interaction information of choice, passage, and question in a simple combination manner from a \emph{union} perspective, which lacks the comparison information on a deeper level. Instead, we propose a Multi-Perspective Fusion Network (MPFN), extending the single fusion method with multiple perspectives by introducing the \emph{difference} and \emph{similarity} fusion\deleted{along with the \emph{union}}. More comprehensive and accurate information can be captured through the three types of fusion. We design several groups of experiments on MCScript dataset \cite{Ostermann:LREC18:MCScript} to evaluate the effectiveness of the three types of fusion respectively. From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion. The experimental result also shows that our MPFN model achieves the state-of-the-art with an accuracy of 83.52% on the official test set.
Tasks Reading Comprehension
Published 2019-01-08
URL http://arxiv.org/abs/1901.02257v1
PDF http://arxiv.org/pdf/1901.02257v1.pdf
PWC https://paperswithcode.com/paper/multi-perspective-fusion-network-for
Repo
Framework

Precipitation nowcasting using a stochastic variational frame predictor with learned prior distribution

Title Precipitation nowcasting using a stochastic variational frame predictor with learned prior distribution
Authors Alexander Bihlo
Abstract We propose the use of a stochastic variational frame prediction deep neural network with a learned prior distribution trained on two-dimensional rain radar reflectivity maps for precipitation nowcasting with lead times of up to 2 1/2 hours. We present a comparison to a standard convolutional LSTM network and assess the evolution of the structural similarity index for both methods. Case studies are presented that illustrate that the novel methodology can yield meaningful forecasts without excessive blur for the time horizons of interest.
Tasks
Published 2019-05-13
URL https://arxiv.org/abs/1905.05037v1
PDF https://arxiv.org/pdf/1905.05037v1.pdf
PWC https://paperswithcode.com/paper/precipitation-nowcasting-using-a-stochastic
Repo
Framework

How do Mixture Density RNNs Predict the Future?

Title How do Mixture Density RNNs Predict the Future?
Authors Kai Olav Ellefsen, Charles Patrick Martin, Jim Torresen
Abstract Gaining a better understanding of how and what machine learning systems learn is important to increase confidence in their decisions and catalyze further research. In this paper, we analyze the predictions made by a specific type of recurrent neural network, mixture density RNNs (MD-RNNs). These networks learn to model predictions as a combination of multiple Gaussian distributions, making them particularly interesting for problems where a sequence of inputs may lead to several distinct future possibilities. An example is learning internal models of an environment, where different events may or may not occur, but where the average over different events is not meaningful. By analyzing the predictions made by trained MD-RNNs, we find that their different Gaussian components have two complementary roles: 1) Separately modeling different stochastic events and 2) Separately modeling scenarios governed by different rules. These findings increase our understanding of what is learned by predictive MD-RNNs, and open up new research directions for further understanding how we can benefit from their self-organizing model decomposition.
Tasks
Published 2019-01-23
URL http://arxiv.org/abs/1901.07859v1
PDF http://arxiv.org/pdf/1901.07859v1.pdf
PWC https://paperswithcode.com/paper/how-do-mixture-density-rnns-predict-the
Repo
Framework

Spam Review Detection with Graph Convolutional Networks

Title Spam Review Detection with Graph Convolutional Networks
Authors Ao Li, Zhou Qin, Runshi Liu, Yiqun Yang, Dong Li
Abstract Customers make a lot of reviews on online shopping websites every day, e.g., Amazon and Taobao. Reviews affect the buying decisions of customers, meanwhile, attract lots of spammers aiming at misleading buyers. Xianyu, the largest second-hand goods app in China, suffering from spam reviews. The anti-spam system of Xianyu faces two major challenges: scalability of the data and adversarial actions taken by spammers. In this paper, we present our technical solutions to address these challenges. We propose a large-scale anti-spam method based on graph convolutional networks (GCN) for detecting spam advertisements at Xianyu, named GCN-based Anti-Spam (GAS) model. In this model, a heterogeneous graph and a homogeneous graph are integrated to capture the local context and global context of a comment. Offline experiments show that the proposed method is superior to our baseline model in which the information of reviews, features of users and items being reviewed are utilized. Furthermore, we deploy our system to process million-scale data daily at Xianyu. The online performance also demonstrates the effectiveness of the proposed method.
Tasks
Published 2019-08-22
URL https://arxiv.org/abs/1908.10679v1
PDF https://arxiv.org/pdf/1908.10679v1.pdf
PWC https://paperswithcode.com/paper/spam-review-detection-with-graph
Repo
Framework

Towards a general model for psychopathology

Title Towards a general model for psychopathology
Authors Alessandro Fontana
Abstract The DSM-1 was published in 1952, contains 128 diagnostic categories, described in 132 pages. The DSM-5 appeared in 2013, contains 541 diagnostic categories, described in 947 pages. The field of psychology is characterised by a steady proliferation of diagnostic models and subcategories, that seems to be inspired by the principle of “divide and inflate”. This approach is in contrast with experimental evidence, which suggests on one hand that traumas of various kind are often present in the anamnesis of patients and, on the other, that the gene variants implicated are shared across a wide range of diagnoses. In this work I propose a holistic approach, built with tools borrowed from the field of Artificial Intelligence. My model is based on two pillars. The first one is trauma, which represents the attack to the mind, is psychological in nature and has its origin in the environment. The second pillar is dissociation, which represents the mind defence in both physiological and pathological conditions, and incorporates all other defence mechanisms. Damages to dissociation can be considered as another category of attacks, that are neurobiological in nature and can be of genetic or environmental origin. They include, among other factors, synaptic over-pruning, abuse of drugs and inflammation. These factors concur to weaken the defence, represented by the neural networks that implement the dissociation mechanism in the brain. The model is subsequently used to interpret five mental conditions: PTSD, complex PTSD, dissociative identity disorder, schizophrenia and bipolar disorder. Ideally, this is a first step towards building a model that aims to explain a wider range of psychopathological affections with a single theoretical framework. The last part is dedicated to sketching a new psychotherapy for psychological trauma.
Tasks
Published 2019-09-05
URL https://arxiv.org/abs/1909.02199v1
PDF https://arxiv.org/pdf/1909.02199v1.pdf
PWC https://paperswithcode.com/paper/towards-a-general-model-for-psychopathology
Repo
Framework

Ternary MobileNets via Per-Layer Hybrid Filter Banks

Title Ternary MobileNets via Per-Layer Hybrid Filter Banks
Authors Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina
Abstract MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly constrained devices require further compression of MobileNets-like already compute-efficient networks. Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4-6 bits albeit with a modest to significant drop in accuracy. While quantization to sub-byte values (i.e. precision less than or equal to 8 bits) has been valuable, even further quantization of MobileNets to binary or ternary values is necessary to realize significant energy savings and possibly runtime speedups on specialized hardware, such as ASICs and FPGAs. Under the key observation that convolutional filters at each layer of a deep neural network may respond differently to ternary quantization, we propose a novel quantization method that generates per-layer hybrid filter banks consisting of full-precision and ternary weight filters for MobileNets. The layer-wise hybrid filter banks essentially combine the strengths of full-precision and ternary weight filters to derive a compact, energy-efficient architecture for MobileNets. Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27.98% savings in energy, and a 51.07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.
Tasks Quantization
Published 2019-11-04
URL https://arxiv.org/abs/1911.01028v1
PDF https://arxiv.org/pdf/1911.01028v1.pdf
PWC https://paperswithcode.com/paper/ternary-mobilenets-via-per-layer-hybrid-1
Repo
Framework
comments powered by Disqus