April 2, 2020

3365 words 16 mins read

Paper Group ANR 189

Paper Group ANR 189

Event-Based Control for Online Training of Neural Networks. Overinterpretation reveals image classification model pathologies. Copy and Paste GAN: Face Hallucination from Shaded Thumbnails. Object-based Metamorphic Testing through Image Structuring. Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?. A Lagrangian Appro …

Event-Based Control for Online Training of Neural Networks

Title Event-Based Control for Online Training of Neural Networks
Authors Zilong Zhao, Sophie Cerf, Bogdan Robu, Nicolas Marchand
Abstract Convolutional Neural Network (CNN) has become the most used method for image classification tasks. During its training the learning rate and the gradient are two key factors to tune for influencing the convergence speed of the model. Usual learning rate strategies are time-based i.e. monotonous decay over time. Recent state-of-the-art techniques focus on adaptive gradient algorithms i.e. Adam and its versions. In this paper we consider an online learning scenario and we propose two Event-Based control loops to adjust the learning rate of a classical algorithm E (Exponential)/PD (Proportional Derivative)-Control. The first Event-Based control loop will be implemented to prevent sudden drop of the learning rate when the model is approaching the optimum. The second Event-Based control loop will decide, based on the learning speed, when to switch to the next data batch. Experimental evaluationis provided using two state-of-the-art machine learning image datasets (CIFAR-10 and CIFAR-100). Results show the Event-Based E/PD is better than the original algorithm (higher final accuracy, lower final loss value), and the Double-Event-BasedE/PD can accelerate the training process, save up to 67% training time compared to state-of-the-art algorithms and even result in better performance.
Tasks Image Classification
Published 2020-03-20
URL https://arxiv.org/abs/2003.09503v1
PDF https://arxiv.org/pdf/2003.09503v1.pdf
PWC https://paperswithcode.com/paper/event-based-control-for-online-training-of

Overinterpretation reveals image classification model pathologies

Title Overinterpretation reveals image classification model pathologies
Authors Brandon Carter, Siddhartha Jain, Jonas Mueller, David Gifford
Abstract Image classifiers are typically scored on their test set accuracy, but high accuracy can mask a subtle type of model failure. We find that high scoring convolutional neural networks (CNN) exhibit troubling pathologies that allow them to display high accuracy even in the absence of semantically salient features. When a model provides a high-confidence decision without salient supporting input features we say that the classifier has overinterpreted its input, finding too much class-evidence in patterns that appear nonsensical to humans. Here, we demonstrate that state of the art neural networks for CIFAR-10 and ImageNet suffer from overinterpretation, and find CIFAR-10 trained models make confident predictions even when 95% of an input image has been masked and humans are unable to discern salient features in the remaining pixel subset. Although these patterns portend potential model fragility in real-world deployment, they are in fact valid statistical patterns of the image classification benchmark that alone suffice to attain high test accuracy. We find that ensembling strategies can help mitigate model overinterpretation, and classifiers which rely on more semantically meaningful features can improve accuracy over both the test set and out-of-distribution images from a different source than the training data.
Tasks Image Classification
Published 2020-03-19
URL https://arxiv.org/abs/2003.08907v1
PDF https://arxiv.org/pdf/2003.08907v1.pdf
PWC https://paperswithcode.com/paper/overinterpretation-reveals-image

Copy and Paste GAN: Face Hallucination from Shaded Thumbnails

Title Copy and Paste GAN: Face Hallucination from Shaded Thumbnails
Authors Yang Zhang, Ivor Tsang, Yawei Luo, Changhui Hu, Xiaobo Lu, Xin Yu
Abstract Existing face hallucination methods based on convolutional neural networks (CNN) have achieved impressive performance on low-resolution (LR) faces in a normal illumination condition. However, their performance degrades dramatically when LR faces are captured in low or non-uniform illumination conditions. This paper proposes a Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination. To this end, we develop two key components in our CPGAN: internal and external Copy and Paste nets (CPnets). Specifically, our internal CPnet exploits facial information residing in the input image to enhance facial details; while our external CPnet leverages an external HR face for illumination compensation. A new illumination compensation loss is thus developed to capture illumination from the external guided face image effectively. Furthermore, our method offsets illumination and upsamples facial details alternately in a coarse-to-fine fashion, thus alleviating the correspondence ambiguity between LR inputs and external HR inputs. Extensive experiments demonstrate that our method manifests authentic HR face images in a uniform illumination condition and outperforms state-of-the-art methods qualitatively and quantitatively.
Tasks Face Hallucination
Published 2020-02-25
URL https://arxiv.org/abs/2002.10650v3
PDF https://arxiv.org/pdf/2002.10650v3.pdf
PWC https://paperswithcode.com/paper/copy-and-paste-gan-face-hallucination-from

Object-based Metamorphic Testing through Image Structuring

Title Object-based Metamorphic Testing through Image Structuring
Authors Adrian Wildandyawan, Yasuharu Nishi
Abstract Testing software is often costly due to the need of mass-producing test cases and providing a test oracle for it. This is often referred to as the oracle problem. One method that has been proposed in order to alleviate the oracle problem is metamorphic testing. Metamorphic testing produces new test cases by altering an existing test case, and uses the metamorphic relation between the inputs and the outputs of the System Under Test (SUT) to predict the expected outputs of the produced test cases. Metamorphic testing has often been used for image processing software, where changes are applied to the image’s attributes to create new test cases with annotations that are the same as the original image. We refer to this existing method as the image-based metamorphic testing. In this research, we propose an object-based metamorphic testing and a composite metamorphic testing which combines different metamorphic testing approaches to relatively increase test coverage.
Published 2020-02-12
URL https://arxiv.org/abs/2002.07046v1
PDF https://arxiv.org/pdf/2002.07046v1.pdf
PWC https://paperswithcode.com/paper/object-based-metamorphic-testing-through

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

Title Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?
Authors Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat
Abstract The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game-episodes, and training epochs. The intuition is that these three increase together as self-play iterations increase, and that increasing them individually is sub-optimal. A consequence of our experiments is a direct recommendation for setting hyper-parameter values in self-play: the overarching outer-loop of self-play iterations should be maximized, in favor of the three inner-loop hyper-parameters, which should be set at lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05988v1
PDF https://arxiv.org/pdf/2003.05988v1.pdf
PWC https://paperswithcode.com/paper/analysis-of-hyper-parameters-for-small-games

A Lagrangian Approach to Information Propagation in Graph Neural Networks

Title A Lagrangian Approach to Information Propagation in Graph Neural Networks
Authors Matteo Tiezzi, Giuseppe Marra, Stefano Melacci, Marco Maggini, Marco Gori
Abstract In many real world applications, data are characterized by a complex structure, that can be naturally encoded as a graph. In the last years, the popularity of deep learning techniques has renewed the interest in neural models able to process complex patterns. In particular, inspired by the Graph Neural Network (GNN) model, different architectures have been proposed to extend the original GNN scheme. GNNs exploit a set of state variables, each assigned to a graph node, and a diffusion mechanism of the states among neighbor nodes, to implement an iterative procedure to compute the fixed point of the (learnable) state transition function. In this paper, we propose a novel approach to the state computation and the learning algorithm for GNNs, based on a constraint optimisation task solved in the Lagrangian framework. The state convergence procedure is implicitly expressed by the constraint satisfaction mechanism and does not require a separate iterative phase for each epoch of the learning procedure. In fact, the computational structure is based on the search for saddle points of the Lagrangian in the adjoint space composed of weights, neural outputs (node states), and Lagrange multipliers. The proposed approach is compared experimentally with other popular models for processing graphs.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07684v2
PDF https://arxiv.org/pdf/2002.07684v2.pdf
PWC https://paperswithcode.com/paper/a-lagrangian-approach-to-information

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

Title Tight Lower Bounds for Combinatorial Multi-Armed Bandits
Authors Nadav Merlis, Shie Mannor
Abstract The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose. While previous work proved regret upper bounds in this setting for general reward functions, only a few works provided matching lower bounds, all for specific reward functions. In this work, we prove regret lower bounds for combinatorial bandits that hold under mild assumptions for all smooth reward functions. We derive both problem-dependent and problem-independent bounds and show that the recently proposed Gini-weighted smoothness parameter \citep{merlisM19} also determines the lower bounds for monotone reward functions. Notably, this implies that our lower bounds are tight up to log-factors.
Tasks Decision Making, Multi-Armed Bandits
Published 2020-02-13
URL https://arxiv.org/abs/2002.05392v1
PDF https://arxiv.org/pdf/2002.05392v1.pdf
PWC https://paperswithcode.com/paper/tight-lower-bounds-for-combinatorial-multi

Semi-Modular Inference: enhanced learning in multi-modular models by tempering the influence of components

Title Semi-Modular Inference: enhanced learning in multi-modular models by tempering the influence of components
Authors Chris U. Carmona, Geoff K. Nicholls
Abstract Bayesian statistical inference loses predictive optimality when generative models are misspecified. Working within an existing coherent loss-based generalisation of Bayesian inference, we show existing Modular/Cut-model inference is coherent, and write down a new family of Semi-Modular Inference (SMI) schemes, indexed by an influence parameter, with Bayesian inference and Cut-models as special cases. We give a meta-learning criterion and estimation procedure to choose the inference scheme. This returns Bayesian inference when there is no misspecification. The framework applies naturally to Multi-modular models. Cut-model inference allows directed information flow from well-specified modules to misspecified modules, but not vice versa. An existing alternative power posterior method gives tunable but undirected control of information flow, improving prediction in some settings. In contrast, SMI allows tunable and directed information flow between modules. We illustrate our methods on two standard test cases from the literature and a motivating archaeological data set.
Tasks Bayesian Inference, Meta-Learning
Published 2020-03-15
URL https://arxiv.org/abs/2003.06804v1
PDF https://arxiv.org/pdf/2003.06804v1.pdf
PWC https://paperswithcode.com/paper/semi-modular-inference-enhanced-learning-in

Causality based Feature Fusion for Brain Neuro-Developmental Analysis

Title Causality based Feature Fusion for Brain Neuro-Developmental Analysis
Authors Peyman Hosseinzadeh Kassani, Li Xiao, Gemeng Zhang, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu Ping Wang
Abstract Human brain development is a complex and dynamic process that is affected by several factors such as genetics, sex hormones, and environmental changes. A number of recent studies on brain development have examined functional connectivity (FC) defined by the temporal correlation between time series of different brain regions. We propose to add the directional flow of information during brain maturation. To do so, we extract effective connectivity (EC) through Granger causality (GC) for two different groups of subjects, i.e., children and young adults. The motivation is that the inclusion of causal interaction may further discriminate brain connections between two age groups and help to discover new connections between brain regions. The contributions of this study are threefold. First, there has been a lack of attention to EC-based feature extraction in the context of brain development. To this end, we propose a new kernel-based GC (KGC) method to learn nonlinearity of complex brain network, where a reduced Sine hyperbolic polynomial (RSP) neural network was used as our proposed learner. Second, we used causality values as the weight for the directional connectivity between brain regions. Our findings indicated that the strength of connections was significantly higher in young adults relative to children. In addition, our new EC-based feature outperformed FC-based analysis from Philadelphia neurocohort (PNC) study with better discrimination of the different age groups. Moreover, the fusion of these two sets of features (FC + EC) improved brain age prediction accuracy by more than 4%, indicating that they should be used together for brain development studies.
Tasks Time Series
Published 2020-01-22
URL https://arxiv.org/abs/2001.08173v1
PDF https://arxiv.org/pdf/2001.08173v1.pdf
PWC https://paperswithcode.com/paper/causality-based-feature-fusion-for-brain

Vector logic and counterfactuals

Title Vector logic and counterfactuals
Authors Eduardo Mizraji
Abstract In this work we investigate the representation of counterfactual conditionals using the vector logic, a matrix-vectors formalism for logical functions and truth values. With this formalism, we can describe the counterfactuals as complex matrix operators that appear preprocessing the implication matrix with one of the square roots of the negation, a complex matrix. This mathematical approach puts in evidence the virtual character of the counterfactuals. The reason of this fact, is that this representation of a counterfactual proposition produces a valuation that is the superposition the two opposite truth values weighted, respectively, by two complex conjugated coefficients. This result shows that this procedure produces a uncertain evaluation projected on the complex domain. After this basic representation, the judgment of the plausibility of a given counterfactual allows us to shift the decision towards an acceptance or a refusal represented by the real vectors ‘true’ or ‘false’, and we can represent symbolically this shift applying for a second time the two square roots of the negation.
Published 2020-03-09
URL https://arxiv.org/abs/2003.11519v2
PDF https://arxiv.org/pdf/2003.11519v2.pdf
PWC https://paperswithcode.com/paper/vector-logic-and-counterfactuals

Stream-Flow Forecasting of Small Rivers Based on LSTM

Title Stream-Flow Forecasting of Small Rivers Based on LSTM
Authors Youchuan Hu, Le Yan, Tingting Hang, Jun Feng
Abstract Stream-flow forecasting for small rivers has always been of great importance, yet comparatively challenging due to the special features of rivers with smaller volume. Artificial Intelligence (AI) methods have been employed in this area for long, but improvement of forecast quality is still on the way. In this paper, we tried to provide a new method to do the forecast using the Long-Short Term Memory (LSTM) deep learning model, which aims in the field of time-series data. Utilizing LSTM, we collected the stream flow data from one hydrologic station in Tunxi, China, and precipitation data from 11 rainfall stations around to forecast the stream flow data from that hydrologic station 6 hours in the future. We evaluated the prediction results using three criteria: root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R^2). By comparing LSTM’s prediction with predictions of Support Vector Regression (SVR) and Multilayer Perceptions (MLP) models, we showed that LSTM has better performance, achieving RMSE of 82.007, MAE of 27.752, and R^2 of 0.970. We also did extended experiments on LSTM model, discussing influence factors of its performance.
Tasks Time Series
Published 2020-01-16
URL https://arxiv.org/abs/2001.05681v1
PDF https://arxiv.org/pdf/2001.05681v1.pdf
PWC https://paperswithcode.com/paper/stream-flow-forecasting-of-small-rivers-based

Human-like Time Series Summaries via Trend Utility Estimation

Title Human-like Time Series Summaries via Trend Utility Estimation
Authors Pegah Jandaghi, Jay Pujara
Abstract In many scenarios, humans prefer a text-based representation of quantitative data over numerical, tabular, or graphical representations. The attractiveness of textual summaries for complex data has inspired research on data-to-text systems. While there are several data-to-text tools for time series, few of them try to mimic how humans summarize for time series. In this paper, we propose a model to create human-like text descriptions for time series. Our system finds patterns in time series data and ranks these patterns based on empirical observations of human behavior using utility estimation. Our proposed utility estimation model is a Bayesian network capturing interdependencies between different patterns. We describe the learning steps for this network and introduce baselines along with their performance for each step. The output of our system is a natural language description of time series that attempts to match a human’s summary of the same data.
Tasks Time Series
Published 2020-01-16
URL https://arxiv.org/abs/2001.05665v1
PDF https://arxiv.org/pdf/2001.05665v1.pdf
PWC https://paperswithcode.com/paper/human-like-time-series-summaries-via-trend

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

Title Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning
Authors Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang
Abstract Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training. Recent methods have exploited classification networks to localize objects by selecting regions with strong response. While such response map provides sparse information, however, there exist strong pairwise relations between pixels in natural images, which can be utilized to propagate the sparse map to a much denser one. In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network. The refined results by the pairwise network are then used as supervision to train the unary network, and the procedures are conducted iteratively to obtain better segmentation progressively. To learn reliable pixel affinity without accurate annotation, we also propose to mine confident regions. We show that iteratively training this framework is equivalent to optimizing an energy function with convergence to a local minimum. Experimental results on the PASCAL VOC 2012 and COCO datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
Tasks Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published 2020-02-19
URL https://arxiv.org/abs/2002.08098v1
PDF https://arxiv.org/pdf/2002.08098v1.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-semantic-segmentation-by-2

Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

Title Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems
Authors Junchi Yang, Negar Kiyavash, Niao He
Abstract Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-{\L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.
Published 2020-02-22
URL https://arxiv.org/abs/2002.09621v1
PDF https://arxiv.org/pdf/2002.09621v1.pdf
PWC https://paperswithcode.com/paper/global-convergence-and-variance-reduced

A study of resting-state EEG biomarkers for depression recognition

Title A study of resting-state EEG biomarkers for depression recognition
Authors Shuting Sun, Jianxiu Li, Huayu Chen, Tao Gong, Xiaowei Li, Bin Hu
Abstract Background: Depression has become a major health burden worldwide, and effective detection depression is a great public-health challenge. This Electroencephalography (EEG)-based research is to explore the effective biomarkers for depression recognition. Methods: Resting state EEG data was collected from 24 major depressive patients (MDD) and 29 normal controls using 128 channel HydroCel Geodesic Sensor Net (HCGSN). To better identify depression, we extracted different types of EEG features including linear features, nonlinear features and functional connectivity features phase lagging index (PLI) to comprehensively analyze the EEG signals in patients with MDD. And using different feature selection methods and classifiers to evaluate the optimal feature sets. Results: Functional connectivity feature PLI is superior to the linear features and nonlinear features. And when combining all the types of features to classify MDD patients, we can obtain the highest classification accuracy 82.31% using ReliefF feature selection method and logistic regression (LR) classifier. Analyzing the distribution of optimal feature set, it was found that intrahemispheric connection edges of PLI were much more than the interhemispheric connection edges, and the intrahemispheric connection edges had a significant differences between two groups. Conclusion: Functional connectivity feature PLI plays an important role in depression recognition. Especially, intrahemispheric connection edges of PLI might be an effective biomarker to identify depression. And statistic results suggested that MDD patients might exist functional dysfunction in left hemisphere.
Tasks EEG, Feature Selection
Published 2020-02-23
URL https://arxiv.org/abs/2002.11039v1
PDF https://arxiv.org/pdf/2002.11039v1.pdf
PWC https://paperswithcode.com/paper/a-study-of-resting-state-eeg-biomarkers-for
comments powered by Disqus