April 3, 2020

3148 words 15 mins read

Paper Group ANR 47

Paper Group ANR 47

Piecewise linear activations substantially shape the loss surfaces of neural networks. Mixed integer programming formulation of unsupervised learning. Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces. Stochasticity in Neural ODEs: An Empirical Study. FiniteNet: A Fully Convolutional LSTM Network Architecture for Time-Depende …

Piecewise linear activations substantially shape the loss surfaces of neural networks

Title Piecewise linear activations substantially shape the loss surfaces of neural networks
Authors Fengxiang He, Bohan Wang, Dacheng Tao
Abstract Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that {\it the loss surfaces of many neural networks have infinite spurious local minima} which are defined as the local minima with higher empirical risks than the global minima. Our result demonstrates that the networks with piecewise linear activations possess substantial differences to the well-studied linear neural networks. This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice. Essentially, the underlying assumptions are consistent with most practical circumstances where the output layer is narrower than any hidden layer. In addition, the loss surface of a neural network with piecewise linear activations is partitioned into multiple smooth and multilinear cells by nondifferentiable boundaries. The constructed spurious local minima are concentrated in one cell as a valley: they are connected with each other by a continuous path, on which empirical risk is invariant. Further for one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
Published 2020-03-27
URL https://arxiv.org/abs/2003.12236v1
PDF https://arxiv.org/pdf/2003.12236v1.pdf
PWC https://paperswithcode.com/paper/piecewise-linear-activations-substantially

Mixed integer programming formulation of unsupervised learning

Title Mixed integer programming formulation of unsupervised learning
Authors Arturo Berrones-Santos
Abstract A novel formulation and training procedure for full Boltzmann machines in terms of a mixed binary quadratic feasibility problem is given. As a proof of concept, the theory is analytically and numerically tested on XOR patterns.
Published 2020-01-20
URL https://arxiv.org/abs/2001.07278v1
PDF https://arxiv.org/pdf/2001.07278v1.pdf
PWC https://paperswithcode.com/paper/mixed-integer-programming-formulation-of

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Title Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces
Authors Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare
Abstract Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitation-exploration trade-off. We show that ZoomRL achieves a worst-case regret $\tilde{O}(H^{\frac{5}{2}} K^{\frac{d+1}{d+2}})$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space with respect to the metric. Moreover, our algorithm enjoys improved metric-dependent guarantees that reflect the geometry of the underlying space. Finally, we show that our algorithm is robust to small misspecification errors.
Published 2020-03-09
URL https://arxiv.org/abs/2003.04069v1
PDF https://arxiv.org/pdf/2003.04069v1.pdf
PWC https://paperswithcode.com/paper/zooming-for-efficient-model-free

Stochasticity in Neural ODEs: An Empirical Study

Title Stochasticity in Neural ODEs: An Empirical Study
Authors Viktor Oganesyan, Alexandra Volokhova, Dmitry Vetrov
Abstract Stochastic regularization of neural networks (e.g. dropout) is a wide-spread technique in deep learning that allows for better generalization. Despite its success, continuous-time models, such as neural ordinary differential equation (ODE), usually rely on a completely deterministic feed-forward operation. This work provides an empirical study of stochastically regularized neural ODE on several image-classification tasks (CIFAR-10, CIFAR-100, TinyImageNet). Building upon the formalism of stochastic differential equations (SDEs), we demonstrate that neural SDE is able to outperform its deterministic counterpart. Further, we show that data augmentation during the training improves the performance of both deterministic and stochastic versions of the same model. However, the improvements obtained by the data augmentation completely eliminate the empirical gains of the stochastic regularization, making the difference in the performance of neural ODE and neural SDE negligible.
Tasks Data Augmentation, Image Classification
Published 2020-02-22
URL https://arxiv.org/abs/2002.09779v1
PDF https://arxiv.org/pdf/2002.09779v1.pdf
PWC https://paperswithcode.com/paper/stochasticity-in-neural-odes-an-empirical

FiniteNet: A Fully Convolutional LSTM Network Architecture for Time-Dependent Partial Differential Equations

Title FiniteNet: A Fully Convolutional LSTM Network Architecture for Time-Dependent Partial Differential Equations
Authors Ben Stevens, Tim Colonius
Abstract In this work, we present a machine learning approach for reducing the error when numerically solving time-dependent partial differential equations (PDE). We use a fully convolutional LSTM network to exploit the spatiotemporal dynamics of PDEs. The neural network serves to enhance finite-difference and finite-volume methods (FDM/FVM) that are commonly used to solve PDEs, allowing us to maintain guarantees on the order of convergence of our method. We train the network on simulation data, and show that our network can reduce error by a factor of 2 to 3 compared to the baseline algorithms. We demonstrate our method on three PDEs that each feature qualitatively different dynamics. We look at the linear advection equation, which propagates its initial conditions at a constant speed, the inviscid Burgers’ equation, which develops shockwaves, and the Kuramoto-Sivashinsky (KS) equation, which is chaotic.
Published 2020-02-07
URL https://arxiv.org/abs/2002.03014v1
PDF https://arxiv.org/pdf/2002.03014v1.pdf
PWC https://paperswithcode.com/paper/finitenet-a-fully-convolutional-lstm-network

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

Title DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team
Authors Qingjian Lin, Weicheng Cai, Lin Yang, Junjie Wang, Jun Zhang, Ming Li
Abstract In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.
Tasks Action Detection, Activity Detection
Published 2020-02-23
URL https://arxiv.org/abs/2002.12761v1
PDF https://arxiv.org/pdf/2002.12761v1.pdf
PWC https://paperswithcode.com/paper/dihard-ii-is-still-hard-experimental-results

Reproducing Kernel Hilbert Spaces Cannot Contain all Continuous Functions on a Compact Metric Space

Title Reproducing Kernel Hilbert Spaces Cannot Contain all Continuous Functions on a Compact Metric Space
Authors Ingo Steinwart
Abstract Given an uncountable, compact metric space, we show that there exists no reproducing kernel Hilbert space that contains the space of all continuous functions on this compact space.
Published 2020-02-08
URL https://arxiv.org/abs/2002.03171v2
PDF https://arxiv.org/pdf/2002.03171v2.pdf
PWC https://paperswithcode.com/paper/reproducing-kernel-hilbert-spaces-cannot

Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning

Title Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning
Authors Haozhao Wang, Zhihao Qu, Song Guo, Xin Gao, Ruixuan Li, Baoliu Ye
Abstract Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of Federated Learning, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed. Specifically, each training node intermittently pulls the global model from the server in SGD iterations, resulting in that it is sometimes unsynchronized with the server. In such a case, it will use its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis of PRLC achieves two important findings. First, we prove that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Second, we show that PRLC admits lower pulling frequency than the existing pulling reduction method without local compensation. We also conduct extensive experiments on various machine learning models to validate our theoretical results. Experimental results show that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., PRLC requiring only half of the pulling operations of LAG.
Published 2020-01-22
URL https://arxiv.org/abs/2001.08277v1
PDF https://arxiv.org/pdf/2001.08277v1.pdf
PWC https://paperswithcode.com/paper/intermittent-pulling-with-local-compensation

Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems

Title Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems
Authors Vevake Balaraman, Bernardo Magnini
Abstract In task-oriented dialogue systems the dialogue state tracker (DST) component is responsible for predicting the state of the dialogue based on the dialogue history. Current DST approaches rely on a predefined domain ontology, a fact that limits their effective usage for large scale conversational agents, where the DST constantly needs to be interfaced with ever-increasing services and APIs. Focused towards overcoming this drawback, we propose a domain-aware dialogue state tracker, that is completely data-driven and it is modeled to predict for dynamic service schemas. The proposed model utilizes domain and slot information to extract both domain and slot specific representations for a given dialogue, and then uses such representations to predict the values of the corresponding slot. Integrating this mechanism with a pretrained language model (i.e. BERT), our approach can effectively learn semantic relations.
Tasks Language Modelling, Task-Oriented Dialogue Systems
Published 2020-01-21
URL https://arxiv.org/abs/2001.07526v1
PDF https://arxiv.org/pdf/2001.07526v1.pdf
PWC https://paperswithcode.com/paper/domain-aware-dialogue-state-tracker-for-multi

A Survey on Contextual Embeddings

Title A Survey on Contextual Embeddings
Authors Qi Liu, Matt J. Kusner, Phil Blunsom
Abstract Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
Tasks Model Compression
Published 2020-03-16
URL https://arxiv.org/abs/2003.07278v1
PDF https://arxiv.org/pdf/2003.07278v1.pdf
PWC https://paperswithcode.com/paper/a-survey-on-contextual-embeddings

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

Title Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders
Authors Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao
Abstract Deep learning-based models have greatly advanced the performance of speech enhancement (SE) systems. However, two problems remain unsolved, which are closely related to model generalizability to noisy conditions: (1) mismatched noisy condition during testing, i.e., the performance is generally sub-optimal when models are tested with unseen noise types that are not involved in the training data; (2) local focus on specific noisy conditions, i.e., models trained using multiple types of noises cannot optimally remove a specific noise type even though the noise type has been involved in the training data. These problems are common in real applications. In this paper, we propose a novel denoising autoencoder with a multi-branched encoder (termed DAEME) model to deal with these two problems. In the DAEME model, two stages are involved: offline and online. In the offline stage, we build multiple component models to form a multi-branched encoder based on a dynamically-sized decision tree(DSDT). The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT. Finally, a decoder is trained on top of the multi-branched encoder. In the online stage, noisy speech is first processed by the tree and fed to each component model. The multiple outputs from these models are then integrated into the decoder to determine the final enhanced speech. Experimental results show that DAEME is superior to several baseline models in terms of objective evaluation metrics and the quality of subjective human listening tests.
Tasks Denoising, Speech Enhancement
Published 2020-01-06
URL https://arxiv.org/abs/2001.01538v1
PDF https://arxiv.org/pdf/2001.01538v1.pdf
PWC https://paperswithcode.com/paper/speech-enhancement-based-on-denoising

CHAOS Challenge – Combined (CT-MR) Healthy Abdominal Organ Segmentation

Title CHAOS Challenge – Combined (CT-MR) Healthy Abdominal Organ Segmentation
Authors A. Emre Kavur, N. Sinem Gezer, Mustafa Barış, Pierre-Henri Conze, Vladimir Groza, Duc Duy Pham, Soumick Chatterjee, Philipp Ernst, Savaş Özkan, Bora Baydar, Dmitry Lachinov, Shuo Han, Josef Pauli, Fabian Isensee, Matthias Perkonigg, Rachana Sathish, Ronnie Rajan, Sinem Aslan, Debdoot Sheet, Gurbandurdy Dovletov, Oliver Speck, Andreas Nürnberger, Klaus H. Maier-Hein, Gözde Bozdağı Akar, Gözde Ünal, Oğuz Dicle, M. Alper Selver
Abstract Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) have introduced new state-of-the-art segmentation systems. Despite outperforming the overall accuracy of existing systems, the effects of DL model properties and parameters on the performance is hard to interpret. This makes comparative analysis a necessary tool to achieve explainable studies and systems. Moreover, the performance of DL for emerging learning approaches such as cross-modality and multi-modal tasks have been rarely discussed. In order to expand the knowledge in these topics, CHAOS – Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge has been organized in the IEEE International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy. Despite a large number of the previous abdomen related challenges, the majority of which are focused on tumor/lesion detection and/or classification with a single modality, CHAOS provides both abdominal CT and MR data from healthy subjects. Five different and complementary tasks have been designed to analyze the capabilities of the current approaches from multiple perspectives. The results are investigated thoroughly, compared with manual annotations and interactive methods. The outcomes are reported in detail to reflect the latest advancements in the field. CHAOS challenge and data will be available online to provide a continuous benchmark resource for segmentation.
Published 2020-01-17
URL https://arxiv.org/abs/2001.06535v1
PDF https://arxiv.org/pdf/2001.06535v1.pdf
PWC https://paperswithcode.com/paper/chaos-challenge-combined-ct-mr-healthy

An efficient constraint based framework forhandling floating point SMT problems

Title An efficient constraint based framework forhandling floating point SMT problems
Authors Heytem Zitoun, Claude Michel, Laurent Michel, Michel Rueher
Abstract This paper introduces the 2019 version of \us{}, a novel Constraint Programming framework for floating point verification problems expressed with the SMT language of SMTLIB. SMT solvers decompose their task by delegating to specific theories (e.g., floating point, bit vectors, arrays, …) the task to reason about combinatorial or otherwise complex constraints for which the SAT encoding would be cumbersome or ineffective. This decomposition and encoding processes lead to the obfuscation of the high-level constraints and a loss of information on the structure of the combinatorial model. In \us{}, constraints over the floats are first class objects, and the purpose is to expose and exploit structures of floating point domains to enhance the search process. A symbolic phase rewrites each SMTLIB instance to elementary constraints, and eliminates auxiliary variables whose presence is counterproductive. A diversification technique within the search steers it away from costly enumerations in unproductive areas of the search space. The empirical evaluation demonstrates that the 2019 version of \us{} is competitive on computationally challenging floating point benchmarks that induce significant search efforts even for other CP solvers. It highlights that the ability to harness both inference and search is critical. Indeed, it yields a factor 3 improvement over Colibri and is up to 10 times faster than SMT solvers. The evaluation was conducted over 214 benchmarks (The Griggio suite) which is a standard within SMTLIB.
Published 2020-02-27
URL https://arxiv.org/abs/2002.12441v1
PDF https://arxiv.org/pdf/2002.12441v1.pdf
PWC https://paperswithcode.com/paper/an-efficient-constraint-based-framework

Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios

Title Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios
Authors Adam Dziedzic, Vanlin Sathya, Muhammad Iqbal Rochman, Monisha Ghosh, Sanjay Krishnan
Abstract The application of Machine Learning (ML) techniques to complex engineering problems has proved to be an attractive and efficient solution. ML has been successfully applied to several practical tasks like image recognition, automating industrial operations, etc. The promise of ML techniques in solving non-linear problems influenced this work which aims to apply known ML techniques and develop new ones for wireless spectrum sharing between Wi-Fi and LTE in the unlicensed spectrum. In this work, we focus on the LTE-Unlicensed (LTE-U) specification developed by the LTE-U Forum, which uses the duty-cycle approach for fair coexistence. The specification suggests reducing the duty cycle at the LTE-U base-station (BS) when the number of co-channel Wi-Fi basic service sets (BSSs) increases from one to two or more. However, without decoding the Wi-Fi packets, detecting the number of Wi-Fi BSSs operating on the channel in real-time is a challenging problem. In this work, we demonstrate a novel ML-based approach which solves this problem by using energy values observed during the LTE-U OFF duration. It is relatively straightforward to observe only the energy values during the LTE-U BS OFF time compared to decoding the entire Wi-Fi packet, which would require a full Wi-Fi receiver at the LTE-U base-station. We implement and validate the proposed ML-based approach by real-time experiments and demonstrate that there exist distinct patterns between the energy distributions between one and many Wi-Fi AP transmissions. The proposed ML-based approach results in a higher accuracy (close to 99% in all cases) as compared to the existing auto-correlation (AC) and energy detection (ED) approaches.
Published 2020-03-18
URL https://arxiv.org/abs/2003.13652v1
PDF https://arxiv.org/pdf/2003.13652v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-enabled-spectrum-sharing-in

Probabilistic Dual Network Architecture Search on Graphs

Title Probabilistic Dual Network Architecture Search on Graphs
Authors Yiren Zhao, Duo Wang, Xitong Gao, Robert Mullins, Pietro Lio, Mateja Jamnik
Abstract We present the first differentiable Network Architecture Search (NAS) for Graph Neural Networks (GNNs). GNNs show promising performance on a wide range of tasks, but require a large amount of architecture engineering. First, graphs are inherently a non-Euclidean and sophisticated data structure, leading to poor adaptivity of GNN architectures across different datasets. Second, a typical graph block contains numerous different components, such as aggregation and attention, generating a large combinatorial search space. To counter these problems, we propose a Probabilistic Dual Network Architecture Search (PDNAS) framework for GNNs. PDNAS not only optimises the operations within a single graph block (micro-architecture), but also considers how these blocks should be connected to each other (macro-architecture). The dual architecture (micro- and marco-architectures) optimisation allows PDNAS to find deeper GNNs on diverse datasets with better performance compared to other graph NAS methods. Moreover, we use a fully gradient-based search approach to update architectural parameters, making it the first differentiable graph NAS method. PDNAS outperforms existing hand-designed GNNs and NAS results, for example, on the PPI dataset, PDNAS beats its best competitors by 1.67 and 0.17 in F1 scores.
Published 2020-03-21
URL https://arxiv.org/abs/2003.09676v1
PDF https://arxiv.org/pdf/2003.09676v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-dual-network-architecture
comments powered by Disqus