February 1, 2020

2846 words 14 mins read

Paper Group AWR 343

Neural ODEs as the Deep Limit of ResNets with constant weights. DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions. Lung nodule segmentation via level set machine learning. A New Lower Bound for Kullback-Leibler Divergence Based on Hammersley-Chapman-Robbins Bound. Does Adam optimizer keep close to the optimal point?. N2D …

Neural ODEs as the Deep Limit of ResNets with constant weights


Title	Neural ODEs as the Deep Limit of ResNets with constant weights
Authors	Benny Avelin, Kaj Nyström
Abstract	In this paper we prove that, in the deep limit, the stochastic gradient descent on a ResNet type deep neural network, where each layer shares the same weight matrix, converges to the stochastic gradient descent for a Neural ODE and that the corresponding value/loss functions converge. Our result gives, in the context of minimization by stochastic gradient descent, a theoretical foundation for considering Neural ODEs as the deep limit of ResNets. Our proof is based on certain decay estimates for associated Fokker-Planck equations.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12183v2
PDF	https://arxiv.org/pdf/1906.12183v2.pdf
PWC	https://paperswithcode.com/paper/neural-odes-as-the-deep-limit-of-resnets-with
Repo	https://github.com/BennyAvelin/DeepLimitNeuralODE
Framework	none

DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions


Title	DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions
Authors	Hang Jiang, Haoshen Hong, Yuxing Chen, Vivek Kulkarni
Abstract	Several computational models have been developed to detect and analyze dialect variation in recent years. Most of these models assume a predefined set of geographical regions over which they detect and analyze dialectal variation. However, dialect variation occurs at multiple levels of geographic resolution ranging from cities within a state, states within a country, and between countries across continents. In this work, we propose a model that enables detection of dialectal variation at multiple levels of geographic resolution obviating the need for a-priori definition of the resolution level. Our method DialectGram, learns dialect-sensitive word embeddings while being agnostic of the geographic resolution. Specifically it only requires one-time training and enables analysis of dialectal variation at a chosen resolution post-hoc – a significant departure from prior models which need to be re-trained whenever the pre-defined set of regions changes. Furthermore, DialectGram explicitly models senses thus enabling one to estimate the proportion of each sense usage in any given region. Finally, we quantitatively evaluate our model against other baselines on a new evaluation dataset DialectSim (in English) and show that DialectGram can effectively model linguistic variation.
Tasks	Word Embeddings
Published	2019-10-04
URL	https://arxiv.org/abs/1910.01818v2
PDF	https://arxiv.org/pdf/1910.01818v2.pdf
PWC	https://paperswithcode.com/paper/dialectgram-automatic-detection-of-dialectal
Repo	https://github.com/yuxingch/DialectGram
Framework	pytorch

Lung nodule segmentation via level set machine learning


Title	Lung nodule segmentation via level set machine learning
Authors	Matthew C Hancock, Jerry F Magnan
Abstract	Lung cancer has the highest mortality rate of all cancers in both men and women. The algorithmic detection, characterization, and diagnosis of abnormalities found in chest CT scan images can potentially aid radiologists by providing additional medical information to consider in their assessment. Lung nodule segmentation, i.e., the algorithmic delineation of the lung nodule surface, is a fundamental component of an automated nodule analysis pipeline. We introduce an extension of the vanilla level set image segmentation method where the velocity function is learned from data via machine learning regression methods, rather than manually designed. This mitigates the tedious design process of the velocity term from the standard method. We apply the method to image volumes of lung nodules from CT scans in the publicly available LIDC dataset, obtaining an average intersection over union score of 0.7185($\pm$0.1114).
Tasks	Lung Nodule Segmentation, Semantic Segmentation
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03191v1
PDF	https://arxiv.org/pdf/1910.03191v1.pdf
PWC	https://paperswithcode.com/paper/lung-nodule-segmentation-via-level-set
Repo	https://github.com/notmatthancock/level-set-machine-learning
Framework	none

A New Lower Bound for Kullback-Leibler Divergence Based on Hammersley-Chapman-Robbins Bound


Title	A New Lower Bound for Kullback-Leibler Divergence Based on Hammersley-Chapman-Robbins Bound
Authors	Tomohiro Nishiyama
Abstract	In this paper, we derive a useful lower bound for the Kullback-Leibler divergence (KL-divergence) based on the Hammersley-Chapman-Robbins bound (HCRB). The HCRB states that the variance of an estimator is bounded from below by the Chi-square divergence and the expectation value of the estimator. By using the relation between the KL-divergence and the Chi-square divergence, we show that the lower bound for the KL-divergence which only depends on the expectation value and the variance of a function we choose. This lower bound can also be derived from an information geometric approach. Furthermore, we show that the equality holds for the Bernoulli distributions and show that the inequality converges to the Cram'{e}r-Rao bound when two distributions are very close. We also describe application examples and examples of numerical calculation.
Tasks
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00288v3
PDF	https://arxiv.org/pdf/1907.00288v3.pdf
PWC	https://paperswithcode.com/paper/a-new-lower-bound-for-kullback-leibler
Repo	https://github.com/nissy220/KL_divergence
Framework	none

Does Adam optimizer keep close to the optimal point?


Title	Does Adam optimizer keep close to the optimal point?
Authors	Kiwook Bae, Heechang Ryu, Hayong Shin
Abstract	The adaptive optimizer for training neural networks has continually evolved to overcome the limitations of the previously proposed adaptive methods. Recent studies have found the rare counterexamples that Adam cannot converge to the optimal point. Those counterexamples reveal the distortion of Adam due to a small second momentum from a small gradient. Unlike previous studies, we show Adam cannot keep closer to the optimal point for not only the counterexamples but also a general convex region when the effective learning rate exceeds the certain bound. Subsequently, we propose an algorithm that overcomes Adam’s limitation and ensures that it can reach and stay at the optimal point region.
Tasks
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00289v1
PDF	https://arxiv.org/pdf/1911.00289v1.pdf
PWC	https://paperswithcode.com/paper/does-adam-optimizer-keep-close-to-the-optimal
Repo	https://github.com/lessw2020/Best-Deep-Learning-Optimizers
Framework	pytorch

N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding


Title	N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding
Authors	Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, Ian Craddock
Abstract	Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network. Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold. For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network. We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is best able to find the most clusterable manifold in the embedding, suggesting local manifold learning on an autoencoded embedding is effective for discovering higher quality discovering clusters. We quantitatively show across a range of image and time-series datasets that our method has competitive performance against the latest deep clustering algorithms, including out-performing current state-of-the-art on several. We postulate that these results show a promising research direction for deep clustering. The code can be found at https://github.com/rymc/n2d
Tasks	Image Clustering, Representation Learning, Time Series, Time Series Clustering
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05968v5
PDF	https://arxiv.org/pdf/1908.05968v5.pdf
PWC	https://paperswithcode.com/paper/n2dnot-too-deep-clustering-via-clustering-the
Repo	https://github.com/josephsdavid/N2D
Framework	tf

Target-Guided Open-Domain Conversation


Title	Target-Guided Open-Domain Conversation
Authors	Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric P. Xing, Zhiting Hu
Abstract	Many real-world open-domain conversation applications have specific goals to achieve during open-ended chats, such as recommendation, psychotherapy, education, etc. We study the problem of imposing conversational goals on open-domain chat agents. In particular, we want a conversational system to chat naturally with human and proactively guide the conversation to a designated target subject. The problem is challenging as no public data is available for learning such a target-guided strategy. We propose a structured approach that introduces coarse-grained keywords to control the intended content of system responses. We then attain smooth conversation transition through turn-level supervised learning, and drive the conversation towards the target with discourse-level constraints. We further derive a keyword-augmented conversation dataset for the study. Quantitative and human evaluations show our system can produce meaningful and effective conversations, significantly improving over other approaches.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11553v2
PDF	https://arxiv.org/pdf/1905.11553v2.pdf
PWC	https://paperswithcode.com/paper/target-guided-open-domain-conversation
Repo	https://github.com/MashiMaroLjc/KnowYouAI
Framework	tf

Monocular Pedestrian Orientation Estimation Based on Deep 2D-3D Feedforward


Title	Monocular Pedestrian Orientation Estimation Based on Deep 2D-3D Feedforward
Authors	Chenchen Zhao, Yeqiang Qian, Ming Yang
Abstract	Accurate pedestrian orientation estimation of autonomous driving helps the ego vehicle obtain the intentions of pedestrians in the related environment, which are the base of safety measures such as collision avoidance and prewarning. However, because of relatively small sizes and high-level deformation of pedestrians, common pedestrian orientation estimation models fail to extract sufficient and comprehensive information from them, thus having their performance restricted, especially monocular ones which fail to obtain depth information of objects and related environment. In this paper, a novel monocular pedestrian orientation estimation model, called FFNet, is proposed. Apart from camera captures, the model adds the 2D and 3D dimensions of pedestrians as two other inputs according to the logic relationship between orientation and them. The 2D and 3D dimensions of pedestrians are determined from the camera captures and further utilized through two feedforward links connected to the orientation estimator. The feedforward links strengthen the logicality and interpretability of the network structure of the proposed model. Experiments show that the proposed model has at least 1.72% AOS increase than most state-of-the-art models after identical training processes. The model also has competitive results in orientation estimation evaluation on KITTI dataset.
Tasks	Autonomous Driving
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10970v2
PDF	https://arxiv.org/pdf/1909.10970v2.pdf
PWC	https://paperswithcode.com/paper/monocular-pedestrian-orientation-estimation
Repo	https://github.com/zcc31415926/FFNet
Framework	tf

Object Counting and Instance Segmentation with Image-level Supervision


Title	Object Counting and Instance Segmentation with Image-level Supervision
Authors	Hisham Cholakkal, Guolei Sun, Fahad Shahbaz Khan, Ling Shao
Abstract	Common object counting in a natural scene is a challenging problem in computer vision with numerous real-world applications. Existing image-level supervised common object counting approaches only predict the global object count and rely on additional instance-level supervision to also determine object locations. We propose an image-level supervised approach that provides both the global object count and the spatial distribution of object instances by constructing an object category density map. Motivated by psychological studies, we further reduce image-level supervision using a limited object count information (up to four). To the best of our knowledge, we are the first to propose image-level supervised density map estimation for common object counting and demonstrate its effectiveness in image-level supervised instance segmentation. Comprehensive experiments are performed on the PASCAL VOC and COCO datasets. Our approach outperforms existing methods, including those using instance-level supervision, on both datasets for common object counting. Moreover, our approach improves state-of-the-art image-level supervised instance segmentation with a relative gain of 17.8% in terms of average best overlap, on the PASCAL VOC 2012 dataset. Code link: https://github.com/GuoleiSun/CountSeg
Tasks	Instance Segmentation, Object Counting, Semantic Segmentation
Published	2019-03-06
URL	https://arxiv.org/abs/1903.02494v2
PDF	https://arxiv.org/pdf/1903.02494v2.pdf
PWC	https://paperswithcode.com/paper/object-counting-and-instance-segmentation
Repo	https://github.com/GuoleiSun/CountSeg
Framework	pytorch

Decentralized Deep Learning with Arbitrary Communication Compression


Title	Decentralized Deep Learning with Arbitrary Communication Compression
Authors	Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi
Abstract	Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters. As current approaches suffer from limited bandwidth of the network, we propose the use of communication compression in the decentralized training context. We show that Choco-SGD $-$ recently introduced and analyzed for strongly-convex objectives only $-$ converges under arbitrary high compression ratio on general non-convex functions at the rate $O\bigl(1/\sqrt{nT}\bigr)$ where $T$ denotes the number of iterations and $n$ the number of workers. The algorithm achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods. We demonstrate the practical performance of the algorithm in two key scenarios: the training of deep learning models (i) over distributed user devices, connected by a social network and (ii) in a datacenter (outperforming all-reduce time-wise).
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09356v2
PDF	https://arxiv.org/pdf/1907.09356v2.pdf
PWC	https://paperswithcode.com/paper/decentralized-deep-learning-with-arbitrary
Repo	https://github.com/epfml/ChocoSGD
Framework	pytorch

Domain Adaptation for sEMG-based Gesture Recognition with Recurrent Neural Networks


Title	Domain Adaptation for sEMG-based Gesture Recognition with Recurrent Neural Networks
Authors	István Ketykó, Ferenc Kovács, Krisztián Zsolt Varga
Abstract	Surface Electromyography (sEMG/EMG) is to record muscles’ electrical activity from a restricted area of the skin by using electrodes. The sEMG-based gesture recognition is extremely sensitive of inter-session and inter-subject variances. We propose a model and a deep-learning-based domain adaptation method to approximate the domain shift for recognition accuracy enhancement. Analysis performed on sparse and HighDensity (HD) sEMG public datasets validate that our approach outperforms state-of-the-art methods.
Tasks	Domain Adaptation, Gesture Recognition
Published	2019-01-21
URL	https://arxiv.org/abs/1901.06958v2
PDF	https://arxiv.org/pdf/1901.06958v2.pdf
PWC	https://paperswithcode.com/paper/domain-adaptation-for-semg-based-gesture
Repo	https://github.com/ketyi/2SRNN
Framework	tf

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning


Title	Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
Authors	Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson
Abstract	The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We prove that our proposed learning rate schedule provides faster convergence to samples from a stationary distribution than SG-MCMC with standard decaying schedules. Moreover, we provide extensive experimental results to demonstrate the effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.
Tasks	Bayesian Inference, Stochastic Optimization
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03932v1
PDF	http://arxiv.org/pdf/1902.03932v1.pdf
PWC	https://paperswithcode.com/paper/cyclical-stochastic-gradient-mcmc-for
Repo	https://github.com/ruqizhang/csgmcmc
Framework	pytorch

Data-driven Estimation of Sinusoid Frequencies


Title	Data-driven Estimation of Sinusoid Frequencies
Authors	Gautier Izacard, Sreyas Mohan, Carlos Fernandez-Granda
Abstract	Frequency estimation is a fundamental problem in signal processing, with applications in radar imaging, underwater acoustics, seismic imaging, and spectroscopy. The goal is to estimate the frequency of each component in a multisinusoidal signal from a finite number of noisy samples. A recent machine-learning approach uses a neural network to output a learned representation with local maxima at the position of the frequency estimates. In this work, we propose a novel neural-network architecture that produces a significantly more accurate representation, and combine it with an additional neural-network module trained to detect the number of frequencies. This yields a fast, fully-automatic method for frequency estimation that achieves state-of-the-art results. In particular, it outperforms existing techniques by a substantial margin at medium-to-high noise levels.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00823v2
PDF	https://arxiv.org/pdf/1906.00823v2.pdf
PWC	https://paperswithcode.com/paper/190600823
Repo	https://github.com/sreyas-mohan/DeepFreq
Framework	pytorch

Fast Efficient Hyperparameter Tuning for Policy Gradients


Title	Fast Efficient Hyperparameter Tuning for Policy Gradients
Authors	Supratik Paul, Vitaly Kurin, Shimon Whiteson
Abstract	The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training that learn optimal schedules for hyperparameters instead of fixed settings can yield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter Optimisation on the Fly (HOOF), a gradient-free algorithm that requires no more than one training run to automatically adapt the hyperparameter that affect the policy update directly through the gradient. The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement. Our experimental results across multiple domains and algorithms show that using HOOF to learn these hyperparameter schedules leads to faster learning with improved performance.
Tasks	Meta-Learning, Policy Gradient Methods
Published	2019-02-18
URL	https://arxiv.org/abs/1902.06583v2
PDF	https://arxiv.org/pdf/1902.06583v2.pdf
PWC	https://paperswithcode.com/paper/fast-efficient-hyperparameter-tuning-for
Repo	https://github.com/supratikp/HOOF
Framework	none

Deep Learning for Cardiologist-level Myocardial Infarction Detection in Electrocardiograms


Title	Deep Learning for Cardiologist-level Myocardial Infarction Detection in Electrocardiograms
Authors	Arjun Gupta, E. A. Huerta, Zhizhen Zhao, Issam Moussa
Abstract	Heart disease is the leading cause of death worldwide. Amongst patients with cardiovascular diseases, myocardial infarction is the main cause of death. In order to provide adequate healthcare support to patients who may experience this clinical event, it is essential to gather supportive evidence in a timely manner to help secure a correct diagnosis. In this article, we study the feasibility of using deep learning to identify suggestive electrocardiographic (ECG) changes that may correctly classify heart conditions using the Physikalisch-Technische Bundesanstalt (PTB) database. As part of this study, we systematically quantify the contribution of each ECG lead to correctly tell apart a healthy from an unhealthy heart. For such a study we fine-tune the ConvNetQuake neural network model, which was originally designed to identify earthquakes. Our findings indicate that out of 15 ECG leads, data from the v6 and vz leads are critical to correctly identify myocardial infarction. Based on these findings, we modify ConvNetQuake to simultaneously take in raw ECG data from leads v6 and vz, achieving $99.43%$ classification accuracy, which represents cardiologist-level performance level for myocardial infarction detection after feeding only 10 seconds of raw ECG data to our neural network model. This approach differs from others in the community in that the ECG data fed into the neural network model does not require any kind of manual feature extraction or pre-processing.
Tasks	Myocardial infarction detection
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07618v1
PDF	https://arxiv.org/pdf/1912.07618v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-cardiologist-level
Repo	https://github.com/arjung128/mi_detection
Framework	pytorch