July 29, 2019

3084 words 15 mins read

Paper Group AWR 198

Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting. A Morphology-aware Network for Morphological Disambiguation. An Empirical Analysis of Feature Engineering for Predictive Modeling. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. CRF Autoencoder …

Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting


Title	Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting
Authors	Wei Zhang, Lei Han, Juanzhen Sun, Hanyang Guo, Jie Dai
Abstract	Convective storm nowcasting has attracted substantial attention in various fields. Existing methods under a deep learning framework rely primarily on radar data. Although they perform nowcast storm advection well, it is still challenging to nowcast storm initiation and growth, due to the limitations of the radar observations. This paper describes the first attempt to nowcast storm initiation, growth, and advection simultaneously under a deep learning framework using multi-source meteorological data. To this end, we present a multi-channel 3D-cube successive convolution network (3D-SCN). As real-time re-analysis meteorological data can now provide valuable atmospheric boundary layer thermal dynamic information, which is essential to predict storm initiation and growth, both raw 3D radar and re-analysis data are used directly without any handcraft feature engineering. These data are formulated as multi-channel 3D cubes, to be fed into our network, which are convolved by cross-channel 3D convolutions. By stacking successive convolutional layers without pooling, we build an end-to-end trainable model for nowcasting. Experimental results show that deep learning methods achieve better performance than traditional extrapolation methods. The qualitative analyses of 3D-SCN show encouraging results of nowcasting of storm initiation, growth, and advection.
Tasks	Feature Engineering
Published	2017-02-15
URL	https://arxiv.org/abs/1702.04517v5
PDF	https://arxiv.org/pdf/1702.04517v5.pdf
PWC	https://paperswithcode.com/paper/application-of-multi-channel-3d-cube
Repo	https://github.com/DandelionX/research-references
Framework	none

A Morphology-aware Network for Morphological Disambiguation


Title	A Morphology-aware Network for Morphological Disambiguation
Authors	Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, Ozan Sonmez
Abstract	Agglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological analysis of a word. Morphological disambiguation is important because it generally is one of the first steps of natural language processing and its performance affects subsequent analyses. In this paper, we propose a system that uses deep learning techniques for morphological disambiguation. Many of the state-of-the-art results in computer vision, speech recognition and natural language processing have been obtained through deep learning models. However, applying deep learning techniques to morphologically rich languages is not well studied. In this work, while we focus on Turkish morphological disambiguation we also present results for French and German in order to show that the proposed architecture achieves high accuracy with no language-specific feature engineering or additional resource. In the experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation accuracy among the ambiguous words for Turkish, German and French respectively.
Tasks	Feature Engineering, Morphological Analysis, Speech Recognition
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03654v1
PDF	http://arxiv.org/pdf/1702.03654v1.pdf
PWC	https://paperswithcode.com/paper/a-morphology-aware-network-for-morphological
Repo	https://github.com/hbahadirsahin/text_categorization
Framework	tf

An Empirical Analysis of Feature Engineering for Predictive Modeling


Title	An Empirical Analysis of Feature Engineering for Predictive Modeling
Authors	Jeff Heaton
Abstract	Machine learning models, such as neural networks, decision trees, random forests and gradient boosting machines accept a feature vector and provide a prediction. These models learn in a supervised fashion where a set of feature vectors with expected output is provided. It is very common practice to engineer new features from the provided feature set. Such engineered features will either augment, or replace portions of the existing feature vector. These engineered features are essentially calculated fields, based on the values of the other features. Engineering such features is primarily a manual, time-consuming task. Additionally, each type of model will respond differently to different types of engineered features. This paper reports on empirical research to demonstrate what types of engineered features are best suited to which machine learning model type. This is accomplished by generating several datasets that are designed to benefit from a particular type of engineered feature. The experiment demonstrates to what degree the machine learning model is capable of synthesizing the needed feature on its own. If a model is capable of synthesizing an engineered feature, it is not necessary to provide that feature. The research demonstrated that the studied models do indeed perform differently with various types of engineered features.
Tasks	Feature Engineering
Published	2017-01-26
URL	http://arxiv.org/abs/1701.07852v1
PDF	http://arxiv.org/pdf/1701.07852v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-analysis-of-feature-engineering
Repo	https://github.com/jeffheaton/papers
Framework	tf

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition


Title	Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition
Authors	Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi
Abstract	Background. Previous state-of-the-art systems on Drug Name Recognition (DNR) and Clinical Concept Extraction (CCE) have focused on a combination of text “feature engineering” and conventional machine learning algorithms such as conditional random fields and support vector machines. However, developing good features is inherently heavily time-consuming. Conversely, more modern machine learning approaches such as recurrent neural networks (RNNs) have proved capable of automatically learning effective features from either random assignments or automated word “embeddings”. Objectives. (i) To create a highly accurate DNR and CCE system that avoids conventional, time-consuming feature engineering. (ii) To create richer, more specialized word embeddings by using health domain datasets such as MIMIC-III. (iii) To evaluate our systems over three contemporary datasets. Methods. Two deep learning methods, namely the Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model is set as the baseline to compare the deep learning systems to a traditional machine learning approach. The same features are used for all the models. Results. We have obtained the best results with the Bidirectional LSTM-CRF model, which has outperformed all previously proposed systems. The specialized embeddings have helped to cover unusual words in DDI-DrugBank and DDI-MedLine, but not in the 2010 i2b2/VA IRB Revision dataset. Conclusion. We present a state-of-the-art system for DNR and CCE. Automated word embeddings has allowed us to avoid costly feature engineering and achieve higher accuracy. Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary.
Tasks	Clinical Concept Extraction, Feature Engineering, Named Entity Recognition, Word Embeddings
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09569v2
PDF	http://arxiv.org/pdf/1706.09569v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-with-specialized
Repo	https://github.com/ijauregiCMCRC/healthNER
Framework	none

CRF Autoencoder for Unsupervised Dependency Parsing


Title	CRF Autoencoder for Unsupervised Dependency Parsing
Authors	Jiong Cai, Yong Jiang, Kewei Tu
Abstract	Unsupervised dependency parsing, which tries to discover linguistic dependency structures from unannotated data, is a very challenging task. Almost all previous work on this task focuses on learning generative models. In this paper, we develop an unsupervised dependency parsing model based on the CRF autoencoder. The encoder part of our model is discriminative and globally normalized which allows us to use rich features as well as universal linguistic priors. We propose an exact algorithm for parsing as well as a tractable learning algorithm. We evaluated the performance of our model on eight multilingual treebanks and found that our model achieved comparable performance with state-of-the-art approaches.
Tasks	Dependency Grammar Induction
Published	2017-08-03
URL	http://arxiv.org/abs/1708.01018v1
PDF	http://arxiv.org/pdf/1708.01018v1.pdf
PWC	https://paperswithcode.com/paper/crf-autoencoder-for-unsupervised-dependency
Repo	https://github.com/caijiong/CRFAE-Dep-Parser
Framework	none

Embedding Words as Distributions with a Bayesian Skip-gram Model


Title	Embedding Words as Distributions with a Bayesian Skip-gram Model
Authors	Arthur Bražinskas, Serhii Havrylov, Ivan Titov
Abstract	We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential ‘meanings’. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.
Tasks
Published	2017-11-29
URL	http://arxiv.org/abs/1711.11027v2
PDF	http://arxiv.org/pdf/1711.11027v2.pdf
PWC	https://paperswithcode.com/paper/embedding-words-as-distributions-with-a
Repo	https://github.com/ixlan/BSG
Framework	none

Backprop as Functor: A compositional perspective on supervised learning


Title	Backprop as Functor: A compositional perspective on supervised learning
Authors	Brendan Fong, David I. Spivak, Rémy Tuyéras
Abstract	A supervised learning algorithm searches over a set of functions $A \to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\colon A \to B$. It does this by taking examples $(a,f(a)) \in A\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent—with respect to a fixed step size and an error function satisfying a certain property—defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks.
Tasks
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10455v3
PDF	http://arxiv.org/pdf/1711.10455v3.pdf
PWC	https://paperswithcode.com/paper/backprop-as-functor-a-compositional
Repo	https://github.com/TomohikoK/backprop-as-functor
Framework	none

Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions


Title	Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions
Authors	Hongqiao Wang, Jinglai Li
Abstract	We consider Bayesian inference problems with computationally intensive likelihood functions. We propose a Gaussian process (GP) based method to approximate the joint distribution of the unknown parameters and the data. In particular, we write the joint density approximately as a product of an approximate posterior density and an exponentiated GP surrogate. We then provide an adaptive algorithm to construct such an approximation, where an active learning method is used to choose the design points. With numerical examples, we illustrate that the proposed method has competitive performance against existing approaches for Bayesian computation.
Tasks	Active Learning, Bayesian Inference
Published	2017-03-29
URL	http://arxiv.org/abs/1703.09930v4
PDF	http://arxiv.org/pdf/1703.09930v4.pdf
PWC	https://paperswithcode.com/paper/adaptive-gaussian-process-approximation-for
Repo	https://github.com/dflemin3/approxposterior
Framework	none

Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model


Title	Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model
Authors	Ty Nguyen, Steven W. Chen, Shreyas S. Shivakumar, Camillo J. Taylor, Vijay Kumar
Abstract	Homography estimation between multiple aerial images can provide relative pose estimation for collaborative autonomous exploration and monitoring. The usage on a robotic system requires a fast and robust homography estimation algorithm. In this study, we propose an unsupervised learning algorithm that trains a Deep Convolutional Neural Network to estimate planar homographies. We compare the proposed algorithm to traditional feature-based and direct methods, as well as a corresponding supervised learning algorithm. Our empirical results demonstrate that compared to traditional approaches, the unsupervised algorithm achieves faster inference speed, while maintaining comparable or better accuracy and robustness to illumination variation. In addition, on both a synthetic dataset and representative real-world aerial dataset, our unsupervised method has superior adaptability and performance compared to the supervised deep learning method.
Tasks	Homography Estimation, Pose Estimation
Published	2017-09-12
URL	http://arxiv.org/abs/1709.03966v3
PDF	http://arxiv.org/pdf/1709.03966v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-deep-homography-a-fast-and
Repo	https://github.com/tynguyen/unsupervisedDeepHomographyRAL2018
Framework	tf

Scaling the Scattering Transform: Deep Hybrid Networks


Title	Scaling the Scattering Transform: Deep Hybrid Networks
Authors	Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko
Abstract	We use the scattering network as a generic and fixed ini-tialization of the first layers of a supervised hybrid deep network. We show that early layers do not necessarily need to be learned, providing the best results to-date with pre-defined representations while being competitive with Deep CNNs. Using a shallow cascade of 1 x 1 convolutions, which encodes scattering coefficients that correspond to spatial windows of very small sizes, permits to obtain AlexNet accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding explicitly learns invariance w.r.t. rotations. Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers. We also find that hybrid architectures can yield excellent performance in the small sample regime, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. We demonstrate this on subsets of the CIFAR-10 dataset and on the STL-10 dataset.
Tasks
Published	2017-03-27
URL	http://arxiv.org/abs/1703.08961v2
PDF	http://arxiv.org/pdf/1703.08961v2.pdf
PWC	https://paperswithcode.com/paper/scaling-the-scattering-transform-deep-hybrid
Repo	https://github.com/edouardoyallon/pyscatwave
Framework	pytorch

Learning Depth from Monocular Videos using Direct Methods


Title	Learning Depth from Monocular Videos using Direct Methods
Authors	Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, Simon Lucey
Abstract	The ability to predict depth from a single image - using recent advances in CNNs - is of increasing interest to the vision community. Unsupervised strategies to learning are particularly appealing as they can utilize much larger and varied monocular video datasets during learning without the need for ground truth depth or stereo. In previous works, separate pose and depth CNN predictors had to be determined such that their joint outputs minimized the photometric error. Inspired by recent advances in direct visual odometry (DVO), we argue that the depth CNN predictor can be learned without a pose CNN predictor. Further, we demonstrate empirically that incorporation of a differentiable implementation of DVO, along with a novel depth normalization strategy - substantially improves performance over state of the art that use monocular videos for training.
Tasks	Depth And Camera Motion, Visual Odometry
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00175v1
PDF	http://arxiv.org/pdf/1712.00175v1.pdf
PWC	https://paperswithcode.com/paper/learning-depth-from-monocular-videos-using
Repo	https://github.com/yzcjtr/GeoNet
Framework	tf

Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach


Title	Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach
Authors	Kai Puolamäki, Emilia Oikarinen, Bo Kang, Jefrey Lijffijt, Tijl De Bie
Abstract	Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing methods use predefined criteria to choose the representation of data. There is a lack of methods that (i) elicit from the user what she has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as “this set of points forms a cluster”, and requires no knowledge of maths. This background knowledge is used to find a Maximum Entropy distribution of the data, after which the system provides the user data projections in which the data and the Maximum Entropy distribution differ the most, hence showing the user aspects of the data that are maximally informative given the user’s current knowledge. We provide an open source EDA system with tailored interactive visualizations to demonstrate these concepts. We study the performance of the system and present use cases on both synthetic and real data. We find that the model and the prototype system allow the user to learn information efficiently from various data sources and the system works sufficiently fast in practice. We conclude that the information theoretic approach to exploratory data analysis where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system.
Tasks
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08167v1
PDF	http://arxiv.org/pdf/1710.08167v1.pdf
PWC	https://paperswithcode.com/paper/interactive-visual-data-exploration-with
Repo	https://github.com/edahelsinki/EDAdemoR
Framework	none

SMASH: One-Shot Model Architecture Search through HyperNetworks


Title	SMASH: One-Shot Model Architecture Search through HyperNetworks
Authors	Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston
Abstract	Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model’s architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks. Our code is available at https://github.com/ajbrock/SMASH
Tasks	Neural Architecture Search
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05344v1
PDF	http://arxiv.org/pdf/1708.05344v1.pdf
PWC	https://paperswithcode.com/paper/smash-one-shot-model-architecture-search
Repo	https://github.com/ajbrock/SMASH
Framework	pytorch

Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information


Title	Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Authors	Jakob Runge
Abstract	Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear and high-dimensional dependencies. Here a fully non-parametric test for continuous data based on conditional mutual information combined with a local permutation scheme is presented. Through a nearest neighbor approach, the test efficiently adapts also to non-smooth distributions due to strongly nonlinear dependencies. Numerical experiments demonstrate that the test reliably simulates the null distribution even for small sample sizes and with high-dimensional conditioning sets. The test is better calibrated than kernel-based tests utilizing an analytical approximation of the null distribution, especially for non-smooth densities, and reaches the same or higher power levels. Combining the local permutation scheme with the kernel tests leads to better calibration, but suffers in power. For smaller sample sizes and lower dimensions, the test is faster than random fourier feature-based kernel tests if the permutation scheme is (embarrassingly) parallelized, but the runtime increases more sharply with sample size and dimensionality. Thus, more theoretical research to analytically approximate the null distribution and speed up the estimation for larger sample sizes is desirable.
Tasks	Calibration, Causal Discovery
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01447v1
PDF	http://arxiv.org/pdf/1709.01447v1.pdf
PWC	https://paperswithcode.com/paper/conditional-independence-testing-based-on-a
Repo	https://github.com/jakobrunge/tigramite
Framework	none

Kernel Cross-Correlator


Title	Kernel Cross-Correlator
Authors	Chen Wang, Le Zhang, Lihua Xie, Junsong Yuan
Abstract	Cross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking. Beyond the linear cross-correlator, this paper proposes a kernel cross-correlator (KCC) that breaks traditional limitations. First, by introducing the kernel trick, the KCC extends the linear cross-correlation to non-linear space, which is more robust to signal noises and distortions. Second, the connection to the existing works shows that KCC provides a unified solution for correlation filters. Third, KCC is applicable to any kernel function and is not limited to circulant structure on training data, thus it is able to predict affine transformations with customized properties. Last, by leveraging the fast Fourier transform (FFT), KCC eliminates direct calculation of kernel vectors, thus achieves better performance yet still with a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices demonstrate its robustness, flexibility, and efficiency. The source codes of both experiments are released at https://github.com/wang-chen/KCC
Tasks	Activity Recognition, Human Activity Recognition, Object Detection, Visual Tracking
Published	2017-09-12
URL	http://arxiv.org/abs/1709.05936v4
PDF	http://arxiv.org/pdf/1709.05936v4.pdf
PWC	https://paperswithcode.com/paper/kernel-cross-correlator
Repo	https://github.com/wang-chen/KCC
Framework	none