Paper Group AWR 198
Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting. A Morphology-aware Network for Morphological Disambiguation. An Empirical Analysis of Feature Engineering for Predictive Modeling. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. CRF Autoencoder …
Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting
Title | Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting |
Authors | Wei Zhang, Lei Han, Juanzhen Sun, Hanyang Guo, Jie Dai |
Abstract | Convective storm nowcasting has attracted substantial attention in various fields. Existing methods under a deep learning framework rely primarily on radar data. Although they perform nowcast storm advection well, it is still challenging to nowcast storm initiation and growth, due to the limitations of the radar observations. This paper describes the first attempt to nowcast storm initiation, growth, and advection simultaneously under a deep learning framework using multi-source meteorological data. To this end, we present a multi-channel 3D-cube successive convolution network (3D-SCN). As real-time re-analysis meteorological data can now provide valuable atmospheric boundary layer thermal dynamic information, which is essential to predict storm initiation and growth, both raw 3D radar and re-analysis data are used directly without any handcraft feature engineering. These data are formulated as multi-channel 3D cubes, to be fed into our network, which are convolved by cross-channel 3D convolutions. By stacking successive convolutional layers without pooling, we build an end-to-end trainable model for nowcasting. Experimental results show that deep learning methods achieve better performance than traditional extrapolation methods. The qualitative analyses of 3D-SCN show encouraging results of nowcasting of storm initiation, growth, and advection. |
Tasks | Feature Engineering |
Published | 2017-02-15 |
URL | https://arxiv.org/abs/1702.04517v5 |
https://arxiv.org/pdf/1702.04517v5.pdf | |
PWC | https://paperswithcode.com/paper/application-of-multi-channel-3d-cube |
Repo | https://github.com/DandelionX/research-references |
Framework | none |
A Morphology-aware Network for Morphological Disambiguation
Title | A Morphology-aware Network for Morphological Disambiguation |
Authors | Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, Ozan Sonmez |
Abstract | Agglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological analysis of a word. Morphological disambiguation is important because it generally is one of the first steps of natural language processing and its performance affects subsequent analyses. In this paper, we propose a system that uses deep learning techniques for morphological disambiguation. Many of the state-of-the-art results in computer vision, speech recognition and natural language processing have been obtained through deep learning models. However, applying deep learning techniques to morphologically rich languages is not well studied. In this work, while we focus on Turkish morphological disambiguation we also present results for French and German in order to show that the proposed architecture achieves high accuracy with no language-specific feature engineering or additional resource. In the experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation accuracy among the ambiguous words for Turkish, German and French respectively. |
Tasks | Feature Engineering, Morphological Analysis, Speech Recognition |
Published | 2017-02-13 |
URL | http://arxiv.org/abs/1702.03654v1 |
http://arxiv.org/pdf/1702.03654v1.pdf | |
PWC | https://paperswithcode.com/paper/a-morphology-aware-network-for-morphological |
Repo | https://github.com/hbahadirsahin/text_categorization |
Framework | tf |
An Empirical Analysis of Feature Engineering for Predictive Modeling
Title | An Empirical Analysis of Feature Engineering for Predictive Modeling |
Authors | Jeff Heaton |
Abstract | Machine learning models, such as neural networks, decision trees, random forests and gradient boosting machines accept a feature vector and provide a prediction. These models learn in a supervised fashion where a set of feature vectors with expected output is provided. It is very common practice to engineer new features from the provided feature set. Such engineered features will either augment, or replace portions of the existing feature vector. These engineered features are essentially calculated fields, based on the values of the other features. Engineering such features is primarily a manual, time-consuming task. Additionally, each type of model will respond differently to different types of engineered features. This paper reports on empirical research to demonstrate what types of engineered features are best suited to which machine learning model type. This is accomplished by generating several datasets that are designed to benefit from a particular type of engineered feature. The experiment demonstrates to what degree the machine learning model is capable of synthesizing the needed feature on its own. If a model is capable of synthesizing an engineered feature, it is not necessary to provide that feature. The research demonstrated that the studied models do indeed perform differently with various types of engineered features. |
Tasks | Feature Engineering |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07852v1 |
http://arxiv.org/pdf/1701.07852v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-analysis-of-feature-engineering |
Repo | https://github.com/jeffheaton/papers |
Framework | tf |
Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition
Title | Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition |
Authors | Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi |
Abstract | Background. Previous state-of-the-art systems on Drug Name Recognition (DNR) and Clinical Concept Extraction (CCE) have focused on a combination of text “feature engineering” and conventional machine learning algorithms such as conditional random fields and support vector machines. However, developing good features is inherently heavily time-consuming. Conversely, more modern machine learning approaches such as recurrent neural networks (RNNs) have proved capable of automatically learning effective features from either random assignments or automated word “embeddings”. Objectives. (i) To create a highly accurate DNR and CCE system that avoids conventional, time-consuming feature engineering. (ii) To create richer, more specialized word embeddings by using health domain datasets such as MIMIC-III. (iii) To evaluate our systems over three contemporary datasets. Methods. Two deep learning methods, namely the Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model is set as the baseline to compare the deep learning systems to a traditional machine learning approach. The same features are used for all the models. Results. We have obtained the best results with the Bidirectional LSTM-CRF model, which has outperformed all previously proposed systems. The specialized embeddings have helped to cover unusual words in DDI-DrugBank and DDI-MedLine, but not in the 2010 i2b2/VA IRB Revision dataset. Conclusion. We present a state-of-the-art system for DNR and CCE. Automated word embeddings has allowed us to avoid costly feature engineering and achieve higher accuracy. Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary. |
Tasks | Clinical Concept Extraction, Feature Engineering, Named Entity Recognition, Word Embeddings |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09569v2 |
http://arxiv.org/pdf/1706.09569v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-with-specialized |
Repo | https://github.com/ijauregiCMCRC/healthNER |
Framework | none |
CRF Autoencoder for Unsupervised Dependency Parsing
Title | CRF Autoencoder for Unsupervised Dependency Parsing |
Authors | Jiong Cai, Yong Jiang, Kewei Tu |
Abstract | Unsupervised dependency parsing, which tries to discover linguistic dependency structures from unannotated data, is a very challenging task. Almost all previous work on this task focuses on learning generative models. In this paper, we develop an unsupervised dependency parsing model based on the CRF autoencoder. The encoder part of our model is discriminative and globally normalized which allows us to use rich features as well as universal linguistic priors. We propose an exact algorithm for parsing as well as a tractable learning algorithm. We evaluated the performance of our model on eight multilingual treebanks and found that our model achieved comparable performance with state-of-the-art approaches. |
Tasks | Dependency Grammar Induction |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.01018v1 |
http://arxiv.org/pdf/1708.01018v1.pdf | |
PWC | https://paperswithcode.com/paper/crf-autoencoder-for-unsupervised-dependency |
Repo | https://github.com/caijiong/CRFAE-Dep-Parser |
Framework | none |
Embedding Words as Distributions with a Bayesian Skip-gram Model
Title | Embedding Words as Distributions with a Bayesian Skip-gram Model |
Authors | Arthur Bražinskas, Serhii Havrylov, Ivan Titov |
Abstract | We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential ‘meanings’. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks. |
Tasks | |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.11027v2 |
http://arxiv.org/pdf/1711.11027v2.pdf | |
PWC | https://paperswithcode.com/paper/embedding-words-as-distributions-with-a |
Repo | https://github.com/ixlan/BSG |
Framework | none |
Backprop as Functor: A compositional perspective on supervised learning
Title | Backprop as Functor: A compositional perspective on supervised learning |
Authors | Brendan Fong, David I. Spivak, Rémy Tuyéras |
Abstract | A supervised learning algorithm searches over a set of functions $A \to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\colon A \to B$. It does this by taking examples $(a,f(a)) \in A\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent—with respect to a fixed step size and an error function satisfying a certain property—defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10455v3 |
http://arxiv.org/pdf/1711.10455v3.pdf | |
PWC | https://paperswithcode.com/paper/backprop-as-functor-a-compositional |
Repo | https://github.com/TomohikoK/backprop-as-functor |
Framework | none |
Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions
Title | Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions |
Authors | Hongqiao Wang, Jinglai Li |
Abstract | We consider Bayesian inference problems with computationally intensive likelihood functions. We propose a Gaussian process (GP) based method to approximate the joint distribution of the unknown parameters and the data. In particular, we write the joint density approximately as a product of an approximate posterior density and an exponentiated GP surrogate. We then provide an adaptive algorithm to construct such an approximation, where an active learning method is used to choose the design points. With numerical examples, we illustrate that the proposed method has competitive performance against existing approaches for Bayesian computation. |
Tasks | Active Learning, Bayesian Inference |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.09930v4 |
http://arxiv.org/pdf/1703.09930v4.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-gaussian-process-approximation-for |
Repo | https://github.com/dflemin3/approxposterior |
Framework | none |
Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model
Title | Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model |
Authors | Ty Nguyen, Steven W. Chen, Shreyas S. Shivakumar, Camillo J. Taylor, Vijay Kumar |
Abstract | Homography estimation between multiple aerial images can provide relative pose estimation for collaborative autonomous exploration and monitoring. The usage on a robotic system requires a fast and robust homography estimation algorithm. In this study, we propose an unsupervised learning algorithm that trains a Deep Convolutional Neural Network to estimate planar homographies. We compare the proposed algorithm to traditional feature-based and direct methods, as well as a corresponding supervised learning algorithm. Our empirical results demonstrate that compared to traditional approaches, the unsupervised algorithm achieves faster inference speed, while maintaining comparable or better accuracy and robustness to illumination variation. In addition, on both a synthetic dataset and representative real-world aerial dataset, our unsupervised method has superior adaptability and performance compared to the supervised deep learning method. |
Tasks | Homography Estimation, Pose Estimation |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03966v3 |
http://arxiv.org/pdf/1709.03966v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-deep-homography-a-fast-and |
Repo | https://github.com/tynguyen/unsupervisedDeepHomographyRAL2018 |
Framework | tf |
Scaling the Scattering Transform: Deep Hybrid Networks
Title | Scaling the Scattering Transform: Deep Hybrid Networks |
Authors | Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko |
Abstract | We use the scattering network as a generic and fixed ini-tialization of the first layers of a supervised hybrid deep network. We show that early layers do not necessarily need to be learned, providing the best results to-date with pre-defined representations while being competitive with Deep CNNs. Using a shallow cascade of 1 x 1 convolutions, which encodes scattering coefficients that correspond to spatial windows of very small sizes, permits to obtain AlexNet accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding explicitly learns invariance w.r.t. rotations. Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers. We also find that hybrid architectures can yield excellent performance in the small sample regime, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. We demonstrate this on subsets of the CIFAR-10 dataset and on the STL-10 dataset. |
Tasks | |
Published | 2017-03-27 |
URL | http://arxiv.org/abs/1703.08961v2 |
http://arxiv.org/pdf/1703.08961v2.pdf | |
PWC | https://paperswithcode.com/paper/scaling-the-scattering-transform-deep-hybrid |
Repo | https://github.com/edouardoyallon/pyscatwave |
Framework | pytorch |
Learning Depth from Monocular Videos using Direct Methods
Title | Learning Depth from Monocular Videos using Direct Methods |
Authors | Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, Simon Lucey |
Abstract | The ability to predict depth from a single image - using recent advances in CNNs - is of increasing interest to the vision community. Unsupervised strategies to learning are particularly appealing as they can utilize much larger and varied monocular video datasets during learning without the need for ground truth depth or stereo. In previous works, separate pose and depth CNN predictors had to be determined such that their joint outputs minimized the photometric error. Inspired by recent advances in direct visual odometry (DVO), we argue that the depth CNN predictor can be learned without a pose CNN predictor. Further, we demonstrate empirically that incorporation of a differentiable implementation of DVO, along with a novel depth normalization strategy - substantially improves performance over state of the art that use monocular videos for training. |
Tasks | Depth And Camera Motion, Visual Odometry |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00175v1 |
http://arxiv.org/pdf/1712.00175v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-depth-from-monocular-videos-using |
Repo | https://github.com/yzcjtr/GeoNet |
Framework | tf |
Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach
Title | Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach |
Authors | Kai Puolamäki, Emilia Oikarinen, Bo Kang, Jefrey Lijffijt, Tijl De Bie |
Abstract | Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing methods use predefined criteria to choose the representation of data. There is a lack of methods that (i) elicit from the user what she has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as “this set of points forms a cluster”, and requires no knowledge of maths. This background knowledge is used to find a Maximum Entropy distribution of the data, after which the system provides the user data projections in which the data and the Maximum Entropy distribution differ the most, hence showing the user aspects of the data that are maximally informative given the user’s current knowledge. We provide an open source EDA system with tailored interactive visualizations to demonstrate these concepts. We study the performance of the system and present use cases on both synthetic and real data. We find that the model and the prototype system allow the user to learn information efficiently from various data sources and the system works sufficiently fast in practice. We conclude that the information theoretic approach to exploratory data analysis where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system. |
Tasks | |
Published | 2017-10-23 |
URL | http://arxiv.org/abs/1710.08167v1 |
http://arxiv.org/pdf/1710.08167v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-visual-data-exploration-with |
Repo | https://github.com/edahelsinki/EDAdemoR |
Framework | none |
SMASH: One-Shot Model Architecture Search through HyperNetworks
Title | SMASH: One-Shot Model Architecture Search through HyperNetworks |
Authors | Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston |
Abstract | Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model’s architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks. Our code is available at https://github.com/ajbrock/SMASH |
Tasks | Neural Architecture Search |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05344v1 |
http://arxiv.org/pdf/1708.05344v1.pdf | |
PWC | https://paperswithcode.com/paper/smash-one-shot-model-architecture-search |
Repo | https://github.com/ajbrock/SMASH |
Framework | pytorch |
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Title | Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information |
Authors | Jakob Runge |
Abstract | Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear and high-dimensional dependencies. Here a fully non-parametric test for continuous data based on conditional mutual information combined with a local permutation scheme is presented. Through a nearest neighbor approach, the test efficiently adapts also to non-smooth distributions due to strongly nonlinear dependencies. Numerical experiments demonstrate that the test reliably simulates the null distribution even for small sample sizes and with high-dimensional conditioning sets. The test is better calibrated than kernel-based tests utilizing an analytical approximation of the null distribution, especially for non-smooth densities, and reaches the same or higher power levels. Combining the local permutation scheme with the kernel tests leads to better calibration, but suffers in power. For smaller sample sizes and lower dimensions, the test is faster than random fourier feature-based kernel tests if the permutation scheme is (embarrassingly) parallelized, but the runtime increases more sharply with sample size and dimensionality. Thus, more theoretical research to analytically approximate the null distribution and speed up the estimation for larger sample sizes is desirable. |
Tasks | Calibration, Causal Discovery |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01447v1 |
http://arxiv.org/pdf/1709.01447v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-independence-testing-based-on-a |
Repo | https://github.com/jakobrunge/tigramite |
Framework | none |
Kernel Cross-Correlator
Title | Kernel Cross-Correlator |
Authors | Chen Wang, Le Zhang, Lihua Xie, Junsong Yuan |
Abstract | Cross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking. Beyond the linear cross-correlator, this paper proposes a kernel cross-correlator (KCC) that breaks traditional limitations. First, by introducing the kernel trick, the KCC extends the linear cross-correlation to non-linear space, which is more robust to signal noises and distortions. Second, the connection to the existing works shows that KCC provides a unified solution for correlation filters. Third, KCC is applicable to any kernel function and is not limited to circulant structure on training data, thus it is able to predict affine transformations with customized properties. Last, by leveraging the fast Fourier transform (FFT), KCC eliminates direct calculation of kernel vectors, thus achieves better performance yet still with a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices demonstrate its robustness, flexibility, and efficiency. The source codes of both experiments are released at https://github.com/wang-chen/KCC |
Tasks | Activity Recognition, Human Activity Recognition, Object Detection, Visual Tracking |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.05936v4 |
http://arxiv.org/pdf/1709.05936v4.pdf | |
PWC | https://paperswithcode.com/paper/kernel-cross-correlator |
Repo | https://github.com/wang-chen/KCC |
Framework | none |