Paper Group ANR 542
Backpropagation generalized for output derivatives. A Computational Model of Afterimages based on Simultaneous and Successive Contrasts. Erratum: Link prediction in drug-target interactions network using similarity indices. An Effective Training Method For Deep Convolutional Neural Network. Improving Multilingual Named Entity Recognition with Wikip …
Backpropagation generalized for output derivatives
Title | Backpropagation generalized for output derivatives |
Authors | V. I. Avrutskiy |
Abstract | Backpropagation algorithm is the cornerstone for neural network analysis. Paper extends it for training any derivatives of neural network’s output with respect to its input. By the dint of it feedforward networks can be used to solve or verify solutions of partial or simple, linear or nonlinear differential equations. This method vastly differs from traditional ones like finite differences on a mesh. It contains no approximations, but rather an exact form of differential operators. Algorithm is built to train a feed forward network with any number of hidden layers and any kind of sufficiently smooth activation functions. It’s presented in a form of matrix-vector products so highly parallel implementation is readily possible. First part derives the method for 2D case with first and second order derivatives, second part extends it to N-dimensional case with any derivatives. All necessary expressions for using this method to solve most applied PDE can be found in Appendix D. |
Tasks | |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04185v1 |
http://arxiv.org/pdf/1712.04185v1.pdf | |
PWC | https://paperswithcode.com/paper/backpropagation-generalized-for-output |
Repo | |
Framework | |
A Computational Model of Afterimages based on Simultaneous and Successive Contrasts
Title | A Computational Model of Afterimages based on Simultaneous and Successive Contrasts |
Authors | Jinhui Yu, Kailin Wu, Kang Zhang, Xianjun Sam Zheng |
Abstract | Negative afterimage appears in our vision when we shift our gaze from an over stimulated original image to a new area with a uniform color. The colors of negative afterimages differ from the old stimulating colors in the original image when the color in the new area is either neutral or chromatic. The interaction between stimulating colors in the test and inducing field in the original image changes our color perception due to simultaneous contrast, and the interaction between changed colors perceived in the previously-viewed field and the color in the currently-viewed field also affects our perception of colors in negative afterimages due to successive contrast. Based on these observations we propose a computational model to estimate colors of negative afterimages in more general cases where the original stimulating color in the test field is chromatic, and the original stimulating color in the inducing field and the new stimulating color can be either neutral or chromatic. We validate our model with human experiments. |
Tasks | |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04550v1 |
http://arxiv.org/pdf/1709.04550v1.pdf | |
PWC | https://paperswithcode.com/paper/a-computational-model-of-afterimages-based-on |
Repo | |
Framework | |
Erratum: Link prediction in drug-target interactions network using similarity indices
Title | Erratum: Link prediction in drug-target interactions network using similarity indices |
Authors | Yiding Lu, Yufan Guo, Anna Korhonen |
Abstract | Background: In silico drug-target interaction (DTI) prediction plays an integral role in drug repositioning: the discovery of new uses for existing drugs. One popular method of drug repositioning is network-based DTI prediction, which uses complex network theory to predict DTIs from a drug-target network. Currently, most network-based DTI prediction is based on machine learning methods such as Restricted Boltzmann Machines (RBM) or Support Vector Machines (SVM). These methods require additional information about the characteristics of drugs, targets and DTIs, such as chemical structure, genome sequence, binding types, causes of interactions, etc., and do not perform satisfactorily when such information is unavailable. We propose a new, alternative method for DTI prediction that makes use of only network topology information attempting to solve this problem. Results: We compare our method for DTI prediction against the well-known RBM approach. We show that when applied to the MATADOR database, our approach based on node neighborhoods yield higher precision for high-ranking predictions than RBM when no information regarding DTI types is available. Conclusion: This demonstrates that approaches purely based on network topology provide a more suitable approach to DTI prediction in the many real-life situations where little or no prior knowledge is available about the characteristics of drugs, targets, or their interactions. |
Tasks | Link Prediction |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00150v1 |
http://arxiv.org/pdf/1711.00150v1.pdf | |
PWC | https://paperswithcode.com/paper/erratum-link-prediction-in-drug-target |
Repo | |
Framework | |
An Effective Training Method For Deep Convolutional Neural Network
Title | An Effective Training Method For Deep Convolutional Neural Network |
Authors | Yang Jiang, Zeyang Dou, Qun Hao, Jie Cao, Kun Gao, Xi Chen |
Abstract | In this paper, we propose the nonlinearity generation method to speed up and stabilize the training of deep convolutional neural networks. The proposed method modifies a family of activation functions as nonlinearity generators (NGs). NGs make the activation functions linear symmetric for their inputs to lower model capacity, and automatically introduce nonlinearity to enhance the capacity of the model during training. The proposed method can be considered an unusual form of regularization: the model parameters are obtained by training a relatively low-capacity model, that is relatively easy to optimize at the beginning, with only a few iterations, and these parameters are reused for the initialization of a higher-capacity model. We derive the upper and lower bounds of variance of the weight variation, and show that the initial symmetric structure of NGs helps stabilize training. We evaluate the proposed method on different frameworks of convolutional neural networks over two object recognition benchmark tasks (CIFAR-10 and CIFAR-100). Experimental results showed that the proposed method allows us to (1) speed up the convergence of training, (2) allow for less careful weight initialization, (3) improve or at least maintain the performance of the model at negligible extra computational cost, and (4) easily train a very deep model. |
Tasks | Object Recognition |
Published | 2017-07-31 |
URL | http://arxiv.org/abs/1708.01666v5 |
http://arxiv.org/pdf/1708.01666v5.pdf | |
PWC | https://paperswithcode.com/paper/an-effective-training-method-for-deep |
Repo | |
Framework | |
Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping
Title | Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping |
Authors | Jian Ni, Radu Florian |
Abstract | The state-of-the-art named entity recognition (NER) systems are statistical machine learning models that have strong generalization capability (i.e., can recognize unseen entities that do not appear in training data) based on lexical and contextual information. However, such a model could still make mistakes if its features favor a wrong entity type. In this paper, we utilize Wikipedia as an open knowledge base to improve multilingual NER systems. Central to our approach is the construction of high-accuracy, high-coverage multilingual Wikipedia entity type mappings. These mappings are built from weakly annotated data and can be extended to new languages with no human annotation or language-dependent knowledge involved. Based on these mappings, we develop several approaches to improve an NER system. We evaluate the performance of the approaches via experiments on NER systems trained for 6 languages. Experimental results show that the proposed approaches are effective in improving the accuracy of such systems on unseen entities, especially when a system is applied to a new domain or it is trained with little training data (up to 18.3 F1 score improvement). |
Tasks | Named Entity Recognition |
Published | 2017-07-08 |
URL | http://arxiv.org/abs/1707.02459v1 |
http://arxiv.org/pdf/1707.02459v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-multilingual-named-entity |
Repo | |
Framework | |
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities
Title | Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities |
Authors | Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari |
Abstract | Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior probabilities estimated from the source speech parameters. Although conventional VC can be built from non-parallel data, it is difficult to convert speaker individuality such as phonetic property and speaking rate contained in the posterior probabilities because the source posterior probabilities are directly used for predicting target speech parameters. In this work, we assume that the training data partly include parallel speech data and propose sequence-to-sequence learning between the source and target posterior probabilities. The conversion models perform non-linear and variable-length transformation from the source probability sequence to the target one. Further, we propose a joint training algorithm for the modules. In contrast to conventional VC, which separately trains the speech recognition that estimates posterior probabilities and the speech synthesis that predicts target speech parameters, our proposed method jointly trains these modules along with the proposed probability conversion modules. Experimental results demonstrate that our approach outperforms the conventional VC. |
Tasks | Speech Recognition, Speech Synthesis, Voice Conversion |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02360v4 |
http://arxiv.org/pdf/1704.02360v4.pdf | |
PWC | https://paperswithcode.com/paper/voice-conversion-using-sequence-to-sequence |
Repo | |
Framework | |
Semi-Supervised Model Training for Unbounded Conversational Speech Recognition
Title | Semi-Supervised Model Training for Unbounded Conversational Speech Recognition |
Authors | Shane Walker, Morten Pedersen, Iroro Orife, Jason Flaks |
Abstract | For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with sufficient accuracy in the unbounded space of conversational speech. These corpora are also timeworn due to dated acoustic telephony features and the rapid advancement of colloquial vocabulary and idiomatic speech over the last decades. Utilizing the colossal scale of our unlabeled telephony dataset, we propose a technique to construct a modern, high quality conversational speech training corpus on the order of hundreds of millions of utterances (or tens of thousands of hours) for both acoustic and language model training. We describe the data collection, selection and training, evaluating the results of our updated speech recognition system on a test corpus of 7K manually transcribed utterances. We show relative word error rate (WER) reductions of {35%, 19%} on {agent, caller} utterances over our seed model and 5% absolute WER improvements over IBM Watson STT on this conversational speech task. |
Tasks | Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2017-05-26 |
URL | http://arxiv.org/abs/1705.09724v1 |
http://arxiv.org/pdf/1705.09724v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-model-training-for-unbounded |
Repo | |
Framework | |
A New Multifocus Image Fusion Method Using Contourlet Transform
Title | A New Multifocus Image Fusion Method Using Contourlet Transform |
Authors | Fatemeh Vakili Moghadam, Hamid Reza Shahdoosti |
Abstract | A new multifocus image fusion approach is presented in this paper. First the contourlet transform is used to decompose the source images into different components. Then, some salient features are extracted from components. In order to extract salient features, spatial frequency is used. Subsequently, the best coefficients from the components are selected by the maximum selection rule. Finally, the inverse contourlet transform is applied to the selected coefficients. Experiments show the superiority of the proposed method. |
Tasks | |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.09528v1 |
http://arxiv.org/pdf/1709.09528v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-multifocus-image-fusion-method-using |
Repo | |
Framework | |
Computing the quality of the Laplace approximation
Title | Computing the quality of the Laplace approximation |
Authors | Guillaume P. Dehaene |
Abstract | Bayesian inference requires approximation methods to become computable, but for most of them it is impossible to quantify how close the approximation is to the true posterior. In this work, we present a theorem upper-bounding the KL divergence between a log-concave target density $f\left(\boldsymbol{\theta}\right)$ and its Laplace approximation $g\left(\boldsymbol{\theta}\right)$. The bound we present is computable: on the classical logistic regression model, we find our bound to be almost exact as long as the dimensionality of the parameter space is high. The approach we followed in this work can be extended to other Gaussian approximations, as we will do in an extended version of this work, to be submitted to the Annals of Statistics. It will then become a critical tool for characterizing whether, for a given problem, a given Gaussian approximation is suitable, or whether a more precise alternative method should be used instead. |
Tasks | Bayesian Inference |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.08911v1 |
http://arxiv.org/pdf/1711.08911v1.pdf | |
PWC | https://paperswithcode.com/paper/computing-the-quality-of-the-laplace |
Repo | |
Framework | |
Simulating Action Dynamics with Neural Process Networks
Title | Simulating Action Dynamics with Neural Process Networks |
Authors | Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, Yejin Choi |
Abstract | Understanding procedural language requires anticipating the causal effects of actions, even when they are not explicitly stated. In this work, we introduce Neural Process Networks to understand procedural text through (neural) simulation of action dynamics. Our model complements existing memory architectures with dynamic entity tracking by explicitly modeling actions as state transformers. The model updates the states of the entities by executing learned action operators. Empirical results demonstrate that our proposed model can reason about the unstated causal effects of actions, allowing it to provide more accurate contextual information for understanding and generating procedural text, all while offering more interpretable internal representations than existing alternatives. |
Tasks | |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.05313v2 |
http://arxiv.org/pdf/1711.05313v2.pdf | |
PWC | https://paperswithcode.com/paper/simulating-action-dynamics-with-neural |
Repo | |
Framework | |
Deep Learning for Time-Series Analysis
Title | Deep Learning for Time-Series Analysis |
Authors | John Cristian Borges Gamboa |
Abstract | In many real-world application, e.g., speech recognition or sleep stage classification, data are captured over the course of time, constituting a Time-Series. Time-Series often contain temporal dependencies that cause two otherwise identical points of time to belong to different classes or predict different behavior. This characteristic generally increases the difficulty of analysing them. Existing techniques often depended on hand-crafted features that were expensive to create and required expert knowledge of the field. With the advent of Deep Learning new models of unsupervised learning of features for Time-series analysis and forecast have been developed. Such new developments are the topic of this paper: a review of the main Deep Learning techniques is presented, and some applications on Time-Series analysis are summaried. The results make it clear that Deep Learning has a lot to contribute to the field. |
Tasks | Speech Recognition, Time Series, Time Series Analysis |
Published | 2017-01-07 |
URL | http://arxiv.org/abs/1701.01887v1 |
http://arxiv.org/pdf/1701.01887v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-time-series-analysis |
Repo | |
Framework | |
Providing theoretical learning guarantees to Deep Learning Networks
Title | Providing theoretical learning guarantees to Deep Learning Networks |
Authors | Rodrigo Fernandes de Mello, Martha Dais Ferreira, Moacir Antonelli Ponti |
Abstract | Deep Learning (DL) is one of the most common subjects when Machine Learning and Data Science approaches are considered. There are clearly two movements related to DL: the first aggregates researchers in quest to outperform other algorithms from literature, trying to win contests by considering often small decreases in the empirical risk; and the second investigates overfitting evidences, questioning the learning capabilities of DL classifiers. Motivated by such opposed points of view, this paper employs the Statistical Learning Theory (SLT) to study the convergence of Deep Neural Networks, with particular interest in Convolutional Neural Networks. In order to draw theoretical conclusions, we propose an approach to estimate the Shattering coefficient of those classification algorithms, providing a lower bound for the complexity of their space of admissible functions, a.k.a. algorithm bias. Based on such estimator, we generalize the complexity of network biases, and, next, we study AlexNet and VGG16 architectures in the point of view of their Shattering coefficients, and number of training examples required to provide theoretical learning guarantees. From our theoretical formulation, we show the conditions which Deep Neural Networks learn as well as point out another issue: DL benchmarks may be strictly driven by empirical risks, disregarding the complexity of algorithms biases. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10292v1 |
http://arxiv.org/pdf/1711.10292v1.pdf | |
PWC | https://paperswithcode.com/paper/providing-theoretical-learning-guarantees-to |
Repo | |
Framework | |
Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification
Title | Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification |
Authors | Hajin Shim, Sung Ju Hwang, Eunho Yang |
Abstract | We consider the problem of active feature acquisition, where we sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way. In this work, we formulate this active feature acquisition problem as a reinforcement learning problem, and provide a novel framework for jointly learning both the RL agent and the classifier (environment). We also introduce a more systematic way of encoding subsets of features that can properly handle innate challenge with missing entries in active feature acquisition problems, that uses the orderless LSTM-based set encoding mechanism that readily fits in the joint learning framework. We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several real datasets such as electric health record (EHR) datasets, on which it outperforms all baselines in terms of prediction performance as well feature acquisition cost. |
Tasks | |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.05964v1 |
http://arxiv.org/pdf/1709.05964v1.pdf | |
PWC | https://paperswithcode.com/paper/why-pay-more-when-you-can-pay-less-a-joint |
Repo | |
Framework | |
Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints
Title | Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints |
Authors | Francisco López-Ramos, Armando Guarnaschelli, José-Fernando Camacho-Vallejo, Laura Hervert-Escobar, Rosa G. González-Ramírez |
Abstract | This work presents a model for the Tramp Ship Scheduling problem including berth allocation considerations, motivated by a real case of a shipping company. The aim is to determine the travel schedule for each vessel considering multiple docking and multiple time windows at the berths. This work is innovative due to the consideration of both spatial and temporal attributes during the scheduling process. The resulting model is formulated as a mixed-integer linear programming problem, and a heuristic method to deal with multiple vessel schedules is also presented. Numerical experimentation is performed to highlight the benefits of the proposed approach and the applicability of the heuristic. Conclusions and recommendations for further research are provided. |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01681v1 |
http://arxiv.org/pdf/1705.01681v1.pdf | |
PWC | https://paperswithcode.com/paper/tramp-ship-scheduling-problem-with-berth |
Repo | |
Framework | |
On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models
Title | On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models |
Authors | Steffen Eger, Alexander Mehler |
Abstract | We consider two graph models of semantic change. The first is a time-series model that relates embedding vectors from one time period to embedding vectors of previous time periods. In the second, we construct one graph for each word: nodes in this graph correspond to time points and edge weights to the similarity of the word’s meaning across two time points. We apply our two models to corpora across three different languages. We find that semantic change is linear in two senses. Firstly, today’s embedding vectors (= meaning) of words can be derived as linear combinations of embedding vectors of their neighbors in previous time periods. Secondly, self-similarity of words decays linearly in time. We consider both findings as new laws/hypotheses of semantic change. |
Tasks | Time Series |
Published | 2017-04-08 |
URL | http://arxiv.org/abs/1704.02497v1 |
http://arxiv.org/pdf/1704.02497v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-linearity-of-semantic-change |
Repo | |
Framework | |