Paper Group ANR 1172
Improving Neural Protein-Protein Interaction Extraction with Knowledge Selection. Self-supervised learning for autonomous vehicles perception: A conciliation between analytical and learning methods. Measuring the intelligence of an idealized mechanical knowing agent. Incorporating Symbolic Sequential Modeling for Speech Enhancement. Identify treatm …
Improving Neural Protein-Protein Interaction Extraction with Knowledge Selection
Title | Improving Neural Protein-Protein Interaction Extraction with Knowledge Selection |
Authors | Huiwei Zhou, Xuefei Li, Weihong Yao, Zhuang Liu, Shixian Ning, Chengkun Lang, Lei Du |
Abstract | Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. Meanwhile, knowledge bases (KBs) contain huge amounts of structured information of protein entities and their relations, which can be encoded in entity and relation embeddings to help PPI extraction. However, the prior knowledge of protein-protein pairs must be selectively used so that it is suitable for different contexts. This paper proposes a Knowledge Selection Model (KSM) to fuse the selected prior knowledge and context information for PPI extraction. Firstly, two Transformers encode the context sequence of a protein pair according to each protein embedding, respectively. Then, the two outputs are fed to a mutual attention to capture the important context features towards the protein pair. Next, the context features are used to distill the relation embedding by a knowledge selector. Finally, the selected relation embedding and the context features are concatenated for PPI extraction. Experiments on the BioCreative VI PPI dataset show that KSM achieves a new state-of-the-art performance (38.08% F1-score) by adding knowledge selection. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05147v1 |
https://arxiv.org/pdf/1912.05147v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-neural-protein-protein-interaction |
Repo | |
Framework | |
Self-supervised learning for autonomous vehicles perception: A conciliation between analytical and learning methods
Title | Self-supervised learning for autonomous vehicles perception: A conciliation between analytical and learning methods |
Authors | Florent Chiaroni, Mohamed-Cherif Rahal, Nicolas Hueber, Frederic Dufaux |
Abstract | This article mainly aims at motivating more investigations on self-supervised learning (SSL) perception techniques and their applications in autonomous driving. Such approaches are of broad interest as they can improve analytical methods performances, for example to perceive farther and more accurately spatially or temporally. In the meantime, they can also reduce the need of hand-labeled training data for learning methods, while offering the possibility to update the learning models into an online process. This can help an autonomous system to deal with unexpected changing conditions in the ego-vehicle environment. In all, this article firstly highlights the analytical and learning tools which may be interesting for improving or developping SSL techniques. Then, it presents the insights and correlations between existing autonomous driving perception SSL techniques, and some of their remaining limitations opening up some future research perspectives. |
Tasks | Autonomous Driving, Autonomous Vehicles |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01636v1 |
https://arxiv.org/pdf/1910.01636v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-for-autonomous |
Repo | |
Framework | |
Measuring the intelligence of an idealized mechanical knowing agent
Title | Measuring the intelligence of an idealized mechanical knowing agent |
Authors | Samuel Allen Alexander |
Abstract | We define a notion of the intelligence level of an idealized mechanical knowing agent. This is motivated by efforts within artificial intelligence research to define real-number intelligence levels of complicated intelligent systems. Our agents are more idealized, which allows us to define a much simpler measure of intelligence level for them. In short, we define the intelligence level of a mechanical knowing agent to be the supremum of the computable ordinals that have codes the agent knows to be codes of computable ordinals. We prove that if one agent knows certain things about another agent, then the former necessarily has a higher intelligence level than the latter. This allows our intelligence notion to serve as a stepping stone to obtain results which, by themselves, are not stated in terms of our intelligence notion (results of potential interest even to readers totally skeptical that our notion correctly captures intelligence). As an application, we argue that these results comprise evidence against the possibility of intelligence explosion (that is, the notion that sufficiently intelligent machines will eventually be capable of designing even more intelligent machines, which can then design even more intelligent machines, and so on). |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.09571v1 |
https://arxiv.org/pdf/1912.09571v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-intelligence-of-an-idealized |
Repo | |
Framework | |
Incorporating Symbolic Sequential Modeling for Speech Enhancement
Title | Incorporating Symbolic Sequential Modeling for Speech Enhancement |
Authors | Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai |
Abstract | In a noisy environment, a lossy speech signal can be automatically restored by a listener if he/she knows the language well. That is, with the built-in knowledge of a “language model”, a listener may effectively suppress noise interference and retrieve the target speech signals. Accordingly, we argue that familiarity with the underlying linguistic content of spoken utterances benefits speech enhancement (SE) in noisy environments. In this study, in addition to the conventional modeling for learning the acoustic noisy-clean speech mapping, an abstract symbolic sequential modeling is incorporated into the SE framework. This symbolic sequential modeling can be regarded as a “linguistic constraint” in learning the acoustic noisy-clean speech mapping function. In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm. The obtained symbols are able to capture high-level phoneme-like content from speech signals. The experimental results demonstrate that the proposed framework can obtain notable performance improvement in terms of perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) on the TIMIT dataset. |
Tasks | Language Modelling, Speech Enhancement |
Published | 2019-04-30 |
URL | https://arxiv.org/abs/1904.13142v3 |
https://arxiv.org/pdf/1904.13142v3.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-symbolic-sequential-modeling |
Repo | |
Framework | |
Identify treatment effect patterns for personalised decisions
Title | Identify treatment effect patterns for personalised decisions |
Authors | Jiuyong Li, Saisai Ma, Lin Liu, Thuc Duy Le, Jixue Liu, Yizhao Han |
Abstract | In personalised decision making, evidence is required to determine suitable actions for individuals. Such evidence can be obtained by identifying treatment effect heterogeneity in different subgroups of the population. In this paper, we design a new type of pattern, treatment effect pattern to represent and discover treatment effect heterogeneity from data for determining whether a treatment will work for an individual or not. Our purpose is to use the computational power to find the most specific and relevant conditions for individuals with respect to a treatment or an action to assist with personalised decision making. Most existing work on identifying treatment effect heterogeneity takes a top-down or partitioning based approach to search for subgroups with heterogeneous treatment effects. We propose a bottom-up generalisation algorithm to obtain the most specific patterns that fit individual circumstances the best for personalised decision making. For the generalisation, we follow a consistency driven strategy to maintain inner-group homogeneity and inter-group heterogeneity of treatment effects. We also employ graphical causal modelling technique to identify adjustment variables for reliable treatment effect pattern discovery. Our method can find the treatment effect patterns reliably as validated by the experiments. The method is faster than the two existing machine learning methods for heterogeneous treatment effect identification and it produces subgroups with higher inner-group treatment effect homogeneity. |
Tasks | Decision Making |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06080v1 |
https://arxiv.org/pdf/1906.06080v1.pdf | |
PWC | https://paperswithcode.com/paper/identify-treatment-effect-patterns-for |
Repo | |
Framework | |
Cascaded Projection: End-to-End Network Compression and Acceleration
Title | Cascaded Projection: End-to-End Network Compression and Acceleration |
Authors | Breton Minnehan, Andreas Savakis |
Abstract | We propose a data-driven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements. Current network compression methods either find a low-rank factorization of the features that requires more memory, or select only a subset of features by pruning entire filter channels. We propose the Cascaded Projection (CaP) compression method that projects the output and input filter channels of successive layers to a unified low dimensional space based on a low-rank projection. We optimize the projection to minimize classification loss and the difference between the next layer’s features in the compressed and uncompressed networks. To solve this non-convex optimization problem we propose a new optimization method of a proxy matrix using backpropagation and Stochastic Gradient Descent (SGD) with geometric constraints. Our cascaded projection approach leads to improvements in all critical areas of network compression: high accuracy, low memory consumption, low parameter count and high processing speed. The proposed CaP method demonstrates state-of-the-art results compressing VGG16 and ResNet networks with over 4x reduction in the number of computations and excellent performance in top-5 accuracy on the ImageNet dataset before and after fine-tuning. |
Tasks | Neural Network Compression |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.04988v1 |
http://arxiv.org/pdf/1903.04988v1.pdf | |
PWC | https://paperswithcode.com/paper/cascaded-projection-end-to-end-network |
Repo | |
Framework | |
Emergence of Writing Systems Through Multi-Agent Cooperation
Title | Emergence of Writing Systems Through Multi-Agent Cooperation |
Authors | Shresth Verma, Joydip Dhar |
Abstract | Learning to communicate is considered an essential task to develop a general AI. While recent literature in language evolution has studied emergent language through discrete or continuous message symbols, there has been little work in the emergence of writing systems in artificial agents. In this paper, we present a referential game setup with two agents, where the mode of communication is a written language system that emerges during the play. We show that the agents can learn to coordinate successfully using this mode of communication. Further, we study how the game rules affect the writing system taxonomy by proposing a consistency metric. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00741v1 |
https://arxiv.org/pdf/1910.00741v1.pdf | |
PWC | https://paperswithcode.com/paper/emergence-of-writing-systems-through-multi |
Repo | |
Framework | |
Towards End-to-End Learning for Efficient Dialogue Agent by Modeling Looking-ahead Ability
Title | Towards End-to-End Learning for Efficient Dialogue Agent by Modeling Looking-ahead Ability |
Authors | Zhuoxuan Jiang, Xian-Ling Mao, Ziming Huang, Jie Ma, Shaochun Li |
Abstract | Learning an efficient manager of dialogue agent from data with little manual intervention is important, especially for goal-oriented dialogues. However, existing methods either take too many manual efforts (e.g. reinforcement learning methods) or cannot guarantee the dialogue efficiency (e.g. sequence-to-sequence methods). In this paper, we address this problem by proposing a novel end-to-end learning model to train a dialogue agent that can look ahead for several future turns and generate an optimal response to make the dialogue efficient. Our method is data-driven and does not require too much manual work for intervention during system design. We evaluate our method on two datasets of different scenarios and the experimental results demonstrate the efficiency of our model. |
Tasks | |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05408v1 |
https://arxiv.org/pdf/1908.05408v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-end-to-end-learning-for-efficient |
Repo | |
Framework | |
Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability
Title | Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability |
Authors | Christoph Molnar, Giuseppe Casalicchio, Bernd Bischl |
Abstract | Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity. |
Tasks | Interpretable Machine Learning |
Published | 2019-04-08 |
URL | https://arxiv.org/abs/1904.03867v2 |
https://arxiv.org/pdf/1904.03867v2.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-interpretability-of-arbitrary |
Repo | |
Framework | |
eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
Title | eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference |
Authors | Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai-Ping Lin, Li-Wei Wang, Li-De Chen |
Abstract | Convolutional neural networks (CNNs) have recently demonstrated superior quality for computational imaging applications. Therefore, they have great potential to revolutionize the image pipelines on cameras and displays. However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. Therefore, finding a further memory- and computation-efficient microarchitecture is crucial to speed up this coming revolution. In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. We apply a block-based inference flow which can eliminate all the DRAM bandwidth for feature maps and accordingly propose a hardware-oriented network model, ERNet, to optimize image quality based on hardware constraints. Then we devise a coarse-grained instruction set architecture, FBISA, to support power-hungry convolution by massive parallelism. Finally,we implement an embedded processor—eCNN—which accommodates to ERNet and FBISA with a flexible processing architecture. Layout results show that it can support high-quality ERNets for super-resolution and denoising at up to 4K Ultra-HD 30 fps while using only DDR-400 and consuming 6.94W on average. By comparison, the state-of-the-art Diffy uses dual-channel DDR3-2133 and consumes 54.3W to support lower-quality VDSR at Full HD 30 fps. Lastly, we will also present application examples of high-performance style transfer and object recognition to demonstrate the flexibility of eCNN. |
Tasks | Denoising, Object Recognition, Style Transfer, Super-Resolution |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05680v1 |
https://arxiv.org/pdf/1910.05680v1.pdf | |
PWC | https://paperswithcode.com/paper/ecnn-a-block-based-and-highly-parallel-cnn |
Repo | |
Framework | |
Cross-Domain Image Classification through Neural-Style Transfer Data Augmentation
Title | Cross-Domain Image Classification through Neural-Style Transfer Data Augmentation |
Authors | Yijie Xu, Arushi Goel |
Abstract | In particular, the lack of sufficient amounts of domain-specific data can reduce the accuracy of a classifier. In this paper, we explore the effects of style transfer-based data transformation on the accuracy of a convolutional neural network classifiers in the context of automobile detection under adverse winter weather conditions. The detection of automobiles under highly adverse weather conditions is a difficult task as such conditions present large amounts of noise in each image. The InceptionV2 architecture is trained on a composite dataset, consisting of either normal car image dataset , a mixture of normal and style transferred car images, or a mixture of normal car images and those taken at blizzard conditions, at a ratio of 80:20. All three classifiers are then tested on a dataset of car images taken at blizzard conditions and on vehicle-free snow landscape images. We evaluate and contrast the effectiveness of each classifier upon each dataset, and discuss the strengths and weaknesses of style-transfer based approaches to data augmentation. |
Tasks | Data Augmentation, Image Classification, Style Transfer |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05611v1 |
https://arxiv.org/pdf/1910.05611v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-domain-image-classification-through |
Repo | |
Framework | |
Variational Inference for Sparse Gaussian Process Modulated Hawkes Process
Title | Variational Inference for Sparse Gaussian Process Modulated Hawkes Process |
Authors | Rui Zhang, Christian Walder, Marian-Andrei Rizoiu |
Abstract | The Hawkes process (HP) has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernel and to be able to quantify the prediction confidence, the non-parametric Bayesian HP has been proposed. However, the inference of such models suffers from unscalability or slow convergence. In this paper, we aim to solve both problems. Specifically, first, we propose a new non-parametric Bayesian HP in which the triggering kernel is modeled as a squared sparse Gaussian process. Then, we propose a novel variational inference schema for model optimization. We employ the branching structure of the HP so that maximization of evidence lower bound (ELBO) is tractable by the expectation-maximization algorithm. We propose a tighter ELBO which improves the fitting performance. Further, we accelerate the novel variational inference schema to linear time complexity by leveraging the stationarity of the triggering kernel. Different from prior acceleration methods, ours enjoys higher efficiency. Finally, we exploit synthetic data and two large social media datasets to evaluate our method. We show that our approach outperforms state-of-the-art non-parametric frequentist and Bayesian methods. We validate the efficiency of our accelerated variational inference schema and practical utility of our tighter ELBO for model selection. We observe that the tighter ELBO exceeds the common one in model selection. |
Tasks | Model Selection |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10496v2 |
https://arxiv.org/pdf/1905.10496v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-gaussian-process-modulated-hawkes |
Repo | |
Framework | |
Risk Bounds for Low Cost Bipartite Ranking
Title | Risk Bounds for Low Cost Bipartite Ranking |
Authors | San Gultekin, John Paisley |
Abstract | Bipartite ranking is an important supervised learning problem; however, unlike regression or classification, it has a quadratic dependence on the number of samples. To circumvent the prohibitive sample cost, many recent work focus on stochastic gradient-based methods. In this paper we consider an alternative approach, which leverages the structure of the widely-adopted pairwise squared loss, to obtain a stochastic and low cost algorithm that does not require stochastic gradients or learning rates. Using a novel uniform risk bound—based on matrix and vector concentration inequalities—we show that the sample size required for competitive performance against the all-pairs batch algorithm does not have a quadratic dependence. Generalization bounds for both the batch and low cost stochastic algorithms are presented. Experimental results show significant speed gain against the batch algorithm, as well as competitive performance against state-of-the-art bipartite ranking algorithms on real datasets. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00537v1 |
https://arxiv.org/pdf/1912.00537v1.pdf | |
PWC | https://paperswithcode.com/paper/risk-bounds-for-low-cost-bipartite-ranking |
Repo | |
Framework | |
P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification
Title | P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification |
Authors | Bingzhe Wu, Shiwan Zhao, Guangyu Sun, Xiaolu Zhang, Zhong Su, Caihong Zeng, Zhihong Liu |
Abstract | Recently, deep convolutional neural networks (CNNs) have achieved great success in pathological image classification. However, due to the limited number of labeled pathological images, there are still two challenges to be addressed: (1) overfitting: the performance of a CNN model is undermined by the overfitting due to its huge amounts of parameters and the insufficiency of labeled training data. (2) privacy leakage: the model trained using a conventional method may involuntarily reveal the private information of the patients in the training dataset. The smaller the dataset, the worse the privacy leakage. To tackle the above two challenges, we introduce a novel stochastic gradient descent (SGD) scheme, named patient privacy preserving SGD (P3SGD), which performs the model update of the SGD in the patient level via a large-step update built upon each patient’s data. Specifically, to protect privacy and regularize the CNN model, we propose to inject the well-designed noise into the updates. Moreover, we equip our P3SGD with an elaborated strategy to adaptively control the scale of the injected noise. To validate the effectiveness of P3SGD, we perform extensive experiments on a real-world clinical dataset and quantitatively demonstrate the superior ability of P3SGD in reducing the risk of overfitting. We also provide a rigorous analysis of the privacy cost under differential privacy. Additionally, we find that the models trained with P3SGD are resistant to the model-inversion attack compared with those trained using non-private SGD. |
Tasks | Image Classification |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12883v1 |
https://arxiv.org/pdf/1905.12883v1.pdf | |
PWC | https://paperswithcode.com/paper/p3sgd-patient-privacy-preserving-sgd-for-1 |
Repo | |
Framework | |
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond
Title | Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond |
Authors | Oliver Hinder, Aaron Sidford, Nimit Sharad Sohoni |
Abstract | In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are strictly unimodal on all lines through a minimizer. This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant $\gamma \in (0,1]$, where $\gamma = 1$ encompasses the classes of smooth convex and star-convex functions, and smaller values of $\gamma$ indicate that the function can be “more nonconvex.” We develop a variant of accelerated gradient descent that computes an $\epsilon$-approximate minimizer of a smooth $\gamma$-quasar-convex function with at most $O(\gamma^{-1} \epsilon^{-1/2} \log(\gamma^{-1} \epsilon^{-1}))$ total function and gradient evaluations. We also derive a lower bound of $\Omega(\gamma^{-1} \epsilon^{-1/2})$ on the number of gradient evaluations required by any deterministic first-order method in the worst case, showing that, up to a logarithmic factor, no deterministic first-order algorithm can improve upon ours. |
Tasks | |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11985v1 |
https://arxiv.org/pdf/1906.11985v1.pdf | |
PWC | https://paperswithcode.com/paper/near-optimal-methods-for-minimizing-star |
Repo | |
Framework | |