Paper Group ANR 1437
Extreme Tensoring for Low-Memory Preconditioning. Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation. A cost-reducing partial labeling estimator in text classification problem. A Visual Programming Paradigm for Abstract Deep Learning Model Development. Analyzing Deep Neural Networks with Symbolic Propagation: Tow …
Extreme Tensoring for Low-Memory Preconditioning
Title | Extreme Tensoring for Low-Memory Preconditioning |
Authors | Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang |
Abstract | State-of-the-art models are now trained with billions of parameters, reaching hardware limits in terms of memory consumption. This has created a recent demand for memory-efficient optimizers. To this end, we investigate the limits and performance tradeoffs of memory-efficient adaptively preconditioned gradient methods. We propose extreme tensoring for high-dimensional stochastic optimization, showing that an optimizer needs very little memory to benefit from adaptive preconditioning. Our technique applies to arbitrary models (not necessarily with tensor-shaped parameters), and is accompanied by regret and convergence guarantees, which shed light on the tradeoffs between preconditioner quality and expressivity. On a large-scale NLP model, we reduce the optimizer memory overhead by three orders of magnitude, without degrading performance. |
Tasks | Stochastic Optimization |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04620v1 |
http://arxiv.org/pdf/1902.04620v1.pdf | |
PWC | https://paperswithcode.com/paper/extreme-tensoring-for-low-memory |
Repo | |
Framework | |
Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation
Title | Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation |
Authors | Hamed Shahbazi, Xiaoli Z. Fern, Reza Ghaeini, Rasha Obeidat, Prasad Tadepalli |
Abstract | We present a new local entity disambiguation system. The key to our system is a novel approach for learning entity representations. In our approach we learn an entity aware extension of Embedding for Language Model (ELMo) which we call Entity-ELMo (E-ELMo). Given a paragraph containing one or more named entity mentions, each mention is first defined as a function of the entire paragraph (including other mentions), then they predict the referent entities. Utilizing E-ELMo for local entity disambiguation, we outperform all of the state-of-the-art local and global models on the popular benchmarks by improving about 0.5% on micro average accuracy for AIDA test-b with Yago candidate set. The evaluation setup of the training data and candidate set are the same as our baselines for fair comparison. |
Tasks | Entity Disambiguation, Language Modelling |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05762v2 |
https://arxiv.org/pdf/1908.05762v2.pdf | |
PWC | https://paperswithcode.com/paper/entity-aware-elmo-learning-contextual-entity |
Repo | |
Framework | |
A cost-reducing partial labeling estimator in text classification problem
Title | A cost-reducing partial labeling estimator in text classification problem |
Authors | Jiangning Chen, Zhibo Dai, Juntao Duan, Qianli Hu, Ruilin Li, Heinrich Matzinger, Ionel Popescu, Haoyan Zhai |
Abstract | We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous training examples if they are unlikely fall into certain classes. We construct our new maximum likelihood estimators with self-correction property, and prove that under some conditions, our estimators converge faster. Also we discuss the advantages of applying one of our estimator to a fully supervised learning problem. The proposed method has potential applicability in many areas, such as crowdsourcing, natural language processing and medical image analysis. |
Tasks | Text Classification |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03768v1 |
https://arxiv.org/pdf/1906.03768v1.pdf | |
PWC | https://paperswithcode.com/paper/a-cost-reducing-partial-labeling-estimator-in |
Repo | |
Framework | |
A Visual Programming Paradigm for Abstract Deep Learning Model Development
Title | A Visual Programming Paradigm for Abstract Deep Learning Model Development |
Authors | Srikanth Tamilselvam, Naveen Panwar, Shreya Khare, Rahul Aralikatte, Anush Sankaran, Senthil Mani |
Abstract | Deep learning is one of the fastest growing technologies in computer science with a plethora of applications. But this unprecedented growth has so far been limited to the consumption of deep learning experts. The primary challenge being a steep learning curve for learning the programming libraries and the lack of intuitive systems enabling non-experts to consume deep learning. Towards this goal, we study the effectiveness of a no-code paradigm for designing deep learning models. Particularly, a visual drag-and-drop interface is found more efficient when compared with the traditional programming and alternative visual programming paradigms. We conduct user studies of different expertise levels to measure the entry level barrier and the developer load across different programming paradigms. We obtain a System Usability Scale (SUS) of 90 and a NASA Task Load index (TLX) score of 21 for the proposed visual programming compared to 68 and 52, respectively, for the traditional programming methods. |
Tasks | |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02486v2 |
https://arxiv.org/pdf/1905.02486v2.pdf | |
PWC | https://paperswithcode.com/paper/a-visual-programming-paradigm-for-abstract |
Repo | |
Framework | |
Analyzing Deep Neural Networks with Symbolic Propagation: Towards Higher Precision and Faster Verification
Title | Analyzing Deep Neural Networks with Symbolic Propagation: Towards Higher Precision and Faster Verification |
Authors | Pengfei Yang, Jiangchao Liu, Jianlin Li, Liqian Chen, Xiaowei Huang |
Abstract | Deep neural networks (DNNs) have been shown lack of robustness for the vulnerability of their classification to small perturbations on the inputs. This has led to safety concerns of applying DNNs to safety-critical domains. Several verification approaches have been developed to automatically prove or disprove safety properties of DNNs. However, these approaches suffer from either the scalability problem, i.e., only small DNNs can be handled, or the precision problem, i.e., the obtained bounds are loose. This paper improves on a recent proposal of analyzing DNNs through the classic abstract interpretation technique, by a novel symbolic propagation technique. More specifically, the values of neurons are represented symbolically and propagated forwardly from the input layer to the output layer, on top of abstract domains. We show that our approach can achieve significantly higher precision and thus can prove more properties than using only abstract domains. Moreover, we show that the bounds derived from our approach on the hidden neurons, when applied to a state-of-the-art SMT based verification tool, can improve its performance. We implement our approach into a software tool and validate it over a few DNNs trained on benchmark datasets such as MNIST, etc. |
Tasks | |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09866v1 |
http://arxiv.org/pdf/1902.09866v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-deep-neural-networks-with-symbolic |
Repo | |
Framework | |
Learning Latent Dynamics for Partially-Observed Chaotic Systems
Title | Learning Latent Dynamics for Partially-Observed Chaotic Systems |
Authors | Said Ouala, Duong Nguyen, Lucas Drumetz, Bertrand Chapron, Ananda Pascual, Fabrice Collard, Lucile Gaultier, Ronan Fablet |
Abstract | This paper addresses the data-driven identification of latent dynamical representations of partially-observed systems, i.e., dynamical systems for which some components are never observed, with an emphasis on forecasting applications, including long-term asymptotic patterns. Whereas state-of-the-art data-driven approaches rely on delay embeddings and linear decompositions of the underlying operators, we introduce a framework based on the data-driven identification of an augmented state-space model using a neural-network-based representation. For a given training dataset, it amounts to jointly learn an ODE (Ordinary Differential Equation) representation in the latent space and reconstructing latent states. Through numerical experiments, we demonstrate the relevance of the proposed framework w.r.t. state-of-the-art approaches in terms of short-term forecasting performance and long-term behaviour. We further discuss how the proposed framework relates to Koopman operator theory and Takens’ embedding theorem. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02452v1 |
https://arxiv.org/pdf/1907.02452v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-latent-dynamics-for-partially |
Repo | |
Framework | |
PatchNet: A Tool for Deep Patch Classification
Title | PatchNet: A Tool for Deep Patch Classification |
Authors | Thong Hoang, Julia Lawall, Richard J. Oentaryo, Yuan Tian, David Lo |
Abstract | This work proposes PatchNet, an automated tool based on hierarchical deep learning for classifying patches by extracting features from commit messages and code changes. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of a code change, differentiating it from the existing deep learning models on source code. PatchNet provides several options allowing users to select parameters for the training process. The tool has been validated in the context of automatic identification of stable-relevant patches in the Linux kernel and is potentially applicable to automate other software engineering tasks that can be formulated as patch classification problems. A video demonstrating PatchNet is available at https://goo.gl/CZjG6X. The PatchNet implementation is available at https://github.com/hvdthong/PatchNetTool. |
Tasks | |
Published | 2019-02-16 |
URL | http://arxiv.org/abs/1903.02063v2 |
http://arxiv.org/pdf/1903.02063v2.pdf | |
PWC | https://paperswithcode.com/paper/patchnet-a-tool-for-deep-patch-classification |
Repo | |
Framework | |
Learning Fully Dense Neural Networks for Image Semantic Segmentation
Title | Learning Fully Dense Neural Networks for Image Semantic Segmentation |
Authors | Mingmin Zhen, Jinglu Wang, Lei Zhou, Tian Fang, Long Quan |
Abstract | Semantic segmentation is pixel-wise classification which retains critical spatial information. The “feature map reuse” has been commonly adopted in CNN based approaches to take advantage of feature maps in the early layers for the later spatial reconstruction. Along this direction, we go a step further by proposing a fully dense neural network with an encoder-decoder structure that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feed-forward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. In addition, we propose the boundary-aware loss function to focus more attention on the pixels near the boundary, which boosts the “hard examples” labeling. We have demonstrated the best performance of the FDNet on the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when not considering training on other datasets. |
Tasks | Semantic Segmentation |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.08929v1 |
https://arxiv.org/pdf/1905.08929v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-fully-dense-neural-networks-for |
Repo | |
Framework | |
Say Anything: Automatic Semantic Infelicity Detection in L2 English Indefinite Pronouns
Title | Say Anything: Automatic Semantic Infelicity Detection in L2 English Indefinite Pronouns |
Authors | Ella Rabinovich, Julia Watson, Barend Beekhuizen, Suzanne Stevenson |
Abstract | Computational research on error detection in second language speakers has mainly addressed clear grammatical anomalies typical to learners at the beginner-to-intermediate level. We focus instead on acquisition of subtle semantic nuances of English indefinite pronouns by non-native speakers at varying levels of proficiency. We first lay out theoretical, linguistically motivated hypotheses, and supporting empirical evidence on the nature of the challenges posed by indefinite pronouns to English learners. We then suggest and evaluate an automatic approach for detection of atypical usage patterns, demonstrating that deep learning architectures are promising for this task involving nuanced semantic anomalies. |
Tasks | |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07928v1 |
https://arxiv.org/pdf/1909.07928v1.pdf | |
PWC | https://paperswithcode.com/paper/say-anything-automatic-semantic-infelicity |
Repo | |
Framework | |
Emotion recognition with 4kresolution database
Title | Emotion recognition with 4kresolution database |
Authors | Qian Zheng |
Abstract | Classifying the human emotion through facial expressions is a big topic in both the Computer Vision and Deep learning fields. Human emotion can be classified as one of the basic emotion types like being angry, happy or dimensional emotion with valence and arousal values. There are a lot of related challenges in this topic, one of the most famous challenges is called the ‘Affect-in-the-wild Challenge’(Aff-Wild Challenge). It is the first challenge on the estimation of valence and arousal in-the-wild. This project is an extension of the Aff-wild Challenge. Aff-wild database was created using images with a mean resolution of 607*359, I and Dimitrios sought to find out the performance of the model that is trained on a database that contains4K resolution in-the-wild images. Since there is no existing database to satisfy the requirement, I built this database from scratch with help from Dimitrios and trained neural network models with different hyperparameters on this database. I used network models likeVGG16, AlexNet, ResNet and also some pre-trained models like Ima-geNet VGG. I compared the results of the different network models alongside the results from the Aff-wild database to exploit the optimal model for my database. |
Tasks | Emotion Recognition |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11276v1 |
https://arxiv.org/pdf/1910.11276v1.pdf | |
PWC | https://paperswithcode.com/paper/emotion-recognition-with-4kresolution |
Repo | |
Framework | |
Forward and Backward Knowledge Transfer for Sentiment Classification
Title | Forward and Backward Knowledge Transfer for Sentiment Classification |
Authors | Hao Wang, Bing Liu, Shuai Wang, Nianzu Ma, Yan Yang |
Abstract | This paper studies the problem of learning a sequence of sentiment classification tasks. The learned knowledge from each task is retained and used to help future or subsequent task learning. This learning paradigm is called Lifelong Learning (LL). However, existing LL methods either only transfer knowledge forward to help future learning and do not go back to improve the model of a previous task or require the training data of the previous task to retrain its model to exploit backward/reverse knowledge transfer. This paper studies reverse knowledge transfer of LL in the context of naive Bayesian (NB) classification. It aims to improve the model of a previous task by leveraging future knowledge without retraining using its training data. This is done by exploiting a key characteristic of the generative model of NB. That is, it is possible to improve the NB classifier for a task by improving its model parameters directly by using the retained knowledge from other tasks. Experimental results show that the proposed method markedly outperforms existing LL baselines. |
Tasks | Sentiment Analysis, Transfer Learning |
Published | 2019-06-08 |
URL | https://arxiv.org/abs/1906.03506v1 |
https://arxiv.org/pdf/1906.03506v1.pdf | |
PWC | https://paperswithcode.com/paper/forward-and-backward-knowledge-transfer-for |
Repo | |
Framework | |
Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution
Title | Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution |
Authors | Joel Lehman |
Abstract | Recent developments in artificial intelligence and machine learning have spurred interest in the growing field of AI safety, which studies how to prevent human-harming accidents when deploying AI systems. This paper thus explores the intersection of AI safety with evolutionary computation, to show how safety issues arise in evolutionary computation and how understanding from evolutionary computational and biological evolution can inform the broader study of AI safety. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10189v2 |
https://arxiv.org/pdf/1906.10189v2.pdf | |
PWC | https://paperswithcode.com/paper/evolutionary-computation-and-ai-safety |
Repo | |
Framework | |
Stochastic Proximal AUC Maximization
Title | Stochastic Proximal AUC Maximization |
Authors | Yunwen Lei, Yiming Ying |
Abstract | In this paper we consider the problem of maximizing the Area under the ROC curve (AUC) which is a widely used performance metric in imbalanced classification and anomaly detection. Due to the pairwise nonlinearity of the objective function, classical SGD algorithms do not apply to the task of AUC maximization. We propose a novel stochastic proximal algorithm for AUC maximization which is scalable to large scale streaming data. Our algorithm can accommodate general penalty terms and is easy to implement with favorable $O(d)$ space and per-iteration time complexities. We establish a high-probability convergence rate $O(1/\sqrt{T})$ for the general convex setting, and improve it to a fast convergence rate $O(1/T)$ for the cases of strongly convex regularizers and no regularization term (without strong convexity). Our proof does not need the uniform boundedness assumption on the loss function or the iterates which is more fidelity to the practice. Finally, we perform extensive experiments over various benchmark data sets from real-world application domains which show the superior performance of our algorithm over the existing AUC maximization algorithms. |
Tasks | Anomaly Detection |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06053v1 |
https://arxiv.org/pdf/1906.06053v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-proximal-auc-maximization |
Repo | |
Framework | |
Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems
Title | Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems |
Authors | Adel Abusitta, Esma Aïmeur, Omar Abdel Wahab |
Abstract | In this paper, we propose a new framework for mitigating biases in machine learning systems. The problem of the existing mitigation approaches is that they are model-oriented in the sense that they focus on tuning the training algorithms to produce fair results, while overlooking the fact that the training data can itself be the main reason for biased outcomes. Technically speaking, two essential limitations can be found in such model-based approaches: 1) the mitigation cannot be achieved without degrading the accuracy of the machine learning models, and 2) when the data used for training are largely biased, the training time automatically increases so as to find suitable learning parameters that help produce fair results. To address these shortcomings, we propose in this work a new framework that can largely mitigate the biases and discriminations in machine learning systems while at the same time enhancing the prediction accuracy of these systems. The proposed framework is based on conditional Generative Adversarial Networks (cGANs), which are used to generate new synthetic fair data with selective properties from the original data. We also propose a framework for analyzing data biases, which is important for understanding the amount and type of data that need to be synthetically sampled and labeled for each population group. Experimental results show that the proposed solution can efficiently mitigate different types of biases, while at the same time enhancing the prediction accuracy of the underlying machine learning model. |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09972v1 |
https://arxiv.org/pdf/1905.09972v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-for-5 |
Repo | |
Framework | |
Detection of vertebral fractures in CT using 3D Convolutional Neural Networks
Title | Detection of vertebral fractures in CT using 3D Convolutional Neural Networks |
Authors | Joeri Nicolaes, Steven Raeymaeckers, David Robben, Guido Wilms, Dirk Vandermeulen, Cesar Libanati, Marc Debois |
Abstract | Osteoporosis induced fractures occur worldwide about every 3 seconds. Vertebral compression fractures are early signs of the disease and considered risk predictors for secondary osteoporotic fractures. We present a detection method to opportunistically screen spine-containing CT images for the presence of these vertebral fractures. Inspired by radiology practice, existing methods are based on 2D and 2.5D features but we present, to the best of our knowledge, the first method for detecting vertebral fractures in CT using automatically learned 3D feature maps. The presented method explicitly localizes these fractures allowing radiologists to interpret its results. We train a voxel-classification 3D Convolutional Neural Network (CNN) with a training database of 90 cases that has been semi-automatically generated using radiologist readings that are readily available in clinical practice. Our 3D method produces an Area Under the Curve (AUC) of 95% for patient-level fracture detection and an AUC of 93% for vertebra-level fracture detection in a five-fold cross-validation experiment. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01816v1 |
https://arxiv.org/pdf/1911.01816v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-vertebral-fractures-in-ct-using |
Repo | |
Framework | |