Paper Group ANR 1725
Knowledge-augmented Column Networks: Guiding Deep Learning with Advice. Regularizing Deep Multi-Task Networks using Orthogonal Gradients. Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction. Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks. One-vs-All Models for Asynchronous Tr …
Knowledge-augmented Column Networks: Guiding Deep Learning with Advice
Title | Knowledge-augmented Column Networks: Guiding Deep Learning with Advice |
Authors | Mayukh Das, Devendra Singh Dhami, Yang Yu, Gautam Kunapuli, Sriraam Natarajan |
Abstract | Recently, deep models have had considerable success in several tasks, especially with low-level representations. However, effective learning from sparse noisy samples is a major challenge in most deep models, especially in domains with structured representations. Inspired by the proven success of human guided machine learning, we propose Knowledge-augmented Column Networks, a relational deep learning framework that leverages human advice/knowledge to learn better models in presence of sparsity and systematic noise. |
Tasks | |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.01432v1 |
https://arxiv.org/pdf/1906.01432v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-augmented-column-networks-guiding |
Repo | |
Framework | |
Regularizing Deep Multi-Task Networks using Orthogonal Gradients
Title | Regularizing Deep Multi-Task Networks using Orthogonal Gradients |
Authors | Mihai Suteu, Yike Guo |
Abstract | Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model’s limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared parameters using this property encourages task specific decoders to optimize different parts of the feature extractor, thus reducing competition. We evaluate our method with classification and regression tasks on the multiDigitMNIST, NYUv2 and SUN RGB-D datasets where we obtain competitive results. |
Tasks | Multi-Task Learning |
Published | 2019-12-14 |
URL | https://arxiv.org/abs/1912.06844v1 |
https://arxiv.org/pdf/1912.06844v1.pdf | |
PWC | https://paperswithcode.com/paper/regularizing-deep-multi-task-networks-using-1 |
Repo | |
Framework | |
Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction
Title | Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction |
Authors | Huiwei Zhou, Yunlong Yang, Shixian Ning, Zhuang Liu, Chengkun Lang, Yingyu Lin, Degen Huang |
Abstract | Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems. |
Tasks | Relation Extraction |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10604v1 |
https://arxiv.org/pdf/1912.10604v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-context-and-knowledge |
Repo | |
Framework | |
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Title | Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks |
Authors | Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig |
Abstract | Deep acoustic models typically receive features in the first layer of the network, and process increasingly abstract representations in the subsequent layers. Here, we propose to feed the input features at multiple depths in the acoustic model. As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function. We study this architecture in the context of deep Transformer networks, and we use an attention mechanism over both the previous layer activations and the input features. To train this model’s intermediate output hypothesis, we apply the objective function at each layer right before feature re-use. We find that the use of such iterated loss significantly improves performance by itself, as well as enabling input feature re-use. We present results on both Librispeech, and a large scale video dataset, with relative improvements of 10 - 20% for Librispeech and 3.2 - 13% for videos. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10324v2 |
https://arxiv.org/pdf/1910.10324v2.pdf | |
PWC | https://paperswithcode.com/paper/deja-vu-double-feature-presentation-in-deep |
Repo | |
Framework | |
One-vs-All Models for Asynchronous Training: An Empirical Analysis
Title | One-vs-All Models for Asynchronous Training: An Empirical Analysis |
Authors | Rahul Gupta, Aman Alok, Shankar Ananthakrishnan |
Abstract | Any given classification problem can be modeled using multi-class or One-vs-All (OVA) architecture. An OVA system consists of as many OVA models as the number of classes, providing the advantage of asynchrony, where each OVA model can be re-trained independent of other models. This is particularly advantageous in settings where scalable model training is a consideration (for instance in an industrial environment where multiple and frequent updates need to be made to the classification system). In this paper, we conduct empirical analysis on realizing independent updates to OVA models and its impact on the accuracy of the overall OVA system. Given that asynchronous updates lead to differences in training datasets for OVA models, we first define a metric to quantify the differences in datasets. Thereafter, using Natural Language Understanding as a task of interest, we estimate the impact of three factors: (i) number of classes, (ii) number of data points and, (iii) divergences in training datasets across OVA models; on the OVA system accuracy. Finally, we observe the accuracy impact of increased asynchrony in a Spoken Language Understanding system. We analyze the results and establish that the proposed metric correlates strongly with the model performances in both the experimental settings. |
Tasks | Spoken Language Understanding |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08858v1 |
https://arxiv.org/pdf/1906.08858v1.pdf | |
PWC | https://paperswithcode.com/paper/one-vs-all-models-for-asynchronous-training |
Repo | |
Framework | |
Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers
Title | Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers |
Authors | Shadman Sakib, Zahidun Ashrafi, Md. Abu Bakr Siddique |
Abstract | Fruit recognition using Deep Convolutional Neural Network (CNN) is one of the most promising applications in computer vision. In recent times, deep learning based classifications are making it possible to recognize fruits from images. However, fruit recognition is still a problem for the stacked fruits on weighing scale because of the complexity and similarity. In this paper, a fruit recognition system using CNN is proposed. The proposed method uses deep learning techniques for the classification. We have used Fruits-360 dataset for the evaluation purpose. From the dataset, we have established a dataset which contains 17,823 images from 25 different categories. The images are divided into training and test dataset. Moreover, for the classification accuracies, we have used various combinations of hidden layer and epochs for different cases and made a comparison between them. The overall performance losses of the network for different cases also observed. Finally, we have achieved the best test accuracy of 100% and a training accuracy of 99.79%. |
Tasks | |
Published | 2019-04-01 |
URL | https://arxiv.org/abs/1904.00783v6 |
https://arxiv.org/pdf/1904.00783v6.pdf | |
PWC | https://paperswithcode.com/paper/implementation-of-fruits-recognition |
Repo | |
Framework | |
How degenerate is the parametrization of neural networks with the ReLU activation function?
Title | How degenerate is the parametrization of neural networks with the ReLU activation function? |
Authors | Julius Berner, Dennis Elbrächter, Philipp Grohs |
Abstract | Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal. Note, however, that each realization has many different, possibly degenerate, parametrizations. In particular, a local minimum in the parametrization space needs not correspond to a local minimum in the realization space. To establish such a connection, inverse stability of the realization map is required, meaning that proximity of realizations must imply proximity of corresponding parametrizations. We present pathologies which prevent inverse stability in general, and, for shallow networks, proceed to establish a restricted space of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets. |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09803v2 |
https://arxiv.org/pdf/1905.09803v2.pdf | |
PWC | https://paperswithcode.com/paper/how-degenerate-is-the-parametrization-of |
Repo | |
Framework | |
OctopusNet: A Deep Learning Segmentation Network for Multi-modal Medical Images
Title | OctopusNet: A Deep Learning Segmentation Network for Multi-modal Medical Images |
Authors | Yu Chen, Jiawei Chen, Dong Wei, Yuexiang Li, Yefeng Zheng |
Abstract | Deep learning models, such as the fully convolutional network (FCN), have been widely used in 3D biomedical segmentation and achieved state-of-the-art performance. Multiple modalities are often used for disease diagnosis and quantification. Two approaches are widely used in the literature to fuse multiple modalities in the segmentation networks: early-fusion (which stacks multiple modalities as different input channels) and late-fusion (which fuses the segmentation results from different modalities at the very end). These fusion methods easily suffer from the cross-modal interference caused by the input modalities which have wide variations. To address the problem, we propose a novel deep learning architecture, namely OctopusNet, to better leverage and fuse the information contained in multi-modalities. The proposed framework employs a separate encoder for each modality for feature extraction and exploits a hyper-fusion decoder to fuse the extracted features while avoiding feature explosion. We evaluate the proposed OctopusNet on two publicly available datasets, i.e. ISLES-2018 and MRBrainS-2013. The experimental results show that our framework outperforms the commonly-used feature fusion approaches and yields the state-of-the-art segmentation accuracy. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02031v2 |
https://arxiv.org/pdf/1906.02031v2.pdf | |
PWC | https://paperswithcode.com/paper/octopusnet-a-deep-learning-segmentation |
Repo | |
Framework | |
Multiview Aggregation for Learning Category-Specific Shape Reconstruction
Title | Multiview Aggregation for Learning Category-Specific Shape Reconstruction |
Authors | Srinath Sridhar, Davis Rempe, Julien Valentin, Sofien Bouaziz, Leonidas J. Guibas |
Abstract | We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations, or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views. Given a single input view of an object instance, we propose a representation that encodes the dense shape of the visible object surface as well as the surface behind line of sight occluded by the visible surface. When multiple input views are available, the shape representation is designed to be aggregated into a single 3D shape using an inexpensive union operation. We train a 2D CNN to learn to predict this representation from a variable number of views (1 or more). We further aggregate multiview information by using permutation equivariant layers that promote order-agnostic view information exchange at the feature level. Experiments show that our approach is able to produce dense 3D reconstructions of objects that improve in quality as more views are added. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.01085v2 |
https://arxiv.org/pdf/1907.01085v2.pdf | |
PWC | https://paperswithcode.com/paper/multiview-aggregation-for-learning-category |
Repo | |
Framework | |
ALERT: Accurate Anytime Learning for Energy and Timeliness
Title | ALERT: Accurate Anytime Learning for Energy and Timeliness |
Authors | Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, Shan Lu |
Abstract | An increasing number of software applications incorporate runtime Deep Neural Network (DNN) inference for its great accuracy in many problem domains. While much prior work has separately tackled the problems of improving DNN-inference accuracy and improving DNN-inference efficiency, an important problem is under-explored: disciplined methods for dynamically managing application-specific latency, accuracy, and energy tradeoffs and constraints at run time. To address this need, we propose ALERT, a co-designed combination of runtime system and DNN nesting technique. The runtime takes latency, accuracy, and energy constraints, and uses dynamic feedback to predict the best DNN-model and system power-limit setting. The DNN nesting creates a type of flexible network that efficiently delivers a series of results with increasing accuracy as time goes on. These two parts well complement each other: the runtime is aware of the tradeoffs of different DNN settings, and the nested DNNs’ flexibility allows the runtime prediction to satisfy application requirements even in unpredictable, changing environments. On real systems for both image and speech, ALERT achieves close-to-optimal results. Comparing with the optimal static DNN-model and power-limit setting, which is impractical to predict, ALERT achieves a harmonic mean 33% of energy savings while satisfying accuracy constraints, and reduces image-classification error rate by 58% and sentence-prediction perplexity by 52% while satisfying energy constraints. |
Tasks | Image Classification |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.00119v1 |
https://arxiv.org/pdf/1911.00119v1.pdf | |
PWC | https://paperswithcode.com/paper/alert-accurate-anytime-learning-for-energy |
Repo | |
Framework | |
Quadratic number of nodes is sufficient to learn a dataset via gradient descent
Title | Quadratic number of nodes is sufficient to learn a dataset via gradient descent |
Authors | Biswarup Das, Eugene. A. Golikov |
Abstract | We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function finds the optimal weights of input layer for global minima in linear time. This threshold value is an improvement over previously obtained values. We hypothesise that this bound cannot be improved by the method we are using in this work. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05402v1 |
https://arxiv.org/pdf/1911.05402v1.pdf | |
PWC | https://paperswithcode.com/paper/quadratic-number-of-nodes-is-sufficient-to |
Repo | |
Framework | |
Unmasking Bias in News
Title | Unmasking Bias in News |
Authors | Javier Sánchez-Junquera, Paolo Rosso, Manuel Montes-y-Gómez, Simone Paolo Ponzetto |
Abstract | We present experiments on detecting hyperpartisanship in news using a ‘masking’ method that allows us to assess the role of style vs. content for the task at hand. Our results corroborate previous research on this task in that topic related features yield better results than stylistic ones. We additionally show that competitive results can be achieved by simply including higher-length n-grams, which suggests the need to develop more challenging datasets and tasks that address implicit and more subtle forms of bias. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04836v1 |
https://arxiv.org/pdf/1906.04836v1.pdf | |
PWC | https://paperswithcode.com/paper/unmasking-bias-in-news |
Repo | |
Framework | |
Potential Applications of Machine Learning at Multidisciplinary Medical Team Meetings
Title | Potential Applications of Machine Learning at Multidisciplinary Medical Team Meetings |
Authors | Bridget Kane, Jing Su, Saturnino Luz |
Abstract | While machine learning (ML) systems have produced great advances in several domains, their use in support of complex cooperative work remains a research challenge. A particularly challenging setting, and one that may benefit from ML support is the work of multidisciplinary medical teams (MDTs). This paper focuses on the activities performed during the multidisciplinary medical team meeting (MDTM), reviewing their main characteristics in light of a longitudinal analysis of several MDTs in a large teaching hospital over a period of ten years and of our development of ML methods to support MDTMs, and identifying opportunities and possible pitfalls for the use of ML to support MDTMs. |
Tasks | |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00914v1 |
https://arxiv.org/pdf/1911.00914v1.pdf | |
PWC | https://paperswithcode.com/paper/potential-applications-of-machine-learning-at |
Repo | |
Framework | |
Deep Learning Regression of VLSI Plasma Etch Metrology
Title | Deep Learning Regression of VLSI Plasma Etch Metrology |
Authors | Jack Kenney, John Valcore, Scott Riggs, Edward Rietman |
Abstract | In computer chip manufacturing, the study of etch patterns on silicon wafers, or metrology, occurs on the nano-scale and is therefore subject to large variation from small, yet significant, perturbations in the manufacturing environment. An enormous amount of information can be gathered from a single etch process, a sequence of actions taken to produce an etched wafer from a blank piece of silicon. Each final wafer, however, is costly to take measurements from, which limits the number of examples available to train a predictive model. Part of the significance of this work is the success we saw from the models despite the limited number of examples. In order to accommodate the high dimensional process signatures, we isolated important sensor variables and applied domain-specific summarization on the data using multiple feature engineering techniques. We used a neural network architecture consisting of the summarized inputs, a single hidden layer of 4032 units, and an output layer of one unit. Two different models were learned, corresponding to the metrology measurements in the dataset, Recess and Remaining Mask. The outputs are related abstractly and do not form a two dimensional space, thus two separate models were learned. Our results approach the error tolerance of the microscopic imaging system. The model can make predictions for a class of etch recipes that include the correct number of etch steps and plasma reactors with the appropriate sensors, which are chambers containing an ionized gas that determine the manufacture environment. Notably, this method is not restricted to some maximum process length due to the summarization techniques used. This allows the method to be adapted to new processes that satisfy the aforementioned requirements. In order to automate semiconductor manufacturing, models like these will be needed throughout the process to evaluate production quality. |
Tasks | Feature Engineering |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1910.10067v1 |
https://arxiv.org/pdf/1910.10067v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-regression-of-vlsi-plasma-etch |
Repo | |
Framework | |
A Robust Data-Driven Approach for Dialogue State Tracking of Unseen Slot Values
Title | A Robust Data-Driven Approach for Dialogue State Tracking of Unseen Slot Values |
Authors | Vevake Balaraman, Bernardo Magnini |
Abstract | A Dialogue State Tracker is a key component in dialogue systems which estimates the beliefs of possible user goals at each dialogue turn. Deep learning approaches using recurrent neural networks have shown state-of-the-art performance for the task of dialogue state tracking. Generally, these approaches assume a predefined candidate list and struggle to predict any new dialogue state values that are not seen during training. This makes extending the candidate list for a slot without model retaining infeasible and also has limitations in modelling for low resource domains where training data for slot values are expensive. In this paper, we propose a novel dialogue state tracker based on copying mechanism that can effectively track such unseen slot values without compromising performance on slot values seen during training. The proposed model is also flexible in extending the candidate list without requiring any retraining or change in the model. We evaluate the proposed model on various benchmark datasets (DSTC2, DSTC3 and WoZ2.0) and show that our approach, outperform other end-to-end data-driven approaches in tracking unseen slot values and also provides significant advantages in modelling for DST. |
Tasks | Dialogue State Tracking |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00269v1 |
https://arxiv.org/pdf/1911.00269v1.pdf | |
PWC | https://paperswithcode.com/paper/a-robust-data-driven-approach-for-dialogue |
Repo | |
Framework | |