January 25, 2020

3143 words 15 mins read

Paper Group ANR 1725

Knowledge-augmented Column Networks: Guiding Deep Learning with Advice. Regularizing Deep Multi-Task Networks using Orthogonal Gradients. Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction. Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks. One-vs-All Models for Asynchronous Tr …

Knowledge-augmented Column Networks: Guiding Deep Learning with Advice


Title	Knowledge-augmented Column Networks: Guiding Deep Learning with Advice
Authors	Mayukh Das, Devendra Singh Dhami, Yang Yu, Gautam Kunapuli, Sriraam Natarajan
Abstract	Recently, deep models have had considerable success in several tasks, especially with low-level representations. However, effective learning from sparse noisy samples is a major challenge in most deep models, especially in domains with structured representations. Inspired by the proven success of human guided machine learning, we propose Knowledge-augmented Column Networks, a relational deep learning framework that leverages human advice/knowledge to learn better models in presence of sparsity and systematic noise.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1906.01432v1
PDF	https://arxiv.org/pdf/1906.01432v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-augmented-column-networks-guiding
Repo
Framework

Regularizing Deep Multi-Task Networks using Orthogonal Gradients


Title	Regularizing Deep Multi-Task Networks using Orthogonal Gradients
Authors	Mihai Suteu, Yike Guo
Abstract	Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model’s limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared parameters using this property encourages task specific decoders to optimize different parts of the feature extractor, thus reducing competition. We evaluate our method with classification and regression tasks on the multiDigitMNIST, NYUv2 and SUN RGB-D datasets where we obtain competitive results.
Tasks	Multi-Task Learning
Published	2019-12-14
URL	https://arxiv.org/abs/1912.06844v1
PDF	https://arxiv.org/pdf/1912.06844v1.pdf
PWC	https://paperswithcode.com/paper/regularizing-deep-multi-task-networks-using-1
Repo
Framework

Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction


Title	Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction
Authors	Huiwei Zhou, Yunlong Yang, Shixian Ning, Zhuang Liu, Chengkun Lang, Yingyu Lin, Degen Huang
Abstract	Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems.
Tasks	Relation Extraction
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10604v1
PDF	https://arxiv.org/pdf/1912.10604v1.pdf
PWC	https://paperswithcode.com/paper/combining-context-and-knowledge
Repo
Framework

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks


Title	Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Authors	Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig
Abstract	Deep acoustic models typically receive features in the first layer of the network, and process increasingly abstract representations in the subsequent layers. Here, we propose to feed the input features at multiple depths in the acoustic model. As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function. We study this architecture in the context of deep Transformer networks, and we use an attention mechanism over both the previous layer activations and the input features. To train this model’s intermediate output hypothesis, we apply the objective function at each layer right before feature re-use. We find that the use of such iterated loss significantly improves performance by itself, as well as enabling input feature re-use. We present results on both Librispeech, and a large scale video dataset, with relative improvements of 10 - 20% for Librispeech and 3.2 - 13% for videos.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10324v2
PDF	https://arxiv.org/pdf/1910.10324v2.pdf
PWC	https://paperswithcode.com/paper/deja-vu-double-feature-presentation-in-deep
Repo
Framework

One-vs-All Models for Asynchronous Training: An Empirical Analysis


Title	One-vs-All Models for Asynchronous Training: An Empirical Analysis
Authors	Rahul Gupta, Aman Alok, Shankar Ananthakrishnan
Abstract	Any given classification problem can be modeled using multi-class or One-vs-All (OVA) architecture. An OVA system consists of as many OVA models as the number of classes, providing the advantage of asynchrony, where each OVA model can be re-trained independent of other models. This is particularly advantageous in settings where scalable model training is a consideration (for instance in an industrial environment where multiple and frequent updates need to be made to the classification system). In this paper, we conduct empirical analysis on realizing independent updates to OVA models and its impact on the accuracy of the overall OVA system. Given that asynchronous updates lead to differences in training datasets for OVA models, we first define a metric to quantify the differences in datasets. Thereafter, using Natural Language Understanding as a task of interest, we estimate the impact of three factors: (i) number of classes, (ii) number of data points and, (iii) divergences in training datasets across OVA models; on the OVA system accuracy. Finally, we observe the accuracy impact of increased asynchrony in a Spoken Language Understanding system. We analyze the results and establish that the proposed metric correlates strongly with the model performances in both the experimental settings.
Tasks	Spoken Language Understanding
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08858v1
PDF	https://arxiv.org/pdf/1906.08858v1.pdf
PWC	https://paperswithcode.com/paper/one-vs-all-models-for-asynchronous-training
Repo
Framework

Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers


Title	Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers
Authors	Shadman Sakib, Zahidun Ashrafi, Md. Abu Bakr Siddique
Abstract	Fruit recognition using Deep Convolutional Neural Network (CNN) is one of the most promising applications in computer vision. In recent times, deep learning based classifications are making it possible to recognize fruits from images. However, fruit recognition is still a problem for the stacked fruits on weighing scale because of the complexity and similarity. In this paper, a fruit recognition system using CNN is proposed. The proposed method uses deep learning techniques for the classification. We have used Fruits-360 dataset for the evaluation purpose. From the dataset, we have established a dataset which contains 17,823 images from 25 different categories. The images are divided into training and test dataset. Moreover, for the classification accuracies, we have used various combinations of hidden layer and epochs for different cases and made a comparison between them. The overall performance losses of the network for different cases also observed. Finally, we have achieved the best test accuracy of 100% and a training accuracy of 99.79%.
Tasks
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00783v6
PDF	https://arxiv.org/pdf/1904.00783v6.pdf
PWC	https://paperswithcode.com/paper/implementation-of-fruits-recognition
Repo
Framework

How degenerate is the parametrization of neural networks with the ReLU activation function?


Title	How degenerate is the parametrization of neural networks with the ReLU activation function?
Authors	Julius Berner, Dennis Elbrächter, Philipp Grohs
Abstract	Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal. Note, however, that each realization has many different, possibly degenerate, parametrizations. In particular, a local minimum in the parametrization space needs not correspond to a local minimum in the realization space. To establish such a connection, inverse stability of the realization map is required, meaning that proximity of realizations must imply proximity of corresponding parametrizations. We present pathologies which prevent inverse stability in general, and, for shallow networks, proceed to establish a restricted space of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09803v2
PDF	https://arxiv.org/pdf/1905.09803v2.pdf
PWC	https://paperswithcode.com/paper/how-degenerate-is-the-parametrization-of
Repo
Framework


Title	OctopusNet: A Deep Learning Segmentation Network for Multi-modal Medical Images
Authors	Yu Chen, Jiawei Chen, Dong Wei, Yuexiang Li, Yefeng Zheng
Abstract	Deep learning models, such as the fully convolutional network (FCN), have been widely used in 3D biomedical segmentation and achieved state-of-the-art performance. Multiple modalities are often used for disease diagnosis and quantification. Two approaches are widely used in the literature to fuse multiple modalities in the segmentation networks: early-fusion (which stacks multiple modalities as different input channels) and late-fusion (which fuses the segmentation results from different modalities at the very end). These fusion methods easily suffer from the cross-modal interference caused by the input modalities which have wide variations. To address the problem, we propose a novel deep learning architecture, namely OctopusNet, to better leverage and fuse the information contained in multi-modalities. The proposed framework employs a separate encoder for each modality for feature extraction and exploits a hyper-fusion decoder to fuse the extracted features while avoiding feature explosion. We evaluate the proposed OctopusNet on two publicly available datasets, i.e. ISLES-2018 and MRBrainS-2013. The experimental results show that our framework outperforms the commonly-used feature fusion approaches and yields the state-of-the-art segmentation accuracy.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02031v2
PDF	https://arxiv.org/pdf/1906.02031v2.pdf
PWC	https://paperswithcode.com/paper/octopusnet-a-deep-learning-segmentation
Repo
Framework

Multiview Aggregation for Learning Category-Specific Shape Reconstruction


Title	Multiview Aggregation for Learning Category-Specific Shape Reconstruction
Authors	Srinath Sridhar, Davis Rempe, Julien Valentin, Sofien Bouaziz, Leonidas J. Guibas
Abstract	We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations, or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views. Given a single input view of an object instance, we propose a representation that encodes the dense shape of the visible object surface as well as the surface behind line of sight occluded by the visible surface. When multiple input views are available, the shape representation is designed to be aggregated into a single 3D shape using an inexpensive union operation. We train a 2D CNN to learn to predict this representation from a variable number of views (1 or more). We further aggregate multiview information by using permutation equivariant layers that promote order-agnostic view information exchange at the feature level. Experiments show that our approach is able to produce dense 3D reconstructions of objects that improve in quality as more views are added.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01085v2
PDF	https://arxiv.org/pdf/1907.01085v2.pdf
PWC	https://paperswithcode.com/paper/multiview-aggregation-for-learning-category
Repo
Framework

ALERT: Accurate Anytime Learning for Energy and Timeliness


Title	ALERT: Accurate Anytime Learning for Energy and Timeliness
Authors	Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, Shan Lu
Abstract	An increasing number of software applications incorporate runtime Deep Neural Network (DNN) inference for its great accuracy in many problem domains. While much prior work has separately tackled the problems of improving DNN-inference accuracy and improving DNN-inference efficiency, an important problem is under-explored: disciplined methods for dynamically managing application-specific latency, accuracy, and energy tradeoffs and constraints at run time. To address this need, we propose ALERT, a co-designed combination of runtime system and DNN nesting technique. The runtime takes latency, accuracy, and energy constraints, and uses dynamic feedback to predict the best DNN-model and system power-limit setting. The DNN nesting creates a type of flexible network that efficiently delivers a series of results with increasing accuracy as time goes on. These two parts well complement each other: the runtime is aware of the tradeoffs of different DNN settings, and the nested DNNs’ flexibility allows the runtime prediction to satisfy application requirements even in unpredictable, changing environments. On real systems for both image and speech, ALERT achieves close-to-optimal results. Comparing with the optimal static DNN-model and power-limit setting, which is impractical to predict, ALERT achieves a harmonic mean 33% of energy savings while satisfying accuracy constraints, and reduces image-classification error rate by 58% and sentence-prediction perplexity by 52% while satisfying energy constraints.
Tasks	Image Classification
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00119v1
PDF	https://arxiv.org/pdf/1911.00119v1.pdf
PWC	https://paperswithcode.com/paper/alert-accurate-anytime-learning-for-energy
Repo
Framework

Quadratic number of nodes is sufficient to learn a dataset via gradient descent


Title	Quadratic number of nodes is sufficient to learn a dataset via gradient descent
Authors	Biswarup Das, Eugene. A. Golikov
Abstract	We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function finds the optimal weights of input layer for global minima in linear time. This threshold value is an improvement over previously obtained values. We hypothesise that this bound cannot be improved by the method we are using in this work.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05402v1
PDF	https://arxiv.org/pdf/1911.05402v1.pdf
PWC	https://paperswithcode.com/paper/quadratic-number-of-nodes-is-sufficient-to
Repo
Framework

Unmasking Bias in News


Title	Unmasking Bias in News
Authors	Javier Sánchez-Junquera, Paolo Rosso, Manuel Montes-y-Gómez, Simone Paolo Ponzetto
Abstract	We present experiments on detecting hyperpartisanship in news using a ‘masking’ method that allows us to assess the role of style vs. content for the task at hand. Our results corroborate previous research on this task in that topic related features yield better results than stylistic ones. We additionally show that competitive results can be achieved by simply including higher-length n-grams, which suggests the need to develop more challenging datasets and tasks that address implicit and more subtle forms of bias.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04836v1
PDF	https://arxiv.org/pdf/1906.04836v1.pdf
PWC	https://paperswithcode.com/paper/unmasking-bias-in-news
Repo
Framework

Potential Applications of Machine Learning at Multidisciplinary Medical Team Meetings


Title	Potential Applications of Machine Learning at Multidisciplinary Medical Team Meetings
Authors	Bridget Kane, Jing Su, Saturnino Luz
Abstract	While machine learning (ML) systems have produced great advances in several domains, their use in support of complex cooperative work remains a research challenge. A particularly challenging setting, and one that may benefit from ML support is the work of multidisciplinary medical teams (MDTs). This paper focuses on the activities performed during the multidisciplinary medical team meeting (MDTM), reviewing their main characteristics in light of a longitudinal analysis of several MDTs in a large teaching hospital over a period of ten years and of our development of ML methods to support MDTMs, and identifying opportunities and possible pitfalls for the use of ML to support MDTMs.
Tasks
Published	2019-11-03
URL	https://arxiv.org/abs/1911.00914v1
PDF	https://arxiv.org/pdf/1911.00914v1.pdf
PWC	https://paperswithcode.com/paper/potential-applications-of-machine-learning-at
Repo
Framework

Deep Learning Regression of VLSI Plasma Etch Metrology


Title	Deep Learning Regression of VLSI Plasma Etch Metrology
Authors	Jack Kenney, John Valcore, Scott Riggs, Edward Rietman
Abstract	In computer chip manufacturing, the study of etch patterns on silicon wafers, or metrology, occurs on the nano-scale and is therefore subject to large variation from small, yet significant, perturbations in the manufacturing environment. An enormous amount of information can be gathered from a single etch process, a sequence of actions taken to produce an etched wafer from a blank piece of silicon. Each final wafer, however, is costly to take measurements from, which limits the number of examples available to train a predictive model. Part of the significance of this work is the success we saw from the models despite the limited number of examples. In order to accommodate the high dimensional process signatures, we isolated important sensor variables and applied domain-specific summarization on the data using multiple feature engineering techniques. We used a neural network architecture consisting of the summarized inputs, a single hidden layer of 4032 units, and an output layer of one unit. Two different models were learned, corresponding to the metrology measurements in the dataset, Recess and Remaining Mask. The outputs are related abstractly and do not form a two dimensional space, thus two separate models were learned. Our results approach the error tolerance of the microscopic imaging system. The model can make predictions for a class of etch recipes that include the correct number of etch steps and plasma reactors with the appropriate sensors, which are chambers containing an ionized gas that determine the manufacture environment. Notably, this method is not restricted to some maximum process length due to the summarization techniques used. This allows the method to be adapted to new processes that satisfy the aforementioned requirements. In order to automate semiconductor manufacturing, models like these will be needed throughout the process to evaluate production quality.
Tasks	Feature Engineering
Published	2019-09-10
URL	https://arxiv.org/abs/1910.10067v1
PDF	https://arxiv.org/pdf/1910.10067v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-regression-of-vlsi-plasma-etch
Repo
Framework

A Robust Data-Driven Approach for Dialogue State Tracking of Unseen Slot Values


Title	A Robust Data-Driven Approach for Dialogue State Tracking of Unseen Slot Values
Authors	Vevake Balaraman, Bernardo Magnini
Abstract	A Dialogue State Tracker is a key component in dialogue systems which estimates the beliefs of possible user goals at each dialogue turn. Deep learning approaches using recurrent neural networks have shown state-of-the-art performance for the task of dialogue state tracking. Generally, these approaches assume a predefined candidate list and struggle to predict any new dialogue state values that are not seen during training. This makes extending the candidate list for a slot without model retaining infeasible and also has limitations in modelling for low resource domains where training data for slot values are expensive. In this paper, we propose a novel dialogue state tracker based on copying mechanism that can effectively track such unseen slot values without compromising performance on slot values seen during training. The proposed model is also flexible in extending the candidate list without requiring any retraining or change in the model. We evaluate the proposed model on various benchmark datasets (DSTC2, DSTC3 and WoZ2.0) and show that our approach, outperform other end-to-end data-driven approaches in tracking unseen slot values and also provides significant advantages in modelling for DST.
Tasks	Dialogue State Tracking
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00269v1
PDF	https://arxiv.org/pdf/1911.00269v1.pdf
PWC	https://paperswithcode.com/paper/a-robust-data-driven-approach-for-dialogue
Repo
Framework