Paper Group ANR 984
Analysis of a Two-Layer Neural Network via Displacement Convexity. Look, Read and Enrich. Learning from Scientific Figures and their Captions. Fine-grained Optimization of Deep Neural Networks. New Radon Transform Based Texture Features of Handwritten Document. A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off. AllenN …
Analysis of a Two-Layer Neural Network via Displacement Convexity
Title | Analysis of a Two-Layer Neural Network via Displacement Convexity |
Authors | Adel Javanmard, Marco Mondelli, Andrea Montanari |
Abstract | Fitting a function by using linear combinations of a large number $N$ of simple' components is one of the most fruitful ideas in statistical learning. This idea lies at the core of a variety of methods, from two-layer neural networks to kernel regression, to boosting. In general, the resulting risk minimization problem is non-convex and is solved by gradient descent or its variants. Unfortunately, little is known about global convergence properties of these approaches. Here we consider the problem of learning a concave function $f$ on a compact convex domain $\Omega\subseteq {\mathbb R}^d$, using linear combinations of bump-like’ components (neurons). The parameters to be fitted are the centers of $N$ bumps, and the resulting empirical risk minimization problem is highly non-convex. We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$. Further, when the bump width $\delta$ tends to $0$, this gradient flow has a limit which is a viscous porous medium equation. Remarkably, the cost function optimized by this gradient flow exhibits a special property known as displacement convexity, which implies exponential convergence rates for $N\to\infty$, $\delta\to 0$. Surprisingly, this asymptotic theory appears to capture well the behavior for moderate values of $\delta, N$. Explaining this phenomenon, and understanding the dependence on $\delta,N$ in a quantitative manner remains an outstanding challenge. |
Tasks | |
Published | 2019-01-05 |
URL | https://arxiv.org/abs/1901.01375v2 |
https://arxiv.org/pdf/1901.01375v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-a-two-layer-neural-network-via |
Repo | |
Framework | |
Look, Read and Enrich. Learning from Scientific Figures and their Captions
Title | Look, Read and Enrich. Learning from Scientific Figures and their Captions |
Authors | Jose Manuel Gomez-Perez, Raul Ortega |
Abstract | Compared to natural images, understanding scientific figures is particularly hard for machines. However, there is a valuable source of information in scientific literature that until now has remained untapped: the correspondence between a figure and its caption. In this paper we investigate what can be learnt by looking at a large number of figures and reading their captions, and introduce a figure-caption correspondence learning task that makes use of our observations. Training visual and language networks without supervision other than pairs of unconstrained figures and captions is shown to successfully solve this task. We also show that transferring lexical and semantic knowledge from a knowledge graph significantly enriches the resulting features. Finally, we demonstrate the positive impact of such features in other tasks involving scientific text and figures, like multi-modal classification and machine comprehension for question answering, outperforming supervised baselines and ad-hoc approaches. |
Tasks | Question Answering, Reading Comprehension |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09070v1 |
https://arxiv.org/pdf/1909.09070v1.pdf | |
PWC | https://paperswithcode.com/paper/look-read-and-enrich-learning-from-scientific |
Repo | |
Framework | |
Fine-grained Optimization of Deep Neural Networks
Title | Fine-grained Optimization of Deep Neural Networks |
Authors | Mete Ozay |
Abstract | In recent studies, several asymptotic upper bounds on generalization errors on deep neural networks (DNNs) are theoretically derived. These bounds are functions of several norms of weights of the DNNs, such as the Frobenius and spectral norms, and they are computed for weights grouped according to either input and output channels of the DNNs. In this work, we conjecture that if we can impose multiple constraints on weights of DNNs to upper bound the norms of the weights, and train the DNNs with these weights, then we can attain empirical generalization errors closer to the derived theoretical bounds, and improve accuracy of the DNNs. To this end, we pose two problems. First, we aim to obtain weights whose different norms are all upper bounded by a constant number, e.g. 1.0. To achieve these bounds, we propose a two-stage renormalization procedure; (i) normalization of weights according to different norms used in the bounds, and (ii) reparameterization of the normalized weights to set a constant and finite upper bound of their norms. In the second problem, we consider training DNNs with these renormalized weights. To this end, we first propose a strategy to construct joint spaces (manifolds) of weights according to different constraints in DNNs. Next, we propose a fine-grained SGD algorithm (FG-SGD) for optimization on the weight manifolds to train DNNs with assurance of convergence to minima. Experimental results show that image classification accuracy of baseline DNNs can be boosted using FG-SGD on collections of manifolds identified by multiple constraints. |
Tasks | Image Classification |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09054v1 |
https://arxiv.org/pdf/1905.09054v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-optimization-of-deep-neural |
Repo | |
Framework | |
New Radon Transform Based Texture Features of Handwritten Document
Title | New Radon Transform Based Texture Features of Handwritten Document |
Authors | Rustam Latypov, Evgeni Stolov |
Abstract | In this paper, we present some new features describing the handwritten document as a texture. These features are based on the Radon transform. All values can be obtained easily and suit for the coarse classification of documents. |
Tasks | |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03068v1 |
http://arxiv.org/pdf/1901.03068v1.pdf | |
PWC | https://paperswithcode.com/paper/new-radon-transform-based-texture-features-of |
Repo | |
Framework | |
A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
Title | A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off |
Authors | Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry |
Abstract | Reducing the precision of weights and activation functions in neural network training, with minimal impact on performance, is essential for the deployment of these models in resource-constrained environments. We apply mean-field techniques to networks with quantized activations in order to evaluate the degree to which quantization degrades signal propagation at initialization. We derive initialization schemes which maximize signal propagation in such networks and suggest why this is helpful for generalization. Building on these results, we obtain a closed form implicit equation for $L_{\max}$, the maximal trainable depth (and hence model capacity), given $N$, the number of quantization levels in the activation function. Solving this equation numerically, we obtain asymptotically: $L_{\max}\propto N^{1.82}$. |
Tasks | Quantization |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00771v2 |
https://arxiv.org/pdf/1906.00771v2.pdf | |
PWC | https://paperswithcode.com/paper/190600771 |
Repo | |
Framework | |
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Title | AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models |
Authors | Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner, Sameer Singh |
Abstract | Neural NLP models are increasingly accurate but are imperfect and opaque—they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders adoption for practitioners and burdens interpretability researchers. We introduce AllenNLP Interpret, a flexible framework for interpreting NLP models. The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit’s flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and adversarial attacks) on a variety of models and tasks (e.g., masked language modeling using BERT and reading comprehension using BiDAF). These demos, alongside our code and tutorials, are available at https://allennlp.org/interpret . |
Tasks | Language Modelling, Reading Comprehension |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09251v1 |
https://arxiv.org/pdf/1909.09251v1.pdf | |
PWC | https://paperswithcode.com/paper/allennlp-interpret-a-framework-for-explaining |
Repo | |
Framework | |
Simple yet Effective Bridge Reasoning for Open-Domain Multi-Hop Question Answering
Title | Simple yet Effective Bridge Reasoning for Open-Domain Multi-Hop Question Answering |
Authors | Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Hong Wang, Shiyu Chang, Murray Campbell, William Yang Wang |
Abstract | A key challenge of multi-hop question answering (QA) in the open-domain setting is to accurately retrieve the supporting passages from a large corpus. Existing work on open-domain QA typically relies on off-the-shelf information retrieval (IR) techniques to retrieve \textbf{answer passages}, i.e., the passages containing the groundtruth answers. However, IR-based approaches are insufficient for multi-hop questions, as the topic of the second or further hops is not explicitly covered by the question. To resolve this issue, we introduce a new sub-problem of open-domain multi-hop QA, which aims to recognize the bridge (\emph{i.e.}, the anchor that links to the answer passage) from the context of a set of start passages with a reading comprehension model. This model, the \textbf{bridge reasoner}, is trained with a weakly supervised signal and produces the candidate answer passages for the \textbf{passage reader} to extract the answer. On the full-wiki HotpotQA benchmark, we significantly improve the baseline method by 14 point F1. Without using any memory-inefficient contextual embeddings, our result is also competitive with the state-of-the-art that applies BERT in multiple modules. |
Tasks | Information Retrieval, Question Answering, Reading Comprehension |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07597v2 |
https://arxiv.org/pdf/1909.07597v2.pdf | |
PWC | https://paperswithcode.com/paper/simple-yet-effective-bridge-reasoning-for |
Repo | |
Framework | |
PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text
Title | PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text |
Authors | Haitian Sun, Tania Bedrax-Weiss, William W. Cohen |
Abstract | We consider open-domain queston answering (QA) where answers are drawn from either a corpus, a knowledge base (KB), or a combination of both of these. We focus on a setting in which a corpus is supplemented with a large but incomplete KB, and on questions that require non-trivial (e.g., multi-hop'') reasoning. We describe PullNet, an integrated framework for (1) learning what to retrieve (from the KB and/or corpus) and (2) reasoning with this heterogeneous information to find the best answer. PullNet uses an {iterative} process to construct a question-specific subgraph that contains information relevant to the question. In each iteration, a graph convolutional network (graph CNN) is used to identify subgraph nodes that should be expanded using retrieval (or pull’') operations on the corpus and/or KB. After the subgraph is complete, a similar graph CNN is used to extract the answer from the subgraph. This retrieve-and-reason process allows us to answer multi-hop questions using large KBs and corpora. PullNet is weakly supervised, requiring question-answer pairs but not gold inference paths. Experimentally PullNet improves over the prior state-of-the art, and in the setting where a corpus is used with incomplete KB these improvements are often dramatic. PullNet is also often superior to prior systems in a KB-only setting or a text-only setting. |
Tasks | Open-Domain Question Answering, Question Answering |
Published | 2019-04-21 |
URL | http://arxiv.org/abs/1904.09537v1 |
http://arxiv.org/pdf/1904.09537v1.pdf | |
PWC | https://paperswithcode.com/paper/pullnet-open-domain-question-answering-with |
Repo | |
Framework | |
KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension
Title | KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension |
Authors | Seungyoung Lim, Myungji Kim, Jooyoul Lee |
Abstract | Machine Reading Comprehension (MRC) is a task that requires machine to understand natural language and answer questions by reading a document. It is the core of automatic response technology such as chatbots and automatized customer supporting systems. We present Korean Question Answering Dataset(KorQuAD), a large-scale Korean dataset for extractive machine reading comprehension task. It consists of 70,000+ human generated question-answer pairs on Korean Wikipedia articles. We release KorQuAD1.0 and launch a challenge at https://KorQuAD.github.io to encourage the development of multilingual natural language processing research. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07005v2 |
https://arxiv.org/pdf/1909.07005v2.pdf | |
PWC | https://paperswithcode.com/paper/korquad10-korean-qa-dataset-for-machine |
Repo | |
Framework | |
Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
Title | Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model |
Authors | Tsung-yuan Hsu, Chi-liang Liu, Hung-yi Lee |
Abstract | Because it is not feasible to collect training data for every language, there is a growing interest in cross-lingual transfer learning. In this paper, we systematically explore zero-shot cross-lingual transfer learning on reading comprehension tasks with a language representation model pre-trained on multi-lingual corpus. The experimental results show that with pre-trained language representation zero-shot learning is feasible, and translating the source data into the target language is not necessary and even degrades the performance. We further explore what does the model learn in zero-shot setting. |
Tasks | Cross-Lingual Transfer, Reading Comprehension, Transfer Learning, Zero-Shot Learning |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.09587v1 |
https://arxiv.org/pdf/1909.09587v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-reading-comprehension-by-cross |
Repo | |
Framework | |
Span Selection Pre-training for Question Answering
Title | Span Selection Pre-training for Question Answering |
Authors | Michael Glass, Alfio Gliozzo, Rishav Chakravarti, Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avirup Sil |
Abstract | BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension and an effort to avoid encoding general knowledge in the transformer network itself. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple reading comprehension (MRC) and paraphrasing datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also establish a new SOTA in HotpotQA, improving answer prediction F1 by 4 F1 points and supporting fact prediction by 1 F1 point. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount. |
Tasks | Language Modelling, Question Answering, Reading Comprehension |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04120v1 |
https://arxiv.org/pdf/1909.04120v1.pdf | |
PWC | https://paperswithcode.com/paper/span-selection-pre-training-for-question |
Repo | |
Framework | |
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Title | FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture |
Authors | Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, Yuan Xie |
Abstract | Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup. |
Tasks | |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.09904v1 |
http://arxiv.org/pdf/1901.09904v1.pdf | |
PWC | https://paperswithcode.com/paper/fpsa-a-full-system-stack-solution-for |
Repo | |
Framework | |
Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection
Title | Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection |
Authors | Junran Peng, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, Junjie Yan |
Abstract | Recently, Neural Architecture Search has achieved great success in large-scale image classification. In contrast, there have been limited works focusing on architecture search for object detection, mainly because the costly ImageNet pre-training is always required for detectors. Training from scratch, as a substitute, demands more epochs to converge and brings no computation saving. To overcome this obstacle, we introduce a practical neural architecture transformation search(NATS)algorithm for object detection in this paper. Instead of searching and constructing an entire network, NATS explores the architecture space on the base of existing network and reusing its weights. We propose a novel neural architecture search strategy in channel-level instead of path-level and devise a search space specially targeting at object detection. With the combination of these two designs, an architecture transformation scheme could be discovered to adapt a network designed for image classification to task of object detection. Since our method is gradient-based and only searches for a transformation scheme, the weights of models pretrained inImageNet could be utilized in both searching and retraining stage, which makes the whole process very efficient. The transformed network requires no extra parameters and FLOPs, and is friendly to hardware optimization, which is practical to use in real-time application. In experiments, we demonstrate the effectiveness of NATSon networks like ResNet and ResNeXt. Our transformed networks, combined with various detection frameworks, achieve significant improvements on the COCO dataset while keeping fast. |
Tasks | Image Classification, Neural Architecture Search, Object Detection |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02293v1 |
https://arxiv.org/pdf/1909.02293v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-neural-architecture-transformation |
Repo | |
Framework | |
Analysis of Contraction Effort Level in EMG-Based Gesture Recognition Using Hyperdimensional Computing
Title | Analysis of Contraction Effort Level in EMG-Based Gesture Recognition Using Hyperdimensional Computing |
Authors | Ali Moin, Andy Zhou, Simone Benatti, Abbas Rahimi, Luca Benini, Jan M. Rabaey |
Abstract | Varying contraction levels of muscles is a big challenge in electromyography-based gesture recognition. Some use cases require the classifier to be robust against varying force changes, while others demand to distinguish between different effort levels of performing the same gesture. We use brain-inspired hyperdimensional computing paradigm to build classification models that are both robust to these variations and able to recognize multiple contraction levels. Experimental results on 5 subjects performing 9 gestures with 3 effort levels show up to 39.17% accuracy drop when training and testing across different effort levels, with up to 30.35% recovery after applying our algorithm. |
Tasks | Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2019-01-02 |
URL | https://arxiv.org/abs/1901.00234v3 |
https://arxiv.org/pdf/1901.00234v3.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-emg-based-hand-gesture-recognition |
Repo | |
Framework | |
Optimal Clustering from Noisy Binary Feedback
Title | Optimal Clustering from Noisy Binary Feedback |
Authors | Kaito Ariu, Jungseul Ok, Alexandre Proutiere, Se-Young Yun |
Abstract | We study the problem of recovering clusters from binary user feedback. Items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a random answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the hardness of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon K-means whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare numerically the performance of our algorithms with or without adaptive selection strategy, and illustrate the gain achieved by being adaptive. Our inference problems are motivated by the problem of solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent CAPTCHA systems, users clicks (binary answers) can be used to efficiently label images, by optimally finding the best questions to present. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06002v1 |
https://arxiv.org/pdf/1910.06002v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-clustering-from-noisy-binary-feedback |
Repo | |
Framework | |