Paper Group ANR 656
Embodied Multimodal Multitask Learning. Improving Neural Relation Extraction with Positive and Unlabeled Learning. Amharic-Arabic Neural Machine Translation. Sparsely Grouped Input Variables for Neural Networks. Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle. Deep Representation Learning for …
Embodied Multimodal Multitask Learning
Title | Embodied Multimodal Multitask Learning |
Authors | Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra |
Abstract | Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question answering. In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks. The proposed model uses a novel Dual-Attention unit to disentangle the knowledge of words in the textual representations and visual concepts in the visual representations, and align them with each other. This disentangled task-invariant alignment of representations facilitates grounding and knowledge transfer across both tasks. We show that the proposed model outperforms a range of baselines on both tasks in simulated 3D environments. We also show that this disentanglement of representations makes our model modular, interpretable, and allows for transfer to instructions containing new words by leveraging object detectors. |
Tasks | Embodied Question Answering, Question Answering, Transfer Learning, Visual Navigation |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.01385v1 |
http://arxiv.org/pdf/1902.01385v1.pdf | |
PWC | https://paperswithcode.com/paper/embodied-multimodal-multitask-learning |
Repo | |
Framework | |
Improving Neural Relation Extraction with Positive and Unlabeled Learning
Title | Improving Neural Relation Extraction with Positive and Unlabeled Learning |
Authors | Zhengqiu He, Wenliang Chen, Yuyi Wang, Wei zhang, Guanchun Wang, Min Zhang |
Abstract | We present a novel approach to improve the performance of distant supervision relation extraction with Positive and Unlabeled (PU) Learning. This approach first applies reinforcement learning to decide whether a sentence is positive to a given relation, and then positive and unlabeled bags are constructed. In contrast to most previous studies, which mainly use selected positive instances only, we make full use of unlabeled instances and propose two new representations for positive and unlabeled bags. These two representations are then combined in an appropriate way to make bag-level prediction. Experimental results on a widely used real-world dataset demonstrate that this new approach indeed achieves significant and consistent improvements as compared to several competitive baselines. |
Tasks | Relation Extraction |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12556v1 |
https://arxiv.org/pdf/1911.12556v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-neural-relation-extraction-with-1 |
Repo | |
Framework | |
Amharic-Arabic Neural Machine Translation
Title | Amharic-Arabic Neural Machine Translation |
Authors | Ibrahim Gashaw, H L Shashirekha |
Abstract | Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) models are developed using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. In order to perform the experiment, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively. |
Tasks | Machine Translation |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/1912.13161v1 |
https://arxiv.org/pdf/1912.13161v1.pdf | |
PWC | https://paperswithcode.com/paper/amharic-arabic-neural-machine-translation |
Repo | |
Framework | |
Sparsely Grouped Input Variables for Neural Networks
Title | Sparsely Grouped Input Variables for Neural Networks |
Authors | Beibin Li, Nicholas Nuechterlein, Erin Barney, Caitlin Hudac, Pamela Ventola, Linda Shapiro, Frederick Shic |
Abstract | In genomic analysis, biomarker discovery, image recognition, and other systems involving machine learning, input variables can often be organized into different groups by their source or semantic category. Eliminating some groups of variables can expedite the process of data acquisition and avoid over-fitting. Researchers have used the group lasso to ensure group sparsity in linear models and have extended it to create compact neural networks in meta-learning. Different from previous studies, we use multi-layer non-linear neural networks to find sparse groups for input variables. We propose a new loss function to regularize parameters for grouped input variables, design a new optimization algorithm for this loss function, and test these methods in three real-world settings. We achieve group sparsity for three datasets, maintaining satisfying results while excluding one nucleotide position from an RNA splicing experiment, excluding 89.9% of stimuli from an eye-tracking experiment, and excluding 60% of image rows from an experiment on the MNIST dataset. |
Tasks | Eye Tracking, Meta-Learning |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.13068v1 |
https://arxiv.org/pdf/1911.13068v1.pdf | |
PWC | https://paperswithcode.com/paper/sparsely-grouped-input-variables-for-neural |
Repo | |
Framework | |
Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle
Title | Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle |
Authors | Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang |
Abstract | $Q$-learning with function approximation is one of the most popular methods in reinforcement learning. Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i.e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy. The key challenges are how to efficiently explore the state space and how to decide when to stop exploring in conjunction with the function approximation scheme. The current paper presents a provably efficient algorithm for $Q$-learning with linear function approximation. Under certain regularity assumptions, our algorithm, Difference Maximization $Q$-learning (DMQ), combined with linear function approximation, returns a near-optimal policy using a polynomial number of trajectories. Our algorithm introduces a new notion, the Distribution Shift Error Checking (DSEC) oracle. This oracle tests whether there exists a function in the function class that predicts well on a distribution $\mathcal{D}_1$, but predicts poorly on another distribution $\mathcal{D}_2$, where $\mathcal{D}_1$ and $\mathcal{D}_2$ are distributions over states induced by two different exploration policies. For the linear function class, this oracle is equivalent to solving a top eigenvalue problem. We believe our algorithmic insights, especially the DSEC oracle, are also useful in designing and analyzing reinforcement learning algorithms with general function approximation. |
Tasks | Q-Learning |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06321v2 |
https://arxiv.org/pdf/1906.06321v2.pdf | |
PWC | https://paperswithcode.com/paper/provably-efficient-q-learning-with-function |
Repo | |
Framework | |
Deep Representation Learning for Social Network Analysis
Title | Deep Representation Learning for Social Network Analysis |
Authors | Qiaoyu Tan, Ninghao Liu, Xia Hu |
Abstract | Social network analysis is an important problem in data mining. A fundamental step for analyzing social networks is to encode network data into low-dimensional representations, i.e., network embeddings, so that the network topology structure and other attribute information can be effectively preserved. Network representation leaning facilitates further applications such as classification, link prediction, anomaly detection and clustering. In addition, techniques based on deep neural networks have attracted great interests over the past a few years. In this survey, we conduct a comprehensive review of current literature in network representation learning utilizing neural network models. First, we introduce the basic models for learning node representations in homogeneous networks. Meanwhile, we will also introduce some extensions of the base models in tackling more complex scenarios, such as analyzing attributed networks, heterogeneous networks and dynamic networks. Then, we introduce the techniques for embedding subgraphs. After that, we present the applications of network representation learning. At the end, we discuss some promising research directions for future work. |
Tasks | Anomaly Detection, Link Prediction, Representation Learning |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.08547v1 |
http://arxiv.org/pdf/1904.08547v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-representation-learning-for-social |
Repo | |
Framework | |
Aggregated Gradient Langevin Dynamics
Title | Aggregated Gradient Langevin Dynamics |
Authors | Chao Zhang, Jiahao Xie, Zebang Shen, Peilin Zhao, Tengfei Zhou, Hui Qian |
Abstract | In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09223v1 |
https://arxiv.org/pdf/1910.09223v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregated-gradient-langevin-dynamics |
Repo | |
Framework | |
Tuning Algorithms and Generators for Efficient Edge Inference
Title | Tuning Algorithms and Generators for Efficient Edge Inference |
Authors | Rawan Naous, Lazar Supic, Yoonhwan Kang, Ranko Seradejovic, Anish Singhani, Vladimir Stojanovic |
Abstract | A surge in artificial intelligence and autonomous technologies have increased the demand toward enhanced edge-processing capabilities. Computational complexity and size of state-of-the-art Deep Neural Networks (DNNs) are rising exponentially with diverse network models and larger datasets. This growth limits the performance scaling and energy-efficiency of both distributed and embedded inference platforms. Embedded designs at the edge are constrained by energy and speed limitations of available processor substrates and processor to memory communication required to fetch the model coefficients. While many hardware accelerator and network deployment frameworks have been in development, a framework is needed to allow the variety of existing architectures, and those in development, to be expressed in critical parts of the flow that perform various optimization steps. Moreover, premature architecture-blind network selection and optimization diminish the effectiveness of schedule optimizations and hardware-specific mappings. In this paper, we address these issues by creating a cross-layer software-hardware design framework that encompasses network training and model compression that is aware of and tuned to the underlying hardware architecture. This approach leverages the available degrees of DNN structure and sparsity to create a converged network that can be partitioned and efficiently scheduled on the target hardware platform, minimizing data movement, and improving the overall throughput and energy. To further streamline the design, we leverage the high-level, flexible SoC generator platform based on RISC-V ROCC framework. This integration allows seamless extensions of the RISC-V instruction set and Chisel-based rapid generator design. Utilizing this approach, we implemented a silicon prototype in a 16 nm TSMC process node achieving record processing efficiency of up to 18 TOPS/W. |
Tasks | Model Compression |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1908.02239v1 |
https://arxiv.org/pdf/1908.02239v1.pdf | |
PWC | https://paperswithcode.com/paper/tuning-algorithms-and-generators-for |
Repo | |
Framework | |
Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence
Title | Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence |
Authors | Yaoming Zhu, Juncheng Wan, Zhiming Zhou, Liheng Chen, Lin Qiu, Weinan Zhang, Xin Jiang, Yong Yu |
Abstract | Knowledge base is one of the main forms to represent information in a structured way. A knowledge base typically consists of Resource Description Frameworks (RDF) triples which describe the entities and their relations. Generating natural language description of the knowledge base is an important task in NLP, which has been formulated as a conditional language generation task and tackled using the sequence-to-sequence framework. Current works mostly train the language models by maximum likelihood estimation, which tends to generate lousy sentences. In this paper, we argue that such a problem of maximum likelihood estimation is intrinsic, which is generally irrevocable via changing network structures. Accordingly, we propose a novel Triple-to-Text (T2T) framework, which approximately optimizes the inverse Kullback-Leibler (KL) divergence between the distributions of the real and generated sentences. Due to the nature that inverse KL imposes large penalty on fake-looking samples, the proposed method can significantly reduce the probability of generating low-quality sentences. Our experiments on three real-world datasets demonstrate that T2T can generate higher-quality sentences and outperform baseline models in several evaluation metrics. |
Tasks | Text Generation |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1906.01965v1 |
https://arxiv.org/pdf/1906.01965v1.pdf | |
PWC | https://paperswithcode.com/paper/190601965 |
Repo | |
Framework | |
CorNet: Generic 3D Corners for 6D Pose Estimation of New Objects without Retraining
Title | CorNet: Generic 3D Corners for 6D Pose Estimation of New Objects without Retraining |
Authors | Giorgia Pitteri, Slobodan Ilic, Vincent Lepetit |
Abstract | We present a novel approach to the detection and 3D pose estimation of objects in color images. Its main contribution is that it does not require any training phases nor data for new objects, while state-of-the-art methods typically require hours of training time and hundreds of training registered images. Instead, our method relies only on the objects’ geometries. Our method focuses on objects with prominent corners, which covers a large number of industrial objects. We first learn to detect object corners of various shapes in images and also to predict their 3D poses, by using training images of a small set of objects. To detect a new object in a given image, we first identify its corners from its CAD model; we also detect the corners visible in the image and predict their 3D poses. We then introduce a RANSAC-like algorithm that robustly and efficiently detects and estimates the object’s 3D pose by matching its corners on the CAD model with their detected counterparts in the image. Because we also estimate the 3D poses of the corners in the image, detecting only 1 or 2 corners is sufficient to estimate the pose of the object, which makes the approach robust to occlusions. We finally rely on a final check that exploits the full 3D geometry of the objects, in case multiple objects have the same corner spatial arrangement. The advantages of our approach make it particularly attractive for industrial contexts, and we demonstrate our approach on the challenging T-LESS dataset. |
Tasks | 3D Pose Estimation, 6D Pose Estimation, Pose Estimation |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.11457v1 |
https://arxiv.org/pdf/1908.11457v1.pdf | |
PWC | https://paperswithcode.com/paper/cornet-generic-3d-corners-for-6d-pose |
Repo | |
Framework | |
Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
Title | Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT |
Authors | Kartikeya Bhardwaj, Chingyi Lin, Anderson Sartor, Radu Marculescu |
Abstract | Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, we propose Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained ‘teacher’ deep network into several disjoint and highly-compressed ‘student’ modules, without loss of accuracy. Moreover, we propose a network science-based knowledge partitioning algorithm for the teacher model, and then train individual students on the resulting disjoint partitions. Extensive experimentation on five image classification datasets, for user-defined memory/performance budgets, show that NoNN achieves higher accuracy than several baselines and similar accuracy as the teacher model, while using minimal communication among students. Finally, as a case study, we deploy the proposed model for CIFAR-10 dataset on edge devices and demonstrate significant improvements in memory footprint (up to 24x), performance (up to 12x), and energy per node (up to 14x) compared to the large teacher model. We further show that for distributed inference on multiple edge devices, our proposed NoNN model results in up to 33x reduction in total latency w.r.t. a state-of-the-art model compression baseline. |
Tasks | Image Classification, Model Compression |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11804v1 |
https://arxiv.org/pdf/1907.11804v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-and-communication-aware-model |
Repo | |
Framework | |
Data-Independent Neural Pruning via Coresets
Title | Data-Independent Neural Pruning via Coresets |
Authors | Ben Mussay, Margarita Osadchy, Vladimir Braverman, Samson Zhou, Dan Feldman |
Abstract | Previous work showed empirically that large neural networks can be significantly reduced in size while preserving their accuracy. Model compression became a central research topic, as it is crucial for deployment of neural networks on devices with limited computational and memory resources. The majority of the compression methods are based on heuristics and offer no worst-case guarantees on the trade-off between the compression rate and the approximation error for an arbitrarily new sample. We propose the first efficient, data-independent neural pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample. Our method is based on the coreset framework, which finds a small weighted subset of points that provably approximates the original inputs. Specifically, we approximate the output of a layer of neurons by a coreset of neurons in the previous layer and discard the rest. We apply this framework in a layer-by-layer fashion from the top to the bottom. Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $x\in \mathbb{R}^d$, including an adversarial one. We demonstrate the effectiveness of our method on popular network architectures. In particular, our coresets yield 90% compression of the LeNet-300-100 architecture on MNIST while improving the accuracy. |
Tasks | Model Compression, Network Pruning |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04018v3 |
https://arxiv.org/pdf/1907.04018v3.pdf | |
PWC | https://paperswithcode.com/paper/on-activation-function-coresets-for-network |
Repo | |
Framework | |
Neural Probabilistic Logic Programming in DeepProbLog
Title | Neural Probabilistic Logic Programming in DeepProbLog |
Authors | Robin Manhaeve, Sebastijan Dumančić, Angelika Kimmig, Thomas Demeester, Luc De Raedt |
Abstract | We introduce DeepProbLog, a neural probabilistic logic programming language that incorporates deep learning by means of neural predicates. We show how existing inference and learning techniques of the underlying probabilistic logic programming language ProbLog can be adapted for the new language. We theoretically and experimentally demonstrate that DeepProbLog supports (i) both symbolic and subsymbolic representations and inference, (ii) program induction, (iii) probabilistic (logic) programming, and (iv) (deep) learning from examples. To the best of our knowledge, this work is the first to propose a framework where general-purpose neural networks and expressive probabilistic-logical modeling and reasoning are integrated in a way that exploits the full expressiveness and strengths of both worlds and can be trained end-to-end based on examples. |
Tasks | |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08194v2 |
https://arxiv.org/pdf/1907.08194v2.pdf | |
PWC | https://paperswithcode.com/paper/deepproblog-neural-probabilistic-logic-2 |
Repo | |
Framework | |
AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
Title | AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates |
Authors | Ning Liu, Xiaolong Ma, Zhiyuan Xu, Yanzhi Wang, Jian Tang, Jieping Ye |
Abstract | Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoCompress is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoCompress outperforms the prior work on automatic model compression by up to 33x in pruning rate (120x reduction in the actual parameter count) under the same accuracy. Significant inference speedup has been observed from the AutoCompress framework on actual measurements on smartphone. We release all models of this work at anonymous link: http://bit.ly/2VZ63dS. |
Tasks | Model Compression |
Published | 2019-07-06 |
URL | https://arxiv.org/abs/1907.03141v2 |
https://arxiv.org/pdf/1907.03141v2.pdf | |
PWC | https://paperswithcode.com/paper/autoslim-an-automatic-dnn-structured-pruning |
Repo | |
Framework | |
Successive Point-of-Interest Recommendation with Local Differential Privacy
Title | Successive Point-of-Interest Recommendation with Local Differential Privacy |
Authors | Jong Seon Kim, Jong Wook Kim, Yon Dohn Chung |
Abstract | A point-of-interest (POI) recommendation system plays an important role in location-based services (LBS) because it can help people to explore new locations and promote advertisers to launch ads to target users. Exiting POI recommendation methods need users’ raw check-in data, which can raise location privacy breaches. Even worse, several privacy-preserving recommendation systems could not utilize the transition pattern in the human movement. To address these problems, we propose Successive Point-of-Interest REcommendation with Local differential privacy (SPIREL) framework. SPIREL employs two types of sources from users’ check-in history: a transition pattern between two POIs and visiting counts of POIs. We propose a novel objective function for learning the user-POI and POI-POI relationships simultaneously. We further propose two privacy-preserving mechanisms to train our recommendation system. Experiments using two public datasets demonstrate that SPIREL achieves better POI recommendation quality while preserving stronger privacy for check-in history. |
Tasks | Recommendation Systems |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09485v1 |
https://arxiv.org/pdf/1908.09485v1.pdf | |
PWC | https://paperswithcode.com/paper/successive-point-of-interest-recommendation |
Repo | |
Framework | |