July 29, 2019

3140 words 15 mins read

Paper Group ANR 78

Introduction to Tensor Decompositions and their Applications in Machine Learning. Model compression as constrained optimization, with application to neural nets. Part I: general framework. The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning. Delineation of Skin Strata in Reflectance Confocal Microscopy Images using Recurrent …

Introduction to Tensor Decompositions and their Applications in Machine Learning


Title	Introduction to Tensor Decompositions and their Applications in Machine Learning
Authors	Stephan Rabanser, Oleksandr Shchur, Stephan Günnemann
Abstract	Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors first emerged in the psychometrics community in the $20^{\text{th}}$ century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially beneficial in unsupervised learning settings, but are gaining popularity in other sub-disciplines like temporal and multi-relational data analysis, too. The scope of this paper is to give a broad overview of tensors, their decompositions, and how they are used in machine learning. As part of this, we are going to introduce basic tensor concepts, discuss why tensors can be considered more rigid than matrices with respect to the uniqueness of their decomposition, explain the most important factorization algorithms and their properties, provide concrete examples of tensor decomposition applications in machine learning, conduct a case study on tensor-based estimation of mixture models, talk about the current state of research, and provide references to available software libraries.
Tasks
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10781v1
PDF	http://arxiv.org/pdf/1711.10781v1.pdf
PWC	https://paperswithcode.com/paper/introduction-to-tensor-decompositions-and
Repo
Framework

Model compression as constrained optimization, with application to neural nets. Part I: general framework


Title	Model compression as constrained optimization, with application to neural nets. Part I: general framework
Authors	Miguel Á. Carreira-Perpiñán
Abstract	Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model compression as constrained optimization. This includes many types of compression: quantization, low-rank decomposition, pruning, lossless compression and others. Then, we give a general algorithm to optimize this nonconvex problem based on the augmented Lagrangian and alternating optimization. This results in a “learning-compression” algorithm, which alternates a learning step of the uncompressed model, independent of the compression type, with a compression step of the model parameters, independent of the learning task. This simple, efficient algorithm is guaranteed to find the best compressed model for the task in a local sense under standard assumptions. We present separately in several companion papers the development of this general framework into specific algorithms for model compression based on quantization, pruning and other variations, including experimental results on compressing neural nets and other models.
Tasks	Model Compression, Object Recognition, Quantization
Published	2017-07-05
URL	http://arxiv.org/abs/1707.01209v1
PDF	http://arxiv.org/pdf/1707.01209v1.pdf
PWC	https://paperswithcode.com/paper/model-compression-as-constrained-optimization-1
Repo
Framework

The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning


Title	The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning
Authors	Anil Usumezbas, Ricardo Fabbri, Benjamin Kimia
Abstract	The three-dimensional reconstruction of scenes from multiple views has made impressive strides in recent years, chiefly by methods correlating isolated feature points, intensities, or curvilinear structure. In the general setting, i.e., without requiring controlled acquisition, limited number of objects, abundant patterns on objects, or object curves to follow particular models, the majority of these methods produce unorganized point clouds, meshes, or voxel representations of the reconstructed scene, with some exceptions producing 3D drawings as networks of curves. Many applications, e.g., robotics, urban planning, industrial design, and hard surface modeling, however, require structured representations which make explicit 3D curves, surfaces, and their spatial relationships. Reconstructing surface representations can now be constrained by the 3D drawing acting like a scaffold to hang on the computed representations, leading to increased robustness and quality of reconstruction. This paper presents one way of completing such 3D drawings with surface reconstructions, by exploring occlusion reasoning through lofting algorithms.
Tasks
Published	2017-07-13
URL	http://arxiv.org/abs/1707.03946v1
PDF	http://arxiv.org/pdf/1707.03946v1.pdf
PWC	https://paperswithcode.com/paper/the-surfacing-of-multiview-3d-drawings-via
Repo
Framework

Delineation of Skin Strata in Reflectance Confocal Microscopy Images using Recurrent Convolutional Networks with Toeplitz Attention


Title	Delineation of Skin Strata in Reflectance Confocal Microscopy Images using Recurrent Convolutional Networks with Toeplitz Attention
Authors	Alican Bozkurt, Kivanc Kose, Jaume Coll-Font, Christi Alessi-Fox, Dana H. Brooks, Jennifer G. Dy, Milind Rajadhyaksha
Abstract	Reflectance confocal microscopy (RCM) is an effective, non-invasive pre-screening tool for skin cancer diagnosis, but it requires extensive training and experience to assess accurately. There are few quantitative tools available to standardize image acquisition and analysis, and the ones that are available are not interpretable. In this study, we use a recurrent neural network with attention on convolutional network features. We apply it to delineate skin strata in vertically-oriented stacks of transverse RCM image slices in an interpretable manner. We introduce a new attention mechanism called Toeplitz attention, which constrains the attention map to have a Toeplitz structure. Testing our model on an expert labeled dataset of 504 RCM stacks, we achieve 88.17% image-wise classification accuracy, which is the current state-of-art.
Tasks
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00192v1
PDF	http://arxiv.org/pdf/1712.00192v1.pdf
PWC	https://paperswithcode.com/paper/delineation-of-skin-strata-in-reflectance
Repo
Framework

Feature Analysis and Selection for Training an End-to-End Autonomous Vehicle Controller Using the Deep Learning Approach


Title	Feature Analysis and Selection for Training an End-to-End Autonomous Vehicle Controller Using the Deep Learning Approach
Authors	Shun Yang, Wenshuo Wang, Chang Liu, Kevin Deng, J. Karl Hedrick
Abstract	Deep learning-based approaches have been widely used for training controllers for autonomous vehicles due to their powerful ability to approximate nonlinear functions or policies. However, the training process usually requires large labeled data sets and takes a lot of time. In this paper, we analyze the influences of features on the performance of controllers trained using the convolutional neural networks (CNNs), which gives a guideline of feature selection to reduce computation cost. We collect a large set of data using The Open Racing Car Simulator (TORCS) and classify the image features into three categories (sky-related, roadside-related, and road-related features).We then design two experimental frameworks to investigate the importance of each single feature for training a CNN controller.The first framework uses the training data with all three features included to train a controller, which is then tested with data that has one feature removed to evaluate the feature’s effects. The second framework is trained with the data that has one feature excluded, while all three features are included in the test data. Different driving scenarios are selected to test and analyze the trained controllers using the two experimental frameworks. The experiment results show that (1) the road-related features are indispensable for training the controller, (2) the roadside-related features are useful to improve the generalizability of the controller to scenarios with complicated roadside information, and (3) the sky-related features have limited contribution to train an end-to-end autonomous vehicle controller.
Tasks	Autonomous Vehicles, Feature Selection
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09744v1
PDF	http://arxiv.org/pdf/1703.09744v1.pdf
PWC	https://paperswithcode.com/paper/feature-analysis-and-selection-for-training
Repo
Framework

A Machine Learning Approach to Routing


Title	A Machine Learning Approach to Routing
Authors	Asaf Valadarsky, Michael Schapira, Dafna Shahaf, Aviv Tamar
Abstract	Can ideas and techniques from machine learning be leveraged to automatically generate “good” routing configurations? We investigate the power of data-driven routing protocols. Our results suggest that applying ideas and techniques from deep reinforcement learning to this context yields high performance, motivating further research along these lines.
Tasks
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03074v2
PDF	http://arxiv.org/pdf/1708.03074v2.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-approach-to-routing
Repo
Framework

Learning to Disambiguate by Asking Discriminative Questions


Title	Learning to Disambiguate by Asking Discriminative Questions
Authors	Yining Li, Chen Huang, Xiaoou Tang, Chen-Change Loy
Abstract	The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.
Tasks	Image Captioning, Question Answering, Question Generation, Visual Question Answering
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02760v1
PDF	http://arxiv.org/pdf/1708.02760v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-disambiguate-by-asking
Repo
Framework

Speaker-independent Speech Separation with Deep Attractor Network


Title	Speaker-independent Speech Separation with Deep Attractor Network
Authors	Yi Luo, Zhuo Chen, Nima Mesgarani
Abstract	Despite the recent success of deep learning for many speech processing tasks, single-microphone, speaker-independent speech separation remains challenging for two main reasons. The first reason is the arbitrary order of the target and masker speakers in the mixture permutation problem, and the second is the unknown number of speakers in the mixture output dimension problem. We propose a novel deep learning framework for speech separation that addresses both of these issues. We use a neural network to project the time-frequency representation of the mixture signal into a high-dimensional embedding space. A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space. The time-frequency embeddings of each speaker are then forced to cluster around the corresponding attractor point which is used to determine the time-frequency assignment of the speaker. We propose three methods for finding the attractors for each source in the embedding space and compare their advantages and limitations. The objective function for the network is standard signal reconstruction error which enables end-to-end operation during both training and test phases. We evaluated our system using the Wall Street Journal dataset WSJ0 on two and three speaker mixtures and report comparable or better performance than other state-of-the-art deep learning methods for speech separation.
Tasks	Speech Separation
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03634v3
PDF	http://arxiv.org/pdf/1707.03634v3.pdf
PWC	https://paperswithcode.com/paper/speaker-independent-speech-separation-with
Repo
Framework

Vehicle Routing with Drones


Title	Vehicle Routing with Drones
Authors	Rami Daknama, Elisabeth Kraus
Abstract	We introduce a package service model where trucks as well as drones can deliver packages. Drones can travel on trucks or fly; but while flying, drones can only carry one package at a time and have to return to a truck to charge after each delivery. We present a heuristic algorithm to solve the problem of finding a good schedule for all drones and trucks. The algorithm is based on two nested local searches, thus the definition of suitable neighbourhoods of solutions is crucial for the algorithm. Empirical tests show that our algorithm performs significantly better than a natural Greedy algorithm. Moreover, the savings compared to solutions without drones turn out to be substantial, suggesting that delivery systems might considerably benefit from using drones in addition to trucks.
Tasks
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06431v1
PDF	http://arxiv.org/pdf/1705.06431v1.pdf
PWC	https://paperswithcode.com/paper/vehicle-routing-with-drones
Repo
Framework

Break it Down for Me: A Study in Automated Lyric Annotation


Title	Break it Down for Me: A Study in Automated Lyric Annotation
Authors	Lucas Sterckx, Jason Naradowsky, Bill Byrne, Thomas Demeester, Chris Develder
Abstract	Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation. We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is to rephrase the original text in a more easily understandable manner. However, in ALA the system must often include additional information to clarify niche terminology and abstract concepts. To stimulate research on this task, we release a large collection of crowdsourced annotations for song lyrics. We analyze the performance of translation and retrieval models on this task, measuring performance with both automated and human evaluation. We find that each model captures a unique type of information important to the task.
Tasks	Text Simplification
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03492v1
PDF	http://arxiv.org/pdf/1708.03492v1.pdf
PWC	https://paperswithcode.com/paper/break-it-down-for-me-a-study-in-automated
Repo
Framework

Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit


Title	Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit
Authors	Brendan Maginnis, Pierre H. Richemond
Abstract	Recurrent Neural Networks architectures excel at processing sequences by modelling dependencies over different timescales. The recently introduced Recurrent Weighted Average (RWA) unit captures long term dependencies far better than an LSTM on several challenging tasks. The RWA achieves this by applying attention to each input and computing a weighted average over the full history of its computations. Unfortunately, the RWA cannot change the attention it has assigned to previous timesteps, and so struggles with carrying out consecutive tasks or tasks with changing requirements. We present the Recurrent Discounted Attention (RDA) unit that builds on the RWA by additionally allowing the discounting of the past. We empirically compare our model to RWA, LSTM and GRU units on several challenging tasks. On tasks with a single output the RWA, RDA and GRU units learn much quicker than the LSTM and with better performance. On the multiple sequence copy task our RDA unit learns the task three times as quickly as the LSTM or GRU units while the RWA fails to learn at all. On the Wikipedia character prediction task the LSTM performs best but it followed closely by our RDA unit. Overall our RDA unit performs well and is sample efficient on a large variety of sequence tasks.
Tasks
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08480v2
PDF	http://arxiv.org/pdf/1705.08480v2.pdf
PWC	https://paperswithcode.com/paper/efficiently-applying-attention-to-sequential
Repo
Framework

Constructing Datasets for Multi-hop Reading Comprehension Across Documents


Title	Constructing Datasets for Multi-hop Reading Comprehension Across Documents
Authors	Johannes Welbl, Pontus Stenetorp, Sebastian Riedel
Abstract	Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently there exist no resources to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence - effectively performing multi-hop (alias multi-step) inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information, as providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 42.9% compared to human performance at 74.0% - leaving ample room for improvement.
Tasks	Multi-Hop Reading Comprehension, Reading Comprehension
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06481v2
PDF	http://arxiv.org/pdf/1710.06481v2.pdf
PWC	https://paperswithcode.com/paper/constructing-datasets-for-multi-hop-reading
Repo
Framework

Best Practices for Applying Deep Learning to Novel Applications


Title	Best Practices for Applying Deep Learning to Novel Applications
Authors	Leslie N. Smith
Abstract	This report is targeted to groups who are subject matter experts in their application but deep learning novices. It contains practical advice for those interested in testing the use of deep neural networks on applications that are novel for deep learning. We suggest making your project more manageable by dividing it into phases. For each phase this report contains numerous recommendations and insights to assist novice practitioners.
Tasks
Published	2017-04-05
URL	http://arxiv.org/abs/1704.01568v1
PDF	http://arxiv.org/pdf/1704.01568v1.pdf
PWC	https://paperswithcode.com/paper/best-practices-for-applying-deep-learning-to
Repo
Framework

CRNN: A Joint Neural Network for Redundancy Detection


Title	CRNN: A Joint Neural Network for Redundancy Detection
Authors	Xinyu Fu, Eugene Ch’ng, Uwe Aickelin, Simon See
Abstract	This paper proposes a novel framework for detecting redundancy in supervised sentence categorisation. Unlike traditional singleton neural network, our model incorporates character-aware convolutional neural network (Char-CNN) with character-aware recurrent neural network (Char-RNN) to form a convolutional recurrent neural network (CRNN). Our model benefits from Char-CNN in that only salient features are selected and fed into the integrated Char-RNN. Char-RNN effectively learns long sequence semantics via sophisticated update mechanism. We compare our framework against the state-of-the-art text classification algorithms on four popular benchmarking corpus. For instance, our model achieves competing precision rate, recall ratio, and F1 score on the Google-news data-set. For twenty-news-groups data stream, our algorithm obtains the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, our framework obtains the best F1 score and almost equivalent precision rate and recall ratio over the top competitor. For the question classification collection, CRNN produces the optimal recall rate and F1 score and comparable precision rate. We also analyse three different RNN hidden recurrent cells’ impact on performance and their runtime efficiency. We observe that MGU achieves the optimal runtime and comparable performance against GRU and LSTM. For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec embeddings and report their performance differences.
Tasks	Text Classification
Published	2017-06-04
URL	http://arxiv.org/abs/1706.01069v1
PDF	http://arxiv.org/pdf/1706.01069v1.pdf
PWC	https://paperswithcode.com/paper/crnn-a-joint-neural-network-for-redundancy
Repo
Framework

Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds


Title	Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds
Authors	Lijun Zhang, Tianbao Yang, Rong Jin
Abstract	Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem—stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an $\widetilde{O}(d/n + \sqrt{F_/n})$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_$ is the minimal risk. Thus, when $F_$ is small we obtain an $\widetilde{O}(d/n)$ risk bound, which is analogous to the $\widetilde{O}(1/n)$ optimistic rate of ERM for supervised learning. Second, if the objective function is also $\lambda$-strongly convex, we prove an $\widetilde{O}(d/n + \kappa F_/n )$ risk bound where $\kappa$ is the condition number, and improve it to $O(1/[\lambda n^2] + \kappa F_/n)$ when $n=\widetilde{\Omega}(\kappa d)$. As a result, we obtain an $O(\kappa/n^2)$ risk bound under the condition that $n$ is large and $F_$ is small, which to the best of our knowledge, is the first $O(1/n^2)$-type of risk bound of ERM. Third, we stress that the above results are established in a unified framework, which allows us to derive new risk bounds under weaker conditions, e.g., without convexity of the random function and Lipschitz continuity of the expected function. Finally, we demonstrate that to achieve an $O(1/[\lambda n^2] + \kappa F_*/n)$ risk bound for supervised learning, the $\widetilde{\Omega}(\kappa d)$ requirement on $n$ can be replaced with $\Omega(\kappa^2)$, which is dimensionality-independent.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02030v1
PDF	http://arxiv.org/pdf/1702.02030v1.pdf
PWC	https://paperswithcode.com/paper/empirical-risk-minimization-for-stochastic
Repo
Framework