January 30, 2020

3170 words 15 mins read

Paper Group ANR 253

Common Artist Music Assistance. Dimensionality compression and expansion in Deep Neural Networks. Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media. Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks. Efficient 2.5D Hand Pose Estimation via Auxilia …

Common Artist Music Assistance


Title	Common Artist Music Assistance
Authors	Manish Agnihotri, Adiyta Rathod, Aditya Jajodia, Chethan Sharma
Abstract	In today’s world of growing number of songs, the need of finding apposite music content according to a user’s interest is crucial. Furthermore, recommendations suitable to one user may be irrelevant to another. In this paper, we propose a recommendation system for users with common-artist music listening patterns. We use “random walk with restart” algorithm to get relevant recommendations and conduct experiments to find the optimal values of multiple parameters.
Tasks
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07200v1
PDF	https://arxiv.org/pdf/1911.07200v1.pdf
PWC	https://paperswithcode.com/paper/common-artist-music-assistance
Repo
Framework

Dimensionality compression and expansion in Deep Neural Networks


Title	Dimensionality compression and expansion in Deep Neural Networks
Authors	Stefano Recanatesi, Matthew Farrell, Madhu Advani, Timothy Moore, Guillaume Lajoie, Eric Shea-Brown
Abstract	Datasets such as images, text, or movies are embedded in high-dimensional spaces. However, in important cases such as images of objects, the statistical structure in the data constrains samples to a manifold of dramatically lower dimensionality. Learning to identify and extract task-relevant variables from this embedded manifold is crucial when dealing with high-dimensional problems. We find that neural networks are often very effective at solving this task and investigate why. To this end, we apply state-of-the-art techniques for intrinsic dimensionality estimation to show that neural networks learn low-dimensional manifolds in two phases: first, dimensionality expansion driven by feature generation in initial layers, and second, dimensionality compression driven by the selection of task-relevant features in later layers. We model noise generated by Stochastic Gradient Descent and show how this noise balances the dimensionality of neural representations by inducing an effective regularization term in the loss. We highlight the important relationship between low-dimensional compressed representations and generalization properties of the network. Our work contributes by shedding light on the success of deep neural networks in disentangling data in high-dimensional space while achieving good generalization. Furthermore, it invites new learning strategies focused on optimizing measurable geometric properties of learned representations, beginning with their intrinsic dimensionality.
Tasks
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00443v3
PDF	https://arxiv.org/pdf/1906.00443v3.pdf
PWC	https://paperswithcode.com/paper/190600443
Repo
Framework


Title	Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Authors	Khuong Vo, Tri Nguyen, Dang Pham, Mao Nguyen, Minh Truong, Trung Mai, Tho Quan
Abstract	Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets.
Tasks	Data Augmentation, Sentiment Analysis, Transfer Learning, Word Embeddings
Published	2019-02-16
URL	https://arxiv.org/abs/1902.06050v2
PDF	https://arxiv.org/pdf/1902.06050v2.pdf
PWC	https://paperswithcode.com/paper/combination-of-domain-knowledge-and-deep
Repo
Framework

Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks


Title	Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks
Authors	Yizhe Wu, Sudhanshu Kasewa, Oliver Groth, Sasha Salter, Li Sun, Oiwi Parker Jones, Ingmar Posner
Abstract	In this paper we investigate an artificial agent’s ability to perform task-focused tool synthesis via imagination. Our motivation is to explore the richness of information captured by the latent space of an object-centric generative model – and how to exploit it. In particular, our approach employs activation maximisation of a task-based performance predictor to optimise the latent variable of a structured latent-space model in order to generate tool geometries appropriate for the task at hand. We evaluate our model using a novel dataset of synthetic reaching tasks inspired by the cognitive sciences and behavioural ecology. In doing so we examine the model’s ability to imagine tools for increasingly complex scenario types, beyond those seen during training. Our experiments demonstrate that the synthesis process modifies emergent, task-relevant object affordances in a targeted and deliberate way: the agents often specifically modify aspects of the tools which relate to meaningful (yet implicitly learned) concepts such as a tool’s length, width and configuration. Our results therefore suggest that task relevant object affordances are implicitly encoded as directions in a structured latent space shaped by experience.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13561v2
PDF	https://arxiv.org/pdf/1909.13561v2.pdf
PWC	https://paperswithcode.com/paper/imagine-that-leveraging-emergent-affordances-1
Repo
Framework

Efficient 2.5D Hand Pose Estimation via Auxiliary Multi-Task Training for Embedded Devices


Title	Efficient 2.5D Hand Pose Estimation via Auxiliary Multi-Task Training for Embedded Devices
Authors	Prajwal Chidananda, Ayan Sinha, Adithya Rao, Douglas Lee, Andrew Rabinovich
Abstract	2D Key-point estimation is an important precursor to 3D pose estimation problems for human body and hands. In this work, we discuss the data, architecture, and training procedure necessary to deploy extremely efficient 2.5D hand pose estimation on embedded devices with highly constrained memory and compute envelope, such as AR/VR wearables. Our 2.5D hand pose estimation consists of 2D key-point estimation of joint positions on an egocentric image, captured by a depth sensor, and lifted to 2.5D using the corresponding depth values. Our contributions are two fold: (a) We discuss data labeling and augmentation strategies, the modules in the network architecture that collectively lead to $3%$ the flop count and $2%$ the number of parameters when compared to the state of the art MobileNetV2 architecture. (b) We propose an auxiliary multi-task training strategy needed to compensate for the small capacity of the network while achieving comparable performance to MobileNetV2. Our 32-bit trained model has a memory footprint of less than 300 Kilobytes, operates at more than 50 Hz with less than 35 MFLOPs.
Tasks	3D Pose Estimation, Hand Pose Estimation, Pose Estimation
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05897v1
PDF	https://arxiv.org/pdf/1909.05897v1.pdf
PWC	https://paperswithcode.com/paper/efficient-25d-hand-pose-estimation-via
Repo
Framework

Synthetic Data Generation and Adaption for Object Detection in Smart Vending Machines


Title	Synthetic Data Generation and Adaption for Object Detection in Smart Vending Machines
Authors	Kai Wang, Fuyuan Shi, Wenqi Wang, Yibing Nan, Shiguo Lian
Abstract	This paper presents an improved scheme for the generation and adaption of synthetic images for the training of deep Convolutional Neural Networks(CNNs) to perform the object detection task in smart vending machines. While generating synthetic data has proved to be effective for complementing the training data in supervised learning methods, challenges still exist for generating virtual images which are similar to those of the complex real scenes and minimizing redundant training data. To solve these problems, we consider the simulation of cluttered objects placed in a virtual scene and the wide-angle camera with distortions used to capture the whole scene in the data generation process, and post-processed the generated images with a elaborately-designed generative network to make them more similar to the real images. Various experiments have been conducted to prove the efficiency of using the generated virtual images to enhance the detection precision on existing datasets with limited real training data and the generalization ability of applying the trained network to datasets collected in new environment.
Tasks	Object Detection, Synthetic Data Generation
Published	2019-04-28
URL	http://arxiv.org/abs/1904.12294v1
PDF	http://arxiv.org/pdf/1904.12294v1.pdf
PWC	https://paperswithcode.com/paper/synthetic-data-generation-and-adaption-for
Repo
Framework

Conditions on Features for Temporal Difference-Like Methods to Converge


Title	Conditions on Features for Temporal Difference-Like Methods to Converge
Authors	Marcus Hutter, Samuel Yang-Zhao, Sultan J. Majeed
Abstract	The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation (when it exists); notably, bootstrapped temporal difference-based methods such as $TD(\lambda)$ and $GTD(\lambda)$ are natural algorithms. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation based features are a safe choice for natural algorithms and we also provide a condition for finding convergent algorithms under other feature constructions.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11702v1
PDF	https://arxiv.org/pdf/1905.11702v1.pdf
PWC	https://paperswithcode.com/paper/conditions-on-features-for-temporal
Repo
Framework

VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications


Title	VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications
Authors	Chyuan-Tyng Wu, Leo F. Isikdogan, Sushma Rao, Bhavin Nayak, Timo Gerasimow, Aleksandar Sutic, Liron Ain-kedem, Gilad Michael
Abstract	Traditional image signal processors (ISPs) are primarily designed and optimized to improve the image quality perceived by humans. However, optimal perceptual image quality does not always translate into optimal performance for computer vision applications. We propose a set of methods, which we collectively call VisionISP, to repurpose the ISP for machine consumption. VisionISP significantly reduces data transmission needs by reducing the bit-depth and resolution while preserving the relevant information. The blocks in VisionISP are simple, content-aware, and trainable. Experimental results show that VisionISP boosts the performance of a subsequent computer vision system trained to detect objects in an autonomous driving setting. The results demonstrate the potential and the practicality of VisionISP for computer vision applications.
Tasks	Autonomous Driving
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05931v1
PDF	https://arxiv.org/pdf/1911.05931v1.pdf
PWC	https://paperswithcode.com/paper/visionisp-repurposing-the-image-signal
Repo
Framework

Learning Continually from Low-shot Data Stream


Title	Learning Continually from Low-shot Data Stream
Authors	Canyu Le, Xihan Wei, Biao Wang, Lei Zhang, Zhonggui Chen
Abstract	While deep learning has achieved remarkable results on various applications, it is usually data hungry and struggles to learn over non-stationary data stream. To solve these two limits, the deep learning model should not only be able to learn from a few of data, but also incrementally learn new concepts from data stream over time without forgetting the previous knowledge. Limited literature simultaneously address both problems. In this work, we propose a novel approach, MetaCL, which enables neural networks to effectively learn meta knowledge from low-shot data stream without catastrophic forgetting. MetaCL trains a model to exploit the intrinsic feature of data (i.e. meta knowledge) and dynamically penalize the important model parameters change to preserve learned knowledge. In this way, the deep learning model can efficiently obtain new knowledge from small volume of data and still keep high performance on previous tasks. MetaCL is conceptually simple, easy to implement and model-agnostic. We implement our method on three recent regularization-based methods. Extensive experiments show that our approach leads to state-of-the-art performance on image classification benchmarks.
Tasks	Image Classification
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10223v2
PDF	https://arxiv.org/pdf/1908.10223v2.pdf
PWC	https://paperswithcode.com/paper/learning-continually-from-low-shot-data
Repo
Framework

CONAN: Complementary Pattern Augmentation for Rare Disease Detection


Title	CONAN: Complementary Pattern Augmentation for Rare Disease Detection
Authors	Limeng Cui, Siddharth Biswal, Lucas M. Glass, Greg Lever, Jimeng Sun, Cao Xiao
Abstract	Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a Complementary pattern Augmentation (CONAN) framework for rare disease detection. CONAN combines ideas from both adversarial training and max-margin classification. It first learns self-attentive and hierarchical embedding for patient pattern characterization. Then, we develop a complementary generative adversarial networks (GAN) model to generate candidate positive and negative samples from the uncertain patients by encouraging a max-margin between classes. In addition, CONAN has a disease detector that serves as the discriminator during the adversarial training for identifying rare diseases. We evaluated CONAN on two disease detection tasks. For low prevalence inflammatory bowel disease (IBD) detection, CONAN achieved .96 precision recall area under the curve (PR-AUC) and 50.1% relative improvement over best baseline. For rare disease idiopathic pulmonary fibrosis (IPF) detection, CONAN achieves .22 PR-AUC with 41.3% relative improvement over the best baseline.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.13232v1
PDF	https://arxiv.org/pdf/1911.13232v1.pdf
PWC	https://paperswithcode.com/paper/conan-complementary-pattern-augmentation-for
Repo
Framework

Iterative Hessian Sketch in Input Sparsity Time


Title	Iterative Hessian Sketch in Input Sparsity Time
Authors	Graham Cormode, Charlie Dickens
Abstract	Scalable algorithms to solve optimization and regression tasks even approximately, are needed to work with large datasets. In this paper we study efficient techniques from matrix sketching to solve a variety of convex constrained regression problems. We adopt “Iterative Hessian Sketching” (IHS) and show that the fast CountSketch and sparse Johnson-Lindenstrauss Transforms yield state-of-the-art accuracy guarantees under IHS, while drastically improving the time cost. As a result, we obtain significantly faster algorithms for constrained regression, for both sparse and dense inputs. Our empirical results show that we can summarize data roughly 100x faster for sparse data, and, surprisingly, 10x faster on dense data! Consequently, solutions accurate to within machine precision of the optimal solution can be found much faster than the previous state of the art.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14166v1
PDF	https://arxiv.org/pdf/1910.14166v1.pdf
PWC	https://paperswithcode.com/paper/iterative-hessian-sketch-in-input-sparsity
Repo
Framework

Stochastic Gradients for Large-Scale Tensor Decomposition


Title	Stochastic Gradients for Large-Scale Tensor Decomposition
Authors	Tamara G. Kolda, David Hong
Abstract	Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor and is efficient because it can be computed using the sparse matricized-tensor-times-Khatri-Rao product (MTTKRP) tensor kernel. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01687v2
PDF	https://arxiv.org/pdf/1906.01687v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradients-for-large-scale-tensor
Repo
Framework

DRNet: Dissect and Reconstruct the Convolutional Neural Network via Interpretable Manners


Title	DRNet: Dissect and Reconstruct the Convolutional Neural Network via Interpretable Manners
Authors	Xiaolong Hu, Zhulin An, Chuanguang Yang, Hui Zhu, Kaiqaing Xu, Yongjun Xu
Abstract	Convolutional neural networks (ConvNets) are widely used in real life. People usually use ConvNets which pre-trained on a fixed number of classes. However, for different application scenarios, we usually do not need all of the classes, which means ConvNets are redundant when dealing with these tasks. This paper focuses on the redundancy of ConvNet channels. We proposed a novel idea: using an interpretable manner to find the most important channels for every single class (dissect), and dynamically run channels according to classes in need (reconstruct). For VGG16 pre-trained on CIFAR-10, we only run 11% parameters for two-classes sub-tasks on average with negligible accuracy loss. For VGG16 pre-trained on ImageNet, our method averagely gains 14.29% accuracy promotion for two-classes sub-tasks. In addition, analysis show that our method captures some semantic meanings of channels, and uses the context information more targeted for sub-tasks of ConvNets.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08691v2
PDF	https://arxiv.org/pdf/1911.08691v2.pdf
PWC	https://paperswithcode.com/paper/drnet-dissect-and-reconstruct-the
Repo
Framework

Speeding Up Natural Language Parsing by Reusing Partial Results


Title	Speeding Up Natural Language Parsing by Reusing Partial Results
Authors	Michalina Strzyz, Carlos Gómez-Rodríguez
Abstract	This paper proposes a novel technique that applies case-based reasoning in order to generate templates for reusable parse tree fragments, based on PoS tags of bigrams and trigrams that demonstrate low variability in their syntactic analyses from prior data. The aim of this approach is to improve the speed of dependency parsers by avoiding redundant calculations. This can be resolved by applying the predefined templates that capture results of previous syntactic analyses and directly assigning the stored structure to a new n-gram that matches one of the templates, instead of parsing a similar text fragment again. The study shows that using a heuristic approach to select and reuse the partial results increases parsing speed by reducing the input length to be processed by a parser. The increase in parsing speed comes at some expense of accuracy. Experiments on English show promising results: the input dimension can be reduced by more than 20% at the cost of less than 3 points of Unlabeled Attachment Score.
Tasks
Published	2019-04-06
URL	http://arxiv.org/abs/1904.03417v1
PDF	http://arxiv.org/pdf/1904.03417v1.pdf
PWC	https://paperswithcode.com/paper/speeding-up-natural-language-parsing-by
Repo
Framework

Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks


Title	Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks
Authors	Chengxi Li, Yue Meng, Stanley H. Chan, Yi-Ting Chen
Abstract	To enable intelligent automated driving systems, a promising strategy is to understand how human drives and interacts with road users in complicated driving situations. In this paper, we propose a 3D-aware egocentric spatial-temporal interaction framework for automated driving applications. Graph convolution networks (GCN) is devised for interaction modeling. We introduce three novel concepts into GCN. First, we decompose egocentric interactions into ego-thing and ego-stuff interaction, modeled by two GCNs. In both GCNs, ego nodes are introduced to encode the interaction between thing objects (e.g., car and pedestrian), and interaction between stuff objects (e.g., lane marking and traffic light). Second, objects’ 3D locations are explicitly incorporated into GCN to better model egocentric interactions. Third, to implement ego-stuff interaction in GCN, we propose a MaskAlign operation to extract features for irregular objects. We validate the proposed framework on tactical driver behavior recognition. Extensive experiments are conducted using Honda Research Institute Driving Dataset, the largest dataset with diverse tactical driver behavior annotations. Our framework demonstrates substantial performance boost over baselines on the two experimental settings by 3.9% and 6.0%, respectively. Furthermore, we visualize the learned affinity matrices, which encode ego-thing and ego-stuff interactions, to showcase the proposed framework can capture interactions effectively.
Tasks
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09272v3
PDF	https://arxiv.org/pdf/1909.09272v3.pdf
PWC	https://paperswithcode.com/paper/learning-3d-aware-egocentric-spatial-temporal
Repo
Framework