July 29, 2019

3308 words 16 mins read

Paper Group AWR 135

Knowledge Graph Completion via Complex Tensor Factorization. CURE-TSR: Challenging Unreal and Real Environments for Traffic Sign Recognition. How to Make an Image More Memorable? A Deep Style Transfer Approach. Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction. Towards better understanding of gradient-based att …

Knowledge Graph Completion via Complex Tensor Factorization


Title	Knowledge Graph Completion via Complex Tensor Factorization
Authors	Théo Trouillon, Christopher R. Dance, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard
Abstract	In statistical relational learning, knowledge graph completion deals with automatically understanding the structure of large knowledge graphs—labeled directed graphs—and predicting missing relationships—labeled edges. State-of-the-art embedding models propose different trade-offs between modeling expressiveness, and time and space complexity. We reconcile both expressiveness and complexity through the use of complex-valued embeddings and explore the link between such complex-valued embeddings and unitary diagonalization. We corroborate our approach theoretically and show that all real square matrices—thus all possible relation/adjacency matrices—are the real part of some unitarily diagonalizable matrix. This results opens the door to a lot of other applications of square matrices factorization. Our approach based on complex embeddings is arguably simple, as it only involves a Hermitian dot product, the complex counterpart of the standard dot product between real vectors, whereas other methods resort to more and more complicated composition functions to increase their expressiveness. The proposed complex embeddings are scalable to large data sets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.
Tasks	Knowledge Graph Completion, Knowledge Graphs, Link Prediction, Relational Reasoning
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06879v2
PDF	http://arxiv.org/pdf/1702.06879v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-graph-completion-via-complex-tensor
Repo	https://github.com/ttrouill/complex
Framework	none

CURE-TSR: Challenging Unreal and Real Environments for Traffic Sign Recognition


Title	CURE-TSR: Challenging Unreal and Real Environments for Traffic Sign Recognition
Authors	Dogancan Temel, Gukyeong Kwon, Mohit Prabhushankar, Ghassan AlRegib
Abstract	In this paper, we investigate the robustness of traffic sign recognition algorithms under challenging conditions. Existing datasets are limited in terms of their size and challenging condition coverage, which motivated us to generate the Challenging Unreal and Real Environments for Traffic Sign Recognition (CURE-TSR) dataset. It includes more than two million traffic sign images that are based on real-world and simulator data. We benchmark the performance of existing solutions in real-world scenarios and analyze the performance variation with respect to challenging conditions. We show that challenging conditions can decrease the performance of baseline methods significantly, especially if these challenging conditions result in loss or misplacement of spatial information. We also investigate the effect of data augmentation and show that utilization of simulator data along with real-world data enhance the average recognition performance in real-world scenarios. The dataset is publicly available at https://ghassanalregib.com/cure-tsr/.
Tasks	Data Augmentation, Traffic Sign Recognition
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02463v2
PDF	http://arxiv.org/pdf/1712.02463v2.pdf
PWC	https://paperswithcode.com/paper/cure-tsr-challenging-unreal-and-real
Repo	https://github.com/olivesgatech/CURE-TSR
Framework	pytorch

How to Make an Image More Memorable? A Deep Style Transfer Approach


Title	How to Make an Image More Memorable? A Deep Style Transfer Approach
Authors	Aliaksandr Siarohin, Gloria Zen, Cveta Majtanovic, Xavier Alameda-Pineda, Elisa Ricci, Nicu Sebe
Abstract	Recent works have shown that it is possible to automatically predict intrinsic image properties like memorability. In this paper, we take a step forward addressing the question: “Can we make an image more memorable?". Methods for automatically increasing image memorability would have an impact in many application fields like education, gaming or advertising. Our work is inspired by the popular editing-by-applying-filters paradigm adopted in photo editing applications, like Instagram and Prisma. In this context, the problem of increasing image memorability maps to that of retrieving “memorabilizing” filters or style “seeds”. Still, users generally have to go through most of the available filters before finding the desired solution, thus turning the editing process into a resource and time consuming task. In this work, we show that it is possible to automatically retrieve the best style seeds for a given image, thus remarkably reducing the number of human attempts needed to find a good match. Our approach leverages from recent advances in the field of image synthesis and adopts a deep architecture for generating a memorable picture from a given input image and a style seed. Importantly, to automatically select the best style a novel learning-based solution, also relying on deep models, is proposed. Our experimental evaluation, conducted on publicly available benchmarks, demonstrates the effectiveness of the proposed approach for generating memorable images through automatic style seed selection
Tasks	Image Generation, Style Transfer
Published	2017-04-06
URL	http://arxiv.org/abs/1704.01745v1
PDF	http://arxiv.org/pdf/1704.01745v1.pdf
PWC	https://paperswithcode.com/paper/how-to-make-an-image-more-memorable-a-deep
Repo	https://github.com/aliaksandrsiarohin/mem-transfer
Framework	torch

Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction


Title	Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction
Authors	Daniel Stoller, Sebastian Ewert, Simon Dixon
Abstract	The state of the art in music source separation employs neural networks trained in a supervised fashion on multi-track databases to estimate the sources from a given mixture. With only few datasets available, often extensive data augmentation is used to combat overfitting. Mixing random tracks, however, can even reduce separation performance as instruments in real music are strongly correlated. The key concept in our approach is that source estimates of an optimal separator should be indistinguishable from real source signals. Based on this idea, we drive the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples. This way, we can also use unpaired source and mixture recordings without the drawbacks of creating unrealistic music mixtures. Our framework is widely applicable as it does not assume a specific network architecture or number of sources. To our knowledge, this is the first adoption of adversarial training for music source separation. In a prototype experiment for singing voice separation, separation performance increases with our approach compared to purely supervised training.
Tasks	Data Augmentation, Music Source Separation
Published	2017-10-31
URL	http://arxiv.org/abs/1711.00048v2
PDF	http://arxiv.org/pdf/1711.00048v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-semi-supervised-audio-source
Repo	https://github.com/NullspaceSF/AAS
Framework	tf

Towards better understanding of gradient-based attribution methods for Deep Neural Networks


Title	Towards better understanding of gradient-based attribution methods for Deep Neural Networks
Authors	Marco Ancona, Enea Ceolini, Cengiz Öztireli, Markus Gross
Abstract	Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work, we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image and text classification, using various network architectures.
Tasks	Text Classification
Published	2017-11-16
URL	http://arxiv.org/abs/1711.06104v4
PDF	http://arxiv.org/pdf/1711.06104v4.pdf
PWC	https://paperswithcode.com/paper/towards-better-understanding-of-gradient
Repo	https://github.com/kundajelab/deeplift
Framework	tf

4DFAB: A Large Scale 4D Facial Expression Database for Biometric Applications


Title	4DFAB: A Large Scale 4D Facial Expression Database for Biometric Applications
Authors	Shiyang Cheng, Irene Kotsia, Maja Pantic, Stefanos Zafeiriou
Abstract	The progress we are currently witnessing in many computer vision applications, including automatic face analysis, would not be made possible without tremendous efforts in collecting and annotating large scale visual databases. To this end, we propose 4DFAB, a new large scale database of dynamic high-resolution 3D faces (over 1,800,000 3D meshes). 4DFAB contains recordings of 180 subjects captured in four different sessions spanning over a five-year period. It contains 4D videos of subjects displaying both spontaneous and posed facial behaviours. The database can be used for both face and facial expression recognition, as well as behavioural biometrics. It can also be used to learn very powerful blendshapes for parametrising facial behaviour. In this paper, we conduct several experiments and demonstrate the usefulness of the database for various applications. The database will be made publicly available for research purposes.
Tasks	Facial Expression Recognition
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01443v2
PDF	http://arxiv.org/pdf/1712.01443v2.pdf
PWC	https://paperswithcode.com/paper/4dfab-a-large-scale-4d-facial-expression
Repo	https://github.com/sw-gong/spiralnet_plus
Framework	pytorch

Island Loss for Learning Discriminative Features in Facial Expression Recognition


Title	Island Loss for Learning Discriminative Features in Facial Expression Recognition
Authors	Jie Cai, Zibo Meng, Ahmed Shehab Khan, Zhiyuan Li, James O’Reilly, Yan Tong
Abstract	Over the past few years, Convolutional Neural Networks (CNNs) have shown promise on facial expression recognition. However, the performance degrades dramatically under real-world settings due to variations introduced by subtle facial appearance changes, head pose variations, illumination changes, and occlusions. In this paper, a novel island loss is proposed to enhance the discriminative power of the deeply learned features. Specifically, the IL is designed to reduce the intra-class variations while enlarging the inter-class differences simultaneously. Experimental results on four benchmark expression databases have demonstrated that the CNN with the proposed island loss (IL-CNN) outperforms the baseline CNN models with either traditional softmax loss or the center loss and achieves comparable or better performance compared with the state-of-the-art methods for facial expression recognition.
Tasks	Facial Expression Recognition
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03144v3
PDF	http://arxiv.org/pdf/1710.03144v3.pdf
PWC	https://paperswithcode.com/paper/island-loss-for-learning-discriminative
Repo	https://github.com/shanxuanchen/FacialExpressionRecognition
Framework	none

Machine learning for neural decoding


Title	Machine learning for neural decoding
Authors	Joshua I. Glaser, Ari S. Benjamin, Raeed H. Chowdhury, Matthew G. Perich, Lee E. Miller, Konrad P. Kording
Abstract	Despite rapid advances in machine learning tools, the majority of neural decoding approaches still use traditional methods. Modern machine learning tools, which are versatile and easy to use, have the potential to significantly improve decoding performance. This tutorial describes how to effectively apply these algorithms for typical decoding problems. We provide descriptions, best practices, and code for applying common machine learning methods, including neural networks and gradient boosting. We also provide detailed comparisons of the performance of various methods at the task of decoding spiking activity in motor cortex, somatosensory cortex, and hippocampus. Modern methods, in particular neural networks and ensembles, significantly outperform traditional approaches, such as Wiener and Kalman filters. Improving the performance of neural decoding algorithms allows neuroscientists to better understand the information contained in a neural population, and can help advance engineering applications such as brain machine interfaces.
Tasks
Published	2017-08-02
URL	https://arxiv.org/abs/1708.00909v3
PDF	https://arxiv.org/pdf/1708.00909v3.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-neural-decoding
Repo	https://github.com/KordingLab/Neural_Decoding
Framework	none

The Mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints


Title	The Mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints
Authors	Po-Wei Wang, Wei-Cheng Chang, J. Zico Kolter
Abstract	In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that the method is strictly decreasing, converges to a critical point, and further that for sufficient rank all non-optimal critical points are unstable. Moreover, we prove that with a step size, the Mixing method converges to the global optimum of the semidefinite program almost surely in a locally linear rate under random initialization. This is the first low-rank semidefinite programming method that has been shown to achieve a global optimum on the spherical manifold without assumption. We apply our algorithm to two related domains: solving the maximum cut semidefinite relaxation, and solving a maximum satisfiability relaxation (we also briefly consider additional applications such as learning word embeddings). In all settings, we demonstrate substantial improvement over the existing state of the art along various dimensions, and in total, this work expands the scope and scale of problems that can be solved using semidefinite programming methods.
Tasks	Learning Word Embeddings, Word Embeddings
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00476v3
PDF	http://arxiv.org/pdf/1706.00476v3.pdf
PWC	https://paperswithcode.com/paper/the-mixing-method-low-rank-coordinate-descent
Repo	https://github.com/locuslab/mixing
Framework	none

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction


Title	Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction
Authors	Chen-Hsuan Lin, Chen Kong, Simon Lucey
Abstract	Conventional methods of 3D object generative modeling learn volumetric predictions using deep networks with 3D convolutional operations, which are direct analogies to classical 2D ones. However, these methods are computationally wasteful in attempt to predict 3D shapes, where information is rich only on the surfaces. In this paper, we propose a novel 3D generative modeling framework to efficiently generate object shapes in the form of dense point clouds. We use 2D convolutional operations to predict the 3D structure from multiple viewpoints and jointly apply geometric reasoning with 2D projection optimization. We introduce the pseudo-renderer, a differentiable module to approximate the true rendering operation, to synthesize novel depth maps for optimization. Experimental results for single-image 3D object reconstruction tasks show that we outperforms state-of-the-art methods in terms of shape similarity and prediction density.
Tasks	3D Object Reconstruction, Object Reconstruction, Point Cloud Generation
Published	2017-06-21
URL	http://arxiv.org/abs/1706.07036v1
PDF	http://arxiv.org/pdf/1706.07036v1.pdf
PWC	https://paperswithcode.com/paper/learning-efficient-point-cloud-generation-for
Repo	https://github.com/chenhsuanlin/3D-point-cloud-generation
Framework	tf

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks


Title	Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
Authors	Takuhiro Kaneko, Hirokazu Kameoka
Abstract	We propose a parallel-data-free voice-conversion (VC) method that can learn a mapping from source to target speech without relying on parallel data. The proposed method is general purpose, high quality, and parallel-data free and works without any extra data, modules, or alignment procedure. It also avoids over-smoothing, which occurs in many conventional statistical model-based VC methods. Our method, called CycleGAN-VC, uses a cycle-consistent adversarial network (CycleGAN) with gated convolutional neural networks (CNNs) and an identity-mapping loss. A CycleGAN learns forward and inverse mappings simultaneously using adversarial and cycle-consistency losses. This makes it possible to find an optimal pseudo pair from unpaired data. Furthermore, the adversarial loss contributes to reducing over-smoothing of the converted feature sequence. We configure a CycleGAN with gated CNNs and train it with an identity-mapping loss. This allows the mapping function to capture sequential and hierarchical structures while preserving linguistic information. We evaluated our method on a parallel-data-free VC task. An objective evaluation showed that the converted feature sequence was near natural in terms of global variance and modulation spectra. A subjective evaluation showed that the quality of the converted speech was comparable to that obtained with a Gaussian mixture model-based method under advantageous conditions with parallel and twice the amount of data.
Tasks	Voice Conversion
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11293v2
PDF	http://arxiv.org/pdf/1711.11293v2.pdf
PWC	https://paperswithcode.com/paper/parallel-data-free-voice-conversion-using
Repo	https://github.com/eliceio/vocal-style-transfer
Framework	tf

Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction


Title	Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction
Authors	Edward Schmerling, Karen Leung, Wolf Vollprecht, Marco Pavone
Abstract	This paper presents a method for constructing human-robot interaction policies in settings where multimodality, i.e., the possibility of multiple highly distinct futures, plays a critical role in decision making. We are motivated in this work by the example of traffic weaving, e.g., at highway on-ramps/off-ramps, where entering and exiting cars must swap lanes in a short distance—a challenging negotiation even for experienced drivers due to the inherent multimodal uncertainty of who will pass whom. Our approach is to learn multimodal probability distributions over future human actions from a dataset of human-human exemplars and perform real-time robot policy construction in the resulting environment model through massively parallel sampling of human responses to candidate robot action sequences. Direct learning of these distributions is made possible by recent advances in the theory of conditional variational autoencoders (CVAEs), whereby we learn action distributions simultaneously conditioned on the present interaction history, as well as candidate future robot actions in order to take into account response dynamics. We demonstrate the efficacy of this approach with a human-in-the-loop simulation of a traffic weaving scenario.
Tasks	Decision Making
Published	2017-10-25
URL	http://arxiv.org/abs/1710.09483v1
PDF	http://arxiv.org/pdf/1710.09483v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-probabilistic-model-based-planning
Repo	https://github.com/StanfordASL/TrafficWeavingCVAE
Framework	tf

Deep Interest Network for Click-Through Rate Prediction


Title	Deep Interest Network for Click-Through Rate Prediction
Authors	Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, Kun Gai
Abstract	Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user’s diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.
Tasks	Click-Through Rate Prediction
Published	2017-06-21
URL	http://arxiv.org/abs/1706.06978v4
PDF	http://arxiv.org/pdf/1706.06978v4.pdf
PWC	https://paperswithcode.com/paper/deep-interest-network-for-click-through-rate
Repo	https://github.com/johnlevi/recsys
Framework	none

No More Discrimination: Cross City Adaptation of Road Scene Segmenters


Title	No More Discrimination: Cross City Adaptation of Road Scene Segmenters
Authors	Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun
Abstract	Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.
Tasks	Semantic Segmentation
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08509v1
PDF	http://arxiv.org/pdf/1704.08509v1.pdf
PWC	https://paperswithcode.com/paper/no-more-discrimination-cross-city-adaptation
Repo	https://github.com/lym29/DASeg
Framework	pytorch

Sliced Wasserstein Distance for Learning Gaussian Mixture Models


Title	Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Authors	Soheil Kolouri, Gustavo K. Rohde, Heiko Hoffmann
Abstract	Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and computer vision. Expectation maximization (EM) is the most popular algorithm for estimating the GMM parameters. However, EM guarantees only convergence to a stationary point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the relationship between the negative log-likelihood function and the Kullback-Leibler (KL) divergence, we propose an alternative formulation for estimating the GMM parameters using the sliced Wasserstein distance, which gives rise to a new algorithm. Specifically, we propose minimizing the sliced-Wasserstein distance between the mixture model and the data distribution with respect to the GMM parameters. In contrast to the KL-divergence, the energy landscape for the sliced-Wasserstein distance is more well-behaved and therefore more suitable for a stochastic gradient descent scheme to obtain the optimal GMM parameters. We show that our formulation results in parameter estimates that are more robust to random initializations and demonstrate that it can estimate high-dimensional data distributions more faithfully than the EM algorithm.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05376v2
PDF	http://arxiv.org/pdf/1711.05376v2.pdf
PWC	https://paperswithcode.com/paper/sliced-wasserstein-distance-for-learning
Repo	https://github.com/yokaze/swgmm
Framework	tf