October 19, 2019

3483 words 17 mins read

Paper Group ANR 115

Neural Melody Composition from Lyrics. An Optimal Transport View on Generalization. Spatial Knowledge Distillation to aid Visual Reasoning. Consistency-aware Shading Orders Selective Fusion for Intrinsic Image Decomposition. Discovering Spatio-Temporal Action Tubes. CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Renderi …

Neural Melody Composition from Lyrics


Title	Neural Melody Composition from Lyrics
Authors	Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou
Abstract	In this paper, we study a novel task that learns to compose music from natural language. Given the lyrics as input, we propose a melody composition model that generates lyrics-conditional melody as well as the exact alignment between the generated melody and the given lyrics simultaneously. More specifically, we develop the melody composition model based on the sequence-to-sequence framework. It consists of two neural encoders to encode the current lyrics and the context melody respectively, and a hierarchical decoder to jointly produce musical notes and the corresponding alignment. Experimental results on lyrics-melody pairs of 18,451 pop songs demonstrate the effectiveness of our proposed methods. In addition, we apply a singing voice synthesizer software to synthesize the “singing” of the lyrics and melodies for human evaluation. Results indicate that our generated melodies are more melodious and tuneful compared with the baseline method.
Tasks
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04318v1
PDF	http://arxiv.org/pdf/1809.04318v1.pdf
PWC	https://paperswithcode.com/paper/neural-melody-composition-from-lyrics
Repo
Framework

An Optimal Transport View on Generalization


Title	An Optimal Transport View on Generalization
Authors	Jingwei Zhang, Tongliang Liu, Dacheng Tao
Abstract	We derive upper bounds on the generalization error of learning algorithms based on their \emph{algorithmic transport cost}: the expected Wasserstein distance between the output hypothesis and the output hypothesis conditioned on an input example. The bounds provide a novel approach to study the generalization of learning algorithms from an optimal transport view and impose less constraints on the loss function, such as sub-gaussian or bounded. We further provide several upper bounds on the algorithmic transport cost in terms of total variation distance, relative entropy (or KL-divergence), and VC dimension, thus further bridging optimal transport theory and information theory with statistical learning theory. Moreover, we also study different conditions for loss functions under which the generalization error of a learning algorithm can be upper bounded by different probability metrics between distributions relating to the output hypothesis and/or the input data. Finally, under our established framework, we analyze the generalization in deep learning and conclude that the generalization error in deep neural networks (DNNs) decreases exponentially to zero as the number of layers increases. Our analyses of generalization error in deep learning mainly exploit the hierarchical structure in DNNs and the contraction property of $f$-divergence, which may be of independent interest in analyzing other learning models with hierarchical structure.
Tasks
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03270v1
PDF	http://arxiv.org/pdf/1811.03270v1.pdf
PWC	https://paperswithcode.com/paper/an-optimal-transport-view-on-generalization
Repo
Framework

Spatial Knowledge Distillation to aid Visual Reasoning


Title	Spatial Knowledge Distillation to aid Visual Reasoning
Authors	Somak Aditya, Rudra Saha, Yezhou Yang, Chitta Baral
Abstract	For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system’s capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.
Tasks	Question Answering, Relational Reasoning, Visual Question Answering, Visual Reasoning
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03631v2
PDF	http://arxiv.org/pdf/1812.03631v2.pdf
PWC	https://paperswithcode.com/paper/spatial-knowledge-distillation-to-aid-visual
Repo
Framework

Consistency-aware Shading Orders Selective Fusion for Intrinsic Image Decomposition


Title	Consistency-aware Shading Orders Selective Fusion for Intrinsic Image Decomposition
Authors	Yuanliu Liu, Ang Li, Zejian Yuan, Badong Chen, Nanning Zheng
Abstract	We address the problem of decomposing a single image into reflectance and shading. The difficulty comes from the fact that the components of image—the surface albedo, the direct illumination, and the ambient illumination—are coupled heavily in observed image. We propose to infer the shading by ordering pixels by their relative brightness, without knowing the absolute values of the image components beforehand. The pairwise shading orders are estimated in two ways: brightness order and low-order fittings of local shading field. The brightness order is a non-local measure, which can be applied to any pair of pixels including those whose reflectance and shading are both different. The low-order fittings are used for pixel pairs within local regions of smooth shading. Together, they can capture both global order structure and local variations of the shading. We propose a Consistency-aware Selective Fusion (CSF) to integrate the pairwise orders into a globally consistent order. The iterative selection process solves the conflicts between the pairwise orders obtained by different estimation methods. Inconsistent or unreliable pairwise orders will be automatically excluded from the fusion to avoid polluting the global order. Experiments on the MIT Intrinsic Image dataset show that the proposed model is effective at recovering the shading including deep shadows. Our model also works well on natural images from the IIW dataset, the UIUC Shadow dataset and the NYU-Depth dataset, where the colors of direct lights and ambient lights are quite different.
Tasks	Intrinsic Image Decomposition
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09706v1
PDF	http://arxiv.org/pdf/1810.09706v1.pdf
PWC	https://paperswithcode.com/paper/consistency-aware-shading-orders-selective
Repo
Framework

Discovering Spatio-Temporal Action Tubes


Title	Discovering Spatio-Temporal Action Tubes
Authors	Yuancheng Ye, Xiaodong Yang, Yingli Tian
Abstract	In this paper, we address the challenging problem of spatial and temporal action detection in videos. We first develop an effective approach to localize frame-level action regions through integrating static and kinematic information by the early- and late-fusion detection scheme. With the intention of exploring important temporal connections among the detected action regions, we propose a tracking-by-point-matching algorithm to stitch the discrete action regions into a continuous spatio-temporal action tube. Recurrent 3D convolutional neural network is used to predict action categories and determine temporal boundaries of the generated tubes. We then introduce an action footprint map to refine the candidate tubes based on the action-specific spatial characteristics preserved in the convolutional layers of R3DCNN. In the extensive experiments, our method achieves superior detection results on the three public benchmark datasets: UCFSports, J-HMDB and UCF101.
Tasks	Action Detection
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12248v1
PDF	http://arxiv.org/pdf/1811.12248v1.pdf
PWC	https://paperswithcode.com/paper/discovering-spatio-temporal-action-tubes
Repo
Framework

CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering


Title	CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Authors	Zhengqi Li, Noah Snavely
Abstract	Intrinsic image decomposition is a challenging, long-standing computer vision problem for which ground truth data is very difficult to acquire. We explore the use of synthetic data for training CNN-based intrinsic image decomposition models, then applying these learned models to real-world images. To that end, we present \ICG, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. The rendering process we use is carefully designed to yield high-quality, realistic images, which we find to be crucial for this problem domain. We also propose a new end-to-end training method that learns better decompositions by leveraging \ICG, and optionally IIW and SAW, two recent datasets of sparse annotations on real-world images. Surprisingly, we find that a decomposition network trained solely on our synthetic data outperforms the state-of-the-art on both IIW and SAW, and performance improves even further when IIW and SAW data is added during training. Our work demonstrates the suprising effectiveness of carefully-rendered synthetic data for the intrinsic images task.
Tasks	Intrinsic Image Decomposition
Published	2018-08-26
URL	http://arxiv.org/abs/1808.08601v3
PDF	http://arxiv.org/pdf/1808.08601v3.pdf
PWC	https://paperswithcode.com/paper/cgintrinsics-better-intrinsic-image
Repo
Framework

Joint Learning of Intrinsic Images and Semantic Segmentation


Title	Joint Learning of Intrinsic Images and Semantic Segmentation
Authors	Anil S. Baslamisli, Thomas T. Groenestege, Partha Das, Hoang-An Le, Sezer Karaoglu, Theo Gevers
Abstract	Semantic segmentation of outdoor scenes is problematic when there are variations in imaging conditions. It is known that albedo (reflectance) is invariant to all kinds of illumination effects. Thus, using reflectance images for semantic segmentation task can be favorable. Additionally, not only segmentation may benefit from reflectance, but also segmentation may be useful for reflectance computation. Therefore, in this paper, the tasks of semantic segmentation and intrinsic image decomposition are considered as a combined process by exploring their mutual relationship in a joint fashion. To that end, we propose a supervised end-to-end CNN architecture to jointly learn intrinsic image decomposition and semantic segmentation. We analyze the gains of addressing those two problems jointly. Moreover, new cascade CNN architectures for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as single tasks. Furthermore, a dataset of 35K synthetic images of natural environments is created with corresponding albedo and shading (intrinsics), as well as semantic labels (segmentation) assigned to each object/scene. The experiments show that joint learning of intrinsic image decomposition and semantic segmentation is beneficial for both tasks for natural scenes. Dataset and models are available at: https://ivi.fnwi.uva.nl/cv/intrinseg
Tasks	Intrinsic Image Decomposition, Semantic Segmentation
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11857v1
PDF	http://arxiv.org/pdf/1807.11857v1.pdf
PWC	https://paperswithcode.com/paper/joint-learning-of-intrinsic-images-and
Repo
Framework

Loop Restricted Existential Rules and First-order Rewritability for Query Answering


Title	Loop Restricted Existential Rules and First-order Rewritability for Query Answering
Authors	Vernon Asuncion, Yan Zhang, Heng Zhang, Yun Bai, Weisheng Si
Abstract	In ontology-based data access (OBDA), the classical database is enhanced with an ontology in the form of logical assertions generating new intensional knowledge. A powerful form of such logical assertions is the tuple-generating dependencies (TGDs), also called existential rules, where Horn rules are extended by allowing existential quantifiers to appear in the rule heads. In this paper we introduce a new language called loop restricted (LR) TGDs (existential rules), which are TGDs with certain restrictions on the loops embedded in the underlying rule set. We study the complexity of this new language. We show that the conjunctive query answering (CQA) under the LR TGDs is decid- able. In particular, we prove that this language satisfies the so-called bounded derivation-depth prop- erty (BDDP), which implies that the CQA is first-order rewritable, and its data complexity is in AC0 . We also prove that the combined complexity of the CQA is EXPTIME complete, while the language membership is PSPACE complete. Then we extend the LR TGDs language to the generalised loop restricted (GLR) TGDs language, and prove that this class of TGDs still remains to be first-order rewritable and properly contains most of other first-order rewritable TGDs classes discovered in the literature so far.
Tasks
Published	2018-04-19
URL	http://arxiv.org/abs/1804.07099v2
PDF	http://arxiv.org/pdf/1804.07099v2.pdf
PWC	https://paperswithcode.com/paper/loop-restricted-existential-rules-and-first
Repo
Framework

Deep Hybrid Real and Synthetic Training for Intrinsic Decomposition


Title	Deep Hybrid Real and Synthetic Training for Intrinsic Decomposition
Authors	Sai Bi, Nima Khademi Kalantari, Ravi Ramamoorthi
Abstract	Intrinsic image decomposition is the process of separating the reflectance and shading layers of an image, which is a challenging and underdetermined problem. In this paper, we propose to systematically address this problem using a deep convolutional neural network (CNN). Although deep learning (DL) has been recently used to handle this application, the current DL methods train the network only on synthetic images as obtaining ground truth reflectance and shading for real images is difficult. Therefore, these methods fail to produce reasonable results on real images and often perform worse than the non-DL techniques. We overcome this limitation by proposing a novel hybrid approach to train our network on both synthetic and real images. Specifically, in addition to directly supervising the network using synthetic images, we train the network by enforcing it to produce the same reflectance for a pair of images of the same real-world scene with different illuminations. Furthermore, we improve the results by incorporating a bilateral solver layer into our system during both training and test stages. Experimental results show that our approach produces better results than the state-of-the-art DL and non-DL methods on various synthetic and real datasets both visually and numerically.
Tasks	Intrinsic Image Decomposition
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11226v1
PDF	http://arxiv.org/pdf/1807.11226v1.pdf
PWC	https://paperswithcode.com/paper/deep-hybrid-real-and-synthetic-training-for
Repo
Framework

Feed-Forward Neural Networks Need Inductive Bias to Learn Equality Relations


Title	Feed-Forward Neural Networks Need Inductive Bias to Learn Equality Relations
Authors	Tillman Weyde, Radha Manisha Kopparti
Abstract	Basic binary relations such as equality and inequality are fundamental to relational data structures. Neural networks should learn such relations and generalise to new unseen data. We show in this study, however, that this generalisation fails with standard feed-forward networks on binary vectors. Even when trained with maximal training data, standard networks do not reliably detect equality.We introduce differential rectifier (DR) units that we add to the network in different configurations. The DR units create an inductive bias in the networks, so that they do learn to generalise, even from small numbers of examples and we have not found any negative effect of their inclusion in the network. Given the fundamental nature of these relations, we hypothesize that feed-forward neural network learning benefits from inductive bias in other relations as well. Consequently, the further development of suitable inductive biases will be beneficial to many tasks in relational learning with neural networks.
Tasks	Relational Reasoning
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01662v1
PDF	http://arxiv.org/pdf/1812.01662v1.pdf
PWC	https://paperswithcode.com/paper/feed-forward-neural-networks-need-inductive
Repo
Framework

Revisiting Inaccuracies of Time Series Averaging under Dynamic Time Warping


Title	Revisiting Inaccuracies of Time Series Averaging under Dynamic Time Warping
Authors	Brijnesh Jain
Abstract	This article revisits an analysis on inaccuracies of time series averaging under dynamic time warping conducted by \cite{Niennattrakul2007}. The authors presented a correctness-criterion and introduced drift-outs of averages from clusters. They claimed that averages are inaccurate if they are incorrect or drift-outs. Furthermore, they conjectured that such inaccuracies are caused by the lack of triangle inequality. We show that a rectified version of the correctness-criterion is unsatisfiable and that the concept of drift-out is geometrically and operationally inconclusive. Satisfying the triangle inequality is insufficient to achieve correctness and unnecessary to overcome the drift-out phenomenon. We place the concept of drift-out on a principled basis and show that sample means as global minimizers of a Fr'echet function never drift out. The adjusted drift-out is a way to test to which extent an approximation is coherent. Empirical results show that solutions obtained by the state-of-the-art methods SSG and DBA are incoherent approximations of a sample mean in over a third of all trials.
Tasks	Time Series, Time Series Averaging
Published	2018-09-07
URL	http://arxiv.org/abs/1809.03371v1
PDF	http://arxiv.org/pdf/1809.03371v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-inaccuracies-of-time-series
Repo
Framework

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching


Title	Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching
Authors	Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu
Abstract	We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping corpus. We propose a fully unsupervised learning algorithm that alternates between solving two sub-problems: (i) learn a phoneme classifier for a given set of phoneme segmentation boundaries, and (ii) refining the phoneme boundaries based on a given classifier. To solve the first sub-problem, we introduce a novel unsupervised cost function named Segmental Empirical Output Distribution Matching, which generalizes the work in (Liu et al., 2017) to segmental structures. For the second sub-problem, we develop an approximate MAP approach to refining the boundaries obtained from Wang et al. (2017). Experimental results on TIMIT dataset demonstrate the success of this fully unsupervised phoneme recognition system, which achieves a phone error rate (PER) of 41.6%. Although it is still far away from the state-of-the-art supervised systems, we show that with oracle boundaries and matching language model, the PER could be improved to 32.5%.This performance approaches the supervised system of the same model architecture, demonstrating the great potential of the proposed method.
Tasks	Language Modelling, Speech Recognition
Published	2018-12-23
URL	http://arxiv.org/abs/1812.09323v1
PDF	http://arxiv.org/pdf/1812.09323v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-speech-recognition-via-segmental
Repo
Framework

Dreaming neural networks: rigorous results


Title	Dreaming neural networks: rigorous results
Authors	Elena Agliari, Francesco Alemanno, Adriano Barra, Alberto Fachechi
Abstract	Recently a daily routine for associative neural networks has been proposed: the network Hebbian-learns during the awake state (thus behaving as a standard Hopfield model), then, during its sleep state, optimizing information storage, it consolidates pure patterns and removes spurious ones: this forces the synaptic matrix to collapse to the projector one (ultimately approaching the Kanter-Sompolinksy model). This procedure keeps the learning Hebbian-based (a biological must) but, by taking advantage of a (properly stylized) sleep phase, still reaches the maximal critical capacity (for symmetric interactions). So far this emerging picture (as well as the bulk of papers on unlearning techniques) was supported solely by mathematically-challenging routes, e.g. mainly replica-trick analysis and numerical simulations: here we rely extensively on Guerra’s interpolation techniques developed for neural networks and, in particular, we extend the generalized stochastic stability approach to the case. Confining our description within the replica symmetric approximation (where the previous ones lie), the picture painted regarding this generalization (and the previously existing variations on theme) is here entirely confirmed. Further, still relying on Guerra’s schemes, we develop a systematic fluctuation analysis to check where ergodicity is broken (an analysis entirely absent in previous investigations). We find that, as long as the network is awake, ergodicity is bounded by the Amit-Gutfreund-Sompolinsky critical line (as it should), but, as the network sleeps, sleeping destroys spin glass states by extending both the retrieval as well as the ergodic region: after an entire sleeping session the solely surviving regions are retrieval and ergodic ones and this allows the network to achieve the perfect retrieval regime (the number of storable patterns equals the number of neurons in the network).
Tasks
Published	2018-12-21
URL	http://arxiv.org/abs/1812.09077v1
PDF	http://arxiv.org/pdf/1812.09077v1.pdf
PWC	https://paperswithcode.com/paper/dreaming-neural-networks-rigorous-results
Repo
Framework

Several Tunable GMM Kernels


Title	Several Tunable GMM Kernels
Authors	Ping Li
Abstract	While tree methods have been popular in practice, researchers and practitioners are also looking for simple algorithms which can reach similar accuracy of trees. In 2010, (Ping Li UAI’10) developed the method of “abc-robust-logitboost” and compared it with other supervised learning methods on datasets used by the deep learning literature. In this study, we propose a series of “tunable GMM kernels” which are simple and perform largely comparably to tree methods on the same datasets. Note that “abc-robust-logitboost” substantially improved the original “GDBT” in that (a) it developed a tree-split formula based on second-order information of the derivatives of the loss function; (b) it developed a new set of derivatives for multi-class classification formulation. In the prior study in 2017, the “generalized min-max” (GMM) kernel was shown to have good performance compared to the “radial-basis function” (RBF) kernel. However, as demonstrated in this paper, the original GMM kernel is often not as competitive as tree methods on the datasets used in the deep learning literature. Since the original GMM kernel has no parameters, we propose tunable GMM kernels by adding tuning parameters in various ways. Three basic (i.e., with only one parameter) GMM kernels are the “$e$GMM kernel”, “$p$GMM kernel”, and “$\gamma$GMM kernel”, respectively. Extensive experiments show that they are able to produce good results for a large number of classification tasks. Furthermore, the basic kernels can be combined to boost the performance.
Tasks
Published	2018-05-08
URL	http://arxiv.org/abs/1805.02830v1
PDF	http://arxiv.org/pdf/1805.02830v1.pdf
PWC	https://paperswithcode.com/paper/several-tunable-gmm-kernels
Repo
Framework

Virtuously Safe Reinforcement Learning


Title	Virtuously Safe Reinforcement Learning
Authors	Henrik Aslund, El Mahdi El Mhamdi, Rachid Guerraoui, Alexandre Maurer
Abstract	We show that when a third party, the adversary, steps into the two-party setting (agent and operator) of safely interruptible reinforcement learning, a trade-off has to be made between the probability of following the optimal policy in the limit, and the probability of escaping a dangerous situation created by the adversary. So far, the work on safely interruptible agents has assumed a perfect perception of the agent about its environment (no adversary), and therefore implicitly set the second probability to zero, by explicitly seeking a value of one for the first probability. We show that (1) agents can be made both interruptible and adversary-resilient, and (2) the interruptibility can be made safe in the sense that the agent itself will not seek to avoid it. We also solve the problem that arises when the agent does not go completely greedy, i.e. issues with safe exploration in the limit. Resilience to perturbed perception, safe exploration in the limit, and safe interruptibility are the three pillars of what we call \emph{virtuously safe reinforcement learning}.
Tasks	Safe Exploration
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11447v1
PDF	http://arxiv.org/pdf/1805.11447v1.pdf
PWC	https://paperswithcode.com/paper/virtuously-safe-reinforcement-learning
Repo
Framework