October 21, 2019

3210 words 16 mins read

Paper Group AWR 65

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights. Two-Player Games for Efficient Non-Convex Constrained Optimization. Automatic Article Commenting: the Task and Dataset. Conditional Generators of Words Definitions. SOLAR: Deep Structured Representations fo …

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans


Title	3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
Authors	Ji Hou, Angela Dai, Matthias Nießner
Abstract	We introduce 3D-SIS, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans. The core idea of our method is to jointly learn from both geometric and color signal, thus enabling accurate instance predictions. Rather than operate solely on 2D frames, we observe that most computer vision applications have multi-view RGB-D input available, which we leverage to construct an approach for 3D instance segmentation that effectively fuses together these multi-modal inputs. Our network leverages high-resolution RGB input by associating 2D images with the volumetric grid based on the pose alignment of the 3D reconstruction. For each image, we first extract 2D features for each pixel with a series of 2D convolutions; we then backproject the resulting feature vector to the associated voxel in the 3D grid. This combination of 2D and 3D feature learning allows significantly higher accuracy object detection and instance segmentation than state-of-the-art alternatives. We show results on both synthetic and real-world public benchmarks, achieving an improvement in mAP of over 13 on real-world data.
Tasks	3D Instance Segmentation, 3D Reconstruction, 3D Semantic Instance Segmentation, Instance Segmentation, Object Detection, Semantic Segmentation
Published	2018-12-17
URL	http://arxiv.org/abs/1812.07003v3
PDF	http://arxiv.org/pdf/1812.07003v3.pdf
PWC	https://paperswithcode.com/paper/3d-sis-3d-semantic-instance-segmentation-of
Repo	https://github.com/Sekunde/3D-SIS
Framework	pytorch

End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights


Title	End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights
Authors	Myungsu Chae, Tae-Ho Kim, Young Hoon Shin, June-Woo Kim, Soo-Young Lee
Abstract	Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks.
Tasks	Multi-Task Learning
Published	2018-09-04
URL	http://arxiv.org/abs/1809.00758v3
PDF	http://arxiv.org/pdf/1809.00758v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-multimodal-emotion-and-gender
Repo	https://github.com/MyungsuChae/IROS2018_ws
Framework	pytorch

Two-Player Games for Efficient Non-Convex Constrained Optimization


Title	Two-Player Games for Efficient Non-Convex Constrained Optimization
Authors	Andrew Cotter, Heinrich Jiang, Karthik Sridharan
Abstract	In recent years, constrained optimization has become increasingly relevant to the machine learning community, with applications including Neyman-Pearson classification, robust optimization, and fair machine learning. A natural approach to constrained optimization is to optimize the Lagrangian, but this is not guaranteed to work in the non-convex setting, and, if using a first-order method, cannot cope with non-differentiable constraints (e.g. constraints on rates or proportions). The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable–even discontinuous–constraints, which we call the “proxy-Lagrangian”. The first player minimizes external regret in terms of easy-to-optimize “proxy constraints”, while the second player enforces the original constraints by minimizing swap regret. For this new formulation, as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations, however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms.
Tasks
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06500v2
PDF	http://arxiv.org/pdf/1804.06500v2.pdf
PWC	https://paperswithcode.com/paper/two-player-games-for-efficient-non-convex
Repo	https://github.com/google-research/tensorflow_constrained_optimization
Framework	tf

Automatic Article Commenting: the Task and Dataset


Title	Automatic Article Commenting: the Task and Dataset
Authors	Lianhui Qin, Lemao Liu, Victoria Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, Shuming Shi
Abstract	Comments of online articles provide extended views and improve user engagement. Automatically making comments thus become a valuable functionality for online forums, intelligent chatbots, etc. This paper proposes the new task of automatic article commenting, and introduces a large-scale Chinese dataset with millions of real comments and a human-annotated subset characterizing the comments’ varying quality. Incorporating the human bias of comment quality, we further develop automatic metrics that generalize a broad set of popular reference-based metrics and exhibit greatly improved correlations with human evaluations.
Tasks
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03668v2
PDF	http://arxiv.org/pdf/1805.03668v2.pdf
PWC	https://paperswithcode.com/paper/automatic-article-commenting-the-task-and
Repo	https://github.com/buaaliuming/Resources-for-Scholarly-Big-Data
Framework	none

Conditional Generators of Words Definitions


Title	Conditional Generators of Words Definitions
Authors	Artyom Gadetsky, Ilya Yakubovskiy, Dmitry Vetrov
Abstract	We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words. In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms. Our quantitative and qualitative evaluation and analysis of the model shows that taking into account words ambiguity and polysemy leads to performance improvement.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.10090v1
PDF	http://arxiv.org/pdf/1806.10090v1.pdf
PWC	https://paperswithcode.com/paper/conditional-generators-of-words-definitions
Repo	https://github.com/agadetsky/pytorch-definitions
Framework	pytorch

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning


Title	SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Authors	Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine
Abstract	Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy. This enables a model-based RL method based on the linear-quadratic regulator (LQR) to be used for systems with image observations. We evaluate our approach on a range of robotics tasks, including manipulation with a real-world robotic arm directly from images. We find that our method produces substantially better final performance than other model-based RL methods while being significantly more efficient than model-free RL.
Tasks
Published	2018-08-28
URL	https://arxiv.org/abs/1808.09105v4
PDF	https://arxiv.org/pdf/1808.09105v4.pdf
PWC	https://paperswithcode.com/paper/solar-deep-structured-representations-for
Repo	https://github.com/sharadmv/parasol
Framework	none

Building Language Models for Text with Named Entities


Title	Building Language Models for Text with Named Entities
Authors	Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang
Abstract	Text in many domains involves a significant amount of named entities. Predict- ing the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a discriminative language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evalu- ate the proposed model. Experimental re- sults show that our model achieves 52.2% better perplexity in recipe generation and 22.06% on code generation than the state-of-the-art language models.
Tasks	Code Generation, Language Modelling, Recipe Generation
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04836v1
PDF	http://arxiv.org/pdf/1805.04836v1.pdf
PWC	https://paperswithcode.com/paper/building-language-models-for-text-with-named
Repo	https://github.com/uclanlp/NamedEntityLanguageModel
Framework	pytorch

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge


Title	Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge
Authors	Abhik Singla, Sindhu Padakandla, Shalabh Bhatnagar
Abstract	This paper presents our method for enabling a UAV quadrotor, equipped with a monocular camera, to autonomously avoid collisions with obstacles in unstructured and unknown indoor environments. When compared to obstacle avoidance in ground vehicular robots, UAV navigation brings in additional challenges because the UAV motion is no more constrained to a well-defined indoor ground or street environment. Horizontal structures in indoor and outdoor environments like decorative items, furnishings, ceiling fans, sign-boards, tree branches etc., also become relevant obstacles unlike those for ground vehicular robots. Thus, methods of obstacle avoidance developed for ground robots are clearly inadequate for UAV navigation. Current control methods using monocular images for UAV obstacle avoidance are heavily dependent on environment information. These controllers do not fully retain and utilize the extensively available information about the ambient environment for decision making. We propose a deep reinforcement learning based method for UAV obstacle avoidance (OA) and autonomous exploration which is capable of doing exactly the same. The crucial idea in our method is the concept of partial observability and how UAVs can retain relevant information about the environment structure to make better future navigation decisions. Our OA technique uses recurrent neural networks with temporal attention and provides better results compared to prior works in terms of distance covered during navigation without collisions. In addition, our technique has a high inference rate (a key factor in robotic applications) and is energy-efficient as it minimizes oscillatory motion of UAV and reduces power wastage.
Tasks	Decision Making
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03307v1
PDF	http://arxiv.org/pdf/1811.03307v1.pdf
PWC	https://paperswithcode.com/paper/memory-based-deep-reinforcement-learning-for
Repo	https://github.com/hbzhang/AwesomeSelfDriving
Framework	none

Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web


Title	Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web
Authors	Diego Esteves, Aniketh Janardhan Reddy, Piyush Chawla, Jens Lehmann
Abstract	With the growth of the internet, the number of fake-news online has been proliferating every year. The consequences of such phenomena are manifold, ranging from lousy decision-making process to bullying and violence episodes. Therefore, fact-checking algorithms became a valuable asset. To this aim, an important step to detect fake-news is to have access to a credibility score for a given information source. However, most of the widely used Web indicators have either been shut-down to the public (e.g., Google PageRank) or are not free for use (Alexa Rank). Further existing databases are short-manually curated lists of online sources, which do not scale. Finally, most of the research on the topic is theoretical-based or explore confidential data in a restricted simulation environment. In this paper we explore current research, highlight the challenges and propose solutions to tackle the problem of classifying websites into a credibility scale. The proposed model automatically extracts source reputation cues and computes a credibility factor, providing valuable insights which can help in belittling dubious and confirming trustful unknown websites. Experimental results outperform state of the art in the 2-classes and 5-classes setting.
Tasks	Fake News Detection, Subjectivity Analysis, Web Credibility
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00494v1
PDF	http://arxiv.org/pdf/1809.00494v1.pdf
PWC	https://paperswithcode.com/paper/belittling-the-source-trustworthiness
Repo	https://github.com/DeFacto/WebCredibility
Framework	none

The jamming transition as a paradigm to understand the loss landscape of deep neural networks


Title	The jamming transition as a paradigm to understand the loss landscape of deep neural networks
Authors	Mario Geiger, Stefano Spigler, Stéphane d’Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart
Abstract	Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in FC networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime, and puts forward the surprising result that the ability of fully connected networks to fit random data is independent of their depth. Our observations suggests that this independence also holds for real data. We also study a quantity $\Delta$ which characterizes how well ($\Delta<0$) or badly ($\Delta>0$) a datum is learned. At the critical point it is power-law distributed, $P_+(\Delta)\sim\Delta^\theta$ for $\Delta>0$ and $P_-(\Delta)\sim(-\Delta)^{-\gamma}$ for $\Delta<0$, with $\theta\approx0.3$ and $\gamma\approx0.2$. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned.
Tasks
Published	2018-09-25
URL	https://arxiv.org/abs/1809.09349v4
PDF	https://arxiv.org/pdf/1809.09349v4.pdf
PWC	https://paperswithcode.com/paper/the-jamming-transition-as-a-paradigm-to
Repo	https://github.com/mariogeiger/nn_jamming
Framework	pytorch

Lifting Layers: Analysis and Applications


Title	Lifting Layers: Analysis and Applications
Authors	Peter Ochs, Tim Meinhardt, Laura Leal-Taixe, Michael Moeller
Abstract	The great advances of learning-based approaches in image processing and computer vision are largely based on deeply nested networks that compose linear transfer functions with suitable non-linearities. Interestingly, the most frequently used non-linearities in imaging applications (variants of the rectified linear unit) are uncommon in low dimensional approximation problems. In this paper we propose a novel non-linear transfer function, called lifting, which is motivated from a related technique in convex optimization. A lifting layer increases the dimensionality of the input, naturally yields a linear spline when combined with a fully connected layer, and therefore closes the gap between low and high dimensional approximation problems. Moreover, applying the lifting operation to the loss layer of the network allows us to handle non-convex and flat (zero-gradient) cost functions. We analyze the proposed lifting theoretically, exemplify interesting properties in synthetic experiments and demonstrate its effectiveness in deep learning approaches to image classification and denoising.
Tasks	Denoising, Image Classification
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08660v1
PDF	http://arxiv.org/pdf/1803.08660v1.pdf
PWC	https://paperswithcode.com/paper/lifting-layers-analysis-and-applications
Repo	https://github.com/michimoeller/liftingLayers
Framework	none

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron


Title	Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Authors	RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous
Abstract	We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task.
Tasks	Speech Synthesis
Published	2018-03-24
URL	http://arxiv.org/abs/1803.09047v1
PDF	http://arxiv.org/pdf/1803.09047v1.pdf
PWC	https://paperswithcode.com/paper/towards-end-to-end-prosody-transfer-for
Repo	https://github.com/syang1993/gst-tacotron
Framework	tf

Global Convergence of Block Coordinate Descent in Deep Learning


Title	Global Convergence of Block Coordinate Descent in Deep Learning
Authors	Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, Yuan Yao
Abstract	Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods. In particular, for most of the commonly used DNN training models involving both two- and three-splitting schemes, we establish the global convergence to a critical point at a rate of ${\cal O}(1/k)$, where $k$ is the number of iterations. The results extend to general loss functions which have Lipschitz continuous gradients and deep residual networks (ResNets). Our key development adds several new elements to the Kurdyka-{\L}ojasiewicz inequality framework that enables us to carry out the global convergence analysis of BCD in the general scenario of deep learning.
Tasks
Published	2018-03-01
URL	https://arxiv.org/abs/1803.00225v4
PDF	https://arxiv.org/pdf/1803.00225v4.pdf
PWC	https://paperswithcode.com/paper/global-convergence-of-block-coordinate
Repo	https://github.com/timlautk/BCD-for-DNNs-PyTorch
Framework	pytorch

Blindfold Baselines for Embodied QA


Title	Blindfold Baselines for Embodied QA
Authors	Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville
Abstract	We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.
Tasks	Embodied Question Answering, Question Answering
Published	2018-11-12
URL	http://arxiv.org/abs/1811.05013v1
PDF	http://arxiv.org/pdf/1811.05013v1.pdf
PWC	https://paperswithcode.com/paper/blindfold-baselines-for-embodied-qa
Repo	https://github.com/ankeshanand/blindfold-baselines-eqa
Framework	pytorch

Improving Machine Reading Comprehension with General Reading Strategies


Title	Improving Machine Reading Comprehension with General Reading Strategies
Authors	Kai Sun, Dian Yu, Dong Yu, Claire Cardie
Abstract	Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources – just a pre-trained model and a fixed number of training instances – we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8% absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies. We further fine-tune the resulting model on a target MRC task, leading to an absolute improvement of 6.2% in average accuracy over previous state-of-the-art approaches on six representative non-extractive MRC datasets from different domains (i.e., ARC, OpenBookQA, MCTest, SemEval-2018 Task 11, ROCStories, and MultiRC). These results demonstrate the effectiveness of our proposed strategies and the versatility and general applicability of our fine-tuned models that incorporate these strategies. Core code is available at https://github.com/nlpdata/strategy/.
Tasks	Language Modelling, Machine Reading Comprehension, Reading Comprehension
Published	2018-10-31
URL	http://arxiv.org/abs/1810.13441v2
PDF	http://arxiv.org/pdf/1810.13441v2.pdf
PWC	https://paperswithcode.com/paper/improving-machine-reading-comprehension-with
Repo	https://github.com/nlpdata/strategy
Framework	tf