Paper Group AWR 65
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights. Two-Player Games for Efficient Non-Convex Constrained Optimization. Automatic Article Commenting: the Task and Dataset. Conditional Generators of Words Definitions. SOLAR: Deep Structured Representations fo …
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
Title | 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans |
Authors | Ji Hou, Angela Dai, Matthias Nießner |
Abstract | We introduce 3D-SIS, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans. The core idea of our method is to jointly learn from both geometric and color signal, thus enabling accurate instance predictions. Rather than operate solely on 2D frames, we observe that most computer vision applications have multi-view RGB-D input available, which we leverage to construct an approach for 3D instance segmentation that effectively fuses together these multi-modal inputs. Our network leverages high-resolution RGB input by associating 2D images with the volumetric grid based on the pose alignment of the 3D reconstruction. For each image, we first extract 2D features for each pixel with a series of 2D convolutions; we then backproject the resulting feature vector to the associated voxel in the 3D grid. This combination of 2D and 3D feature learning allows significantly higher accuracy object detection and instance segmentation than state-of-the-art alternatives. We show results on both synthetic and real-world public benchmarks, achieving an improvement in mAP of over 13 on real-world data. |
Tasks | 3D Instance Segmentation, 3D Reconstruction, 3D Semantic Instance Segmentation, Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2018-12-17 |
URL | http://arxiv.org/abs/1812.07003v3 |
http://arxiv.org/pdf/1812.07003v3.pdf | |
PWC | https://paperswithcode.com/paper/3d-sis-3d-semantic-instance-segmentation-of |
Repo | https://github.com/Sekunde/3D-SIS |
Framework | pytorch |
End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights
Title | End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights |
Authors | Myungsu Chae, Tae-Ho Kim, Young Hoon Shin, June-Woo Kim, Soo-Young Lee |
Abstract | Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks. |
Tasks | Multi-Task Learning |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.00758v3 |
http://arxiv.org/pdf/1809.00758v3.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-multimodal-emotion-and-gender |
Repo | https://github.com/MyungsuChae/IROS2018_ws |
Framework | pytorch |
Two-Player Games for Efficient Non-Convex Constrained Optimization
Title | Two-Player Games for Efficient Non-Convex Constrained Optimization |
Authors | Andrew Cotter, Heinrich Jiang, Karthik Sridharan |
Abstract | In recent years, constrained optimization has become increasingly relevant to the machine learning community, with applications including Neyman-Pearson classification, robust optimization, and fair machine learning. A natural approach to constrained optimization is to optimize the Lagrangian, but this is not guaranteed to work in the non-convex setting, and, if using a first-order method, cannot cope with non-differentiable constraints (e.g. constraints on rates or proportions). The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable–even discontinuous–constraints, which we call the “proxy-Lagrangian”. The first player minimizes external regret in terms of easy-to-optimize “proxy constraints”, while the second player enforces the original constraints by minimizing swap regret. For this new formulation, as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations, however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. |
Tasks | |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06500v2 |
http://arxiv.org/pdf/1804.06500v2.pdf | |
PWC | https://paperswithcode.com/paper/two-player-games-for-efficient-non-convex |
Repo | https://github.com/google-research/tensorflow_constrained_optimization |
Framework | tf |
Automatic Article Commenting: the Task and Dataset
Title | Automatic Article Commenting: the Task and Dataset |
Authors | Lianhui Qin, Lemao Liu, Victoria Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, Shuming Shi |
Abstract | Comments of online articles provide extended views and improve user engagement. Automatically making comments thus become a valuable functionality for online forums, intelligent chatbots, etc. This paper proposes the new task of automatic article commenting, and introduces a large-scale Chinese dataset with millions of real comments and a human-annotated subset characterizing the comments’ varying quality. Incorporating the human bias of comment quality, we further develop automatic metrics that generalize a broad set of popular reference-based metrics and exhibit greatly improved correlations with human evaluations. |
Tasks | |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03668v2 |
http://arxiv.org/pdf/1805.03668v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-article-commenting-the-task-and |
Repo | https://github.com/buaaliuming/Resources-for-Scholarly-Big-Data |
Framework | none |
Conditional Generators of Words Definitions
Title | Conditional Generators of Words Definitions |
Authors | Artyom Gadetsky, Ilya Yakubovskiy, Dmitry Vetrov |
Abstract | We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words. In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms. Our quantitative and qualitative evaluation and analysis of the model shows that taking into account words ambiguity and polysemy leads to performance improvement. |
Tasks | |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.10090v1 |
http://arxiv.org/pdf/1806.10090v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-generators-of-words-definitions |
Repo | https://github.com/agadetsky/pytorch-definitions |
Framework | pytorch |
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Title | SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning |
Authors | Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine |
Abstract | Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy. This enables a model-based RL method based on the linear-quadratic regulator (LQR) to be used for systems with image observations. We evaluate our approach on a range of robotics tasks, including manipulation with a real-world robotic arm directly from images. We find that our method produces substantially better final performance than other model-based RL methods while being significantly more efficient than model-free RL. |
Tasks | |
Published | 2018-08-28 |
URL | https://arxiv.org/abs/1808.09105v4 |
https://arxiv.org/pdf/1808.09105v4.pdf | |
PWC | https://paperswithcode.com/paper/solar-deep-structured-representations-for |
Repo | https://github.com/sharadmv/parasol |
Framework | none |
Building Language Models for Text with Named Entities
Title | Building Language Models for Text with Named Entities |
Authors | Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang |
Abstract | Text in many domains involves a significant amount of named entities. Predict- ing the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a discriminative language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evalu- ate the proposed model. Experimental re- sults show that our model achieves 52.2% better perplexity in recipe generation and 22.06% on code generation than the state-of-the-art language models. |
Tasks | Code Generation, Language Modelling, Recipe Generation |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04836v1 |
http://arxiv.org/pdf/1805.04836v1.pdf | |
PWC | https://paperswithcode.com/paper/building-language-models-for-text-with-named |
Repo | https://github.com/uclanlp/NamedEntityLanguageModel |
Framework | pytorch |
Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge
Title | Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge |
Authors | Abhik Singla, Sindhu Padakandla, Shalabh Bhatnagar |
Abstract | This paper presents our method for enabling a UAV quadrotor, equipped with a monocular camera, to autonomously avoid collisions with obstacles in unstructured and unknown indoor environments. When compared to obstacle avoidance in ground vehicular robots, UAV navigation brings in additional challenges because the UAV motion is no more constrained to a well-defined indoor ground or street environment. Horizontal structures in indoor and outdoor environments like decorative items, furnishings, ceiling fans, sign-boards, tree branches etc., also become relevant obstacles unlike those for ground vehicular robots. Thus, methods of obstacle avoidance developed for ground robots are clearly inadequate for UAV navigation. Current control methods using monocular images for UAV obstacle avoidance are heavily dependent on environment information. These controllers do not fully retain and utilize the extensively available information about the ambient environment for decision making. We propose a deep reinforcement learning based method for UAV obstacle avoidance (OA) and autonomous exploration which is capable of doing exactly the same. The crucial idea in our method is the concept of partial observability and how UAVs can retain relevant information about the environment structure to make better future navigation decisions. Our OA technique uses recurrent neural networks with temporal attention and provides better results compared to prior works in terms of distance covered during navigation without collisions. In addition, our technique has a high inference rate (a key factor in robotic applications) and is energy-efficient as it minimizes oscillatory motion of UAV and reduces power wastage. |
Tasks | Decision Making |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03307v1 |
http://arxiv.org/pdf/1811.03307v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-based-deep-reinforcement-learning-for |
Repo | https://github.com/hbzhang/AwesomeSelfDriving |
Framework | none |
Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web
Title | Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web |
Authors | Diego Esteves, Aniketh Janardhan Reddy, Piyush Chawla, Jens Lehmann |
Abstract | With the growth of the internet, the number of fake-news online has been proliferating every year. The consequences of such phenomena are manifold, ranging from lousy decision-making process to bullying and violence episodes. Therefore, fact-checking algorithms became a valuable asset. To this aim, an important step to detect fake-news is to have access to a credibility score for a given information source. However, most of the widely used Web indicators have either been shut-down to the public (e.g., Google PageRank) or are not free for use (Alexa Rank). Further existing databases are short-manually curated lists of online sources, which do not scale. Finally, most of the research on the topic is theoretical-based or explore confidential data in a restricted simulation environment. In this paper we explore current research, highlight the challenges and propose solutions to tackle the problem of classifying websites into a credibility scale. The proposed model automatically extracts source reputation cues and computes a credibility factor, providing valuable insights which can help in belittling dubious and confirming trustful unknown websites. Experimental results outperform state of the art in the 2-classes and 5-classes setting. |
Tasks | Fake News Detection, Subjectivity Analysis, Web Credibility |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00494v1 |
http://arxiv.org/pdf/1809.00494v1.pdf | |
PWC | https://paperswithcode.com/paper/belittling-the-source-trustworthiness |
Repo | https://github.com/DeFacto/WebCredibility |
Framework | none |
The jamming transition as a paradigm to understand the loss landscape of deep neural networks
Title | The jamming transition as a paradigm to understand the loss landscape of deep neural networks |
Authors | Mario Geiger, Stefano Spigler, Stéphane d’Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart |
Abstract | Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in FC networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime, and puts forward the surprising result that the ability of fully connected networks to fit random data is independent of their depth. Our observations suggests that this independence also holds for real data. We also study a quantity $\Delta$ which characterizes how well ($\Delta<0$) or badly ($\Delta>0$) a datum is learned. At the critical point it is power-law distributed, $P_+(\Delta)\sim\Delta^\theta$ for $\Delta>0$ and $P_-(\Delta)\sim(-\Delta)^{-\gamma}$ for $\Delta<0$, with $\theta\approx0.3$ and $\gamma\approx0.2$. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned. |
Tasks | |
Published | 2018-09-25 |
URL | https://arxiv.org/abs/1809.09349v4 |
https://arxiv.org/pdf/1809.09349v4.pdf | |
PWC | https://paperswithcode.com/paper/the-jamming-transition-as-a-paradigm-to |
Repo | https://github.com/mariogeiger/nn_jamming |
Framework | pytorch |
Lifting Layers: Analysis and Applications
Title | Lifting Layers: Analysis and Applications |
Authors | Peter Ochs, Tim Meinhardt, Laura Leal-Taixe, Michael Moeller |
Abstract | The great advances of learning-based approaches in image processing and computer vision are largely based on deeply nested networks that compose linear transfer functions with suitable non-linearities. Interestingly, the most frequently used non-linearities in imaging applications (variants of the rectified linear unit) are uncommon in low dimensional approximation problems. In this paper we propose a novel non-linear transfer function, called lifting, which is motivated from a related technique in convex optimization. A lifting layer increases the dimensionality of the input, naturally yields a linear spline when combined with a fully connected layer, and therefore closes the gap between low and high dimensional approximation problems. Moreover, applying the lifting operation to the loss layer of the network allows us to handle non-convex and flat (zero-gradient) cost functions. We analyze the proposed lifting theoretically, exemplify interesting properties in synthetic experiments and demonstrate its effectiveness in deep learning approaches to image classification and denoising. |
Tasks | Denoising, Image Classification |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08660v1 |
http://arxiv.org/pdf/1803.08660v1.pdf | |
PWC | https://paperswithcode.com/paper/lifting-layers-analysis-and-applications |
Repo | https://github.com/michimoeller/liftingLayers |
Framework | none |
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Title | Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron |
Authors | RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous |
Abstract | We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task. |
Tasks | Speech Synthesis |
Published | 2018-03-24 |
URL | http://arxiv.org/abs/1803.09047v1 |
http://arxiv.org/pdf/1803.09047v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-end-to-end-prosody-transfer-for |
Repo | https://github.com/syang1993/gst-tacotron |
Framework | tf |
Global Convergence of Block Coordinate Descent in Deep Learning
Title | Global Convergence of Block Coordinate Descent in Deep Learning |
Authors | Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, Yuan Yao |
Abstract | Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods. In particular, for most of the commonly used DNN training models involving both two- and three-splitting schemes, we establish the global convergence to a critical point at a rate of ${\cal O}(1/k)$, where $k$ is the number of iterations. The results extend to general loss functions which have Lipschitz continuous gradients and deep residual networks (ResNets). Our key development adds several new elements to the Kurdyka-{\L}ojasiewicz inequality framework that enables us to carry out the global convergence analysis of BCD in the general scenario of deep learning. |
Tasks | |
Published | 2018-03-01 |
URL | https://arxiv.org/abs/1803.00225v4 |
https://arxiv.org/pdf/1803.00225v4.pdf | |
PWC | https://paperswithcode.com/paper/global-convergence-of-block-coordinate |
Repo | https://github.com/timlautk/BCD-for-DNNs-PyTorch |
Framework | pytorch |
Blindfold Baselines for Embodied QA
Title | Blindfold Baselines for Embodied QA |
Authors | Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville |
Abstract | We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object. |
Tasks | Embodied Question Answering, Question Answering |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.05013v1 |
http://arxiv.org/pdf/1811.05013v1.pdf | |
PWC | https://paperswithcode.com/paper/blindfold-baselines-for-embodied-qa |
Repo | https://github.com/ankeshanand/blindfold-baselines-eqa |
Framework | pytorch |
Improving Machine Reading Comprehension with General Reading Strategies
Title | Improving Machine Reading Comprehension with General Reading Strategies |
Authors | Kai Sun, Dian Yu, Dong Yu, Claire Cardie |
Abstract | Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources – just a pre-trained model and a fixed number of training instances – we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8% absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies. We further fine-tune the resulting model on a target MRC task, leading to an absolute improvement of 6.2% in average accuracy over previous state-of-the-art approaches on six representative non-extractive MRC datasets from different domains (i.e., ARC, OpenBookQA, MCTest, SemEval-2018 Task 11, ROCStories, and MultiRC). These results demonstrate the effectiveness of our proposed strategies and the versatility and general applicability of our fine-tuned models that incorporate these strategies. Core code is available at https://github.com/nlpdata/strategy/. |
Tasks | Language Modelling, Machine Reading Comprehension, Reading Comprehension |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13441v2 |
http://arxiv.org/pdf/1810.13441v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-machine-reading-comprehension-with |
Repo | https://github.com/nlpdata/strategy |
Framework | tf |