Paper Group ANR 572
Speaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging. Syllable-level Neural Language Model for Agglutinative Language. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem. Nuclear Discrepancy for Active Learning. What is (missing or wrong) in the scene? A Hybrid Deep Boltzmann Machine For Contextualized Scene Mo …
Speaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging
Title | Speaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging |
Authors | Young-Bum Kim, Sungjin Lee, Ruhi Sarikaya |
Abstract | In multi-turn dialogs, natural language understanding models can introduce obvious errors by being blind to contextual information. To incorporate dialog history, we present a neural architecture with Speaker-Sensitive Dual Memory Networks which encode utterances differently depending on the speaker. This addresses the different extents of information available to the system - the system knows only the surface form of user utterances while it has the exact semantics of system output. We performed experiments on real user data from Microsoft Cortana, a commercial personal assistant. The result showed a significant performance improvement over the state-of-the-art slot tagging models using contextual information. |
Tasks | |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.10705v1 |
http://arxiv.org/pdf/1711.10705v1.pdf | |
PWC | https://paperswithcode.com/paper/speaker-sensitive-dual-memory-networks-for |
Repo | |
Framework | |
Syllable-level Neural Language Model for Agglutinative Language
Title | Syllable-level Neural Language Model for Agglutinative Language |
Authors | Seunghak Yu, Nilesh Kulkarni, Haejun Lee, Jihie Kim |
Abstract | Language models for agglutinative languages have always been hindered in past due to myriad of agglutinations possible to any given word through various affixes. We propose a method to diminish the problem of out-of-vocabulary words by introducing an embedding derived from syllables and morphemes which leverages the agglutinative property. Our model outperforms character-level embedding in perplexity by 16.87 with 9.50M parameters. Proposed method achieves state of the art performance over existing input prediction methods in terms of Key Stroke Saving and has been commercialized. |
Tasks | Language Modelling |
Published | 2017-08-18 |
URL | http://arxiv.org/abs/1708.05515v1 |
http://arxiv.org/pdf/1708.05515v1.pdf | |
PWC | https://paperswithcode.com/paper/syllable-level-neural-language-model-for |
Repo | |
Framework | |
VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem
Title | VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem |
Authors | Ronald Clark, Sen Wang, Hongkai Wen, Andrew Markham, Niki Trigoni |
Abstract | In this paper we present an on-manifold sequence-to-sequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the first end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches. Specifically, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain specific information which significantly mitigates drift. We show that our approach is competitive with state-of-the-art traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors. |
Tasks | Calibration, Motion Estimation |
Published | 2017-01-29 |
URL | http://arxiv.org/abs/1701.08376v2 |
http://arxiv.org/pdf/1701.08376v2.pdf | |
PWC | https://paperswithcode.com/paper/vinet-visual-inertial-odometry-as-a-sequence |
Repo | |
Framework | |
Nuclear Discrepancy for Active Learning
Title | Nuclear Discrepancy for Active Learning |
Authors | Tom J. Viering, Jesse H. Krijthe, Marco Loog |
Abstract | Active learning algorithms propose which unlabeled objects should be queried for their labels to improve a predictive model the most. We study active learners that minimize generalization bounds and uncover relationships between these bounds that lead to an improved approach to active learning. In particular we show the relation between the bound of the state-of-the-art Maximum Mean Discrepancy (MMD) active learner, the bound of the Discrepancy, and a new and looser bound that we refer to as the Nuclear Discrepancy bound. We motivate this bound by a probabilistic argument: we show it considers situations which are more likely to occur. Our experiments indicate that active learning using the tightest Discrepancy bound performs the worst in terms of the squared loss. Overall, our proposed loosest Nuclear Discrepancy generalization bound performs the best. We confirm our probabilistic argument empirically: the other bounds focus on more pessimistic scenarios that are rarer in practice. We conclude that tightness of bounds is not always of main importance and that active learning methods should concentrate on realistic scenarios in order to improve performance. |
Tasks | Active Learning |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02645v1 |
http://arxiv.org/pdf/1706.02645v1.pdf | |
PWC | https://paperswithcode.com/paper/nuclear-discrepancy-for-active-learning |
Repo | |
Framework | |
What is (missing or wrong) in the scene? A Hybrid Deep Boltzmann Machine For Contextualized Scene Modeling
Title | What is (missing or wrong) in the scene? A Hybrid Deep Boltzmann Machine For Contextualized Scene Modeling |
Authors | İlker Bozcan, Yağmur Oymak, İdil Zeynep Alemdar, Sinan Kalkan |
Abstract | Scene models allow robots to reason about what is in the scene, what else should be in it, and what should not be in it. In this paper, we propose a hybrid Boltzmann Machine (BM) for scene modeling where relations between objects are integrated. To be able to do that, we extend BM to include tri-way edges between visible (object) nodes and make the network to share the relations across different objects. We evaluate our method against several baseline models (Deep Boltzmann Machines, and Restricted Boltzmann Machines) on a scene classification dataset, and show that it performs better in several scene reasoning tasks. |
Tasks | Scene Classification |
Published | 2017-10-16 |
URL | http://arxiv.org/abs/1710.05664v2 |
http://arxiv.org/pdf/1710.05664v2.pdf | |
PWC | https://paperswithcode.com/paper/what-is-missing-or-wrong-in-the-scene-a |
Repo | |
Framework | |
Sentiment Perception of Readers and Writers in Emoji use
Title | Sentiment Perception of Readers and Writers in Emoji use |
Authors | Jose Berengueres, Dani Castro |
Abstract | Previous research has traditionally analyzed emoji sentiment from the point of view of the reader of the content not the author. Here, we analyze emoji sentiment from the point of view of the author and present a emoji sentiment benchmark that was built from an employee happiness dataset where emoji happen to be annotated with daily happiness of the author of the comment. The data spans over 3 years, and 4k employees of 56 companies based in Barcelona. We compare sentiment of writers to readers. Results indicate that, there is an 82% agreement in how emoji sentiment is perceived by readers and writers. Finally, we report that when authors use emoji they report higher levels of happiness. Emoji use was not found to be correlated with differences in author moodiness. |
Tasks | |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00888v2 |
http://arxiv.org/pdf/1710.00888v2.pdf | |
PWC | https://paperswithcode.com/paper/sentiment-perception-of-readers-and-writers |
Repo | |
Framework | |
Integrated Face Analytics Networks through Cross-Dataset Hybrid Training
Title | Integrated Face Analytics Networks through Cross-Dataset Hybrid Training |
Authors | Jianshu Li, Shengtao Xiao, Fang Zhao, Jian Zhao, Jianan Li, Jiashi Feng, Shuicheng Yan, Terence Sim |
Abstract | Face analytics benefits many multimedia applications. It consists of a number of tasks, such as facial emotion recognition and face parsing, and most existing approaches generally treat these tasks independently, which limits their deployment in real scenarios. In this paper we propose an integrated Face Analytics Network (iFAN), which is able to perform multiple tasks jointly for face analytics with a novel carefully designed network architecture to fully facilitate the informative interaction among different tasks. The proposed integrated network explicitly models the interactions between tasks so that the correlations between tasks can be fully exploited for performance boost. In addition, to solve the bottleneck of the absence of datasets with comprehensive training data for various tasks, we propose a novel cross-dataset hybrid training strategy. It allows “plug-in and play” of multiple datasets annotated for different tasks without the requirement of a fully labeled common dataset for all the tasks. We experimentally show that the proposed iFAN achieves state-of-the-art performance on multiple face analytics tasks using a single integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on the Helen dataset for face parsing, a normalized mean error of 5.81% on the MTFL dataset for facial landmark localization and an accuracy of 45.73% on the BNU dataset for emotion recognition with a single model. |
Tasks | Emotion Recognition, Face Alignment |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06055v1 |
http://arxiv.org/pdf/1711.06055v1.pdf | |
PWC | https://paperswithcode.com/paper/integrated-face-analytics-networks-through |
Repo | |
Framework | |
Data Dropout in Arbitrary Basis for Deep Network Regularization
Title | Data Dropout in Arbitrary Basis for Deep Network Regularization |
Authors | Mostafa Rahmani, George Atia |
Abstract | An important problem in training deep networks with high capacity is to ensure that the trained network works well when presented with new inputs outside the training dataset. Dropout is an effective regularization technique to boost the network generalization in which a random subset of the elements of the given data and the extracted features are set to zero during the training process. In this paper, a new randomized regularization technique in which we withhold a random part of the data without necessarily turning off the neurons/data-elements is proposed. In the proposed method, of which the conventional dropout is shown to be a special case, random data dropout is performed in an arbitrary basis, hence the designation Generalized Dropout. We also present a framework whereby the proposed technique can be applied efficiently to convolutional neural networks. The presented numerical experiments demonstrate that the proposed technique yields notable performance gain. Generalized Dropout provides new insight into the idea of dropout, shows that we can achieve different performance gains by using different bases matrices, and opens up a new research question as of how to choose optimal bases matrices that achieve maximal performance gain. |
Tasks | |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.00891v2 |
http://arxiv.org/pdf/1712.00891v2.pdf | |
PWC | https://paperswithcode.com/paper/data-dropout-in-arbitrary-basis-for-deep |
Repo | |
Framework | |
Lexical and Derivational Meaning in Vector-Based Models of Relativisation
Title | Lexical and Derivational Meaning in Vector-Based Models of Relativisation |
Authors | Michael Moortgat, Gijs Wijnholds |
Abstract | Sadrzadeh et al (2013) present a compositional distributional analysis of relative clauses in English in terms of the Frobenius algebraic structure of finite dimensional vector spaces. The analysis relies on distinct type assignments and lexical recipes for subject vs object relativisation. The situation for Dutch is different: because of the verb final nature of Dutch, relative clauses are ambiguous between a subject vs object relativisation reading. Using an extended version of Lambek calculus, we present a compositional distributional framework that accounts for this derivational ambiguity, and that allows us to give a single meaning recipe for the relative pronoun reconciling the Frobenius semantics with the demands of Dutch derivational syntax. |
Tasks | |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11513v2 |
http://arxiv.org/pdf/1711.11513v2.pdf | |
PWC | https://paperswithcode.com/paper/lexical-and-derivational-meaning-in-vector |
Repo | |
Framework | |
On Natural Language Generation of Formal Argumentation
Title | On Natural Language Generation of Formal Argumentation |
Authors | Federico Cerutti, Alice Toniolo, Timothy J. Norman |
Abstract | In this paper we provide a first analysis of the research questions that arise when dealing with the problem of communicating pieces of formal argumentation through natural language interfaces. It is a generally held opinion that formal models of argumentation naturally capture human argument, and some preliminary studies have focused on justifying this view. Unfortunately, the results are not only inconclusive, but seem to suggest that explaining formal argumentation to humans is a rather articulated task. Graphical models for expressing argumentation-based reasoning are appealing, but often humans require significant training to use these tools effectively. We claim that natural language interfaces to formal argumentation systems offer a real alternative, and may be the way forward for systems that capture human argument. |
Tasks | Text Generation |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.04033v1 |
http://arxiv.org/pdf/1706.04033v1.pdf | |
PWC | https://paperswithcode.com/paper/on-natural-language-generation-of-formal |
Repo | |
Framework | |
Intelligent Personal Assistant with Knowledge Navigation
Title | Intelligent Personal Assistant with Knowledge Navigation |
Authors | Amit Kumar, Rahul Dutta, Harbhajan Rai |
Abstract | An Intelligent Personal Agent (IPA) is an agent that has the purpose of helping the user to gain information through reliable resources with the help of knowledge navigation techniques and saving time to search the best content. The agent is also responsible for responding to the chat-based queries with the help of Conversation Corpus. We will be testing different methods for optimal query generation. To felicitate the ease of usage of the application, the agent will be able to accept the input through Text (Keyboard), Voice (Speech Recognition) and Server (Facebook) and output responses using the same method. Existing chat bots reply by making changes in the input, but we will give responses based on multiple SRT files. The model will learn using the human dialogs dataset and will be able respond human-like. Responses to queries about famous things (places, people, and words) can be provided using web scraping which will enable the bot to have knowledge navigation features. The agent will even learn from its past experiences supporting semi-supervised learning. |
Tasks | Speech Recognition |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08950v1 |
http://arxiv.org/pdf/1704.08950v1.pdf | |
PWC | https://paperswithcode.com/paper/intelligent-personal-assistant-with-knowledge |
Repo | |
Framework | |
Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point
Title | Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point |
Authors | Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul |
Abstract | We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing the number of ternary operations. We show that with a cluster size of N=4 on Resnet-101, can achieve 71.8% TOP-1 accuracy, within 6% of the best full precision results while replacing ~85% of all multiplications with 8-bit accumulations. Using the same method with 4-bit weights achieves 76.3% TOP-1 accuracy which within 2% of the full precision result. We also study the impact of the size of the cluster on both performance and accuracy, larger cluster sizes N=64 can replace ~98% of the multiplications with ternary operations but introduces significant drop in accuracy which necessitates fine tuning the parameters with retraining the network at lower precision. To address this we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights by pre-initializing the network with full precision weights and achieve 68.9% TOP-1 accuracy within 4 additional epochs. Our final quantized model can run on a full 8-bit compute pipeline, with a potential 16x improvement in performance compared to baseline full-precision models. |
Tasks | Quantization |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1701.08978v2 |
http://arxiv.org/pdf/1701.08978v2.pdf | |
PWC | https://paperswithcode.com/paper/mixed-low-precision-deep-learning-inference |
Repo | |
Framework | |
Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers
Title | Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers |
Authors | Andreas Doerr, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Sebastian Trimpe |
Abstract | PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system’s state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems. |
Tasks | |
Published | 2017-03-08 |
URL | http://arxiv.org/abs/1703.02899v1 |
http://arxiv.org/pdf/1703.02899v1.pdf | |
PWC | https://paperswithcode.com/paper/model-based-policy-search-for-automatic |
Repo | |
Framework | |
Application of Deep Learning in Neuroradiology: Automated Detection of Basal Ganglia Hemorrhage using 2D-Convolutional Neural Networks
Title | Application of Deep Learning in Neuroradiology: Automated Detection of Basal Ganglia Hemorrhage using 2D-Convolutional Neural Networks |
Authors | Vishal Desai, Adam E. Flanders, Paras Lakhani |
Abstract | Background: Deep learning techniques have achieved high accuracy in image classification tasks, and there is interest in applicability to neuroimaging critical findings. This study evaluates the efficacy of 2D deep convolutional neural networks (DCNNs) for detecting basal ganglia (BG) hemorrhage on noncontrast head CT. Materials and Methods: 170 unique de-identified HIPAA-compliant noncontrast head CTs were obtained, those with and without BG hemorrhage. 110 cases were held-out for test, and 60 were split into training (45) and validation (15), consisting of 20 right, 20 left, and 20 no BG hemorrhage. Data augmentation was performed to increase size and variation of the training dataset by 48-fold. Two DCNNs were used to classify the images-AlexNet and GoogLeNet-using untrained networks and those pre-trained on ImageNet. Area under the curves (AUC) for the receiver-operator characteristic (ROC) curves were calculated, using the DeLong method for statistical comparison of ROCs. Results: The best performing model was the pre-trained augmented GoogLeNet, which had an AUC of 1.00 in classification of hemorrhage. Preprocessing augmentation increased accuracy for all networks (p<0.001), and pretrained networks outperformed untrained ones (p<0.001) for the unaugmented models. The best performing GoogLeNet model (AUC 1.00) outperformed the best performing AlexNet model (AUC 0.95)(p=0.01). Conclusion: For this dataset, the best performing DCNN identified BG hemorrhage on noncontrast head CT with an AUC of 1.00. Pretrained networks and data augmentation increased classifier accuracy. Future prospective research would be important to determine if the accuracy can be maintained on a larger cohort of patients and for very small hemorrhages. |
Tasks | Data Augmentation, Image Classification |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03823v2 |
http://arxiv.org/pdf/1710.03823v2.pdf | |
PWC | https://paperswithcode.com/paper/application-of-deep-learning-in |
Repo | |
Framework | |
Generating Video Descriptions with Topic Guidance
Title | Generating Video Descriptions with Topic Guidance |
Authors | Shizhe Chen, Jia Chen, Qin Jin |
Abstract | Generating video descriptions in natural language (a.k.a. video captioning) is a more challenging task than image captioning as the videos are intrinsically more complicated than images in two aspects. First, videos cover a broader range of topics, such as news, music, sports and so on. Second, multiple topics could coexist in the same video. In this paper, we propose a novel caption model, topic-guided model (TGM), to generate topic-oriented descriptions for videos in the wild via exploiting topic information. In addition to predefined topics, i.e., category tags crawled from the web, we also mine topics in a data-driven way based on training captions by an unsupervised topic mining model. We show that data-driven topics reflect a better topic schema than the predefined topics. As for testing video topic prediction, we treat the topic mining model as teacher to train the student, the topic prediction model, by utilizing the full multi-modalities in the video especially the speech modality. We propose a series of caption models to exploit topic guidance, including implicitly using the topics as input features to generate words related to the topic and explicitly modifying the weights in the decoder with topics to function as an ensemble of topic-aware language decoders. Our comprehensive experimental results on the current largest video caption dataset MSR-VTT prove the effectiveness of our topic-guided model, which significantly surpasses the winning performance in the 2016 MSR video to language challenge. |
Tasks | Image Captioning, Video Captioning |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09666v2 |
http://arxiv.org/pdf/1708.09666v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-video-descriptions-with-topic |
Repo | |
Framework | |