Paper Group ANR 1142
Measuring Conversational Productivity in Child Forensic Interviews. Overcoming Language Priors in Visual Question Answering with Adversarial Regularization. A Fast, Compact, Accurate Model for Language Identification of Codemixed Text. Language Identification with Deep Bottleneck Features. Construction of Microdata from a Set of Differentially Priv …
Measuring Conversational Productivity in Child Forensic Interviews
Title | Measuring Conversational Productivity in Child Forensic Interviews |
Authors | Victor Ardulov, Manoj Kumar, Shanna Williams, Thomas Lyon, Shrikanth Narayanan |
Abstract | Child Forensic Interviewing (FI) presents a challenge for effective information retrieval and decision making. The high stakes associated with the process demand that expert legal interviewers are able to effectively establish a channel of communication and elicit substantive knowledge from the child-client while minimizing potential for experiencing trauma. As a first step toward computationally modeling and producing quality spoken interviewing strategies and a generalized understanding of interview dynamics, we propose a novel methodology to computationally model effectiveness criteria, by applying summarization and topic modeling techniques to objectively measure and rank the responsiveness and conversational productivity of a child during FI. We score information retrieval by constructing an agenda to represent general topics of interest and measuring alignment with a given response and leveraging lexical entrainment for responsiveness. For comparison, we present our methods along with traditional metrics of evaluation and discuss the use of prior information for generating situational awareness. |
Tasks | Decision Making, Information Retrieval |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.03357v1 |
http://arxiv.org/pdf/1806.03357v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-conversational-productivity-in |
Repo | |
Framework | |
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
Title | Overcoming Language Priors in Visual Question Answering with Adversarial Regularization |
Authors | Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee |
Abstract | Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training such as overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary – discouraging the VQA model from capturing language biases in its question encoding. Further,we leverage this question-only model to estimate the increase in model confidence after considering the image, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models – achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models. |
Tasks | Question Answering, Visual Question Answering |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03649v2 |
http://arxiv.org/pdf/1810.03649v2.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-language-priors-in-visual-question |
Repo | |
Framework | |
A Fast, Compact, Accurate Model for Language Identification of Codemixed Text
Title | A Fast, Compact, Accurate Model for Language Identification of Codemixed Text |
Authors | Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss |
Abstract | We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages. Such text is prevalent online, in documents, social media, and message boards. We show that a feed-forward network with a simple globally constrained decoder can accurately and rapidly label both codemixed and monolingual text in 100 languages and 100 language pairs. This model outperforms previously published multilingual approaches in terms of both accuracy and speed, yielding an 800x speed-up and a 19.5% averaged absolute gain on three codemixed datasets. It furthermore outperforms several benchmark systems on monolingual language identification. |
Tasks | Language Identification |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.04142v1 |
http://arxiv.org/pdf/1810.04142v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fast-compact-accurate-model-for-language |
Repo | |
Framework | |
Language Identification with Deep Bottleneck Features
Title | Language Identification with Deep Bottleneck Features |
Authors | Zhanyu Ma, Hong Yu |
Abstract | In this paper we proposed an end-to-end short utterances speech language identification(SLD) approach based on a Long Short Term Memory (LSTM) neural network which is special suitable for SLD application in intelligent vehicles. Features used for LSTM learning are generated by a transfer learning method. Bottle-neck features of a deep neural network (DNN) which are trained for mandarin acoustic-phonetic classification are used for LSTM training. In order to improve the SLD accuracy of short utterances a phase vocoder based time-scale modification(TSM) method is used to reduce and increase speech rated of the test utterance. By splicing the normal, speech rate reduced and increased utterances, we can extend length of test utterances so as to improved improved the performance of the SLD system. The experimental results on AP17-OLR database shows that the proposed methods can improve the performance of SLD, especially on short utterance with 1s and 3s durations. |
Tasks | Language Identification, Transfer Learning |
Published | 2018-09-18 |
URL | https://arxiv.org/abs/1809.08909v2 |
https://arxiv.org/pdf/1809.08909v2.pdf | |
PWC | https://paperswithcode.com/paper/language-identification-with-deep-bottleneck |
Repo | |
Framework | |
Construction of Microdata from a Set of Differentially Private Low-dimensional Contingency Tables through Solving Linear Equations with Tikhonov Regularization
Title | Construction of Microdata from a Set of Differentially Private Low-dimensional Contingency Tables through Solving Linear Equations with Tikhonov Regularization |
Authors | Evercita C. Eugenio, Fang Liu |
Abstract | When individual-level data are shared for research and public use, they are often perturbed to provide some level of privacy protection. A simple way to perturb a high-dimensional data set where individual-level data can be easily generated with good utility is to sanitize the full contingency table or full-dimensional histogram. However, it can be costly from the data storage and memory perspective to work with full tables. In addition, most of the observed signals in the high-order interactions among all attributes are likely just sample randomness rather than being of statistical significance and rarely of interest to practitioners. We introduce a new algorithm, CIPHER, which can reproduce individual-level data from a set of meaningful differentially private low-dimensional contingency (LDC) tables constructed from the original high-dimensional data, through solving a set of linear equations with the Tikhonov regularization. CIPHER is conceptually simple and requires no more than decomposing joint probabilities via basic probability rules to construct the equation set and subsequently solving linear equations. Compared to full table sanitization, the set of LDC tables that CIPHER works with has drastically lower requirements on data storage and memory. We run experiments to compare CIPHER with the full table sanitization and the multiplicative weighting exponential mechanism (MWEM) which can also be used to generate individual-level synthetic data given a set of LDC tables.The results demonstrate that CIPHER outperforms MWEM in preserving original information at the same privacy budget and converges to the full-table sanitization in utility as the sample data size or the privacy budget increases. |
Tasks | |
Published | 2018-12-12 |
URL | https://arxiv.org/abs/1812.05671v2 |
https://arxiv.org/pdf/1812.05671v2.pdf | |
PWC | https://paperswithcode.com/paper/cipher-construction-of-differentially-private |
Repo | |
Framework | |
Federated Learning for Keyword Spotting
Title | Federated Learning for Keyword Spotting |
Authors | David Leroy, Alice Coucke, Thibaut Lavril, Thibault Gisselbrecht, Joseph Dureau |
Abstract | We propose a practical approach based on federated learning to solve out-of-domain issues with continuously running embedded speech-based models such as wake word detectors. We conduct an extensive empirical study of the federated averaging algorithm for the “Hey Snips” wake word based on a crowdsourced dataset that mimics a federation of wake word users. We empirically demonstrate that using an adaptive averaging strategy inspired from Adam in place of standard weighted model averaging highly reduces the number of communication rounds required to reach our target performance. The associated upstream communication costs per user are estimated at 8 MB, which is a reasonable in the context of smart home voice assistants. Additionally, the dataset used for these experiments is being open sourced with the aim of fostering further transparent research in the application of federated learning to speech data. |
Tasks | Keyword Spotting |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.05512v4 |
http://arxiv.org/pdf/1810.05512v4.pdf | |
PWC | https://paperswithcode.com/paper/federated-learning-for-keyword-spotting |
Repo | |
Framework | |
Monocular Depth Estimation by Learning from Heterogeneous Datasets
Title | Monocular Depth Estimation by Learning from Heterogeneous Datasets |
Authors | Akhil Gurram, Onay Urfalioglu, Ibrahim Halfaoui, Fahd Bouzaraa, Antonio M. Lopez |
Abstract | Depth estimation provides essential information to perform autonomous driving and driver assistance. Especially, Monocular Depth Estimation is interesting from a practical point of view, since using a single camera is cheaper than many other options and avoids the need for continuous calibration strategies as required by stereo-vision approaches. State-of-the-art methods for Monocular Depth Estimation are based on Convolutional Neural Networks (CNNs). A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels, which usually are difficult to annotate (e.g. crowded urban images). Moreover, so far it is common practice to assume that the same raw training data is associated with both types of ground truth, i.e., depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, i.e., that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on Monocular Depth Estimation. |
Tasks | Autonomous Driving, Calibration, Depth Estimation, Monocular Depth Estimation, Semantic Segmentation |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.08018v2 |
http://arxiv.org/pdf/1803.08018v2.pdf | |
PWC | https://paperswithcode.com/paper/monocular-depth-estimation-by-learning-from |
Repo | |
Framework | |
Change Detection between Multimodal Remote Sensing Data Using Siamese CNN
Title | Change Detection between Multimodal Remote Sensing Data Using Siamese CNN |
Authors | Zhenchao Zhang, George Vosselman, Markus Gerke, Devis Tuia, Michael Ying Yang |
Abstract | Detecting topographic changes in the urban environment has always been an important task for urban planning and monitoring. In practice, remote sensing data are often available in different modalities and at different time epochs. Change detection between multimodal data can be very challenging since the data show different characteristics. Given 3D laser scanning point clouds and 2D imagery from different epochs, this paper presents a framework to detect building and tree changes. First, the 2D and 3D data are transformed to image patches, respectively. A Siamese CNN is then employed to detect candidate changes between the two epochs. Finally, the candidate patch-based changes are grouped and verified as individual object changes. Experiments on the urban data show that 86.4% of patch pairs can be correctly classified by the model. |
Tasks | |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09562v1 |
http://arxiv.org/pdf/1807.09562v1.pdf | |
PWC | https://paperswithcode.com/paper/change-detection-between-multimodal-remote |
Repo | |
Framework | |
A Generative Model for Inverse Design of Metamaterials
Title | A Generative Model for Inverse Design of Metamaterials |
Authors | Zhaocheng Liu, Dayu Zhu, Sean P. Rodrigues, Kyu-Tae Lee, Wenshan Cai |
Abstract | The advent of two-dimensional metamaterials in recent years has ushered in a revolutionary means to manipulate the behavior of light on the nanoscale. The effective parameters of these architected materials render unprecedented control over the optical properties of light, thereby eliciting previously unattainable applications in flat lenses, holographic imaging, and emission control among others. The design of such structures, to date, has relied on the expertise of an optical scientist to guide a progression of electromagnetic simulations that iteratively solve Maxwell’s equations until a locally optimized solution can be attained. In this work, we identify a solution to circumvent this intuition-guided design by means of a deep learning architecture. When fed an input set of optical spectra, the constructed generative network assimilates a candidate pattern from a user-defined dataset of geometric structures in order to match the input spectra. The generated metamaterial patterns demonstrate high fidelity, yielding equivalent optical spectra at an average accuracy of about 0.9. This approach reveals an opportunity to expedite the discovery and design of metasurfaces for tailored optical responses in a systematic, inverse-design manner. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10181v1 |
http://arxiv.org/pdf/1805.10181v1.pdf | |
PWC | https://paperswithcode.com/paper/a-generative-model-for-inverse-design-of |
Repo | |
Framework | |
Copenhagen at CoNLL–SIGMORPHON 2018: Multilingual Inflection in Context with Explicit Morphosyntactic Decoding
Title | Copenhagen at CoNLL–SIGMORPHON 2018: Multilingual Inflection in Context with Explicit Morphosyntactic Decoding |
Authors | Yova Kementchedjhieva, Johannes Bjerva, Isabelle Augenstein |
Abstract | This paper documents the Team Copenhagen system which placed first in the CoNLL–SIGMORPHON 2018 shared task on universal morphological reinflection, Task 2 with an overall accuracy of 49.87. Task 2 focuses on morphological inflection in context: generating an inflected word form, given the lemma of the word and the context it occurs in. Previous SIGMORPHON shared tasks have focused on context-agnostic inflection—the “inflection in context” task was introduced this year. We approach this with an encoder-decoder architecture over character sequences with three core innovations, all contributing to an improvement in performance: (1) a wide context window; (2) a multi-task learning approach with the auxiliary task of MSD prediction; (3) training models in a multilingual fashion. |
Tasks | Morphological Inflection, Multi-Task Learning |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01541v1 |
http://arxiv.org/pdf/1809.01541v1.pdf | |
PWC | https://paperswithcode.com/paper/copenhagen-at-conll-sigmorphon-2018 |
Repo | |
Framework | |
Predictive Linguistic Features of Schizophrenia
Title | Predictive Linguistic Features of Schizophrenia |
Authors | Efsun Sarioglu Kayi, Mona Diab, Luca Pauselli, Michael Compton, Glen Coppersmith |
Abstract | Schizophrenia is one of the most disabling and difficult to treat of all human medical/health conditions, ranking in the top ten causes of disability worldwide. It has been a puzzle in part due to difficulty in identifying its basic, fundamental components. Several studies have shown that some manifestations of schizophrenia (e.g., the negative symptoms that include blunting of speech prosody, as well as the disorganization symptoms that lead to disordered language) can be understood from the perspective of linguistics. However, schizophrenia research has not kept pace with technologies in computational linguistics, especially in semantics and pragmatics. As such, we examine the writings of schizophrenia patients analyzing their syntax, semantics and pragmatics. In addition, we analyze tweets of (self pro-claimed) schizophrenia patients who publicly discuss their diagnoses. For writing samples dataset, syntactic features are found to be the most successful in classification whereas for the less structured Twitter dataset, a combination of features performed the best. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09377v1 |
http://arxiv.org/pdf/1810.09377v1.pdf | |
PWC | https://paperswithcode.com/paper/predictive-linguistic-features-of |
Repo | |
Framework | |
Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields
Title | Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields |
Authors | Valentin Barriere, Chloé Clavel, Slim Essid |
Abstract | In this paper, the main goal is to detect a movie reviewer’s opinion using hidden conditional random fields. This model allows us to capture the dynamics of the reviewer’s opinion in the transcripts of long unsegmented audio reviews that are analyzed by our system. High level linguistic features are computed at the level of inter-pausal segments. The features include syntactic features, a statistical word embedding model and subjectivity lexicons. The proposed system is evaluated on the ICT-MMMO corpus. We obtain a F1-score of 82%, which is better than logistic regression and recurrent neural network approaches. We also offer a discussion that sheds some light on the capacity of our system to adapt the word embedding model learned from general written texts data to spoken movie reviews and thus model the dynamics of the opinion. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07787v1 |
http://arxiv.org/pdf/1806.07787v1.pdf | |
PWC | https://paperswithcode.com/paper/opinion-dynamics-modeling-for-movie-review |
Repo | |
Framework | |
Semantic Sentence Embeddings for Paraphrasing and Text Summarization
Title | Semantic Sentence Embeddings for Paraphrasing and Text Summarization |
Authors | Chi Zhang, Shagan Sah, Thang Nguyen, Dheeraj Peri, Alexander Loui, Carl Salvaggio, Raymond Ptucha |
Abstract | This paper introduces a sentence to vector encoding framework suitable for advanced natural language processing. Our latent representation is shown to encode sentences with common semantic information with similar vector representations. The vector representation is extracted from an encoder-decoder model which is trained on sentence paraphrase pairs. We demonstrate the application of the sentence representations for two different tasks – sentence paraphrasing and paragraph summarization, making it attractive for commonly used recurrent frameworks that process text. Experimental results help gain insight how vector representations are suitable for advanced language embedding. |
Tasks | Sentence Embeddings, Text Summarization |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.10267v1 |
http://arxiv.org/pdf/1809.10267v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-sentence-embeddings-for-paraphrasing |
Repo | |
Framework | |
Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution
Title | Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution |
Authors | Zhisheng Zhong, Tiancheng Shen, Yibo Yang, Zhouchen Lin, Chao Zhang |
Abstract | Convolutional neural networks (CNNs) have recently achieved great success in single-image super-resolution (SISR). However, these methods tend to produce over-smoothed outputs and miss some textural details. To solve these problems, we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain. The proposed SRCliqueNet firstly extracts a set of feature maps from the low resolution (LR) image by the clique blocks group. Then we send the set of feature maps to the clique up-sampling module to reconstruct the HR image. The clique up-sampling module consists of four sub-nets which predict the high resolution wavelet coefficients of four sub-bands. Since we consider the edge feature properties of four sub-bands, the four sub-nets are connected to the others so that they can learn the coefficients of four sub-bands jointly. Finally we apply inverse discrete wavelet transform (IDWT) to the output of four sub-nets at the end of the clique up-sampling module to increase the resolution and reconstruct the HR image. Extensive quantitative and qualitative experiments on benchmark datasets show that our method achieves superior performance over the state-of-the-art methods. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04508v3 |
http://arxiv.org/pdf/1809.04508v3.pdf | |
PWC | https://paperswithcode.com/paper/joint-sub-bands-learning-with-clique |
Repo | |
Framework | |
Closed-form Inference and Prediction in Gaussian Process State-Space Models
Title | Closed-form Inference and Prediction in Gaussian Process State-Space Models |
Authors | Alessandro Davide Ialongo, Mark van der Wilk, Carl Edward Rasmussen |
Abstract | We examine an analytic variational inference scheme for the Gaussian Process State Space Model (GPSSM) - a probabilistic model for system identification and time-series modelling. Our approach performs variational inference over both the system states and the transition function. We exploit Markov structure in the true posterior, as well as an inducing point approximation to achieve linear time complexity in the length of the time series. Contrary to previous approaches, no Monte Carlo sampling is required: inference is cast as a deterministic optimisation problem. In a number of experiments, we demonstrate the ability to model non-linear dynamics in the presence of both process and observation noise as well as to impute missing information (e.g. velocities from raw positions through time), to de-noise, and to estimate the underlying dimensionality of the system. Finally, we also introduce a closed-form method for multi-step prediction, and a novel criterion for assessing the quality of our approximate posterior. |
Tasks | Time Series |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03580v1 |
http://arxiv.org/pdf/1812.03580v1.pdf | |
PWC | https://paperswithcode.com/paper/closed-form-inference-and-prediction-in |
Repo | |
Framework | |