February 1, 2020

3015 words 15 mins read

Paper Group AWR 262

Paper Group AWR 262

Image search using multilingual texts: a cross-modal learning approach between image and text. Show, Translate and Tell. Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation. The Convex Information Bottleneck Lagrangian. Counterfactual Story Reasoning and Generation. Improving Neural Story Generatio …

Image search using multilingual texts: a cross-modal learning approach between image and text

Title Image search using multilingual texts: a cross-modal learning approach between image and text
Authors Maxime Portaz, Hicham Randrianarivo, Adrien Nivaggioli, Estelle Maudet, Christophe Servan, Sylvain Peyronnet
Abstract Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7].
Tasks Image Retrieval
Published 2019-03-27
URL https://arxiv.org/abs/1903.11299v3
PDF https://arxiv.org/pdf/1903.11299v3.pdf
PWC https://paperswithcode.com/paper/image-search-using-multilingual-texts-a-cross
Repo https://github.com/QwantResearch/text-image-similarity
Framework pytorch

Show, Translate and Tell

Title Show, Translate and Tell
Authors Dheeraj Peri, Shagan Sah, Raymond Ptucha
Abstract Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech. Recent success of deep neural networks has enabled us to develop algorithms which give machines the ability to understand and interpret this information. There is a need to both broaden their applicability and develop methods which correlate visual information along with semantic content. We propose a unified model which jointly trains on images and captions, and learns to generate new captions given either an image or a caption query. We evaluate our model on three different tasks namely cross-modal retrieval, image captioning, and sentence paraphrasing. Our model gains insight into cross-modal vector embeddings, generalizes well on multiple tasks and is competitive to state of the art methods on retrieval.
Tasks Cross-Modal Retrieval, Image Captioning
Published 2019-03-14
URL http://arxiv.org/abs/1903.06275v1
PDF http://arxiv.org/pdf/1903.06275v1.pdf
PWC https://paperswithcode.com/paper/show-translate-and-tell
Repo https://github.com/peri044/STT
Framework tf

Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation

Title Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation
Authors Nicholas Ruiz, Marcello Federico
Abstract We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Levenshtein alignment is performed. The aligned phonemes are recombined into aligned words that adjust the word alignment labels in each error region. We demonstrate that our Phonetically-Oriented Word Error Rate (POWER) yields similar scores to WER with the added advantages of better word alignments and the ability to capture one-to-many word alignments corresponding to homophonic errors in speech recognition hypotheses. These improved alignments allow us to better trace the impact of Levenshtein error types on downstream tasks such as speech translation.
Tasks Speech Recognition, Word Alignment
Published 2019-04-24
URL http://arxiv.org/abs/1904.11024v1
PDF http://arxiv.org/pdf/1904.11024v1.pdf
PWC https://paperswithcode.com/paper/phonetically-oriented-word-error-alignment
Repo https://github.com/NickRuiz/power-asr
Framework none

The Convex Information Bottleneck Lagrangian

Title The Convex Information Bottleneck Lagrangian
Authors Borja Rodríguez Gálvez, Ragnar Thobaben, Mikael Skoglund
Abstract The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations $T$ of some random variable $X$ for the task of predicting $Y$. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, $I(T;Y)$, while ensuring that a certain level of compression $r$ is achieved (i.e., $ I(X;T) \leq r$). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., $\mathcal{L}{\text{IB}}(T;\beta) = I(T;Y) - \beta I(X;T)$) for many values of $\beta \in [0,1]$. Then, the curve of maximal $I(T;Y)$ for a given $I(X;T)$ is drawn and a representation with the desired predictability and compression is selected. It is known when $Y$ is a deterministic function of $X$, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: $\mathcal{L}{\text{sq-IB}}(T;\beta_{\text{sq}})=I(T;Y)-\beta_{\text{sq}}I(X;T)^2$. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one mapping between the Lagrange multiplier and the desired compression rate $r$ for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization.
Tasks
Published 2019-11-25
URL https://arxiv.org/abs/1911.11000v2
PDF https://arxiv.org/pdf/1911.11000v2.pdf
PWC https://paperswithcode.com/paper/the-convex-information-bottleneck-lagrangian-1
Repo https://github.com/burklight/convex-IB-Lagrangian-PyTorch
Framework pytorch

Counterfactual Story Reasoning and Generation

Title Counterfactual Story Reasoning and Generation
Authors Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Yejin Choi
Abstract Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes. Despite being considered a necessary component of AI-complete systems, few resources have been developed for evaluating counterfactual reasoning in narratives. In this paper, we propose Counterfactual Story Rewriting: given an original story and an intervening counterfactual event, the task is to minimally revise the story to make it compatible with the given counterfactual event. Solving this task will require deep understanding of causal narrative chains and counterfactual invariance, and integration of such story reasoning capabilities into conditional language generation models. We present TimeTravel, a new dataset of 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story compatible with the counterfactual event. Additionally, we include 80,115 counterfactual “branches” without a rewritten storyline to support future work on semi- or un-supervised approaches to counterfactual story rewriting. Finally, we evaluate the counterfactual rewriting capacities of several competitive baselines based on pretrained language models, and assess whether common overlap and model-based automatic metrics for text generation correlate well with human scores for counterfactual rewriting.
Tasks Text Generation
Published 2019-09-09
URL https://arxiv.org/abs/1909.04076v2
PDF https://arxiv.org/pdf/1909.04076v2.pdf
PWC https://paperswithcode.com/paper/counterfactual-story-reasoning-and-generation
Repo https://github.com/qkaren/Counterfactual-StoryRW
Framework tf

Improving Neural Story Generation by Targeted Common Sense Grounding

Title Improving Neural Story Generation by Targeted Common Sense Grounding
Authors Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian McAuley, Garrison W. Cottrell
Abstract Stories generated with neural language models have shown promise in grammatical and stylistic consistency. However, the generated stories are still lacking in common sense reasoning, e.g., they often contain sentences deprived of world knowledge. We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding. When combined with our two-stage fine-tuning pipeline, our method achieves improved common sense reasoning and state-of-the-art perplexity on the Writing Prompts (Fan et al., 2018) story generation dataset.
Tasks Common Sense Reasoning, Multi-Task Learning
Published 2019-08-26
URL https://arxiv.org/abs/1908.09451v2
PDF https://arxiv.org/pdf/1908.09451v2.pdf
PWC https://paperswithcode.com/paper/improving-neural-story-generation-by-targeted
Repo https://github.com/calclavia/story-generation
Framework pytorch

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Title ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning
Authors Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
Abstract In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos – such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation. By contrast, our Convolutional Neural Network (CNN)-based approach is trained to calculate video-to-video similarity from refined frame-to-frame similarity matrices, so as to consider both intra- and inter-frame relations. In the proposed method, pairwise frame similarity is estimated by applying Tensor Dot (TD) followed by Chamfer Similarity (CS) on regional CNN frame features - this avoids feature aggregation before the similarity calculation between frames. Subsequently, the similarity matrix between all video frames is fed to a four-layer CNN, and then summarized using Chamfer Similarity (CS) into a video-to-video similarity score – this avoids feature aggregation before the similarity calculation between videos and captures the temporal similarity patterns between matching frame sequences. We train the proposed network using a triplet loss scheme and evaluate it on five public benchmark datasets on four different video retrieval problems where we demonstrate large improvements in comparison to the state of the art. The implementation of ViSiL is publicly available.
Tasks Video Retrieval, Video Similarity
Published 2019-08-20
URL https://arxiv.org/abs/1908.07410v1
PDF https://arxiv.org/pdf/1908.07410v1.pdf
PWC https://paperswithcode.com/paper/visil-fine-grained-spatio-temporal-video
Repo https://github.com/MKLab-ITI/visil
Framework tf

Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost

Title Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost
Authors Chen Wang, Chengyuan Deng, Suzhen Wang
Abstract The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors’ knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a general-purpose extension on XGBoost for label-imbalanced scenarios. In this paper, the design and usage of the package are described with exemplar code listings, and its convenience to be integrated into Python-driven Machine Learning projects is illustrated. Furthermore, as the first- and second-order derivatives of the loss functions are essential for the implementations, the algebraic derivation is discussed and it can be deemed as a separate algorithmic contribution. The performances of the algorithms implemented in the package are empirically evaluated on Parkinson’s disease classification data set, and multiple state-of-the-art performances have been observed. Given the scalable nature of XGBoost, the package has great potentials to be applied to real-life binary classification tasks, which are usually of large-scale and label-imbalanced.
Tasks
Published 2019-08-05
URL https://arxiv.org/abs/1908.01672v1
PDF https://arxiv.org/pdf/1908.01672v1.pdf
PWC https://paperswithcode.com/paper/imbalance-xgboost-leveraging-weighted-and
Repo https://github.com/jhwjhw0123/Imbalance-XGBoost
Framework none

Conditional Generative ConvNets for Exemplar-based Texture Synthesis

Title Conditional Generative ConvNets for Exemplar-based Texture Synthesis
Authors Zi-Ming Wang, Meng-Han Li, Gui-Song Xia
Abstract The goal of exemplar-based texture synthesis is to generate texture images that are visually similar to a given exemplar. Recently, promising results have been reported by methods relying on convolutional neural networks (ConvNets) pretrained on large-scale image datasets. However, these methods have difficulties in synthesizing image textures with non-local structures and extending to dynamic or sound textures. In this paper, we present a conditional generative ConvNet (cgCNN) model which combines deep statistics and the probabilistic framework of generative ConvNet (gCNN) model. Given a texture exemplar, the cgCNN model defines a conditional distribution using deep statistics of a ConvNet, and synthesize new textures by sampling from the conditional distribution. In contrast to previous deep texture models, the proposed cgCNN dose not rely on pre-trained ConvNets but learns the weights of ConvNets for each input exemplar instead. As a result, the cgCNN model can synthesize high quality dynamic, sound and image textures in a unified manner. We also explore the theoretical connections between our model and other texture models. Further investigations show that the cgCNN model can be easily generalized to texture expansion and inpainting. Extensive experiments demonstrate that our model can achieve better or at least comparable results than the state-of-the-art methods.
Tasks Texture Synthesis
Published 2019-12-17
URL https://arxiv.org/abs/1912.07971v1
PDF https://arxiv.org/pdf/1912.07971v1.pdf
PWC https://paperswithcode.com/paper/conditional-generative-convnets-for-exemplar
Repo https://github.com/wzm2256/cgCNN
Framework tf

A Statistical Investigation of Long Memory in Language and Music

Title A Statistical Investigation of Long Memory in Language and Music
Authors Alexander Greaves-Tunnell, Zaid Harchaoui
Abstract Representation and learning of long-range dependencies is a central challenge confronted in modern applications of machine learning to sequence data. Yet despite the prominence of this issue, the basic problem of measuring long-range dependence, either in a given data source or as represented in a trained deep model, remains largely limited to heuristic tools. We contribute a statistical framework for investigating long-range dependence in current applications of deep sequence modeling, drawing on the well-developed theory of long memory stochastic processes. This framework yields testable implications concerning the relationship between long memory in real-world data and its learned representation in a deep learning architecture, which are explored through a semiparametric framework adapted to the high-dimensional setting.
Tasks Language Modelling, Time Series
Published 2019-04-08
URL https://arxiv.org/abs/1904.03834v2
PDF https://arxiv.org/pdf/1904.03834v2.pdf
PWC https://paperswithcode.com/paper/a-statistical-investigation-of-long-memory-in
Repo https://github.com/alecgt/RNN_long_memory
Framework pytorch

OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network

Title OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network
Authors Xavier Marjou
Abstract To transcribe spoken language to written medium, most alphabets enable an unambiguous sound-to-letter rule. However, some writing systems have distanced themselves from this simple concept and little work exists on measuring such distance. In this study, we use an Artificial Neural Network (ANN) model to evaluate the transparency between written words and their pronunciation, hence its name Orthographic Transparency Estimation with an ANN (OTEANN). Based on datasets derived from Wikimedia dictionaries, we trained and tested this model to score the percentage of false predictions in phoneme-to-grapheme and grapheme-to-phoneme translation tasks. The scores obtained on 15 orthographies were in line with the estimations of other studies. Interestingly, the model also provided insight into typical mistakes made by learners who only consider the phonemic rule in reading and writing.
Tasks
Published 2019-12-31
URL https://arxiv.org/abs/1912.13321v1
PDF https://arxiv.org/pdf/1912.13321v1.pdf
PWC https://paperswithcode.com/paper/oteann-estimating-the-transparency-of
Repo https://github.com/marxav/oteann
Framework none

Trick or TReAT: Thematic Reinforcement for Artistic Typography

Title Trick or TReAT: Thematic Reinforcement for Artistic Typography
Authors Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, Devi Parikh
Abstract An approach to make text visually appealing and memorable is semantic reinforcement - the use of visual cues alluding to the context or theme in which the word is being used to reinforce the message (e.g., Google Doodles). We present a computational approach for semantic reinforcement called TReAT - Thematic Reinforcement for Artistic Typography. Given an input word (e.g. exam) and a theme (e.g. education), the individual letters of the input word are replaced by cliparts relevant to the theme which visually resemble the letters - adding creative context to the potentially boring input word. We use an unsupervised approach to learn a latent space to represent letters and cliparts and compute similarities between the two. Human studies show that participants can reliably recognize the word as well as the theme in our outputs (TReATs) and find them more creative compared to meaningful baselines.
Tasks
Published 2019-03-19
URL http://arxiv.org/abs/1903.07820v1
PDF http://arxiv.org/pdf/1903.07820v1.pdf
PWC https://paperswithcode.com/paper/trick-or-treat-thematic-reinforcement-for
Repo https://github.com/purvaten/treat
Framework pytorch

Parametric Resynthesis with neural vocoders

Title Parametric Resynthesis with neural vocoders
Authors Soumi Maiti, Michael I Mandel
Abstract Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.
Tasks
Published 2019-06-16
URL https://arxiv.org/abs/1906.06762v2
PDF https://arxiv.org/pdf/1906.06762v2.pdf
PWC https://paperswithcode.com/paper/parametric-resynthesis-with-neural-vocoders
Repo https://github.com/r9y9/wavenet_vocoder
Framework pytorch

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Title Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers
Authors Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack
Abstract In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.
Tasks Knowledge Graphs, Recommendation Systems
Published 2019-05-25
URL https://arxiv.org/abs/1905.10630v2
PDF https://arxiv.org/pdf/1905.10630v2.pdf
PWC https://paperswithcode.com/paper/stochastic-shared-embeddings-data-driven
Repo https://github.com/wuliwei9278/SSE-PT
Framework tf

Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs

Title Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs
Authors Kevin Joseph, Hui Jiang
Abstract Content-based news recommendation systems need to recommend news articles based on the topics and content of articles without using user specific information. Many news articles describe the occurrence of specific events and named entities including people, places or objects. In this paper, we propose a graph traversal algorithm as well as a novel weighting scheme for cold-start content based news recommendation utilizing these named entities. Seeking to create a higher degree of user-specific relevance, our algorithm computes the shortest distance between named entities, across news articles, over a large knowledge graph. Moreover, we have created a new human annotated data set for evaluating content based news recommendation systems. Experimental results show our method is suitable to tackle the hard cold-start problem and it produces stronger Pearson correlation to human similarity scores than other cold-start methods. Our method is also complementary and a combination with the conventional cold-start recommendation methods may yield significant performance gains. The dataset, CNRec, is available at: https://github.com/kevinj22/CNRec
Tasks Knowledge Graphs, Recommendation Systems
Published 2019-05-24
URL https://arxiv.org/abs/1905.13132v1
PDF https://arxiv.org/pdf/1905.13132v1.pdf
PWC https://paperswithcode.com/paper/190513132
Repo https://github.com/kevinj22/CNRec
Framework none
comments powered by Disqus