February 1, 2020

3015 words 15 mins read

Paper Group AWR 262

Image search using multilingual texts: a cross-modal learning approach between image and text. Show, Translate and Tell. Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation. The Convex Information Bottleneck Lagrangian. Counterfactual Story Reasoning and Generation. Improving Neural Story Generatio …


Title	Image search using multilingual texts: a cross-modal learning approach between image and text
Authors	Maxime Portaz, Hicham Randrianarivo, Adrien Nivaggioli, Estelle Maudet, Christophe Servan, Sylvain Peyronnet
Abstract	Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7].
Tasks	Image Retrieval
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11299v3
PDF	https://arxiv.org/pdf/1903.11299v3.pdf
PWC	https://paperswithcode.com/paper/image-search-using-multilingual-texts-a-cross
Repo	https://github.com/QwantResearch/text-image-similarity
Framework	pytorch

Show, Translate and Tell


Title	Show, Translate and Tell
Authors	Dheeraj Peri, Shagan Sah, Raymond Ptucha
Abstract	Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech. Recent success of deep neural networks has enabled us to develop algorithms which give machines the ability to understand and interpret this information. There is a need to both broaden their applicability and develop methods which correlate visual information along with semantic content. We propose a unified model which jointly trains on images and captions, and learns to generate new captions given either an image or a caption query. We evaluate our model on three different tasks namely cross-modal retrieval, image captioning, and sentence paraphrasing. Our model gains insight into cross-modal vector embeddings, generalizes well on multiple tasks and is competitive to state of the art methods on retrieval.
Tasks	Cross-Modal Retrieval, Image Captioning
Published	2019-03-14
URL	http://arxiv.org/abs/1903.06275v1
PDF	http://arxiv.org/pdf/1903.06275v1.pdf
PWC	https://paperswithcode.com/paper/show-translate-and-tell
Repo	https://github.com/peri044/STT
Framework	tf

Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation


Title	Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation
Authors	Nicholas Ruiz, Marcello Federico
Abstract	We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Levenshtein alignment is performed. The aligned phonemes are recombined into aligned words that adjust the word alignment labels in each error region. We demonstrate that our Phonetically-Oriented Word Error Rate (POWER) yields similar scores to WER with the added advantages of better word alignments and the ability to capture one-to-many word alignments corresponding to homophonic errors in speech recognition hypotheses. These improved alignments allow us to better trace the impact of Levenshtein error types on downstream tasks such as speech translation.
Tasks	Speech Recognition, Word Alignment
Published	2019-04-24
URL	http://arxiv.org/abs/1904.11024v1
PDF	http://arxiv.org/pdf/1904.11024v1.pdf
PWC	https://paperswithcode.com/paper/phonetically-oriented-word-error-alignment
Repo	https://github.com/NickRuiz/power-asr
Framework	none

The Convex Information Bottleneck Lagrangian


Title	The Convex Information Bottleneck Lagrangian
Authors	Borja Rodríguez Gálvez, Ragnar Thobaben, Mikael Skoglund
Abstract	The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations $T$ of some random variable $X$ for the task of predicting $Y$. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, $I(T;Y)$, while ensuring that a certain level of compression $r$ is achieved (i.e., $ I(X;T) \leq r$). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., $\mathcal{L}{\text{IB}}(T;\beta) = I(T;Y) - \beta I(X;T)$) for many values of $\beta \in [0,1]$. Then, the curve of maximal $I(T;Y)$ for a given $I(X;T)$ is drawn and a representation with the desired predictability and compression is selected. It is known when $Y$ is a deterministic function of $X$, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: $\mathcal{L}{\text{sq-IB}}(T;\beta_{\text{sq}})=I(T;Y)-\beta_{\text{sq}}I(X;T)^2$. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one mapping between the Lagrange multiplier and the desired compression rate $r$ for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11000v2
PDF	https://arxiv.org/pdf/1911.11000v2.pdf
PWC	https://paperswithcode.com/paper/the-convex-information-bottleneck-lagrangian-1
Repo	https://github.com/burklight/convex-IB-Lagrangian-PyTorch
Framework	pytorch

Counterfactual Story Reasoning and Generation


Title	Counterfactual Story Reasoning and Generation
Authors	Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Yejin Choi
Abstract	Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes. Despite being considered a necessary component of AI-complete systems, few resources have been developed for evaluating counterfactual reasoning in narratives. In this paper, we propose Counterfactual Story Rewriting: given an original story and an intervening counterfactual event, the task is to minimally revise the story to make it compatible with the given counterfactual event. Solving this task will require deep understanding of causal narrative chains and counterfactual invariance, and integration of such story reasoning capabilities into conditional language generation models. We present TimeTravel, a new dataset of 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story compatible with the counterfactual event. Additionally, we include 80,115 counterfactual “branches” without a rewritten storyline to support future work on semi- or un-supervised approaches to counterfactual story rewriting. Finally, we evaluate the counterfactual rewriting capacities of several competitive baselines based on pretrained language models, and assess whether common overlap and model-based automatic metrics for text generation correlate well with human scores for counterfactual rewriting.
Tasks	Text Generation
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04076v2
PDF	https://arxiv.org/pdf/1909.04076v2.pdf
PWC	https://paperswithcode.com/paper/counterfactual-story-reasoning-and-generation
Repo	https://github.com/qkaren/Counterfactual-StoryRW
Framework	tf

Improving Neural Story Generation by Targeted Common Sense Grounding


Title	Improving Neural Story Generation by Targeted Common Sense Grounding
Authors	Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian McAuley, Garrison W. Cottrell
Abstract	Stories generated with neural language models have shown promise in grammatical and stylistic consistency. However, the generated stories are still lacking in common sense reasoning, e.g., they often contain sentences deprived of world knowledge. We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding. When combined with our two-stage fine-tuning pipeline, our method achieves improved common sense reasoning and state-of-the-art perplexity on the Writing Prompts (Fan et al., 2018) story generation dataset.
Tasks	Common Sense Reasoning, Multi-Task Learning
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09451v2
PDF	https://arxiv.org/pdf/1908.09451v2.pdf
PWC	https://paperswithcode.com/paper/improving-neural-story-generation-by-targeted
Repo	https://github.com/calclavia/story-generation
Framework	pytorch

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning


Title	ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning
Authors	Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
Abstract	In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos – such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation. By contrast, our Convolutional Neural Network (CNN)-based approach is trained to calculate video-to-video similarity from refined frame-to-frame similarity matrices, so as to consider both intra- and inter-frame relations. In the proposed method, pairwise frame similarity is estimated by applying Tensor Dot (TD) followed by Chamfer Similarity (CS) on regional CNN frame features - this avoids feature aggregation before the similarity calculation between frames. Subsequently, the similarity matrix between all video frames is fed to a four-layer CNN, and then summarized using Chamfer Similarity (CS) into a video-to-video similarity score – this avoids feature aggregation before the similarity calculation between videos and captures the temporal similarity patterns between matching frame sequences. We train the proposed network using a triplet loss scheme and evaluate it on five public benchmark datasets on four different video retrieval problems where we demonstrate large improvements in comparison to the state of the art. The implementation of ViSiL is publicly available.
Tasks	Video Retrieval, Video Similarity
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07410v1
PDF	https://arxiv.org/pdf/1908.07410v1.pdf
PWC	https://paperswithcode.com/paper/visil-fine-grained-spatio-temporal-video
Repo	https://github.com/MKLab-ITI/visil
Framework	tf

Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost


Title	Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost
Authors	Chen Wang, Chengyuan Deng, Suzhen Wang
Abstract	The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors’ knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a general-purpose extension on XGBoost for label-imbalanced scenarios. In this paper, the design and usage of the package are described with exemplar code listings, and its convenience to be integrated into Python-driven Machine Learning projects is illustrated. Furthermore, as the first- and second-order derivatives of the loss functions are essential for the implementations, the algebraic derivation is discussed and it can be deemed as a separate algorithmic contribution. The performances of the algorithms implemented in the package are empirically evaluated on Parkinson’s disease classification data set, and multiple state-of-the-art performances have been observed. Given the scalable nature of XGBoost, the package has great potentials to be applied to real-life binary classification tasks, which are usually of large-scale and label-imbalanced.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01672v1
PDF	https://arxiv.org/pdf/1908.01672v1.pdf
PWC	https://paperswithcode.com/paper/imbalance-xgboost-leveraging-weighted-and
Repo	https://github.com/jhwjhw0123/Imbalance-XGBoost
Framework	none

Conditional Generative ConvNets for Exemplar-based Texture Synthesis


Title	Conditional Generative ConvNets for Exemplar-based Texture Synthesis
Authors	Zi-Ming Wang, Meng-Han Li, Gui-Song Xia
Abstract	The goal of exemplar-based texture synthesis is to generate texture images that are visually similar to a given exemplar. Recently, promising results have been reported by methods relying on convolutional neural networks (ConvNets) pretrained on large-scale image datasets. However, these methods have difficulties in synthesizing image textures with non-local structures and extending to dynamic or sound textures. In this paper, we present a conditional generative ConvNet (cgCNN) model which combines deep statistics and the probabilistic framework of generative ConvNet (gCNN) model. Given a texture exemplar, the cgCNN model defines a conditional distribution using deep statistics of a ConvNet, and synthesize new textures by sampling from the conditional distribution. In contrast to previous deep texture models, the proposed cgCNN dose not rely on pre-trained ConvNets but learns the weights of ConvNets for each input exemplar instead. As a result, the cgCNN model can synthesize high quality dynamic, sound and image textures in a unified manner. We also explore the theoretical connections between our model and other texture models. Further investigations show that the cgCNN model can be easily generalized to texture expansion and inpainting. Extensive experiments demonstrate that our model can achieve better or at least comparable results than the state-of-the-art methods.
Tasks	Texture Synthesis
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07971v1
PDF	https://arxiv.org/pdf/1912.07971v1.pdf
PWC	https://paperswithcode.com/paper/conditional-generative-convnets-for-exemplar
Repo	https://github.com/wzm2256/cgCNN
Framework	tf

A Statistical Investigation of Long Memory in Language and Music


Title	A Statistical Investigation of Long Memory in Language and Music
Authors	Alexander Greaves-Tunnell, Zaid Harchaoui
Abstract	Representation and learning of long-range dependencies is a central challenge confronted in modern applications of machine learning to sequence data. Yet despite the prominence of this issue, the basic problem of measuring long-range dependence, either in a given data source or as represented in a trained deep model, remains largely limited to heuristic tools. We contribute a statistical framework for investigating long-range dependence in current applications of deep sequence modeling, drawing on the well-developed theory of long memory stochastic processes. This framework yields testable implications concerning the relationship between long memory in real-world data and its learned representation in a deep learning architecture, which are explored through a semiparametric framework adapted to the high-dimensional setting.
Tasks	Language Modelling, Time Series
Published	2019-04-08
URL	https://arxiv.org/abs/1904.03834v2
PDF	https://arxiv.org/pdf/1904.03834v2.pdf
PWC	https://paperswithcode.com/paper/a-statistical-investigation-of-long-memory-in
Repo	https://github.com/alecgt/RNN_long_memory
Framework	pytorch

OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network


Title	OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network
Authors	Xavier Marjou
Abstract	To transcribe spoken language to written medium, most alphabets enable an unambiguous sound-to-letter rule. However, some writing systems have distanced themselves from this simple concept and little work exists on measuring such distance. In this study, we use an Artificial Neural Network (ANN) model to evaluate the transparency between written words and their pronunciation, hence its name Orthographic Transparency Estimation with an ANN (OTEANN). Based on datasets derived from Wikimedia dictionaries, we trained and tested this model to score the percentage of false predictions in phoneme-to-grapheme and grapheme-to-phoneme translation tasks. The scores obtained on 15 orthographies were in line with the estimations of other studies. Interestingly, the model also provided insight into typical mistakes made by learners who only consider the phonemic rule in reading and writing.
Tasks
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13321v1
PDF	https://arxiv.org/pdf/1912.13321v1.pdf
PWC	https://paperswithcode.com/paper/oteann-estimating-the-transparency-of
Repo	https://github.com/marxav/oteann
Framework	none

Trick or TReAT: Thematic Reinforcement for Artistic Typography


Title	Trick or TReAT: Thematic Reinforcement for Artistic Typography
Authors	Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, Devi Parikh
Abstract	An approach to make text visually appealing and memorable is semantic reinforcement - the use of visual cues alluding to the context or theme in which the word is being used to reinforce the message (e.g., Google Doodles). We present a computational approach for semantic reinforcement called TReAT - Thematic Reinforcement for Artistic Typography. Given an input word (e.g. exam) and a theme (e.g. education), the individual letters of the input word are replaced by cliparts relevant to the theme which visually resemble the letters - adding creative context to the potentially boring input word. We use an unsupervised approach to learn a latent space to represent letters and cliparts and compute similarities between the two. Human studies show that participants can reliably recognize the word as well as the theme in our outputs (TReATs) and find them more creative compared to meaningful baselines.
Tasks
Published	2019-03-19
URL	http://arxiv.org/abs/1903.07820v1
PDF	http://arxiv.org/pdf/1903.07820v1.pdf
PWC	https://paperswithcode.com/paper/trick-or-treat-thematic-reinforcement-for
Repo	https://github.com/purvaten/treat
Framework	pytorch

Parametric Resynthesis with neural vocoders


Title	Parametric Resynthesis with neural vocoders
Authors	Soumi Maiti, Michael I Mandel
Abstract	Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.
Tasks
Published	2019-06-16
URL	https://arxiv.org/abs/1906.06762v2
PDF	https://arxiv.org/pdf/1906.06762v2.pdf
PWC	https://paperswithcode.com/paper/parametric-resynthesis-with-neural-vocoders
Repo	https://github.com/r9y9/wavenet_vocoder
Framework	pytorch

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers


Title	Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers
Authors	Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack
Abstract	In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.
Tasks	Knowledge Graphs, Recommendation Systems
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10630v2
PDF	https://arxiv.org/pdf/1905.10630v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-shared-embeddings-data-driven
Repo	https://github.com/wuliwei9278/SSE-PT
Framework	tf

Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs


Title	Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs
Authors	Kevin Joseph, Hui Jiang
Abstract	Content-based news recommendation systems need to recommend news articles based on the topics and content of articles without using user specific information. Many news articles describe the occurrence of specific events and named entities including people, places or objects. In this paper, we propose a graph traversal algorithm as well as a novel weighting scheme for cold-start content based news recommendation utilizing these named entities. Seeking to create a higher degree of user-specific relevance, our algorithm computes the shortest distance between named entities, across news articles, over a large knowledge graph. Moreover, we have created a new human annotated data set for evaluating content based news recommendation systems. Experimental results show our method is suitable to tackle the hard cold-start problem and it produces stronger Pearson correlation to human similarity scores than other cold-start methods. Our method is also complementary and a combination with the conventional cold-start recommendation methods may yield significant performance gains. The dataset, CNRec, is available at: https://github.com/kevinj22/CNRec
Tasks	Knowledge Graphs, Recommendation Systems
Published	2019-05-24
URL	https://arxiv.org/abs/1905.13132v1
PDF	https://arxiv.org/pdf/1905.13132v1.pdf
PWC	https://paperswithcode.com/paper/190513132
Repo	https://github.com/kevinj22/CNRec
Framework	none