Paper Group AWR 262
Image search using multilingual texts: a cross-modal learning approach between image and text. Show, Translate and Tell. Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation. The Convex Information Bottleneck Lagrangian. Counterfactual Story Reasoning and Generation. Improving Neural Story Generatio …
Image search using multilingual texts: a cross-modal learning approach between image and text
Title | Image search using multilingual texts: a cross-modal learning approach between image and text |
Authors | Maxime Portaz, Hicham Randrianarivo, Adrien Nivaggioli, Estelle Maudet, Christophe Servan, Sylvain Peyronnet |
Abstract | Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7]. |
Tasks | Image Retrieval |
Published | 2019-03-27 |
URL | https://arxiv.org/abs/1903.11299v3 |
https://arxiv.org/pdf/1903.11299v3.pdf | |
PWC | https://paperswithcode.com/paper/image-search-using-multilingual-texts-a-cross |
Repo | https://github.com/QwantResearch/text-image-similarity |
Framework | pytorch |
Show, Translate and Tell
Title | Show, Translate and Tell |
Authors | Dheeraj Peri, Shagan Sah, Raymond Ptucha |
Abstract | Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech. Recent success of deep neural networks has enabled us to develop algorithms which give machines the ability to understand and interpret this information. There is a need to both broaden their applicability and develop methods which correlate visual information along with semantic content. We propose a unified model which jointly trains on images and captions, and learns to generate new captions given either an image or a caption query. We evaluate our model on three different tasks namely cross-modal retrieval, image captioning, and sentence paraphrasing. Our model gains insight into cross-modal vector embeddings, generalizes well on multiple tasks and is competitive to state of the art methods on retrieval. |
Tasks | Cross-Modal Retrieval, Image Captioning |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.06275v1 |
http://arxiv.org/pdf/1903.06275v1.pdf | |
PWC | https://paperswithcode.com/paper/show-translate-and-tell |
Repo | https://github.com/peri044/STT |
Framework | tf |
Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation
Title | Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation |
Authors | Nicholas Ruiz, Marcello Federico |
Abstract | We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Levenshtein alignment is performed. The aligned phonemes are recombined into aligned words that adjust the word alignment labels in each error region. We demonstrate that our Phonetically-Oriented Word Error Rate (POWER) yields similar scores to WER with the added advantages of better word alignments and the ability to capture one-to-many word alignments corresponding to homophonic errors in speech recognition hypotheses. These improved alignments allow us to better trace the impact of Levenshtein error types on downstream tasks such as speech translation. |
Tasks | Speech Recognition, Word Alignment |
Published | 2019-04-24 |
URL | http://arxiv.org/abs/1904.11024v1 |
http://arxiv.org/pdf/1904.11024v1.pdf | |
PWC | https://paperswithcode.com/paper/phonetically-oriented-word-error-alignment |
Repo | https://github.com/NickRuiz/power-asr |
Framework | none |
The Convex Information Bottleneck Lagrangian
Title | The Convex Information Bottleneck Lagrangian |
Authors | Borja Rodríguez Gálvez, Ragnar Thobaben, Mikael Skoglund |
Abstract | The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations $T$ of some random variable $X$ for the task of predicting $Y$. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, $I(T;Y)$, while ensuring that a certain level of compression $r$ is achieved (i.e., $ I(X;T) \leq r$). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., $\mathcal{L}{\text{IB}}(T;\beta) = I(T;Y) - \beta I(X;T)$) for many values of $\beta \in [0,1]$. Then, the curve of maximal $I(T;Y)$ for a given $I(X;T)$ is drawn and a representation with the desired predictability and compression is selected. It is known when $Y$ is a deterministic function of $X$, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: $\mathcal{L}{\text{sq-IB}}(T;\beta_{\text{sq}})=I(T;Y)-\beta_{\text{sq}}I(X;T)^2$. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one mapping between the Lagrange multiplier and the desired compression rate $r$ for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11000v2 |
https://arxiv.org/pdf/1911.11000v2.pdf | |
PWC | https://paperswithcode.com/paper/the-convex-information-bottleneck-lagrangian-1 |
Repo | https://github.com/burklight/convex-IB-Lagrangian-PyTorch |
Framework | pytorch |
Counterfactual Story Reasoning and Generation
Title | Counterfactual Story Reasoning and Generation |
Authors | Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Yejin Choi |
Abstract | Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes. Despite being considered a necessary component of AI-complete systems, few resources have been developed for evaluating counterfactual reasoning in narratives. In this paper, we propose Counterfactual Story Rewriting: given an original story and an intervening counterfactual event, the task is to minimally revise the story to make it compatible with the given counterfactual event. Solving this task will require deep understanding of causal narrative chains and counterfactual invariance, and integration of such story reasoning capabilities into conditional language generation models. We present TimeTravel, a new dataset of 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story compatible with the counterfactual event. Additionally, we include 80,115 counterfactual “branches” without a rewritten storyline to support future work on semi- or un-supervised approaches to counterfactual story rewriting. Finally, we evaluate the counterfactual rewriting capacities of several competitive baselines based on pretrained language models, and assess whether common overlap and model-based automatic metrics for text generation correlate well with human scores for counterfactual rewriting. |
Tasks | Text Generation |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04076v2 |
https://arxiv.org/pdf/1909.04076v2.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-story-reasoning-and-generation |
Repo | https://github.com/qkaren/Counterfactual-StoryRW |
Framework | tf |
Improving Neural Story Generation by Targeted Common Sense Grounding
Title | Improving Neural Story Generation by Targeted Common Sense Grounding |
Authors | Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian McAuley, Garrison W. Cottrell |
Abstract | Stories generated with neural language models have shown promise in grammatical and stylistic consistency. However, the generated stories are still lacking in common sense reasoning, e.g., they often contain sentences deprived of world knowledge. We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding. When combined with our two-stage fine-tuning pipeline, our method achieves improved common sense reasoning and state-of-the-art perplexity on the Writing Prompts (Fan et al., 2018) story generation dataset. |
Tasks | Common Sense Reasoning, Multi-Task Learning |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09451v2 |
https://arxiv.org/pdf/1908.09451v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-neural-story-generation-by-targeted |
Repo | https://github.com/calclavia/story-generation |
Framework | pytorch |
ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning
Title | ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning |
Authors | Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris |
Abstract | In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos – such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation. By contrast, our Convolutional Neural Network (CNN)-based approach is trained to calculate video-to-video similarity from refined frame-to-frame similarity matrices, so as to consider both intra- and inter-frame relations. In the proposed method, pairwise frame similarity is estimated by applying Tensor Dot (TD) followed by Chamfer Similarity (CS) on regional CNN frame features - this avoids feature aggregation before the similarity calculation between frames. Subsequently, the similarity matrix between all video frames is fed to a four-layer CNN, and then summarized using Chamfer Similarity (CS) into a video-to-video similarity score – this avoids feature aggregation before the similarity calculation between videos and captures the temporal similarity patterns between matching frame sequences. We train the proposed network using a triplet loss scheme and evaluate it on five public benchmark datasets on four different video retrieval problems where we demonstrate large improvements in comparison to the state of the art. The implementation of ViSiL is publicly available. |
Tasks | Video Retrieval, Video Similarity |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07410v1 |
https://arxiv.org/pdf/1908.07410v1.pdf | |
PWC | https://paperswithcode.com/paper/visil-fine-grained-spatio-temporal-video |
Repo | https://github.com/MKLab-ITI/visil |
Framework | tf |
Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost
Title | Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost |
Authors | Chen Wang, Chengyuan Deng, Suzhen Wang |
Abstract | The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors’ knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a general-purpose extension on XGBoost for label-imbalanced scenarios. In this paper, the design and usage of the package are described with exemplar code listings, and its convenience to be integrated into Python-driven Machine Learning projects is illustrated. Furthermore, as the first- and second-order derivatives of the loss functions are essential for the implementations, the algebraic derivation is discussed and it can be deemed as a separate algorithmic contribution. The performances of the algorithms implemented in the package are empirically evaluated on Parkinson’s disease classification data set, and multiple state-of-the-art performances have been observed. Given the scalable nature of XGBoost, the package has great potentials to be applied to real-life binary classification tasks, which are usually of large-scale and label-imbalanced. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01672v1 |
https://arxiv.org/pdf/1908.01672v1.pdf | |
PWC | https://paperswithcode.com/paper/imbalance-xgboost-leveraging-weighted-and |
Repo | https://github.com/jhwjhw0123/Imbalance-XGBoost |
Framework | none |
Conditional Generative ConvNets for Exemplar-based Texture Synthesis
Title | Conditional Generative ConvNets for Exemplar-based Texture Synthesis |
Authors | Zi-Ming Wang, Meng-Han Li, Gui-Song Xia |
Abstract | The goal of exemplar-based texture synthesis is to generate texture images that are visually similar to a given exemplar. Recently, promising results have been reported by methods relying on convolutional neural networks (ConvNets) pretrained on large-scale image datasets. However, these methods have difficulties in synthesizing image textures with non-local structures and extending to dynamic or sound textures. In this paper, we present a conditional generative ConvNet (cgCNN) model which combines deep statistics and the probabilistic framework of generative ConvNet (gCNN) model. Given a texture exemplar, the cgCNN model defines a conditional distribution using deep statistics of a ConvNet, and synthesize new textures by sampling from the conditional distribution. In contrast to previous deep texture models, the proposed cgCNN dose not rely on pre-trained ConvNets but learns the weights of ConvNets for each input exemplar instead. As a result, the cgCNN model can synthesize high quality dynamic, sound and image textures in a unified manner. We also explore the theoretical connections between our model and other texture models. Further investigations show that the cgCNN model can be easily generalized to texture expansion and inpainting. Extensive experiments demonstrate that our model can achieve better or at least comparable results than the state-of-the-art methods. |
Tasks | Texture Synthesis |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07971v1 |
https://arxiv.org/pdf/1912.07971v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-generative-convnets-for-exemplar |
Repo | https://github.com/wzm2256/cgCNN |
Framework | tf |
A Statistical Investigation of Long Memory in Language and Music
Title | A Statistical Investigation of Long Memory in Language and Music |
Authors | Alexander Greaves-Tunnell, Zaid Harchaoui |
Abstract | Representation and learning of long-range dependencies is a central challenge confronted in modern applications of machine learning to sequence data. Yet despite the prominence of this issue, the basic problem of measuring long-range dependence, either in a given data source or as represented in a trained deep model, remains largely limited to heuristic tools. We contribute a statistical framework for investigating long-range dependence in current applications of deep sequence modeling, drawing on the well-developed theory of long memory stochastic processes. This framework yields testable implications concerning the relationship between long memory in real-world data and its learned representation in a deep learning architecture, which are explored through a semiparametric framework adapted to the high-dimensional setting. |
Tasks | Language Modelling, Time Series |
Published | 2019-04-08 |
URL | https://arxiv.org/abs/1904.03834v2 |
https://arxiv.org/pdf/1904.03834v2.pdf | |
PWC | https://paperswithcode.com/paper/a-statistical-investigation-of-long-memory-in |
Repo | https://github.com/alecgt/RNN_long_memory |
Framework | pytorch |
OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network
Title | OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network |
Authors | Xavier Marjou |
Abstract | To transcribe spoken language to written medium, most alphabets enable an unambiguous sound-to-letter rule. However, some writing systems have distanced themselves from this simple concept and little work exists on measuring such distance. In this study, we use an Artificial Neural Network (ANN) model to evaluate the transparency between written words and their pronunciation, hence its name Orthographic Transparency Estimation with an ANN (OTEANN). Based on datasets derived from Wikimedia dictionaries, we trained and tested this model to score the percentage of false predictions in phoneme-to-grapheme and grapheme-to-phoneme translation tasks. The scores obtained on 15 orthographies were in line with the estimations of other studies. Interestingly, the model also provided insight into typical mistakes made by learners who only consider the phonemic rule in reading and writing. |
Tasks | |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/1912.13321v1 |
https://arxiv.org/pdf/1912.13321v1.pdf | |
PWC | https://paperswithcode.com/paper/oteann-estimating-the-transparency-of |
Repo | https://github.com/marxav/oteann |
Framework | none |
Trick or TReAT: Thematic Reinforcement for Artistic Typography
Title | Trick or TReAT: Thematic Reinforcement for Artistic Typography |
Authors | Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, Devi Parikh |
Abstract | An approach to make text visually appealing and memorable is semantic reinforcement - the use of visual cues alluding to the context or theme in which the word is being used to reinforce the message (e.g., Google Doodles). We present a computational approach for semantic reinforcement called TReAT - Thematic Reinforcement for Artistic Typography. Given an input word (e.g. exam) and a theme (e.g. education), the individual letters of the input word are replaced by cliparts relevant to the theme which visually resemble the letters - adding creative context to the potentially boring input word. We use an unsupervised approach to learn a latent space to represent letters and cliparts and compute similarities between the two. Human studies show that participants can reliably recognize the word as well as the theme in our outputs (TReATs) and find them more creative compared to meaningful baselines. |
Tasks | |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.07820v1 |
http://arxiv.org/pdf/1903.07820v1.pdf | |
PWC | https://paperswithcode.com/paper/trick-or-treat-thematic-reinforcement-for |
Repo | https://github.com/purvaten/treat |
Framework | pytorch |
Parametric Resynthesis with neural vocoders
Title | Parametric Resynthesis with neural vocoders |
Authors | Soumi Maiti, Michael I Mandel |
Abstract | Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation. |
Tasks | |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06762v2 |
https://arxiv.org/pdf/1906.06762v2.pdf | |
PWC | https://paperswithcode.com/paper/parametric-resynthesis-with-neural-vocoders |
Repo | https://github.com/r9y9/wavenet_vocoder |
Framework | pytorch |
Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers
Title | Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers |
Authors | Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack |
Abstract | In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results. |
Tasks | Knowledge Graphs, Recommendation Systems |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10630v2 |
https://arxiv.org/pdf/1905.10630v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-shared-embeddings-data-driven |
Repo | https://github.com/wuliwei9278/SSE-PT |
Framework | tf |
Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs
Title | Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs |
Authors | Kevin Joseph, Hui Jiang |
Abstract | Content-based news recommendation systems need to recommend news articles based on the topics and content of articles without using user specific information. Many news articles describe the occurrence of specific events and named entities including people, places or objects. In this paper, we propose a graph traversal algorithm as well as a novel weighting scheme for cold-start content based news recommendation utilizing these named entities. Seeking to create a higher degree of user-specific relevance, our algorithm computes the shortest distance between named entities, across news articles, over a large knowledge graph. Moreover, we have created a new human annotated data set for evaluating content based news recommendation systems. Experimental results show our method is suitable to tackle the hard cold-start problem and it produces stronger Pearson correlation to human similarity scores than other cold-start methods. Our method is also complementary and a combination with the conventional cold-start recommendation methods may yield significant performance gains. The dataset, CNRec, is available at: https://github.com/kevinj22/CNRec |
Tasks | Knowledge Graphs, Recommendation Systems |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.13132v1 |
https://arxiv.org/pdf/1905.13132v1.pdf | |
PWC | https://paperswithcode.com/paper/190513132 |
Repo | https://github.com/kevinj22/CNRec |
Framework | none |