February 1, 2020

3124 words 15 mins read

Paper Group AWR 369

Phrase Grounding by Soft-Label Chain Conditional Random Field. A Novel Re-weighting Method for Connectionist Temporal Classification. A Hybrid Neural Network Model for Commonsense Reasoning. Knowledge Graph Embedding for Ecotoxicological Effect Prediction. Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet …

Phrase Grounding by Soft-Label Chain Conditional Random Field


Title	Phrase Grounding by Soft-Label Chain Conditional Random Field
Authors	Jiacheng Liu, Julia Hockenmaier
Abstract	The phrase grounding task aims to ground each entity mention in a given caption of an image to a corresponding region in that image. Although there are clear dependencies between how different mentions of the same caption should be grounded, previous structured prediction methods that aim to capture such dependencies need to resort to approximate inference or non-differentiable losses. In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions. In contrast to standard sequence labeling tasks, the phrase grounding task is defined such that there may be multiple correct candidate regions. To address this multiplicity of gold labels, we define so-called Soft-Label Chain CRFs, and present an algorithm that enables convenient end-to-end training. Our method establishes a new state-of-the-art on phrase grounding on the Flickr30k Entities dataset. Analysis shows that our model benefits both from the entity dependencies captured by the CRF and from the soft-label training regime. Our code is available at \url{github.com/liujch1998/SoftLabelCCRF}
Tasks	Phrase Grounding, Structured Prediction
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00301v1
PDF	https://arxiv.org/pdf/1909.00301v1.pdf
PWC	https://paperswithcode.com/paper/phrase-grounding-by-soft-label-chain
Repo	https://github.com/liujch1998/SoftLabelCCRF
Framework	pytorch

A Novel Re-weighting Method for Connectionist Temporal Classification


Title	A Novel Re-weighting Method for Connectionist Temporal Classification
Authors	Hongzhu Li, Weiqiang Wang
Abstract	The connectionist temporal classification (CTC) enables end-to-end sequence learning by maximizing the probability of correctly recognizing sequences during training. With an extra $blank$ class, the CTC implicitly converts recognizing a sequence into classifying each timestep within the sequence. But the CTC loss is not intuitive for such classification task, so the class imbalance within each sequence, caused by the overwhelming $blank$ timesteps, is a knotty problem. In this paper, we define a piece-wise function as the pseudo ground-truth to reinterpret the CTC loss based on sequences as the cross entropy loss based on timesteps. The cross entropy form makes it easy to re-weight the CTC loss. Experiments on text recognition show that the weighted CTC loss solves the class imbalance problem as well as facilitates the convergence, generally leading to better results than the CTC loss. Beside this, the reinterpretation of CTC, as a brand new perspective, may be potentially useful in some other situations.
Tasks
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10619v1
PDF	http://arxiv.org/pdf/1904.10619v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-re-weighting-method-for-connectionist
Repo	https://github.com/vadimkantorov/ctc
Framework	pytorch

A Hybrid Neural Network Model for Commonsense Reasoning


Title	A Hybrid Neural Network Model for Commonsense Reasoning
Authors	Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao
Abstract	This paper proposes a hybrid neural network (HNN) model for commonsense reasoning. An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers. HNN obtains new state-of-the-art results on three classic commonsense reasoning tasks, pushing the WNLI benchmark to 89%, the Winograd Schema Challenge (WSC) benchmark to 75.1%, and the PDP60 benchmark to 90.0%. An ablation study shows that language models and semantic similarity models are complementary approaches to commonsense reasoning, and HNN effectively combines the strengths of both. The code and pre-trained models will be publicly available at https://github.com/namisan/mt-dnn.
Tasks	Language Modelling, Semantic Similarity, Semantic Textual Similarity
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11983v1
PDF	https://arxiv.org/pdf/1907.11983v1.pdf
PWC	https://paperswithcode.com/paper/a-hybrid-neural-network-model-for-commonsense
Repo	https://github.com/namisan/mt-dnn
Framework	pytorch

Knowledge Graph Embedding for Ecotoxicological Effect Prediction


Title	Knowledge Graph Embedding for Ecotoxicological Effect Prediction
Authors	Erik Bryhn Myklebust, Ernesto Jimenez-Ruiz, Jiaoyan Chen, Raoul Wolf, Knut Erik Tollefsen
Abstract	Exploring the effects a chemical compound has on a species takes a considerable experimental effort. Appropriate methods for estimating and suggesting new effects can dramatically reduce the work needed to be done by a laboratory. In this paper we explore the suitability of using a knowledge graph embedding approach for ecotoxicological effect prediction. A knowledge graph has been constructed from publicly available data sets, including a species taxonomy and chemical classification and similarity. The publicly available effect data is integrated to the knowledge graph using ontology alignment techniques. Our experimental results show that the knowledge graph based approach improves the selected baselines.
Tasks	Graph Embedding, Knowledge Graph Embedding
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01328v3
PDF	https://arxiv.org/pdf/1907.01328v3.pdf
PWC	https://paperswithcode.com/paper/knowledge-graph-embedding-for
Repo	https://github.com/Erik-BM/NIVAUC
Framework	tf

Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians


Title	Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Authors	Vardan Papyan
Abstract	We consider deep classifying neural networks. We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain “averaging” operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08244v1
PDF	http://arxiv.org/pdf/1901.08244v1.pdf
PWC	https://paperswithcode.com/paper/measurements-of-three-level-hierarchical
Repo	https://github.com/deep-lab/DeepnetHessian
Framework	pytorch

Dependency-Guided LSTM-CRF for Named Entity Recognition


Title	Dependency-Guided LSTM-CRF for Named Entity Recognition
Authors	Zhanming Jie, Wei Lu
Abstract	Dependency tree structures capture long-distance and syntactic relationships between words in a sentence. The syntactic relations (e.g., nominal subject, object) can potentially infer the existence of certain named entities. In addition, the performance of a named entity recognizer could benefit from the long-distance dependencies between the words in dependency trees. In this work, we propose a simple yet effective dependency-guided LSTM-CRF model to encode the complete dependency trees and capture the above properties for the task of named entity recognition (NER). The data statistics show strong correlations between the entity types and dependency relations. We conduct extensive experiments on several standard datasets and demonstrate the effectiveness of the proposed model in improving NER and achieving state-of-the-art performance. Our analysis reveals that the significant improvements mainly result from the dependency relations and long-distance interactions provided by dependency trees.
Tasks	Named Entity Recognition
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10148v1
PDF	https://arxiv.org/pdf/1909.10148v1.pdf
PWC	https://paperswithcode.com/paper/190910148
Repo	https://github.com/IIITian-Chandan/Solar_Data_modeling
Framework	tf

Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders


Title	Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders
Authors	Ari Heljakka, Arno Solin, Juho Kannala
Abstract	We present a generative autoencoder that provides fast encoding, faithful reconstructions (eg. retaining the identity of a face), sharp generated/reconstructed samples in high resolutions, and a well-structured latent space that supports semantic manipulation of the inputs. There are no current autoencoder or GAN models that satisfactorily achieve all of these. We build on the progressively growing autoencoder model PIONEER, for which we completely alter the training dynamics based on a careful analysis of recently introduced normalization schemes. We show significantly improved visual and quantitative results for face identity conservation in CelebAHQ. Our model achieves state-of-the-art disentanglement of latent space, both quantitatively and via realistic image attribute manipulations. On the LSUN Bedrooms dataset, we improve the disentanglement performance of the vanilla PIONEER, despite having a simpler model. Overall, our results indicate that the PIONEER networks provide a way towards photorealistic face manipulation.
Tasks
Published	2019-04-12
URL	https://arxiv.org/abs/1904.06145v2
PDF	https://arxiv.org/pdf/1904.06145v2.pdf
PWC	https://paperswithcode.com/paper/towards-photographic-image-manipulation-with
Repo	https://github.com/AaltoVision/balanced-pioneer
Framework	pytorch

Assessing the Lexico-Semantic Relational Knowledge Captured by Word and Concept Embeddings


Title	Assessing the Lexico-Semantic Relational Knowledge Captured by Word and Concept Embeddings
Authors	Ronald Denaux, Jose Manuel Gomez-Perez
Abstract	Deep learning currently dominates the benchmarks for various NLP tasks and, at the basis of such systems, words are frequently represented as embeddings –vectors in a low dimensional space– learned from large text corpora and various algorithms have been proposed to learn both word and concept embeddings. One of the claimed benefits of such embeddings is that they capture knowledge about semantic relations. Such embeddings are most often evaluated through tasks such as predicting human-rated similarity and analogy which only test a few, often ill-defined, relations. In this paper, we propose a method for (i) reliably generating word and concept pair datasets for a wide number of relations by using a knowledge graph and (ii) evaluating to what extent pre-trained embeddings capture those relations. We evaluate the approach against a proprietary and a public knowledge graph and analyze the results, showing which lexico-semantic relational knowledge is captured by current embedding learning approaches.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11042v1
PDF	https://arxiv.org/pdf/1909.11042v1.pdf
PWC	https://paperswithcode.com/paper/assessing-the-lexico-semantic-relational
Repo	https://github.com/rdenaux/embrelassess
Framework	pytorch

A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts


Title	A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts
Authors	Jiahuan Pei, Pengjie Ren, Maarten de Rijke
Abstract	End-to-end Task-oriented Dialogue Systems (TDSs) have attracted a lot of attention for their superiority (e.g., in terms of global optimization) over pipeline modularized TDSs. Previous studies on end-to-end TDSs use a single-module model to generate responses for complex dialogue contexts. However, no model consistently outperforms the others in all cases. We propose a neural Modular Task-oriented Dialogue System(MTDS) framework, in which a few expert bots are combined to generate the response for a given dialogue context. MTDS consists of a chair bot and several expert bots. Each expert bot is specialized for a particular situation, e.g., one domain, one type of action of a system, etc. The chair bot coordinates multiple expert bots and adaptively selects an expert bot to generate the appropriate response. We further propose a Token-level Mixture-of-Expert (TokenMoE) model to implement MTDS, where the expert bots predict multiple tokens at each timestamp and the chair bot determines the final generated token by fully taking into consideration the outputs of all expert bots. Both the chair bot and the expert bots are jointly trained in an end-to-end fashion. To verify the effectiveness of TokenMoE, we carry out extensive experiments on a benchmark dataset. Compared with the baseline using a single-module model, our TokenMoE improves the performance by 8.1% of inform rate and 0.8% of success rate.
Tasks	Task-Oriented Dialogue Systems
Published	2019-07-10
URL	https://arxiv.org/abs/1907.05346v1
PDF	https://arxiv.org/pdf/1907.05346v1.pdf
PWC	https://paperswithcode.com/paper/a-modular-task-oriented-dialogue-system-using
Repo	https://github.com/budzianowski/multiwoz
Framework	pytorch

Incrementally Improving Graph WaveNet Performance on Traffic Prediction


Title	Incrementally Improving Graph WaveNet Performance on Traffic Prediction
Authors	Sam Shleifer, Clara McCreery, Vamsi Chitters
Abstract	We present a series of modifications which improve upon Graph WaveNet’s previously state-of-the-art performance on the METR-LA traffic prediction task. The goal of this task is to predict the future speed of traffic at each sensor in a network using the past hour of sensor readings. Graph WaveNet (GWN) is a spatio-temporal graph neural network which interleaves graph convolution to aggregate information from nearby sensors and dilated convolutions to aggregate information from the past. We improve GWN by (1) using better hyperparameters, (2) adding connections that allow larger gradients to flow back to the early convolutional layers, and (3) pretraining on an easier short-term traffic prediction task. These modifications reduce the mean absolute error by .06 on the METR-LA task, nearly equal to GWN’s improvement over its predecessor. These improvements generalize to the PEMS-BAY dataset, with similar relative magnitude. We also show that ensembling separate models for short-and long-term predictions further improves performance. Code is available at https://github.com/sshleifer/Graph-WaveNet .
Tasks	Traffic Prediction
Published	2019-12-11
URL	https://arxiv.org/abs/1912.07390v1
PDF	https://arxiv.org/pdf/1912.07390v1.pdf
PWC	https://paperswithcode.com/paper/incrementally-improving-graph-wavenet
Repo	https://github.com/nnzhan/Graph-WaveNet
Framework	pytorch

Learning meters of Arabic and English poems with Recurrent Neural Networks: a step forward for language understanding and synthesis


Title	Learning meters of Arabic and English poems with Recurrent Neural Networks: a step forward for language understanding and synthesis
Authors	Waleed A. Yousef, Omar M. Ibrahime, Taha M. Madbouly, Moustafa A. Mahmoud
Abstract	Recognizing a piece of writing as a poem or prose is usually easy for the majority of people; however, only specialists can determine which meter a poem belongs to. In this paper, we build Recurrent Neural Network (RNN) models that can classify poems according to their meters from plain text. The input text is encoded at the character level and directly fed to the models without feature handcrafting. This is a step forward for machine understanding and synthesis of languages in general, and Arabic language in particular. Among the 16 poem meters of Arabic and the 4 meters of English the networks were able to correctly classify poem with an overall accuracy of 96.38% and 82.31% respectively. The poem datasets used to conduct this research were massive, over 1.5 million of verses, and were crawled from different nontechnical sources, almost Arabic and English literature sites, and in different heterogeneous and unstructured formats. These datasets are now made publicly available in clean, structured, and documented format for other future research. To the best of the authors’ knowledge, this research is the first to address classifying poem meters in a machine learning approach, in general, and in RNN featureless based approach, in particular. In addition, the dataset is the first publicly available dataset ready for the purpose of future computational research.
Tasks
Published	2019-05-07
URL	https://arxiv.org/abs/1905.05700v1
PDF	https://arxiv.org/pdf/1905.05700v1.pdf
PWC	https://paperswithcode.com/paper/190505700
Repo	https://github.com/DrWaleedAYousef/My-Stuff-To-Share
Framework	none

Transfer Learning Using Ensemble Neural Networks for Organic Solar Cell Screening


Title	Transfer Learning Using Ensemble Neural Networks for Organic Solar Cell Screening
Authors	Arindam Paul, Dipendra Jha, Reda Al-Bahrani, Wei-keng Liao, Alok Choudhary, Ankit Agrawal
Abstract	Organic Solar Cells are a promising technology for solving the clean energy crisis in the world. However, generating candidate chemical compounds for solar cells is a time-consuming process requiring thousands of hours of laboratory analysis. For a solar cell, the most important property is the power conversion efficiency which is dependent on the highest occupied molecular orbitals (HOMO) values of the donor molecules. Recently, machine learning techniques have proved to be very useful in building predictive models for HOMO values of donor structures of Organic Photovoltaic Cells (OPVs). Since experimental datasets are limited in size, current machine learning models are trained on data derived from calculations based on density functional theory (DFT). Molecular line notations such as SMILES or InChI are popular input representations for describing the molecular structure of donor molecules. The two types of line representations encode different information, such as SMILES defines the bond types while InChi defines protonation. In this work, we present an ensemble deep neural network architecture, called SINet, which harnesses both the SMILES and InChI molecular representations to predict HOMO values and leverage the potential of transfer learning from a sizeable DFT-computed dataset- Harvard CEP to build more robust predictive models for relatively smaller HOPV datasets. Harvard CEP dataset contains molecular structures and properties for 2.3 million candidate donor structures for OPV while HOPV contains DFT-computed and experimental values of 350 and 243 molecules respectively. Our results demonstrate significant performance improvement from the use of transfer learning and leveraging both molecular representations.
Tasks	Transfer Learning
Published	2019-03-07
URL	https://arxiv.org/abs/1903.03178v4
PDF	https://arxiv.org/pdf/1903.03178v4.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-using-ensemble-neural-nets
Repo	https://github.com/paularindam/SINet
Framework	tf

Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs


Title	Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs
Authors	Denis Mazur, Vage Egiazarian, Stanislav Morozov, Artem Babenko
Abstract	Learning useful representations is a key ingredient to the success of modern machine learning. Currently, representation learning mostly relies on embedding data into Euclidean space. However, recent work has shown that data in some domains is better modeled by non-euclidean metric spaces, and inappropriate geometry can result in inferior performance. In this paper, we aim to eliminate the inductive bias imposed by the embedding space geometry. Namely, we propose to map data into more general non-vector metric spaces: a weighted graph with a shortest path distance. By design, such graphs can model arbitrary geometry with a proper configuration of edges and weights. Our main contribution is PRODIGE: a method that learns a weighted graph representation of data end-to-end by gradient descent. Greater generality and fewer model assumptions make PRODIGE more powerful than existing embedding-based approaches. We confirm the superiority of our method via extensive experiments on a wide range of tasks, including classification, compression, and collaborative filtering.
Tasks	Representation Learning
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03524v4
PDF	https://arxiv.org/pdf/1910.03524v4.pdf
PWC	https://paperswithcode.com/paper/beyond-vector-spaces-compact-data
Repo	https://github.com/stanis-morozov/prodige
Framework	pytorch

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation


Title	Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation
Authors	Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan
Abstract	Cross-view image translation is challenging because it involves images with drastically different views and severe deformation. In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map. The proposed SelectionGAN explicitly utilizes the semantic information and consists of two stages. In the first stage, the condition image and the target semantic map are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using a multi-channel attention selection mechanism. Moreover, uncertainty maps automatically learned from attentions are used to guide the pixel loss for better network optimization. Extensive experiments on Dayton, CVUSA and Ego2Top datasets show that our model is able to generate significantly better results than the state-of-the-art methods. The source code, data and trained models are available at https://github.com/Ha0Tang/SelectionGAN.
Tasks	Bird View Synthesis, Cross-View Image-to-Image Translation, Image-to-Image Translation
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06807v2
PDF	http://arxiv.org/pdf/1904.06807v2.pdf
PWC	https://paperswithcode.com/paper/multi-channel-attention-selection-gan-with
Repo	https://github.com/Ha0Tang/HandGestureRecognition
Framework	none

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes


Title	Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Authors	Nathan Kallus, Masatoshi Uehara
Abstract	Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of $q$-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08526v2
PDF	https://arxiv.org/pdf/1908.08526v2.pdf
PWC	https://paperswithcode.com/paper/double-reinforcement-learning-for-efficient
Repo	https://github.com/CausalML/DoubleReinforcementLearningMDP
Framework	none