February 1, 2020

3315 words 16 mins read

Paper Group AWR 261

A Survey on Document-level Machine Translation: Methods and Evaluation. Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks. Exploiting BERT for End-to-End Aspect-based Sentiment Analysis. Model Development Process. Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy. Semantically Tied Pa …

A Survey on Document-level Machine Translation: Methods and Evaluation


Title	A Survey on Document-level Machine Translation: Methods and Evaluation
Authors	Sameen Maruf, Fahimeh Saleh, Gholamreza Haffari
Abstract	Machine translation (MT) is an important task in natural language processing (NLP) as it automates the translation process and reduces the reliance on human translators. With the advent of neural networks, the translation quality surpasses that of the translations obtained using statistical techniques. Up until three years ago, all neural translation models translated sentences independently, without incorporating any extra-sentential information. The aim of this paper is to highlight the major works that have been undertaken in the space of document-level machine translation before and after the neural revolution so that researchers can recognise where we started from and which direction we are heading in. When talking about the literature in statistical machine translation (SMT), we focus on works which have tried to improve the translation of specific discourse phenomena, while in neural machine translation (NMT), we focus on works which use the wider context explicitly. In addition to this, we also cover the evaluation strategies that have been introduced to account for the improvements in this domain.
Tasks	Machine Translation
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08494v1
PDF	https://arxiv.org/pdf/1912.08494v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-document-level-machine
Repo	https://github.com/SFFAI-AIKT/AIKT-Natural_Language_Processing
Framework	none

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks


Title	Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks
Authors	Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley
Abstract	Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed S-D approach achieves a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural network PSNR of 15.3 dB and 12.2 dB, respectively and achieves a PSNR of 13.2 dB in source separation together with deconvolution, outperforming a convolutive non-negative matrix factorization (NMF) baseline of 10.1 dB.
Tasks	Image Inpainting
Published	2019-06-14
URL	https://arxiv.org/abs/1906.07552v1
PDF	https://arxiv.org/pdf/1906.07552v1.pdf
PWC	https://paperswithcode.com/paper/single-channel-signal-separation-and
Repo	https://github.com/qiuqiangkong/gan_separation_deconvolution
Framework	pytorch

Exploiting BERT for End-to-End Aspect-based Sentiment Analysis


Title	Exploiting BERT for End-to-End Aspect-based Sentiment Analysis
Authors	Xin Li, Lidong Bing, Wenxuan Zhang, Wai Lam
Abstract	In this paper, we investigate the modeling power of contextualized embeddings from pre-trained language models, e.g. BERT, on the E2E-ABSA task. Specifically, we build a series of simple yet insightful neural baselines to deal with E2E-ABSA. The experimental results show that even with a simple linear classification layer, our BERT-based architecture can outperform state-of-the-art works. Besides, we also standardize the comparative study by consistently utilizing a hold-out validation dataset for model selection, which is largely ignored by previous works. Therefore, our work can serve as a BERT-based benchmark for E2E-ABSA.
Tasks	Aspect-Based Sentiment Analysis, Model Selection, Sentiment Analysis
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00883v2
PDF	https://arxiv.org/pdf/1910.00883v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-bert-for-end-to-end-aspect-based
Repo	https://github.com/lixin4ever/BERT-E2E-ABSA
Framework	pytorch

Model Development Process


Title	Model Development Process
Authors	Przemyslaw Biecek
Abstract	Predictive modeling has an increasing number of applications in various fields. High demand for predictive models drives creation of tools that automate and support work of data scientist on the model development. To better understand what can be automated we need first a description of the model life-cycle. In this paper we propose a generic Model Development Process (MDP). This process is inspired by Rational Unified Process (RUP) which was designed for software development. There are other approached to process description, like CRISP DM or ASUM DM, in this paper we discuss similarities and differences between these methodologies. We believe that the proposed open standard for model development will facilitate creation of tools for automation of model training, testing and maintaining.
Tasks
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04461v1
PDF	https://arxiv.org/pdf/1907.04461v1.pdf
PWC	https://paperswithcode.com/paper/model-development-process
Repo	https://github.com/ModelOriented/ModelDevelopmentProcess
Framework	none

Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy


Title	Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy
Authors	Qing Yu, Kiyoharu Aizawa
Abstract	Since deep learning models have been implemented in many commercial applications, it is important to detect out-of-distribution (OOD) inputs correctly to maintain the performance of the models, ensure the quality of the collected data, and prevent the applications from being used for other-than-intended purposes. In this work, we propose a two-head deep convolutional neural network (CNN) and maximize the discrepancy between the two classifiers to detect OOD inputs. We train a two-head CNN consisting of one common feature extractor and two classifiers which have different decision boundaries but can classify in-distribution (ID) samples correctly. Unlike previous methods, we also utilize unlabeled data for unsupervised training and we use these unlabeled data to maximize the discrepancy between the decision boundaries of two classifiers to push OOD samples outside the manifold of the in-distribution (ID) samples, which enables us to detect OOD samples that are far from the support of the ID samples. Overall, our approach significantly outperforms other state-of-the-art methods on several OOD detection benchmarks and two cases of real-world simulation.
Tasks	Out-of-Distribution Detection
Published	2019-08-14
URL	https://arxiv.org/abs/1908.04951v1
PDF	https://arxiv.org/pdf/1908.04951v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-out-of-distribution-detection-by
Repo	https://github.com/Mephisto405/Unsupervised-Out-of-Distribution-Detection-by-Maximum-Classifier-Discrepancy
Framework	pytorch

Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval


Title	Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
Authors	Anjan Dutta, Zeynep Akata
Abstract	Zero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.
Tasks	Feature Selection, Image Retrieval, Sketch-Based Image Retrieval
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03372v1
PDF	http://arxiv.org/pdf/1903.03372v1.pdf
PWC	https://paperswithcode.com/paper/semantically-tied-paired-cycle-consistency
Repo	https://github.com/AnjanDutta/sem-pcyc
Framework	pytorch

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction


Title	DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction
Authors	Hamed R. Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala
Abstract	This paper studies audio-visual deep saliency prediction. It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed `DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named` AVE”. Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency models only consider visual cues and neglect the auditory information that is ubiquitous in dynamic scenes. Here, we investigate the applicability of audio cues in conjunction with visual ones in predicting saliency maps using deep neural networks. To this end, the proposed model is intentionally designed to be simple. Two baseline models are developed on the same architecture which consists of an encoder-decoder. The encoder projects the input into a feature space followed by a decoder that infers saliency. We conduct an extensive analysis on different modalities and various aspects of multi-model dynamic saliency prediction. Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53.54% of the frames predicted by the best Visual model (our baseline). Our endeavour demonstrates that audio is an important cue that boosts dynamic video saliency prediction and helps models to approach human performance. The code is available at https://github.com/hrtavakoli/DAVE
Tasks	Eye Tracking, Saliency Prediction
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10693v2
PDF	https://arxiv.org/pdf/1905.10693v2.pdf
PWC	https://paperswithcode.com/paper/dave-a-deep-audio-visual-embedding-for
Repo	https://github.com/hrtavakoli/DAVE
Framework	none

How is Gaze Influenced by Image Transformations? Dataset and Model


Title	How is Gaze Influenced by Image Transformations? Dataset and Model
Authors	Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, Patrick Le Callet
Abstract	Data size is the bottleneck for developing deep saliency models, because collecting eye-movement data is very time consuming and expensive. Most of current studies on human attention and saliency modeling have used high quality stereotype stimuli. In real world, however, captured images undergo various types of transformations. Can we use these transformations to augment existing saliency datasets? Here, we first create a novel saliency dataset including fixations of 10 observers over 1900 images degraded by 19 types of transformations. Second, by analyzing eye movements, we find that observers look at different locations over transformed versus original images. Third, we utilize the new data over transformed images, called data augmentation transformation (DAT), to train deep saliency models. We find that label preserving DATs with negligible impact on human gaze boost saliency prediction, whereas some other DATs that severely impact human gaze degrade the performance. These label preserving valid augmentation transformations provide a solution to enlarge existing saliency datasets. Finally, we introduce a novel saliency model based on generative adversarial network (dubbed GazeGAN). A modified UNet is proposed as the generator of the GazeGAN, which combines classic skip connections with a novel center-surround connection (CSC), in order to leverage multi level features. We also propose a histogram loss based on Alternative Chi Square Distance (ACS HistLoss) to refine the saliency map in terms of luminance distribution. Extensive experiments and comparisons over 3 datasets indicate that GazeGAN achieves the best performance in terms of popular saliency evaluation metrics, and is more robust to various perturbations. Our code and data are available at: https://github.com/CZHQuality/Sal-CFS-GAN.
Tasks	Data Augmentation, Saliency Prediction
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06803v4
PDF	https://arxiv.org/pdf/1905.06803v4.pdf
PWC	https://paperswithcode.com/paper/leverage-eye-movement-data-for-saliency
Repo	https://github.com/CZHQuality/Sal-CFS-GAN
Framework	tf


Title	Simulation and Augmentation of Social Networks for Building Deep Learning Models
Authors	Akanda Wahid -Ul- Ashraf, Marcin Budka, Katarzyna Musial
Abstract	A limitation of the Graph Convolutional Networks (GCNs) is that it assumes at a particular $l^{th}$ layer of the neural network model only the $l^{th}$ order neighbourhood nodes of a social network are influential. Furthermore, the GCN has been evaluated on citation and knowledge graphs, but not extensively on friendship-based social graphs. The drawback associated with the dependencies between layers and the order of node neighbourhood for the GCN can be more prevalent for friendship-based graphs. The evaluation of the full potential of the GCN on friendship-based social network requires openly available datasets in larger quantities. However, most available social network datasets are not complete. Also, the majority of the available social network datasets do not contain both the features and ground truth labels. In this work, firstly, we provide a guideline on simulating dynamic social networks, with ground truth labels and features, both coupled with the topology. Secondly, we introduce an open-source Python-based simulation library. We argue that the topology of the network is driven by a set of latent variables, termed as the social DNA (sDNA). We consider the sDNA as labels for the nodes. Finally, by evaluating on our simulated datasets, we propose four new variants of the GCN, mainly to overcome the limitation of dependency between the order of node-neighbourhood and a particular layer of the model. We then evaluate the performance of all the models and our results show that on 27 out of the 30 simulated datasets our proposed GCN variants outperform the original model.
Tasks	Knowledge Graphs
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09087v3
PDF	https://arxiv.org/pdf/1905.09087v3.pdf
PWC	https://paperswithcode.com/paper/simulation-and-augmentation-of-social
Repo	https://github.com/AkandaAshraf/VirtualSoc
Framework	none

Shearlets as Feature Extractor for Semantic Edge Detection: The Model-Based and Data-Driven Realm


Title	Shearlets as Feature Extractor for Semantic Edge Detection: The Model-Based and Data-Driven Realm
Authors	Héctor Andrade-Loarca, Gitta Kutyniok, Ozan Öktem
Abstract	Semantic edge detection has recently gained a lot of attention as an image processing task, mainly due to its wide range of real-world applications. This is based on the fact that edges in images contain most of the semantic information. Semantic edge detection involves two tasks, namely pure edge detecion and edge classification. Those are in fact fundamentally distinct in terms of the level of abstraction that each task requires, which is known as the distracted supervision paradox that limits the possible performance of a supervised model in semantic edge detection. In this work, we will present a novel hybrid method to avoid the distracted supervision paradox and achieve high-performance in semantic edge detection. Our approach is based on a combination of the model-based concept of shearlets, which provides probably optimally sparse approximations of a model-class of images, and the data-driven method of a suitably designed convolutional neural netwok. Finally, we present several applications such as tomographic reconstruction and show that our approach signifiantly outperforms former methods, thereby indicating the value of such hybrid methods for the area in biomedical imaging.
Tasks	Edge Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12159v1
PDF	https://arxiv.org/pdf/1911.12159v1.pdf
PWC	https://paperswithcode.com/paper/shearlets-as-feature-extractor-for-semantic
Repo	https://github.com/arsenal9971/shearlet_semantic_edge
Framework	pytorch

AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks


Title	AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks
Authors	Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe
Abstract	State-of-the-art methods in the unpaired image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data. Though the existing methods have achieved promising results, they still produce unsatisfied artifacts, being able to convert low-level information while limited in transforming high-level semantics of input images. One possible reason is that generators do not have the ability to perceive the most discriminative semantic parts between the source and target domains, thus making the generated images low quality. In this paper, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired image-to-image translation task. AttentionGAN can identify the most discriminative semantic objects and minimize changes of unwanted parts for semantic manipulation problems without using extra data and models. The attention-guided generators in AttentionGAN are able to produce attention masks via a built-in attention mechanism, and then fuse the generation output with the attention masks to obtain high-quality target images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks, demonstrating that the proposed model is effective to generate sharper and more realistic images compared with existing competitive models. The source code for the proposed AttentionGAN is available at https://github.com/Ha0Tang/AttentionGAN.
Tasks	Image-to-Image Translation
Published	2019-11-27
URL	https://arxiv.org/abs/1911.11897v4
PDF	https://arxiv.org/pdf/1911.11897v4.pdf
PWC	https://paperswithcode.com/paper/attentiongan-unpaired-image-to-image
Repo	https://github.com/Ha0Tang/AttentionGAN
Framework	pytorch

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B


Title	Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B
Authors	Jiaming Luo, Yuan Cao, Regina Barzilay
Abstract	In this paper we propose a novel neural approach for automatic decipherment of lost languages. To compensate for the lack of strong supervision signal, our model design is informed by patterns in language change documented in historical linguistics. The model utilizes an expressive sequence-to-sequence model to capture character-level correspondences between cognates. To effectively train the model in an unsupervised manner, we innovate the training procedure by formalizing it as a minimum-cost flow problem. When applied to the decipherment of Ugaritic, we achieve a 5.5% absolute improvement over state-of-the-art results. We also report the first automatic results in deciphering Linear B, a syllabic language related to ancient Greek, where our model correctly translates 67.3% of cognates.
Tasks
Published	2019-06-16
URL	https://arxiv.org/abs/1906.06718v1
PDF	https://arxiv.org/pdf/1906.06718v1.pdf
PWC	https://paperswithcode.com/paper/neural-decipherment-via-minimum-cost-flow
Repo	https://github.com/j-luo93/NeuroDecipher
Framework	pytorch

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips


Title	HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Authors	Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic
Abstract	Learning text-video embeddings usually requires a dataset of video clips with manually provided captions. However, such datasets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations. The contributions of this work are three-fold. First, we introduce HowTo100M: a large-scale dataset of 136 million video clips sourced from 1.22M narrated instructional web videos depicting humans performing and describing over 23k different visual tasks. Our data collection procedure is fast, scalable and does not require any additional manual annotation. Second, we demonstrate that a text-video embedding trained on this data leads to state-of-the-art results for text-to-video retrieval and action localization on instructional video datasets such as YouCook2 or CrossTask. Finally, we show that this embedding transfers well to other domains: fine-tuning on generic Youtube videos (MSR-VTT dataset) and movies (LSMDC dataset) outperforms models trained on these datasets alone. Our dataset, code and models will be publicly available at: www.di.ens.fr/willow/research/howto100m/.
Tasks	Action Localization, Video Retrieval
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03327v2
PDF	https://arxiv.org/pdf/1906.03327v2.pdf
PWC	https://paperswithcode.com/paper/howto100m-learning-a-text-video-embedding-by
Repo	https://github.com/antoine77340/S3D_HowTo100M
Framework	pytorch

Latent ODEs for Irregularly-Sampled Time Series


Title	Latent ODEs for Irregularly-Sampled Time Series
Authors	Yulia Rubanova, Ricky T. Q. Chen, David Duvenaud
Abstract	Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs. Furthermore, we use ODE-RNNs to replace the recognition network of the recently-proposed Latent ODE model. Both ODE-RNNs and Latent ODEs can naturally handle arbitrary time gaps between observations, and can explicitly model the probability of observation times using Poisson processes. We show experimentally that these ODE-based models outperform their RNN-based counterparts on irregularly-sampled data.
Tasks	Multivariate Time Series Forecasting, Multivariate Time Series Imputation, Time Series, Time Series Classification
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03907v1
PDF	https://arxiv.org/pdf/1907.03907v1.pdf
PWC	https://paperswithcode.com/paper/latent-odes-for-irregularly-sampled-time
Repo	https://github.com/MeetGandhi/Reconstruction-of-Trajectory-recorded-with-Missing-Markers
Framework	tf

CORE: Automatic Molecule Optimization Using Copy & Refine Strategy


Title	CORE: Automatic Molecule Optimization Using Copy & Refine Strategy
Authors	Tianfan Fu, Cao Xiao, Jimeng Sun
Abstract	Molecule optimization is about generating molecule $Y$ with more desirable properties based on an input molecule $X$. The state-of-the-art approaches partition the molecules into a large set of substructures $S$ and grow the new molecule structure by iteratively predicting which substructure from $S$ to add. However, since the set of available substructures $S$ is large, such an iterative prediction task is often inaccurate especially for substructures that are infrequent in the training data. To address this challenge, we propose a new generating strategy called “Copy & Refine” (CORE), where at each step the generator first decides whether to copy an existing substructure from input $X$ or to generate a new substructure, then the most promising substructure will be added to the new molecule. Combining together with scaffolding tree generation and adversarial training, CORE can significantly improve several latest molecule optimization methods in various measures including drug likeness (QED), dopamine receptor (DRD2) and penalized LogP. We tested CORE and baselines using the ZINC database and CORE obtained up to 11% and 21% relatively improvement over the baselines on success rate on the complete test set and the subset with infrequent substructures, respectively.
Tasks
Published	2019-11-23
URL	https://arxiv.org/abs/1912.05910v1
PDF	https://arxiv.org/pdf/1912.05910v1.pdf
PWC	https://paperswithcode.com/paper/core-automatic-molecule-optimization-using
Repo	https://github.com/futianfan/CORE
Framework	pytorch