October 20, 2019

3239 words 16 mins read

Paper Group AWR 246

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features. Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning. Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition. tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow. Towards Automated Deep …

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features


Title	Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features
Authors	Xu Yang, Hanwang Zhang, Jianfei Cai
Abstract	Due to the fact that it is prohibitively expensive to completely annotate visual relationships, i.e., the (obj1, rel, obj2) triplets, relationship models are inevitably biased to object classes of limited pairwise patterns, leading to poor generalization to rare or unseen object combinations. Therefore, we are interested in learning object-agnostic visual features for more generalizable relationship models. By “agnostic”, we mean that the feature is less likely biased to the classes of paired objects. To alleviate the bias, we propose a novel \texttt{Shuffle-Then-Assemble} pre-training strategy. First, we discard all the triplet relationship annotations in an image, leaving two unpaired object domains without obj1-obj2 alignment. Then, our feature learning is to recover possible obj1-obj2 pairs. In particular, we design a cycle of residual transformations between the two domains, to capture shared but not object-specific visual patterns. Extensive experiments on two visual relationship benchmarks show that by using our pre-trained features, naive relationship models can be consistently improved and even outperform other state-of-the-art relationship models. Code has been made available at: \url{https://github.com/yangxuntu/vrd}.
Tasks
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00171v1
PDF	http://arxiv.org/pdf/1808.00171v1.pdf
PWC	https://paperswithcode.com/paper/shuffle-then-assemble-learning-object
Repo	https://github.com/yangxuntu/vrd
Framework	tf

Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning


Title	Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning
Authors	Leonardo León, Felipe Moreno-Vera, Renato Castro, José Navío, Marco Capcha
Abstract	Currently, there is an increase in the number of Peruvian families living in apartments instead of houses for the lots of advantage; However, in some cases there are troubles such as robberies of goods that are usually left at the parking lots or the entrance of strangers that use the tenants parking lots (this last trouble sometimes is related to kidnappings or robberies in building apartments). Due to these problems, the use of a self-driving mini-car is proposed to implement a monitoring system of license plates in an underground garage inside a building using a deep learning model with the aim of recording the vehicles and identifying their owners if they were tenants or not. In addition, the small robot has its own location system using beacons that allow us to identify the position of the parking lot corresponding to each tenant of the building while the mini-car is on its way. Finally, one of the objectives of this work is to build a low-cost mini-robot that would replace expensive cameras or work together in order to keep safe the goods of tenants.
Tasks
Published	2018-09-01
URL	https://arxiv.org/abs/1809.00251v3
PDF	https://arxiv.org/pdf/1809.00251v3.pdf
PWC	https://paperswithcode.com/paper/car-monitoring-system-in-apartment-garages-by
Repo	https://github.com/renatocastro33/Car-Monitoring-System-in-Apartment-Garages-by-Small-Autonomous-Car-using-Deep-Learning
Framework	none

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition


Title	Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition
Authors	Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy
Abstract	Recognizing visual relationships among any pair of localized objects is pivotal for image understanding. Previous studies have shown remarkable progress in exploiting linguistic priors or external textual information to improve the performance. In this work, we investigate an orthogonal perspective based on feature interactions. We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors. To this end, we present two new pooling cells to encourage feature interactions: (i) Contrastive ROI Pooling Cell, which has a unique deROI pooling that inversely pools local object features to the corresponding area of global predicate features. (ii) Pyramid ROI Pooling Cell, which broadcasts global predicate features to reinforce local object features.The two cells constitute a Spatiality-Context-Appearance Module (SCA-M), which can be further stacked consecutively to form our final Zoom-Net.We further shed light on how one could resolve ambiguous and noisy object and predicate annotations by Intra-Hierarchical trees (IH-tree). Extensive experiments conducted on Visual Genome dataset demonstrate the effectiveness of our feature-oriented approach compared to state-of-the-art methods (Acc@1 11.42% from 8.16%) that depend on explicit modeling of linguistic interactions. We further show that SCA-M can be incorporated seamlessly into existing approaches to improve the performance by a large margin. The source code will be released on https://github.com/gjyin91/ZoomNet.
Tasks
Published	2018-07-13
URL	http://arxiv.org/abs/1807.04979v1
PDF	http://arxiv.org/pdf/1807.04979v1.pdf
PWC	https://paperswithcode.com/paper/zoom-net-mining-deep-feature-interactions-for
Repo	https://github.com/gjyin91/ZoomNet
Framework	none

tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow


Title	tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow
Authors	You Xie, Erik Franz, Mengyu Chu, Nils Thuerey
Abstract	We propose a temporally coherent generative model addressing the super-resolution problem for fluid flows. Our work represents a first approach to synthesize four-dimensional physics fields with neural networks. Based on a conditional generative adversarial network that is designed for the inference of three-dimensional volumetric data, our model generates consistent and detailed results by using a novel temporal discriminator, in addition to the commonly used spatial one. Our experiments show that the generator is able to infer more realistic high-resolution details by using additional physical quantities, such as low-resolution velocities or vorticities. Besides improvements in the training process and in the generated outputs, these inputs offer means for artistic control as well. We additionally employ a physics-aware data augmentation step, which is crucial to avoid overfitting and to reduce memory requirements. In this way, our network learns to generate advected quantities with highly detailed, realistic, and temporally coherent features. Our method works instantaneously, using only a single time-step of low-resolution fluid data. We demonstrate the abilities of our method using a variety of complex inputs and applications in two and three dimensions.
Tasks	Data Augmentation, Super-Resolution
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09710v2
PDF	http://arxiv.org/pdf/1801.09710v2.pdf
PWC	https://paperswithcode.com/paper/tempogan-a-temporally-coherent-volumetric-gan
Repo	https://github.com/thunil/tempoGAN
Framework	tf

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search


Title	Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search
Authors	Arber Zela, Aaron Klein, Stefan Falkner, Frank Hutter
Abstract	While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal. Likewise, we demonstrate that the common practice of using very few epochs during the main NAS and much larger numbers of epochs during a post-processing step is inefficient due to little correlation in the relative rankings for these two training regimes. To combat both of these problems, we propose to use a recent combination of Bayesian optimization and Hyperband for efficient joint neural architecture and hyperparameter search.
Tasks	Neural Architecture Search
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06906v1
PDF	http://arxiv.org/pdf/1807.06906v1.pdf
PWC	https://paperswithcode.com/paper/towards-automated-deep-learning-efficient
Repo	https://github.com/arberzela/EfficientNAS
Framework	pytorch

Unsupervised Image Captioning


Title	Unsupervised Image Captioning
Authors	Yang Feng, Lin Ma, Wei Liu, Jiebo Luo
Abstract	Deep neural networks have achieved great successes on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire. In this paper, we make the first attempt to train an image captioning model in an unsupervised manner. Instead of relying on manually labeled image-sentence pairs, our proposed model merely requires an image set, a sentence corpus, and an existing visual concept detector. The sentence corpus is used to teach the captioning model how to generate plausible sentences. Meanwhile, the knowledge in the visual concept detector is distilled into the captioning model to guide the model to recognize the visual concepts in an image. In order to further encourage the generated captions to be semantically consistent with the image, the image and caption are projected into a common latent space so that they can reconstruct each other. Given that the existing sentence corpora are mainly designed for linguistic research and are thus with little reference to image contents, we crawl a large-scale image description corpus of two million natural sentences to facilitate the unsupervised image captioning scenario. Experimental results show that our proposed model is able to produce quite promising results without any caption annotations.
Tasks	Image Captioning
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10787v2
PDF	http://arxiv.org/pdf/1811.10787v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-image-captioning
Repo	https://github.com/fengyang0317/unsupervised_captioning
Framework	tf

Attention Branch Network: Learning of Attention Mechanism for Visual Explanation


Title	Attention Branch Network: Learning of Attention Mechanism for Visual Explanation
Authors	Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi
Abstract	Visual explanation enables human to understand the decision making of Deep Convolutional Neural Network (CNN), but it is insufficient to contribute the performance improvement. In this paper, we focus on the attention map for visual explanation, which represents high response value as the important region in image recognition. This region significantly improves the performance of CNN by introducing an attention mechanism that focuses on a specific region in an image. In this work, we propose Attention Branch Network (ABN), which extends the top-down visual explanation model by introducing a branch structure with an attention mechanism. ABN can be applicable to several image recognition tasks by introducing a branch for attention mechanism and is trainable for the visual explanation and image recognition in end-to-end manner. We evaluate ABN on several image recognition tasks such as image classification, fine-grained recognition, and multiple facial attributes recognition. Experimental results show that ABN can outperform the accuracy of baseline models on these image recognition tasks while generating an attention map for visual explanation. Our code is available at https://github.com/machine-perception-robotics-group/attention_branch_network.
Tasks	Decision Making, Image Classification
Published	2018-12-25
URL	http://arxiv.org/abs/1812.10025v2
PDF	http://arxiv.org/pdf/1812.10025v2.pdf
PWC	https://paperswithcode.com/paper/attention-branch-network-learning-of
Repo	https://github.com/machine-perception-robotics-group/ABN_CelebA
Framework	none

Boosting Handwriting Text Recognition in Small Databases with Transfer Learning


Title	Boosting Handwriting Text Recognition in Small Databases with Transfer Learning
Authors	José Carlos Aradillas, Juan José Murillo-Fuentes, Pablo M. Olmos
Abstract	In this paper we deal with the offline handwriting text recognition (HTR) problem with reduced training datasets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classification (CTC) is the key to avoid segmentation at character level, greatly facilitating the labeling task. One of the main drawbacks of the CNNLSTM-CTC (CLC) solutions is that they need a considerable part of the text to be transcribed for every type of calligraphy, typically in the order of a few thousands of lines. Furthermore, in some scenarios the text to transcribe is not that long, e.g. in the Washington database. The CLC typically overfits for this reduced number of training samples. Our proposal is based on the transfer learning (TL) from the parameters learned with a bigger database. We first investigate, for a reduced and fixed number of training samples, 350 lines, how the learning from a large database, the IAM, can be transferred to the learning of the CLC of a reduced database, Washington. We focus on which layers of the network could be not re-trained. We conclude that the best solution is to re-train the whole CLC parameters initialized to the values obtained after the training of the CLC from the larger database. We also investigate results when the training size is further reduced. The differences in the CER are more remarkable when training with just 350 lines, a CER of 3.3% is achieved with TL while we have a CER of 18.2% when training from scratch. As a byproduct, the learning times are quite reduced. Similar good results are obtained from the Parzival database when trained with this reduced number of lines and this new approach.
Tasks	Transfer Learning
Published	2018-04-04
URL	http://arxiv.org/abs/1804.01527v1
PDF	http://arxiv.org/pdf/1804.01527v1.pdf
PWC	https://paperswithcode.com/paper/boosting-handwriting-text-recognition-in
Repo	https://github.com/SamuelNguyen1998/Vietnamese_Handwriting_Recognition
Framework	tf

Learning Semantic Textual Similarity from Conversations


Title	Learning Semantic Textual Similarity from Conversations
Authors	Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
Abstract	We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and SemEval 2017’s Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.
Tasks	Community Question Answering, Natural Language Inference, Question Answering, Question Similarity, Semantic Similarity, Semantic Textual Similarity, Sentence Embeddings
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07754v1
PDF	http://arxiv.org/pdf/1804.07754v1.pdf
PWC	https://paperswithcode.com/paper/learning-semantic-textual-similarity-from
Repo	https://github.com/nickyeolk/info_retrieve
Framework	tf

Adversarial Domain Adaptation for Duplicate Question Detection


Title	Adversarial Domain Adaptation for Duplicate Question Detection
Authors	Darsh J Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, Preslav Nakov
Abstract	We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. As finding and annotating such potential duplicates manually is very tedious and costly, automatic methods based on machine learning are a viable alternative. However, many forums do not have annotated data, i.e., questions labeled by experts as duplicates, and thus a promising solution is to use domain adaptation from another forum that has such annotations. Here we focus on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard. Our experiments with StackExchange data show an average improvement of 5.6% over the best baseline across multiple pairs of domains.
Tasks	Domain Adaptation, Question Similarity
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02255v1
PDF	http://arxiv.org/pdf/1809.02255v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-domain-adaptation-for-duplicate
Repo	https://github.com/darsh10/qra_code
Framework	pytorch

Infinite-Horizon Gaussian Processes


Title	Infinite-Horizon Gaussian Processes
Authors	Arno Solin, James Hensman, Richard E. Turner
Abstract	Gaussian processes provide a flexible framework for forecasting, removing noise, and interpreting long temporal datasets. State space modelling (Kalman filtering) enables these non-parametric models to be deployed on long datasets by reducing the complexity to linear in the number of data points. The complexity is still cubic in the state dimension $m$ which is an impediment to practical application. In certain special cases (Gaussian likelihood, regular spacing) the GP posterior will reach a steady posterior state when the data are very long. We leverage this and formulate an inference scheme for GPs with general likelihoods, where inference is based on single-sweep EP (assumed density filtering). The infinite-horizon model tackles the cubic cost in the state dimensionality and reduces the cost in the state dimension $m$ to $\mathcal{O}(m^2)$ per data point. The model is extended to online-learning of hyperparameters. We show examples for large finite-length modelling problems, and present how the method runs in real-time on a smartphone on a continuous data stream updated at 100~Hz.
Tasks	Gaussian Processes
Published	2018-11-15
URL	http://arxiv.org/abs/1811.06588v1
PDF	http://arxiv.org/pdf/1811.06588v1.pdf
PWC	https://paperswithcode.com/paper/infinite-horizon-gaussian-processes
Repo	https://github.com/AaltoML/IHGP
Framework	none

Structure-Infused Copy Mechanisms for Abstractive Summarization


Title	Structure-Infused Copy Mechanisms for Abstractive Summarization
Authors	Kaiqiang Song, Lin Zhao, Fei Liu
Abstract	Seq2seq learning has produced promising results on summarization. However, in many cases, system summaries still struggle to keep the meaning of the original intact. They may miss out important words or relations that play critical roles in the syntactic structure of source sentences. In this paper, we present structure-infused copy mechanisms to facilitate copying important words and relations from the source sentence to summary sentence. The approach naturally combines source dependency structure with the copy mechanism of an abstractive sentence summarizer. Experimental results demonstrate the effectiveness of incorporating source-side syntactic information in the system, and our proposed approach compares favorably to state-of-the-art methods.
Tasks	Abstractive Text Summarization
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05658v2
PDF	http://arxiv.org/pdf/1806.05658v2.pdf
PWC	https://paperswithcode.com/paper/structure-infused-copy-mechanisms-for
Repo	https://github.com/KaiQiangSong/struct_infused_summ
Framework	none

A Comparative Measurement Study of Deep Learning as a Service Framework


Title	A Comparative Measurement Study of Deep Learning as a Service Framework
Authors	Yanzhao Wu, Ling Liu, Calton Pu, Wenqi Cao, Semih Sahin, Wenqi Wei, Qi Zhang
Abstract	Big data powered Deep Learning (DL) and its applications have blossomed in recent years, fueled by three technological trends: a large amount of digitized data openly accessible, a growing number of DL software frameworks in open source and commercial markets, and a selection of affordable parallel computing hardware devices. However, no single DL framework, to date, dominates in terms of performance and accuracy even for baseline classification tasks on standard datasets, making the selection of a DL framework an overwhelming task. This paper takes a holistic approach to conduct empirical comparison and analysis of four representative DL frameworks with three unique contributions. First, given a selection of CPU-GPU configurations, we show that for a specific DL framework, different configurations of its hyper-parameters may have a significant impact on both performance and accuracy of DL applications. Second, to the best of our knowledge, this study is the first to identify the opportunities for improving the training time performance and the accuracy of DL frameworks by configuring parallel computing libraries and tuning individual and multiple hyper-parameters. Third, we also conduct a comparative measurement study on the resource consumption patterns of four DL frameworks and their performance and accuracy implications, including CPU and memory usage, and their correlations to varying settings of hyper-parameters under different configuration combinations of hardware, parallel computing libraries. We argue that this measurement study provides in-depth empirical comparison and analysis of four representative DL frameworks, and offers practical guidance for service providers to deploying and delivering DL as a Service (DLaaS) and for application developers and DLaaS consumers to select the right DL frameworks for the right DL workloads.
Tasks
Published	2018-10-29
URL	https://arxiv.org/abs/1810.12210v2
PDF	https://arxiv.org/pdf/1810.12210v2.pdf
PWC	https://paperswithcode.com/paper/a-comparative-measurement-study-of-deep
Repo	https://github.com/git-disl/GTDLBench
Framework	tf

Yes, but Did It Work?: Evaluating Variational Inference


Title	Yes, but Did It Work?: Evaluating Variational Inference
Authors	Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman
Abstract	While it’s always possible to compute a variational approximation to a posterior distribution, it can be difficult to discover problems with this approximation. We propose two diagnostic algorithms to alleviate this problem. The Pareto-smoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulation-based calibration (VSBC) assesses the average performance of point estimates.
Tasks	Calibration
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02538v2
PDF	http://arxiv.org/pdf/1802.02538v2.pdf
PWC	https://paperswithcode.com/paper/yes-but-did-it-work-evaluating-variational
Repo	https://github.com/yao-yl/Evaluating-Variational-Inference
Framework	none

Disfluency Detection using Auto-Correlational Neural Networks


Title	Disfluency Detection using Auto-Correlational Neural Networks
Authors	Paria Jamshid Lou, Peter Anderson, Mark Johnson
Abstract	In recent years, the natural language processing community has moved away from task-specific feature engineering, i.e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves. However, state-of-the-art approaches to disfluency detection in spontaneous speech transcripts currently still depend on an array of hand-crafted features, and other representations derived from the output of pre-existing systems such as language models or dependency parsers. As an alternative, this paper proposes a simple yet effective model for automatic disfluency detection, called an auto-correlational neural network (ACNN). The model uses a convolutional neural network (CNN) and augments it with a new auto-correlation operator at the lowest layer that can capture the kinds of “rough copy” dependencies that are characteristic of repair disfluencies in speech. In experiments, the ACNN model outperforms the baseline CNN on a disfluency detection task with a 5% increase in f-score, which is close to the previous best result on this task.
Tasks	Feature Engineering
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09092v2
PDF	http://arxiv.org/pdf/1808.09092v2.pdf
PWC	https://paperswithcode.com/paper/disfluency-detection-using-auto-correlational
Repo	https://github.com/pariajm/Deep-Disfluency-Detection-Model
Framework	tf