Paper Group AWR 246
Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features. Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning. Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition. tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow. Towards Automated Deep …
Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features
Title | Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features |
Authors | Xu Yang, Hanwang Zhang, Jianfei Cai |
Abstract | Due to the fact that it is prohibitively expensive to completely annotate visual relationships, i.e., the (obj1, rel, obj2) triplets, relationship models are inevitably biased to object classes of limited pairwise patterns, leading to poor generalization to rare or unseen object combinations. Therefore, we are interested in learning object-agnostic visual features for more generalizable relationship models. By “agnostic”, we mean that the feature is less likely biased to the classes of paired objects. To alleviate the bias, we propose a novel \texttt{Shuffle-Then-Assemble} pre-training strategy. First, we discard all the triplet relationship annotations in an image, leaving two unpaired object domains without obj1-obj2 alignment. Then, our feature learning is to recover possible obj1-obj2 pairs. In particular, we design a cycle of residual transformations between the two domains, to capture shared but not object-specific visual patterns. Extensive experiments on two visual relationship benchmarks show that by using our pre-trained features, naive relationship models can be consistently improved and even outperform other state-of-the-art relationship models. Code has been made available at: \url{https://github.com/yangxuntu/vrd}. |
Tasks | |
Published | 2018-08-01 |
URL | http://arxiv.org/abs/1808.00171v1 |
http://arxiv.org/pdf/1808.00171v1.pdf | |
PWC | https://paperswithcode.com/paper/shuffle-then-assemble-learning-object |
Repo | https://github.com/yangxuntu/vrd |
Framework | tf |
Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning
Title | Car Monitoring System in Apartment Garages by Small Autonomous Car using Deep Learning |
Authors | Leonardo León, Felipe Moreno-Vera, Renato Castro, José Navío, Marco Capcha |
Abstract | Currently, there is an increase in the number of Peruvian families living in apartments instead of houses for the lots of advantage; However, in some cases there are troubles such as robberies of goods that are usually left at the parking lots or the entrance of strangers that use the tenants parking lots (this last trouble sometimes is related to kidnappings or robberies in building apartments). Due to these problems, the use of a self-driving mini-car is proposed to implement a monitoring system of license plates in an underground garage inside a building using a deep learning model with the aim of recording the vehicles and identifying their owners if they were tenants or not. In addition, the small robot has its own location system using beacons that allow us to identify the position of the parking lot corresponding to each tenant of the building while the mini-car is on its way. Finally, one of the objectives of this work is to build a low-cost mini-robot that would replace expensive cameras or work together in order to keep safe the goods of tenants. |
Tasks | |
Published | 2018-09-01 |
URL | https://arxiv.org/abs/1809.00251v3 |
https://arxiv.org/pdf/1809.00251v3.pdf | |
PWC | https://paperswithcode.com/paper/car-monitoring-system-in-apartment-garages-by |
Repo | https://github.com/renatocastro33/Car-Monitoring-System-in-Apartment-Garages-by-Small-Autonomous-Car-using-Deep-Learning |
Framework | none |
Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition
Title | Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition |
Authors | Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy |
Abstract | Recognizing visual relationships among any pair of localized objects is pivotal for image understanding. Previous studies have shown remarkable progress in exploiting linguistic priors or external textual information to improve the performance. In this work, we investigate an orthogonal perspective based on feature interactions. We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors. To this end, we present two new pooling cells to encourage feature interactions: (i) Contrastive ROI Pooling Cell, which has a unique deROI pooling that inversely pools local object features to the corresponding area of global predicate features. (ii) Pyramid ROI Pooling Cell, which broadcasts global predicate features to reinforce local object features.The two cells constitute a Spatiality-Context-Appearance Module (SCA-M), which can be further stacked consecutively to form our final Zoom-Net.We further shed light on how one could resolve ambiguous and noisy object and predicate annotations by Intra-Hierarchical trees (IH-tree). Extensive experiments conducted on Visual Genome dataset demonstrate the effectiveness of our feature-oriented approach compared to state-of-the-art methods (Acc@1 11.42% from 8.16%) that depend on explicit modeling of linguistic interactions. We further show that SCA-M can be incorporated seamlessly into existing approaches to improve the performance by a large margin. The source code will be released on https://github.com/gjyin91/ZoomNet. |
Tasks | |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.04979v1 |
http://arxiv.org/pdf/1807.04979v1.pdf | |
PWC | https://paperswithcode.com/paper/zoom-net-mining-deep-feature-interactions-for |
Repo | https://github.com/gjyin91/ZoomNet |
Framework | none |
tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow
Title | tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow |
Authors | You Xie, Erik Franz, Mengyu Chu, Nils Thuerey |
Abstract | We propose a temporally coherent generative model addressing the super-resolution problem for fluid flows. Our work represents a first approach to synthesize four-dimensional physics fields with neural networks. Based on a conditional generative adversarial network that is designed for the inference of three-dimensional volumetric data, our model generates consistent and detailed results by using a novel temporal discriminator, in addition to the commonly used spatial one. Our experiments show that the generator is able to infer more realistic high-resolution details by using additional physical quantities, such as low-resolution velocities or vorticities. Besides improvements in the training process and in the generated outputs, these inputs offer means for artistic control as well. We additionally employ a physics-aware data augmentation step, which is crucial to avoid overfitting and to reduce memory requirements. In this way, our network learns to generate advected quantities with highly detailed, realistic, and temporally coherent features. Our method works instantaneously, using only a single time-step of low-resolution fluid data. We demonstrate the abilities of our method using a variety of complex inputs and applications in two and three dimensions. |
Tasks | Data Augmentation, Super-Resolution |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09710v2 |
http://arxiv.org/pdf/1801.09710v2.pdf | |
PWC | https://paperswithcode.com/paper/tempogan-a-temporally-coherent-volumetric-gan |
Repo | https://github.com/thunil/tempoGAN |
Framework | tf |
Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search
Title | Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search |
Authors | Arber Zela, Aaron Klein, Stefan Falkner, Frank Hutter |
Abstract | While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal. Likewise, we demonstrate that the common practice of using very few epochs during the main NAS and much larger numbers of epochs during a post-processing step is inefficient due to little correlation in the relative rankings for these two training regimes. To combat both of these problems, we propose to use a recent combination of Bayesian optimization and Hyperband for efficient joint neural architecture and hyperparameter search. |
Tasks | Neural Architecture Search |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.06906v1 |
http://arxiv.org/pdf/1807.06906v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automated-deep-learning-efficient |
Repo | https://github.com/arberzela/EfficientNAS |
Framework | pytorch |
Unsupervised Image Captioning
Title | Unsupervised Image Captioning |
Authors | Yang Feng, Lin Ma, Wei Liu, Jiebo Luo |
Abstract | Deep neural networks have achieved great successes on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire. In this paper, we make the first attempt to train an image captioning model in an unsupervised manner. Instead of relying on manually labeled image-sentence pairs, our proposed model merely requires an image set, a sentence corpus, and an existing visual concept detector. The sentence corpus is used to teach the captioning model how to generate plausible sentences. Meanwhile, the knowledge in the visual concept detector is distilled into the captioning model to guide the model to recognize the visual concepts in an image. In order to further encourage the generated captions to be semantically consistent with the image, the image and caption are projected into a common latent space so that they can reconstruct each other. Given that the existing sentence corpora are mainly designed for linguistic research and are thus with little reference to image contents, we crawl a large-scale image description corpus of two million natural sentences to facilitate the unsupervised image captioning scenario. Experimental results show that our proposed model is able to produce quite promising results without any caption annotations. |
Tasks | Image Captioning |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10787v2 |
http://arxiv.org/pdf/1811.10787v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-image-captioning |
Repo | https://github.com/fengyang0317/unsupervised_captioning |
Framework | tf |
Attention Branch Network: Learning of Attention Mechanism for Visual Explanation
Title | Attention Branch Network: Learning of Attention Mechanism for Visual Explanation |
Authors | Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi |
Abstract | Visual explanation enables human to understand the decision making of Deep Convolutional Neural Network (CNN), but it is insufficient to contribute the performance improvement. In this paper, we focus on the attention map for visual explanation, which represents high response value as the important region in image recognition. This region significantly improves the performance of CNN by introducing an attention mechanism that focuses on a specific region in an image. In this work, we propose Attention Branch Network (ABN), which extends the top-down visual explanation model by introducing a branch structure with an attention mechanism. ABN can be applicable to several image recognition tasks by introducing a branch for attention mechanism and is trainable for the visual explanation and image recognition in end-to-end manner. We evaluate ABN on several image recognition tasks such as image classification, fine-grained recognition, and multiple facial attributes recognition. Experimental results show that ABN can outperform the accuracy of baseline models on these image recognition tasks while generating an attention map for visual explanation. Our code is available at https://github.com/machine-perception-robotics-group/attention_branch_network. |
Tasks | Decision Making, Image Classification |
Published | 2018-12-25 |
URL | http://arxiv.org/abs/1812.10025v2 |
http://arxiv.org/pdf/1812.10025v2.pdf | |
PWC | https://paperswithcode.com/paper/attention-branch-network-learning-of |
Repo | https://github.com/machine-perception-robotics-group/ABN_CelebA |
Framework | none |
Boosting Handwriting Text Recognition in Small Databases with Transfer Learning
Title | Boosting Handwriting Text Recognition in Small Databases with Transfer Learning |
Authors | José Carlos Aradillas, Juan José Murillo-Fuentes, Pablo M. Olmos |
Abstract | In this paper we deal with the offline handwriting text recognition (HTR) problem with reduced training datasets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classification (CTC) is the key to avoid segmentation at character level, greatly facilitating the labeling task. One of the main drawbacks of the CNNLSTM-CTC (CLC) solutions is that they need a considerable part of the text to be transcribed for every type of calligraphy, typically in the order of a few thousands of lines. Furthermore, in some scenarios the text to transcribe is not that long, e.g. in the Washington database. The CLC typically overfits for this reduced number of training samples. Our proposal is based on the transfer learning (TL) from the parameters learned with a bigger database. We first investigate, for a reduced and fixed number of training samples, 350 lines, how the learning from a large database, the IAM, can be transferred to the learning of the CLC of a reduced database, Washington. We focus on which layers of the network could be not re-trained. We conclude that the best solution is to re-train the whole CLC parameters initialized to the values obtained after the training of the CLC from the larger database. We also investigate results when the training size is further reduced. The differences in the CER are more remarkable when training with just 350 lines, a CER of 3.3% is achieved with TL while we have a CER of 18.2% when training from scratch. As a byproduct, the learning times are quite reduced. Similar good results are obtained from the Parzival database when trained with this reduced number of lines and this new approach. |
Tasks | Transfer Learning |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01527v1 |
http://arxiv.org/pdf/1804.01527v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-handwriting-text-recognition-in |
Repo | https://github.com/SamuelNguyen1998/Vietnamese_Handwriting_Recognition |
Framework | tf |
Learning Semantic Textual Similarity from Conversations
Title | Learning Semantic Textual Similarity from Conversations |
Authors | Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil |
Abstract | We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and SemEval 2017’s Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks. |
Tasks | Community Question Answering, Natural Language Inference, Question Answering, Question Similarity, Semantic Similarity, Semantic Textual Similarity, Sentence Embeddings |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07754v1 |
http://arxiv.org/pdf/1804.07754v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-semantic-textual-similarity-from |
Repo | https://github.com/nickyeolk/info_retrieve |
Framework | tf |
Adversarial Domain Adaptation for Duplicate Question Detection
Title | Adversarial Domain Adaptation for Duplicate Question Detection |
Authors | Darsh J Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, Preslav Nakov |
Abstract | We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. As finding and annotating such potential duplicates manually is very tedious and costly, automatic methods based on machine learning are a viable alternative. However, many forums do not have annotated data, i.e., questions labeled by experts as duplicates, and thus a promising solution is to use domain adaptation from another forum that has such annotations. Here we focus on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard. Our experiments with StackExchange data show an average improvement of 5.6% over the best baseline across multiple pairs of domains. |
Tasks | Domain Adaptation, Question Similarity |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02255v1 |
http://arxiv.org/pdf/1809.02255v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-domain-adaptation-for-duplicate |
Repo | https://github.com/darsh10/qra_code |
Framework | pytorch |
Infinite-Horizon Gaussian Processes
Title | Infinite-Horizon Gaussian Processes |
Authors | Arno Solin, James Hensman, Richard E. Turner |
Abstract | Gaussian processes provide a flexible framework for forecasting, removing noise, and interpreting long temporal datasets. State space modelling (Kalman filtering) enables these non-parametric models to be deployed on long datasets by reducing the complexity to linear in the number of data points. The complexity is still cubic in the state dimension $m$ which is an impediment to practical application. In certain special cases (Gaussian likelihood, regular spacing) the GP posterior will reach a steady posterior state when the data are very long. We leverage this and formulate an inference scheme for GPs with general likelihoods, where inference is based on single-sweep EP (assumed density filtering). The infinite-horizon model tackles the cubic cost in the state dimensionality and reduces the cost in the state dimension $m$ to $\mathcal{O}(m^2)$ per data point. The model is extended to online-learning of hyperparameters. We show examples for large finite-length modelling problems, and present how the method runs in real-time on a smartphone on a continuous data stream updated at 100~Hz. |
Tasks | Gaussian Processes |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06588v1 |
http://arxiv.org/pdf/1811.06588v1.pdf | |
PWC | https://paperswithcode.com/paper/infinite-horizon-gaussian-processes |
Repo | https://github.com/AaltoML/IHGP |
Framework | none |
Structure-Infused Copy Mechanisms for Abstractive Summarization
Title | Structure-Infused Copy Mechanisms for Abstractive Summarization |
Authors | Kaiqiang Song, Lin Zhao, Fei Liu |
Abstract | Seq2seq learning has produced promising results on summarization. However, in many cases, system summaries still struggle to keep the meaning of the original intact. They may miss out important words or relations that play critical roles in the syntactic structure of source sentences. In this paper, we present structure-infused copy mechanisms to facilitate copying important words and relations from the source sentence to summary sentence. The approach naturally combines source dependency structure with the copy mechanism of an abstractive sentence summarizer. Experimental results demonstrate the effectiveness of incorporating source-side syntactic information in the system, and our proposed approach compares favorably to state-of-the-art methods. |
Tasks | Abstractive Text Summarization |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05658v2 |
http://arxiv.org/pdf/1806.05658v2.pdf | |
PWC | https://paperswithcode.com/paper/structure-infused-copy-mechanisms-for |
Repo | https://github.com/KaiQiangSong/struct_infused_summ |
Framework | none |
A Comparative Measurement Study of Deep Learning as a Service Framework
Title | A Comparative Measurement Study of Deep Learning as a Service Framework |
Authors | Yanzhao Wu, Ling Liu, Calton Pu, Wenqi Cao, Semih Sahin, Wenqi Wei, Qi Zhang |
Abstract | Big data powered Deep Learning (DL) and its applications have blossomed in recent years, fueled by three technological trends: a large amount of digitized data openly accessible, a growing number of DL software frameworks in open source and commercial markets, and a selection of affordable parallel computing hardware devices. However, no single DL framework, to date, dominates in terms of performance and accuracy even for baseline classification tasks on standard datasets, making the selection of a DL framework an overwhelming task. This paper takes a holistic approach to conduct empirical comparison and analysis of four representative DL frameworks with three unique contributions. First, given a selection of CPU-GPU configurations, we show that for a specific DL framework, different configurations of its hyper-parameters may have a significant impact on both performance and accuracy of DL applications. Second, to the best of our knowledge, this study is the first to identify the opportunities for improving the training time performance and the accuracy of DL frameworks by configuring parallel computing libraries and tuning individual and multiple hyper-parameters. Third, we also conduct a comparative measurement study on the resource consumption patterns of four DL frameworks and their performance and accuracy implications, including CPU and memory usage, and their correlations to varying settings of hyper-parameters under different configuration combinations of hardware, parallel computing libraries. We argue that this measurement study provides in-depth empirical comparison and analysis of four representative DL frameworks, and offers practical guidance for service providers to deploying and delivering DL as a Service (DLaaS) and for application developers and DLaaS consumers to select the right DL frameworks for the right DL workloads. |
Tasks | |
Published | 2018-10-29 |
URL | https://arxiv.org/abs/1810.12210v2 |
https://arxiv.org/pdf/1810.12210v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-measurement-study-of-deep |
Repo | https://github.com/git-disl/GTDLBench |
Framework | tf |
Yes, but Did It Work?: Evaluating Variational Inference
Title | Yes, but Did It Work?: Evaluating Variational Inference |
Authors | Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman |
Abstract | While it’s always possible to compute a variational approximation to a posterior distribution, it can be difficult to discover problems with this approximation. We propose two diagnostic algorithms to alleviate this problem. The Pareto-smoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulation-based calibration (VSBC) assesses the average performance of point estimates. |
Tasks | Calibration |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02538v2 |
http://arxiv.org/pdf/1802.02538v2.pdf | |
PWC | https://paperswithcode.com/paper/yes-but-did-it-work-evaluating-variational |
Repo | https://github.com/yao-yl/Evaluating-Variational-Inference |
Framework | none |
Disfluency Detection using Auto-Correlational Neural Networks
Title | Disfluency Detection using Auto-Correlational Neural Networks |
Authors | Paria Jamshid Lou, Peter Anderson, Mark Johnson |
Abstract | In recent years, the natural language processing community has moved away from task-specific feature engineering, i.e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves. However, state-of-the-art approaches to disfluency detection in spontaneous speech transcripts currently still depend on an array of hand-crafted features, and other representations derived from the output of pre-existing systems such as language models or dependency parsers. As an alternative, this paper proposes a simple yet effective model for automatic disfluency detection, called an auto-correlational neural network (ACNN). The model uses a convolutional neural network (CNN) and augments it with a new auto-correlation operator at the lowest layer that can capture the kinds of “rough copy” dependencies that are characteristic of repair disfluencies in speech. In experiments, the ACNN model outperforms the baseline CNN on a disfluency detection task with a 5% increase in f-score, which is close to the previous best result on this task. |
Tasks | Feature Engineering |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09092v2 |
http://arxiv.org/pdf/1808.09092v2.pdf | |
PWC | https://paperswithcode.com/paper/disfluency-detection-using-auto-correlational |
Repo | https://github.com/pariajm/Deep-Disfluency-Detection-Model |
Framework | tf |