February 1, 2020

3661 words 18 mins read

Paper Group AWR 335

Paper Group AWR 335

Loss Landscape Sightseeing with Multi-Point Optimization. Cross-Domain Car Detection Using Unsupervised Image-to-Image Translation: From Day to Night. Estimating Pedestrian Moving State Based on Single 2D Body Pose. Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation. DMM-Net: Differentiable Mask-Matching Net …

Loss Landscape Sightseeing with Multi-Point Optimization

Title Loss Landscape Sightseeing with Multi-Point Optimization
Authors Ivan Skorokhodov, Mikhail Burtsev
Abstract We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisingly diverse and intricate in terms of landscape patterns it contains, and 2) adding batch normalization makes it more smooth. Source code to reproduce all the reported results is available on GitHub: https://github.com/universome/loss-patterns.
Tasks
Published 2019-10-09
URL https://arxiv.org/abs/1910.03867v2
PDF https://arxiv.org/pdf/1910.03867v2.pdf
PWC https://paperswithcode.com/paper/loss-surface-sightseeing-by-multi-point
Repo https://github.com/universome/loss-patterns
Framework pytorch

Cross-Domain Car Detection Using Unsupervised Image-to-Image Translation: From Day to Night

Title Cross-Domain Car Detection Using Unsupervised Image-to-Image Translation: From Day to Night
Authors Vinicius F. Arruda, Thiago M. Paixão, Rodrigo F. Berriel, Alberto F. De Souza, Claudine Badue, Nicu Sebe, Thiago Oliveira-Santos
Abstract Deep learning techniques have enabled the emergence of state-of-the-art models to address object detection tasks. However, these techniques are data-driven, delegating the accuracy to the training dataset which must resemble the images in the target task. The acquisition of a dataset involves annotating images, an arduous and expensive process, generally requiring time and manual effort. Thus, a challenging scenario arises when the target domain of application has no annotated dataset available, making tasks in such situation to lean on a training dataset of a different domain. Sharing this issue, object detection is a vital task for autonomous vehicles where the large amount of driving scenarios yields several domains of application requiring annotated data for the training process. In this work, a method for training a car detection system with annotated data from a source domain (day images) without requiring the image annotations of the target domain (night images) is presented. For that, a model based on Generative Adversarial Networks (GANs) is explored to enable the generation of an artificial dataset with its respective annotations. The artificial dataset (fake dataset) is created translating images from day-time domain to night-time domain. The fake dataset, which comprises annotated images of only the target domain (night images), is then used to train the car detector model. Experimental results showed that the proposed method achieved significant and consistent improvements, including the increasing by more than 10% of the detection performance when compared to the training with only the available annotated data (i.e., day images).
Tasks Autonomous Vehicles, Image-to-Image Translation, Object Detection, Unsupervised Image-To-Image Translation
Published 2019-07-19
URL https://arxiv.org/abs/1907.08719v1
PDF https://arxiv.org/pdf/1907.08719v1.pdf
PWC https://paperswithcode.com/paper/cross-domain-car-detection-using-unsupervised
Repo https://github.com/LCAD-UFES/publications-arruda-ijcnn-2019
Framework tf

Estimating Pedestrian Moving State Based on Single 2D Body Pose

Title Estimating Pedestrian Moving State Based on Single 2D Body Pose
Authors Zixing Wang, Nikolaos Papanikolopoulos
Abstract The Crossing or Not-Crossing (C/NC) problem is important to autonomous vehicles (AVs) for safe vehicle/pedestrian interactions. However, this problem setup often ignores pedestrians walking along the direction of the vehicles’ movement (LONG). To enhance the AVs’ awareness of pedestrians behavior, we make the first step towards extending the C/NC to the C/NC/LONG problem and recognize them based on single body pose. In contrast, previous C/NC state classifiers depend on multiple poses or contextual information. Our proposed shallow neural network classifier aims to recognize these three states swiftly. We tested it on the JAAD dataset and reported an average 81.23% accuracy. Furthermore, this model can be integrated with different sensors and algorithms that provide 2D pedestrian body pose so that it is able to function across multiple light and weather conditions.
Tasks Autonomous Vehicles
Published 2019-07-09
URL https://arxiv.org/abs/1907.04361v3
PDF https://arxiv.org/pdf/1907.04361v3.pdf
PWC https://paperswithcode.com/paper/fast-estimating-pedestrian-moving-state-based
Repo https://github.com/zxwang96/Single_Pose_CNC
Framework pytorch

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Title Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation
Authors Qiang Zhou, Zilong Huang, Lichao Huang, Yongchao Gong, Han Shen, Chang Huang, Wenyu Liu, Xinggang Wang
Abstract Video object segmentation (VOS) aims at pixel-level object tracking given only the annotations in the first frame. Due to the large visual variations of objects in video and the lack of training samples, it remains a difficult task despite the upsurging development of deep learning. Toward solving the VOS problem, we bring in several new insights by the proposed unified framework consisting of object proposal, tracking and segmentation components. The object proposal network transfers objectness information as generic knowledge into VOS; the tracking network identifies the target object from the proposals; and the segmentation network is performed based on the tracking results with a novel dynamic-reference based model adaptation scheme. Extensive experiments have been conducted on the DAVIS’17 dataset and the YouTube-VOS dataset, our method achieves the state-of-the-art performance on several video object segmentation benchmarks. We make the code publicly available at https://github.com/sydney0zq/PTSNet.
Tasks Object Tracking, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-07-02
URL https://arxiv.org/abs/1907.01203v2
PDF https://arxiv.org/pdf/1907.01203v2.pdf
PWC https://paperswithcode.com/paper/proposal-tracking-and-segmentation-pts-a
Repo https://github.com/sydney0zq/PTSNet
Framework pytorch

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

Title DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation
Authors Xiaohui Zeng, Renjie Liao, Li Gu, Yuwen Xiong, Sanja Fidler, Raquel Urtasun
Abstract In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals at one time step as a linear assignment problem where the cost matrix is predicted by a CNN. We propose a differentiable matching layer by unrolling a projected gradient descent algorithm in which the projection exploits the Dykstra’s algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimum. In practice, it performs similarly to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a refinement head is leveraged to improve the quality of the matched mask. Our DMM-Net achieves competitive results on the largest video object segmentation dataset YouTube-VOS. On DAVIS 2017, DMM-Net achieves the best performance without online learning on the first frames. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our matching layer is very simple to implement; we attach the PyTorch code ($<50$ lines) in the supplementary material. Our code is released at https://github.com/ZENGXH/DMM_Net.
Tasks Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-09-27
URL https://arxiv.org/abs/1909.12471v1
PDF https://arxiv.org/pdf/1909.12471v1.pdf
PWC https://paperswithcode.com/paper/dmm-net-differentiable-mask-matching-network
Repo https://github.com/ZENGXH/DMM_Net
Framework pytorch

A Capsule Network for Recommendation and Explaining What You Like and Dislike

Title A Capsule Network for Recommendation and Explaining What You Like and Dislike
Authors Chenliang Li, Cong Quan, Li Peng, Yunwei Qi, Yuming Deng, Libing Wu
Abstract User reviews contain rich semantics towards the preference of users to features of items. Recently, many deep learning based solutions have been proposed by exploiting reviews for recommendation. The attention mechanism is mainly adopted in these works to identify words or aspects that are important for rating prediction. However, it is still hard to understand whether a user likes or dislikes an aspect of an item according to what viewpoint the user holds and to what extent, without examining the review details. Here, we consider a pair of a viewpoint held by a user and an aspect of an item as a logic unit. Reasoning a rating behavior by discovering the informative logic units from the reviews and resolving their corresponding sentiments could enable a better rating prediction with explanation. To this end, in this paper, we propose a capsule network based model for rating prediction with user reviews, named CARP. For each user-item pair, CARP is devised to extract the informative logic units from the reviews and infer their corresponding sentiments. The model firstly extracts the viewpoints and aspects from the user and item review documents respectively. Then we derive the representation of each logic unit based on its constituent viewpoint and aspect. A sentiment capsule architecture with a novel Routing by Bi-Agreement mechanism is proposed to identify the informative logic unit and the sentiment based representations in user-item level for rating prediction. Extensive experiments are conducted over seven real-world datasets with diverse characteristics. Our results demonstrate that the proposed CARP obtains substantial performance gain over recently proposed state-of-the-art models in terms of prediction accuracy. Further analysis shows that our model can successfully discover the interpretable reasons at a finer level of granularity.
Tasks
Published 2019-07-01
URL https://arxiv.org/abs/1907.00687v1
PDF https://arxiv.org/pdf/1907.00687v1.pdf
PWC https://paperswithcode.com/paper/a-capsule-network-for-recommendation-and
Repo https://github.com/WHUIR/CARP
Framework tf

Deep Predictive Motion Tracking in Magnetic Resonance Imaging: Application to Fetal Imaging

Title Deep Predictive Motion Tracking in Magnetic Resonance Imaging: Application to Fetal Imaging
Authors Ayush Singh, Seyed Sadegh Mohseni Salehi, Ali Gholipour
Abstract Fetal magnetic resonance imaging (MRI) is challenged by uncontrollable, large, and irregular fetal movements. Fetal MRI is performed in a fully interactive manner in which a technologist monitors motion to prescribe slices in right angles with respect to the anatomy of interest. Current practice involves repeated acquisitions to ensure diagnostic-quality images are acquired; and the scans are retrospectively registered slice-by-slice to reconstruct 3D images. Nonetheless, manual monitoring of 3D fetal motion based on displayed 2D slices and navigation at the level of stacks-of-slices (instead of slices) is sub-optimal and inefficient. The current process is highly operator-dependent, requires extensive training, and significantly increases the length of fetal MRI scans which makes them difficult for pregnant women, and costly. With that motivation, we presented a new real-time image-based motion tracking technique in MRI using deep learning that can significantly improve state of the art. Through a combination of spatial and temporal encoder-decoder networks, our system learns to predict 3D pose of the fetal head based on dynamics of motion inferred directly from sequences of acquired slices. Compared to recent works that estimate static 3D pose of the subject from slices, our method learns to predict dynamics of 3D motion. We compared our trained network on held-out test sets (including data with different characteristics, e.g. different age ranges, and motion trajectories recorded from volunteer subjects) with networks designed for estimation as well as methods adopted to make predictions. The results of all estimation and prediction tasks show that we achieved reliable motion tracking in fetal MRI. This technique can be augmented with deep learning based fast anatomy detection, segmentation, and image registration techniques to build real-time motion tracking and navigation systems.
Tasks 3D Object Reconstruction, Image Registration, Motion Compensation, Motion Forecasting
Published 2019-09-25
URL https://arxiv.org/abs/1909.11625v1
PDF https://arxiv.org/pdf/1909.11625v1.pdf
PWC https://paperswithcode.com/paper/deep-predictive-motion-tracking-in-magnetic
Repo https://github.com/singhay/DeepPredictiveMotionTracking
Framework none

FastDVDnet: Towards Real-Time Video Denoising Without Explicit Motion Estimation

Title FastDVDnet: Towards Real-Time Video Denoising Without Explicit Motion Estimation
Authors Matias Tassano, Julie Delon, Thomas Veit
Abstract In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
Tasks Denoising, Motion Compensation, Motion Estimation, Video Denoising
Published 2019-07-01
URL https://arxiv.org/abs/1907.01361v1
PDF https://arxiv.org/pdf/1907.01361v1.pdf
PWC https://paperswithcode.com/paper/fastdvdnet-towards-real-time-video-denoising
Repo https://github.com/m-tassano/fastdvdnet
Framework pytorch

MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video

Title MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video
Authors Qunliang Xing, Zhenyu Guan, Mai Xu, Ren Yang, Tie Liu, Zulin Wang
Abstract The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, not considering the similarity between consecutive frames. Since heavy fluctuation exists across compressed video frames as investigated in this paper, frame similarity can be utilized for quality enhancement of low-quality frames given their neighboring high-quality frames. This task is Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as the first attempt in this direction. In our approach, we firstly develop a Bidirectional Long Short-Term Memory (BiLSTM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are the input. In MF-CNN, motion between the non-PQF and PQFs is compensated by a motion compensation subnet. Subsequently, a quality enhancement subnet fuses the non-PQF and compensated PQFs, and then reduces the compression artifacts of the non-PQF. Also, PQF quality is enhanced in the same way. Finally, experiments validate the effectiveness and generalization ability of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code of our MFQE approach is available at https://github.com/RyanXingQL/MFQEv2.0.git.
Tasks Motion Compensation
Published 2019-02-26
URL https://arxiv.org/abs/1902.09707v4
PDF https://arxiv.org/pdf/1902.09707v4.pdf
PWC https://paperswithcode.com/paper/mfqe-20-a-new-approach-for-multi-frame
Repo https://github.com/RyanXingQL/MFQEv2.0
Framework tf

A Machine Learning Benchmark for Facies Classification

Title A Machine Learning Benchmark for Facies Classification
Authors Yazeed Alaudah, Patrycja Michalowicz, Motaz Alfarraj, Ghassan AlRegib
Abstract The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits. In addition, it is common for papers that apply machine learning for facies classification to not contain quantitative results, and rather rely solely on visual inspection of the results. All of these practices have lead to subjective results and have greatly hindered the ability to compare different machine learning models against each other and understand the advantages and disadvantages of each approach. To address these issues, we open-source a fully-annotated 3D geological model of the Netherlands F3 Block. This model is based on the study of the 3D seismic data in addition to 26 well logs, and is grounded on the careful study of the geology of the region. Furthermore, we propose two baseline models for facies classification based on a deconvolution network architecture and make their codes publicly available. Finally, we propose a scheme for evaluating different models on this dataset, and we share the results of our baseline models. In addition to making the dataset and the code publicly available, this work helps advance research in this area by creating an objective benchmark for comparing the results of different machine learning approaches for facies classification.
Tasks Facies Classification, Seismic Interpretation
Published 2019-01-12
URL http://arxiv.org/abs/1901.07659v2
PDF http://arxiv.org/pdf/1901.07659v2.pdf
PWC https://paperswithcode.com/paper/a-machine-learning-benchmark-for-facies
Repo https://github.com/yalaudah/facies_classification_benchmark
Framework pytorch

DEEP-BO for Hyperparameter Optimization of Deep Networks

Title DEEP-BO for Hyperparameter Optimization of Deep Networks
Authors Hyunghun Cho, Yongjin Kim, Eunjung Lee, Daeyoung Choi, Yongjae Lee, Wonjong Rhee
Abstract The performance of deep neural networks (DNN) is very sensitive to the particular choice of hyper-parameters. To make it worse, the shape of the learning curve can be significantly affected when a technique like batchnorm is used. As a result, hyperparameter optimization of deep networks can be much more challenging than traditional machine learning models. In this work, we start from well known Bayesian Optimization solutions and provide enhancement strategies specifically designed for hyperparameter optimization of deep networks. The resulting algorithm is named as DEEP-BO (Diversified, Early-termination-Enabled, and Parallel Bayesian Optimization). When evaluated over six DNN benchmarks, DEEP-BO easily outperforms or shows comparable performance with some of the well-known solutions including GP-Hedge, Hyperband, BOHB, Median Stopping Rule, and Learning Curve Extrapolation. The code used is made publicly available at https://github.com/snu-adsl/DEEP-BO.
Tasks Hyperparameter Optimization
Published 2019-05-23
URL https://arxiv.org/abs/1905.09680v1
PDF https://arxiv.org/pdf/1905.09680v1.pdf
PWC https://paperswithcode.com/paper/deep-bo-for-hyperparameter-optimization-of-1
Repo https://github.com/snu-adsl/DEEP-BO
Framework tf

Quantifying contribution and propagation of error from computational steps, algorithms and hyperparameter choices in image classification pipelines

Title Quantifying contribution and propagation of error from computational steps, algorithms and hyperparameter choices in image classification pipelines
Authors Aritra Chowdhury, Malik Magdon-Ismail, Bulent Yener
Abstract Data science relies on pipelines that are organized in the form of interdependent computational steps. Each step consists of various candidate algorithms that maybe used for performing a particular function. Each algorithm consists of several hyperparameters. Algorithms and hyperparameters must be optimized as a whole to produce the best performance. Typical machine learning pipelines consist of complex algorithms in each of the steps. Not only is the selection process combinatorial, but it is also important to interpret and understand the pipelines. We propose a method to quantify the importance of different components in the pipeline, by computing an error contribution relative to an agnostic choice of computational steps, algorithms and hyperparameters. We also propose a methodology to quantify the propagation of error from individual components of the pipeline with the help of a naive set of benchmark algorithms not involved in the pipeline. We demonstrate our methodology on image classification pipelines. The agnostic and naive methodologies quantify the error contribution and propagation respectively from the computational steps, algorithms and hyperparameters in the image classification pipeline. We show that algorithm selection and hyperparameter optimization methods like grid search, random search and Bayesian optimization can be used to quantify the error contribution and propagation, and that random search is able to quantify them more accurately than Bayesian optimization. This methodology can be used by domain experts to understand machine learning and data analysis pipelines in terms of their individual components, which can help in prioritizing different components of the pipeline.
Tasks Hyperparameter Optimization, Image Classification
Published 2019-02-21
URL http://arxiv.org/abs/1903.00405v1
PDF http://arxiv.org/pdf/1903.00405v1.pdf
PWC https://paperswithcode.com/paper/quantifying-contribution-and-propagation-of
Repo https://github.com/AriChow/error_propagation
Framework none

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Title VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
Authors Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville
Abstract Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.
Tasks Embodied Question Answering, Question Answering, Scene Understanding, Visual Question Answering
Published 2019-08-14
URL https://arxiv.org/abs/1908.04950v1
PDF https://arxiv.org/pdf/1908.04950v1.pdf
PWC https://paperswithcode.com/paper/videonavqa-bridging-the-gap-between-visual
Repo https://github.com/catalina17/VideoNavQA
Framework pytorch

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Title Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
Authors Christoph Alt, Marc Hübner, Leonhard Hennig
Abstract Distantly supervised relation extraction is widely used to extract relational facts from text, but suffers from noisy labels. Current relation extraction methods try to alleviate the noise by multi-instance learning and by providing supporting linguistic and contextual information to more efficiently guide the relation classification. While achieving state-of-the-art results, we observed these models to be biased towards recognizing a limited set of relations with high precision, while ignoring those in the long tail. To address this gap, we utilize a pre-trained language model, the OpenAI Generative Pre-trained Transformer (GPT) [Radford et al., 2018]. The GPT and similar models have been shown to capture semantic and syntactic features, and also a notable amount of “common-sense” knowledge, which we hypothesize are important features for recognizing a more diverse set of relations. By extending the GPT to the distantly supervised setting, and fine-tuning it on the NYT10 dataset, we show that it predicts a larger set of distinct relation types with high confidence. Manual and automated evaluation of our model shows that it achieves a state-of-the-art AUC score of 0.422 on the NYT10 dataset, and performs especially well at higher recall levels.
Tasks Common Sense Reasoning, Language Modelling, Relation Classification, Relation Extraction
Published 2019-06-19
URL https://arxiv.org/abs/1906.08646v1
PDF https://arxiv.org/pdf/1906.08646v1.pdf
PWC https://paperswithcode.com/paper/fine-tuning-pre-trained-transformer-language-1
Repo https://github.com/DFKI-NLP/DISTRE
Framework pytorch

Attentional Encoder Network for Targeted Sentiment Classification

Title Attentional Encoder Network for Targeted Sentiment Classification
Authors Youwei Song, Jiahai Wang, Tao Jiang, Zhiyue Liu, Yanghui Rao
Abstract Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words with RNN and attention. However, RNNs are difficult to parallelize and truncated backpropagation through time brings difficulty in remembering long-term patterns. To address this issue, this paper proposes an Attentional Encoder Network (AEN) which eschews recurrence and employs attention based encoders for the modeling between context and target. We raise the label unreliability issue and introduce label smoothing regularization. We also apply pre-trained BERT to this task and obtain new state-of-the-art results. Experiments and analysis demonstrate the effectiveness and lightweight of our model.
Tasks Aspect-Based Sentiment Analysis, Sentiment Analysis
Published 2019-02-25
URL http://arxiv.org/abs/1902.09314v2
PDF http://arxiv.org/pdf/1902.09314v2.pdf
PWC https://paperswithcode.com/paper/attentional-encoder-network-for-targeted
Repo https://github.com/songyouwei/ABSA-PyTorch
Framework pytorch
comments powered by Disqus