February 1, 2020

2985 words 15 mins read

Paper Group AWR 115

HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs. Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild. Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT. SCOPS: Self-Supervised Co-Part Segmentation. Interactive Matching Network for Multi-Turn Response Select …

HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs


Title	HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs
Authors	Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri
Abstract	We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy.
Tasks
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04120v2
PDF	http://arxiv.org/pdf/1903.04120v2.pdf
PWC	https://paperswithcode.com/paper/hetconv-heterogeneous-kernel-based
Repo	https://github.com/irvinxav/Efficient-HetConv-Heterogeneous-Kernel-Based-Convolutions
Framework	pytorch

Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild


Title	Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
Authors	Yu Rong, Ziwei Liu, Cheng Li, Kaidi Cao, Chen Change Loy
Abstract	Though much progress has been achieved in single-image 3D human recovery, estimating 3D model for in-the-wild images remains a formidable challenge. The reason lies in the fact that obtaining high-quality 3D annotations for in-the-wild images is an extremely hard task that consumes enormous amount of resources and manpower. To tackle this problem, previous methods adopt a hybrid training strategy that exploits multiple heterogeneous types of annotations including 3D and 2D while leaving the efficacy of each annotation not thoroughly investigated. In this work, we aim to perform a comprehensive study on cost and effectiveness trade-off between different annotations. Specifically, we focus on the challenging task of in-the-wild 3D human recovery from single images when paired 3D annotations are not fully available. Through extensive experiments, we obtain several observations: 1) 3D annotations are efficient, whereas traditional 2D annotations such as 2D keypoints and body part segmentation are less competent in guiding 3D human recovery. 2) Dense Correspondence such as DensePose is effective. When there are no paired in-the-wild 3D annotations available, the model exploiting dense correspondence can achieve 92% of the performance compared to a model trained with paired 3D data. We show that incorporating dense correspondence into in-the-wild 3D human recovery is promising and competitive due to its high efficiency and relatively low annotating cost. Our model trained with dense correspondence can serve as a strong reference for future research.
Tasks
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06442v2
PDF	https://arxiv.org/pdf/1908.06442v2.pdf
PWC	https://paperswithcode.com/paper/delving-deep-into-hybrid-annotations-for-3d
Repo	https://github.com/penincillin/DCT_ICCV-2019
Framework	pytorch

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT


Title	Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
Authors	Han He, Jinho D. Choi
Abstract	This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and simplify the current state-of-the-art approach to enhance its model efficiency. We then evaluate our simplified approaches on those three tasks using token embeddings generated by BERT. 12 datasets in both English and Chinese are used for our experiments. The BERT models outperform the previously best-performing models by 2.5% on average (7.5% for the most significant case). Moreover, an in-depth analysis on the impact of BERT embeddings is provided using self-attention, which helps understanding in this rich yet representation. All models and source codes are available in public so that researchers can improve upon and utilize them to establish strong baselines for the next decade.
Tasks	Part-Of-Speech Tagging, Semantic Parsing
Published	2019-08-14
URL	https://arxiv.org/abs/1908.04943v2
PDF	https://arxiv.org/pdf/1908.04943v2.pdf
PWC	https://paperswithcode.com/paper/establishing-strong-baselines-for-the-new
Repo	https://github.com/emorynlp/bert-2019
Framework	mxnet

SCOPS: Self-Supervised Co-Part Segmentation


Title	SCOPS: Self-Supervised Co-Part Segmentation
Authors	Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz
Abstract	Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations. Existing works on part segmentation is dominated by supervised approaches that rely on large amounts of manual annotations and can not generalize to unseen object categories. We propose a self-supervised deep learning approach for part segmentation, where we devise several loss functions that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances. Extensive experiments on different types of image collections demonstrate that our approach can produce part segments that adhere to object boundaries and also more semantically consistent across object instances compared to existing self-supervised techniques.
Tasks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01298v1
PDF	https://arxiv.org/pdf/1905.01298v1.pdf
PWC	https://paperswithcode.com/paper/scops-self-supervised-co-part-segmentation
Repo	https://github.com/NVlabs/SCOPS
Framework	pytorch

Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots


Title	Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots
Authors	Jia-Chen Gu, Zhen-Hua Ling, Quan Liu
Abstract	In this paper, we propose an interactive matching network (IMN) for the multi-turn response selection task. First, IMN constructs word representations from three aspects to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE), which is capable of encoding sentences hierarchically and generating more descriptive representations by aggregating with an attention mechanism, is designed. Finally, the bidirectional interactions between whole multi-turn contexts and response candidates are calculated to derive the matching information between them. Experiments on four public datasets show that IMN outperforms the baseline models on all metrics, achieving a new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection.
Tasks	Conversational Response Selection
Published	2019-01-07
URL	https://arxiv.org/abs/1901.01824v2
PDF	https://arxiv.org/pdf/1901.01824v2.pdf
PWC	https://paperswithcode.com/paper/interactive-matching-network-for-multi-turn
Repo	https://github.com/JasonForJoy/IMN
Framework	tf

On Metrics to Assess the Transferability of Machine Learning Models in Non-Intrusive Load Monitoring


Title	On Metrics to Assess the Transferability of Machine Learning Models in Non-Intrusive Load Monitoring
Authors	Christoph Klemenjak, Anthony Faustine, Stephen Makonin, Wilfried Elmenreich
Abstract	To assess the performance of load disaggregation algorithms it is common practise to train a candidate algorithm on data from one or multiple households and subsequently apply cross-validation by evaluating the classification and energy estimation performance on unseen portions of the dataset derived from the same households. With an emerging discussion of transferability in Non-Intrusive Load Monitoring (NILM), there is a need for domain-specific metrics to assess the performance of NILM algorithms on new test scenarios being unseen buildings. In this paper, we discuss several metrics to assess the generalisation ability of NILM algorithms. These metrics target different aspects of performance evaluation in NILM and are meant to complement the traditional performance evaluation approach. We demonstrate how our metrics can be utilised to evaluate NILM algorithms by means of two case studies. We conduct our studies on several energy consumption datasets and take into consideration five state-of-the-art as well as four baseline NILM solutions. Finally, we formulate research challenges for future work.
Tasks	Non-Intrusive Load Monitoring
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06200v1
PDF	https://arxiv.org/pdf/1912.06200v1.pdf
PWC	https://paperswithcode.com/paper/on-metrics-to-assess-the-transferability-of
Repo	https://github.com/klemenjak/nilm-transferability-metrics
Framework	none

Misleading Metadata Detection on YouTube


Title	Misleading Metadata Detection on YouTube
Authors	Priyank Palod, Ayush Patwari, Sudhanshu Bahety, Saurabh Bagchi, Pawan Goyal
Abstract	YouTube is the leading social media platform for sharing videos. As a result, it is plagued with misleading content that includes staged videos presented as real footages from an incident, videos with misrepresented context and videos where audio/video content is morphed. We tackle the problem of detecting such misleading videos as a supervised classification task. We develop UCNet - a deep network to detect fake videos and perform our experiments on two datasets - VAVD created by us and publicly available FVC [8]. We achieve a macro averaged F-score of 0.82 while training and testing on a 70:30 split of FVC, while the baseline model scores 0.36. We find that the proposed model generalizes well when trained on one dataset and tested on the other.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08759v1
PDF	http://arxiv.org/pdf/1901.08759v1.pdf
PWC	https://paperswithcode.com/paper/misleading-metadata-detection-on-youtube
Repo	https://github.com/ucnet01/UCNet_Implementation
Framework	pytorch

Seamless Scene Segmentation


Title	Seamless Scene Segmentation
Authors	Lorenzo Porzi, Samuel Rota Bulò, Aleksander Colovic, Peter Kontschieder
Abstract	In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results. Our goal is to predict consistent semantic segmentation and detection results by means of a panoptic output format, going beyond the simple combination of independently trained segmentation and detection models. The proposed architecture takes advantage of a novel segmentation head that seamlessly integrates multi-scale features generated by a Feature Pyramid Network with contextual information conveyed by a light-weight DeepLab-like module. As additional contribution we review the panoptic metric and propose an alternative that overcomes its limitations when evaluating non-instance categories. Our proposed network architecture yields state-of-the-art results on three challenging street-level datasets, i.e. Cityscapes, Indian Driving Dataset and Mapillary Vistas.
Tasks	Panoptic Segmentation, Scene Segmentation, Semantic Segmentation
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01220v1
PDF	https://arxiv.org/pdf/1905.01220v1.pdf
PWC	https://paperswithcode.com/paper/seamless-scene-segmentation
Repo	https://github.com/mapillary/seamseg
Framework	pytorch

Simitate: A Hybrid Imitation Learning Benchmark


Title	Simitate: A Hybrid Imitation Learning Benchmark
Authors	Raphael Memmesheimer, Ivanna Mykhalchyshyna, Viktor Seib, Dietrich Paulus
Abstract	We present Simitate — a hybrid benchmarking suite targeting the evaluation of approaches for imitation learning. A dataset containing 1938 sequences where humans perform daily activities in a realistic environment is presented. The dataset is strongly coupled with an integration into a simulator. RGB and depth streams with a resolution of 960$\mathbb{\times}$540 at 30Hz and accurate ground truth poses for the demonstrator’s hand, as well as the object in 6 DOF at 120Hz are provided. Along with our dataset we provide the 3D model of the used environment, labeled object images and pre-trained models. A benchmarking suite that aims at fostering comparability and reproducibility supports the development of imitation learning approaches. Further, we propose and integrate evaluation metrics on assessing the quality of effect and trajectory of the imitation performed in simulation. Simitate is available on our project website: \url{https://agas.uni-koblenz.de/data/simitate/}.
Tasks	Imitation Learning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06002v1
PDF	https://arxiv.org/pdf/1905.06002v1.pdf
PWC	https://paperswithcode.com/paper/simitate-a-hybrid-imitation-learning
Repo	https://github.com/airglow/simitate
Framework	pytorch

TDAM: a Topic-Dependent Attention Model for Sentiment Analysis


Title	TDAM: a Topic-Dependent Attention Model for Sentiment Analysis
Authors	Gabriele Pergola, Lin Gui, Yulan He
Abstract	We propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words’ local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users’ reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training.
Tasks	Multi-Task Learning, Sentiment Analysis
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06435v1
PDF	https://arxiv.org/pdf/1908.06435v1.pdf
PWC	https://paperswithcode.com/paper/tdam-a-topic-dependent-attention-model-for
Repo	https://github.com/gabrer/topic_dependent_attention_model
Framework	none

Learning Calibratable Policies using Programmatic Style-Consistency


Title	Learning Calibratable Policies using Programmatic Style-Consistency
Authors	Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht
Abstract	We study the problem of controllable generation of long-term sequential behaviors. Solutions to this important problem would enable many applications, such as calibrating behaviors of AI agents in games or predicting player trajectories in sports. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are two questions that pose significant challenges when generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated temporal behavior faithfully demonstrates diverse styles? In this paper, we leverage large amounts of raw behavioral data to learn policies that can be calibrated to generate a diverse range of behavior styles (e.g., aggressive versus passive play in sports). Inspired by recent work on leveraging programmatic labeling functions, we present a novel framework that combines imitation learning with data programming to learn style-calibratable policies. Our primary technical contribution is a formal notion of style-consistency as a learning objective, and its integration with conventional imitation learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that our learned policies can be calibrated to generate interesting behavior styles in both domains.
Tasks	Imitation Learning
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01179v2
PDF	https://arxiv.org/pdf/1910.01179v2.pdf
PWC	https://paperswithcode.com/paper/learning-calibratable-policies-using
Repo	https://github.com/ezhan94/calibratable-style-consistency
Framework	pytorch

Densely Connected Search Space for More Flexible Neural Architecture Search


Title	Densely Connected Search Space for More Flexible Neural Architecture Search
Authors	Jiemin Fang, Yuzhu Sun, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
Abstract	Neural architecture search (NAS) has dramatically advanced the development of neural network design. We revisit the search space design in most previous NAS methods and find the number of blocks and the widths of blocks are set manually. However, block counts and block widths determine the network scale (depth and width) and make a great influence on both the accuracy and the model cost (FLOPs/latency). In this paper, we propose to search block counts and block widths by designing a densely connected search space, i.e., DenseNAS. The new search space is represented as a dense super network, which is built upon our designed routing blocks. In the super network, routing blocks are densely connected and we search for the best path between them to derive the final architecture. We further propose a chained cost estimation algorithm to approximate the model cost during the search. Both the accuracy and model cost are optimized in DenseNAS. For experiments on the MobileNetV2-based search space, DenseNAS achieves 75.3% top-1 accuracy on ImageNet with only 361MB FLOPs and 17.9ms latency on a single TITAN-XP. The larger model searched by DenseNAS achieves 76.1% accuracy with only 479M FLOPs. DenseNAS further promotes the ImageNet classification accuracies of ResNet-18, -34 and -50-B by 1.5%, 0.5% and 0.3% with 200M, 600M and 680M FLOPs reduction respectively.
Tasks	Image Classification, Neural Architecture Search
Published	2019-06-23
URL	https://arxiv.org/abs/1906.09607v2
PDF	https://arxiv.org/pdf/1906.09607v2.pdf
PWC	https://paperswithcode.com/paper/densely-connected-search-space-for-more
Repo	https://github.com/JaminFong/DenseNAS
Framework	pytorch

Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions


Title	Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions
Authors	Tim Pearce, Russell Tsuchida, Mohamed Zaki, Alexandra Brintrup, Andy Neely
Abstract	A simple, flexible approach to creating expressive priors in Gaussian process (GP) models makes new kernels from a combination of basic kernels, e.g. summing a periodic and linear kernel can capture seasonal variation with a long term trend. Despite a well-studied link between GPs and Bayesian neural networks (BNNs), the BNN analogue of this has not yet been explored. This paper derives BNN architectures mirroring such kernel combinations. Furthermore, it shows how BNNs can produce periodic kernels, which are often useful in this context. These ideas provide a principled approach to designing BNNs that incorporate prior knowledge about a function. We showcase the practical value of these ideas with illustrative experiments in supervised and reinforcement learning settings.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06076v2
PDF	https://arxiv.org/pdf/1905.06076v2.pdf
PWC	https://paperswithcode.com/paper/expressive-priors-in-bayesian-neural-networks
Repo	https://github.com/TeaPearce/Expressive_Priors_in_BNNs
Framework	tf

Learning Q-network for Active Information Acquisition


Title	Learning Q-network for Active Information Acquisition
Authors	Heejin Jeong, Brent Schlotfeldt, Hamed Hassani, Manfred Morari, Daniel D. Lee, George J. Pappas
Abstract	In this paper, we propose a novel Reinforcement Learning approach for solving the Active Information Acquisition problem, which requires an agent to choose a sequence of actions in order to acquire information about a process of interest using on-board sensors. The classic challenges in the information acquisition problem are the dependence of a planning algorithm on known models and the difficulty of computing information-theoretic cost functions over arbitrary distributions. In contrast, the proposed framework of reinforcement learning does not require any knowledge on models and alleviates the problems during an extended training stage. It results in policies that are efficient to execute online and applicable for real-time control of robotic systems. Furthermore, the state-of-the-art planning methods are typically restricted to short horizons, which may become problematic with local minima. Reinforcement learning naturally handles the issue of planning horizon in information problems as it maximizes a discounted sum of rewards over a long finite or infinite time horizon. We discuss the potential benefits of the proposed framework and compare the performance of the novel algorithm to an existing information acquisition method for multi-target tracking scenarios.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10754v1
PDF	https://arxiv.org/pdf/1910.10754v1.pdf
PWC	https://paperswithcode.com/paper/learning-q-network-for-active-information
Repo	https://github.com/coco66/ttenv
Framework	none

Model-based Behavioral Cloning with Future Image Similarity Learning


Title	Model-based Behavioral Cloning with Future Image Similarity Learning
Authors	Alan Wu, AJ Piergiovanni, Michael S. Ryoo
Abstract	We present a visual imitation learning framework that enables learning of robot action policies solely based on expert samples without any robot trials. Robot exploration and on-policy trials in a real-world environment could often be expensive/dangerous. We present a new approach to address this problem by learning a future scene prediction model solely on a collection of expert trajectories consisting of unlabeled example videos and actions, and by enabling generalized action cloning using future image similarity. The robot learns to visually predict the consequences of taking an action, and obtains the policy by evaluating how similar the predicted future image is to an expert image. We develop a stochastic action-conditioned convolutional autoencoder, and present how we take advantage of future images for robot learning. We conduct experiments in simulated and real-life environments using a ground mobility robot with and without obstacles, and compare our models to multiple baseline methods.
Tasks	Imitation Learning
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03157v1
PDF	https://arxiv.org/pdf/1910.03157v1.pdf
PWC	https://paperswithcode.com/paper/model-based-behavioral-cloning-with-future
Repo	https://github.com/anwu21/future-image-similarity
Framework	pytorch