Paper Group AWR 115
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs. Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild. Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT. SCOPS: Self-Supervised Co-Part Segmentation. Interactive Matching Network for Multi-Turn Response Select …
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs
Title | HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs |
Authors | Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri |
Abstract | We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy. |
Tasks | |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04120v2 |
http://arxiv.org/pdf/1903.04120v2.pdf | |
PWC | https://paperswithcode.com/paper/hetconv-heterogeneous-kernel-based |
Repo | https://github.com/irvinxav/Efficient-HetConv-Heterogeneous-Kernel-Based-Convolutions |
Framework | pytorch |
Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
Title | Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild |
Authors | Yu Rong, Ziwei Liu, Cheng Li, Kaidi Cao, Chen Change Loy |
Abstract | Though much progress has been achieved in single-image 3D human recovery, estimating 3D model for in-the-wild images remains a formidable challenge. The reason lies in the fact that obtaining high-quality 3D annotations for in-the-wild images is an extremely hard task that consumes enormous amount of resources and manpower. To tackle this problem, previous methods adopt a hybrid training strategy that exploits multiple heterogeneous types of annotations including 3D and 2D while leaving the efficacy of each annotation not thoroughly investigated. In this work, we aim to perform a comprehensive study on cost and effectiveness trade-off between different annotations. Specifically, we focus on the challenging task of in-the-wild 3D human recovery from single images when paired 3D annotations are not fully available. Through extensive experiments, we obtain several observations: 1) 3D annotations are efficient, whereas traditional 2D annotations such as 2D keypoints and body part segmentation are less competent in guiding 3D human recovery. 2) Dense Correspondence such as DensePose is effective. When there are no paired in-the-wild 3D annotations available, the model exploiting dense correspondence can achieve 92% of the performance compared to a model trained with paired 3D data. We show that incorporating dense correspondence into in-the-wild 3D human recovery is promising and competitive due to its high efficiency and relatively low annotating cost. Our model trained with dense correspondence can serve as a strong reference for future research. |
Tasks | |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06442v2 |
https://arxiv.org/pdf/1908.06442v2.pdf | |
PWC | https://paperswithcode.com/paper/delving-deep-into-hybrid-annotations-for-3d |
Repo | https://github.com/penincillin/DCT_ICCV-2019 |
Framework | pytorch |
Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
Title | Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT |
Authors | Han He, Jinho D. Choi |
Abstract | This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and simplify the current state-of-the-art approach to enhance its model efficiency. We then evaluate our simplified approaches on those three tasks using token embeddings generated by BERT. 12 datasets in both English and Chinese are used for our experiments. The BERT models outperform the previously best-performing models by 2.5% on average (7.5% for the most significant case). Moreover, an in-depth analysis on the impact of BERT embeddings is provided using self-attention, which helps understanding in this rich yet representation. All models and source codes are available in public so that researchers can improve upon and utilize them to establish strong baselines for the next decade. |
Tasks | Part-Of-Speech Tagging, Semantic Parsing |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.04943v2 |
https://arxiv.org/pdf/1908.04943v2.pdf | |
PWC | https://paperswithcode.com/paper/establishing-strong-baselines-for-the-new |
Repo | https://github.com/emorynlp/bert-2019 |
Framework | mxnet |
SCOPS: Self-Supervised Co-Part Segmentation
Title | SCOPS: Self-Supervised Co-Part Segmentation |
Authors | Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz |
Abstract | Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations. Existing works on part segmentation is dominated by supervised approaches that rely on large amounts of manual annotations and can not generalize to unseen object categories. We propose a self-supervised deep learning approach for part segmentation, where we devise several loss functions that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances. Extensive experiments on different types of image collections demonstrate that our approach can produce part segments that adhere to object boundaries and also more semantically consistent across object instances compared to existing self-supervised techniques. |
Tasks | |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01298v1 |
https://arxiv.org/pdf/1905.01298v1.pdf | |
PWC | https://paperswithcode.com/paper/scops-self-supervised-co-part-segmentation |
Repo | https://github.com/NVlabs/SCOPS |
Framework | pytorch |
Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots
Title | Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots |
Authors | Jia-Chen Gu, Zhen-Hua Ling, Quan Liu |
Abstract | In this paper, we propose an interactive matching network (IMN) for the multi-turn response selection task. First, IMN constructs word representations from three aspects to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE), which is capable of encoding sentences hierarchically and generating more descriptive representations by aggregating with an attention mechanism, is designed. Finally, the bidirectional interactions between whole multi-turn contexts and response candidates are calculated to derive the matching information between them. Experiments on four public datasets show that IMN outperforms the baseline models on all metrics, achieving a new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection. |
Tasks | Conversational Response Selection |
Published | 2019-01-07 |
URL | https://arxiv.org/abs/1901.01824v2 |
https://arxiv.org/pdf/1901.01824v2.pdf | |
PWC | https://paperswithcode.com/paper/interactive-matching-network-for-multi-turn |
Repo | https://github.com/JasonForJoy/IMN |
Framework | tf |
On Metrics to Assess the Transferability of Machine Learning Models in Non-Intrusive Load Monitoring
Title | On Metrics to Assess the Transferability of Machine Learning Models in Non-Intrusive Load Monitoring |
Authors | Christoph Klemenjak, Anthony Faustine, Stephen Makonin, Wilfried Elmenreich |
Abstract | To assess the performance of load disaggregation algorithms it is common practise to train a candidate algorithm on data from one or multiple households and subsequently apply cross-validation by evaluating the classification and energy estimation performance on unseen portions of the dataset derived from the same households. With an emerging discussion of transferability in Non-Intrusive Load Monitoring (NILM), there is a need for domain-specific metrics to assess the performance of NILM algorithms on new test scenarios being unseen buildings. In this paper, we discuss several metrics to assess the generalisation ability of NILM algorithms. These metrics target different aspects of performance evaluation in NILM and are meant to complement the traditional performance evaluation approach. We demonstrate how our metrics can be utilised to evaluate NILM algorithms by means of two case studies. We conduct our studies on several energy consumption datasets and take into consideration five state-of-the-art as well as four baseline NILM solutions. Finally, we formulate research challenges for future work. |
Tasks | Non-Intrusive Load Monitoring |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06200v1 |
https://arxiv.org/pdf/1912.06200v1.pdf | |
PWC | https://paperswithcode.com/paper/on-metrics-to-assess-the-transferability-of |
Repo | https://github.com/klemenjak/nilm-transferability-metrics |
Framework | none |
Misleading Metadata Detection on YouTube
Title | Misleading Metadata Detection on YouTube |
Authors | Priyank Palod, Ayush Patwari, Sudhanshu Bahety, Saurabh Bagchi, Pawan Goyal |
Abstract | YouTube is the leading social media platform for sharing videos. As a result, it is plagued with misleading content that includes staged videos presented as real footages from an incident, videos with misrepresented context and videos where audio/video content is morphed. We tackle the problem of detecting such misleading videos as a supervised classification task. We develop UCNet - a deep network to detect fake videos and perform our experiments on two datasets - VAVD created by us and publicly available FVC [8]. We achieve a macro averaged F-score of 0.82 while training and testing on a 70:30 split of FVC, while the baseline model scores 0.36. We find that the proposed model generalizes well when trained on one dataset and tested on the other. |
Tasks | |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08759v1 |
http://arxiv.org/pdf/1901.08759v1.pdf | |
PWC | https://paperswithcode.com/paper/misleading-metadata-detection-on-youtube |
Repo | https://github.com/ucnet01/UCNet_Implementation |
Framework | pytorch |
Seamless Scene Segmentation
Title | Seamless Scene Segmentation |
Authors | Lorenzo Porzi, Samuel Rota Bulò, Aleksander Colovic, Peter Kontschieder |
Abstract | In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results. Our goal is to predict consistent semantic segmentation and detection results by means of a panoptic output format, going beyond the simple combination of independently trained segmentation and detection models. The proposed architecture takes advantage of a novel segmentation head that seamlessly integrates multi-scale features generated by a Feature Pyramid Network with contextual information conveyed by a light-weight DeepLab-like module. As additional contribution we review the panoptic metric and propose an alternative that overcomes its limitations when evaluating non-instance categories. Our proposed network architecture yields state-of-the-art results on three challenging street-level datasets, i.e. Cityscapes, Indian Driving Dataset and Mapillary Vistas. |
Tasks | Panoptic Segmentation, Scene Segmentation, Semantic Segmentation |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01220v1 |
https://arxiv.org/pdf/1905.01220v1.pdf | |
PWC | https://paperswithcode.com/paper/seamless-scene-segmentation |
Repo | https://github.com/mapillary/seamseg |
Framework | pytorch |
Simitate: A Hybrid Imitation Learning Benchmark
Title | Simitate: A Hybrid Imitation Learning Benchmark |
Authors | Raphael Memmesheimer, Ivanna Mykhalchyshyna, Viktor Seib, Dietrich Paulus |
Abstract | We present Simitate — a hybrid benchmarking suite targeting the evaluation of approaches for imitation learning. A dataset containing 1938 sequences where humans perform daily activities in a realistic environment is presented. The dataset is strongly coupled with an integration into a simulator. RGB and depth streams with a resolution of 960$\mathbb{\times}$540 at 30Hz and accurate ground truth poses for the demonstrator’s hand, as well as the object in 6 DOF at 120Hz are provided. Along with our dataset we provide the 3D model of the used environment, labeled object images and pre-trained models. A benchmarking suite that aims at fostering comparability and reproducibility supports the development of imitation learning approaches. Further, we propose and integrate evaluation metrics on assessing the quality of effect and trajectory of the imitation performed in simulation. Simitate is available on our project website: \url{https://agas.uni-koblenz.de/data/simitate/}. |
Tasks | Imitation Learning |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06002v1 |
https://arxiv.org/pdf/1905.06002v1.pdf | |
PWC | https://paperswithcode.com/paper/simitate-a-hybrid-imitation-learning |
Repo | https://github.com/airglow/simitate |
Framework | pytorch |
TDAM: a Topic-Dependent Attention Model for Sentiment Analysis
Title | TDAM: a Topic-Dependent Attention Model for Sentiment Analysis |
Authors | Gabriele Pergola, Lin Gui, Yulan He |
Abstract | We propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words’ local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users’ reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training. |
Tasks | Multi-Task Learning, Sentiment Analysis |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06435v1 |
https://arxiv.org/pdf/1908.06435v1.pdf | |
PWC | https://paperswithcode.com/paper/tdam-a-topic-dependent-attention-model-for |
Repo | https://github.com/gabrer/topic_dependent_attention_model |
Framework | none |
Learning Calibratable Policies using Programmatic Style-Consistency
Title | Learning Calibratable Policies using Programmatic Style-Consistency |
Authors | Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht |
Abstract | We study the problem of controllable generation of long-term sequential behaviors. Solutions to this important problem would enable many applications, such as calibrating behaviors of AI agents in games or predicting player trajectories in sports. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are two questions that pose significant challenges when generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated temporal behavior faithfully demonstrates diverse styles? In this paper, we leverage large amounts of raw behavioral data to learn policies that can be calibrated to generate a diverse range of behavior styles (e.g., aggressive versus passive play in sports). Inspired by recent work on leveraging programmatic labeling functions, we present a novel framework that combines imitation learning with data programming to learn style-calibratable policies. Our primary technical contribution is a formal notion of style-consistency as a learning objective, and its integration with conventional imitation learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that our learned policies can be calibrated to generate interesting behavior styles in both domains. |
Tasks | Imitation Learning |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01179v2 |
https://arxiv.org/pdf/1910.01179v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-calibratable-policies-using |
Repo | https://github.com/ezhan94/calibratable-style-consistency |
Framework | pytorch |
Densely Connected Search Space for More Flexible Neural Architecture Search
Title | Densely Connected Search Space for More Flexible Neural Architecture Search |
Authors | Jiemin Fang, Yuzhu Sun, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang |
Abstract | Neural architecture search (NAS) has dramatically advanced the development of neural network design. We revisit the search space design in most previous NAS methods and find the number of blocks and the widths of blocks are set manually. However, block counts and block widths determine the network scale (depth and width) and make a great influence on both the accuracy and the model cost (FLOPs/latency). In this paper, we propose to search block counts and block widths by designing a densely connected search space, i.e., DenseNAS. The new search space is represented as a dense super network, which is built upon our designed routing blocks. In the super network, routing blocks are densely connected and we search for the best path between them to derive the final architecture. We further propose a chained cost estimation algorithm to approximate the model cost during the search. Both the accuracy and model cost are optimized in DenseNAS. For experiments on the MobileNetV2-based search space, DenseNAS achieves 75.3% top-1 accuracy on ImageNet with only 361MB FLOPs and 17.9ms latency on a single TITAN-XP. The larger model searched by DenseNAS achieves 76.1% accuracy with only 479M FLOPs. DenseNAS further promotes the ImageNet classification accuracies of ResNet-18, -34 and -50-B by 1.5%, 0.5% and 0.3% with 200M, 600M and 680M FLOPs reduction respectively. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-06-23 |
URL | https://arxiv.org/abs/1906.09607v2 |
https://arxiv.org/pdf/1906.09607v2.pdf | |
PWC | https://paperswithcode.com/paper/densely-connected-search-space-for-more |
Repo | https://github.com/JaminFong/DenseNAS |
Framework | pytorch |
Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions
Title | Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions |
Authors | Tim Pearce, Russell Tsuchida, Mohamed Zaki, Alexandra Brintrup, Andy Neely |
Abstract | A simple, flexible approach to creating expressive priors in Gaussian process (GP) models makes new kernels from a combination of basic kernels, e.g. summing a periodic and linear kernel can capture seasonal variation with a long term trend. Despite a well-studied link between GPs and Bayesian neural networks (BNNs), the BNN analogue of this has not yet been explored. This paper derives BNN architectures mirroring such kernel combinations. Furthermore, it shows how BNNs can produce periodic kernels, which are often useful in this context. These ideas provide a principled approach to designing BNNs that incorporate prior knowledge about a function. We showcase the practical value of these ideas with illustrative experiments in supervised and reinforcement learning settings. |
Tasks | |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06076v2 |
https://arxiv.org/pdf/1905.06076v2.pdf | |
PWC | https://paperswithcode.com/paper/expressive-priors-in-bayesian-neural-networks |
Repo | https://github.com/TeaPearce/Expressive_Priors_in_BNNs |
Framework | tf |
Learning Q-network for Active Information Acquisition
Title | Learning Q-network for Active Information Acquisition |
Authors | Heejin Jeong, Brent Schlotfeldt, Hamed Hassani, Manfred Morari, Daniel D. Lee, George J. Pappas |
Abstract | In this paper, we propose a novel Reinforcement Learning approach for solving the Active Information Acquisition problem, which requires an agent to choose a sequence of actions in order to acquire information about a process of interest using on-board sensors. The classic challenges in the information acquisition problem are the dependence of a planning algorithm on known models and the difficulty of computing information-theoretic cost functions over arbitrary distributions. In contrast, the proposed framework of reinforcement learning does not require any knowledge on models and alleviates the problems during an extended training stage. It results in policies that are efficient to execute online and applicable for real-time control of robotic systems. Furthermore, the state-of-the-art planning methods are typically restricted to short horizons, which may become problematic with local minima. Reinforcement learning naturally handles the issue of planning horizon in information problems as it maximizes a discounted sum of rewards over a long finite or infinite time horizon. We discuss the potential benefits of the proposed framework and compare the performance of the novel algorithm to an existing information acquisition method for multi-target tracking scenarios. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10754v1 |
https://arxiv.org/pdf/1910.10754v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-q-network-for-active-information |
Repo | https://github.com/coco66/ttenv |
Framework | none |
Model-based Behavioral Cloning with Future Image Similarity Learning
Title | Model-based Behavioral Cloning with Future Image Similarity Learning |
Authors | Alan Wu, AJ Piergiovanni, Michael S. Ryoo |
Abstract | We present a visual imitation learning framework that enables learning of robot action policies solely based on expert samples without any robot trials. Robot exploration and on-policy trials in a real-world environment could often be expensive/dangerous. We present a new approach to address this problem by learning a future scene prediction model solely on a collection of expert trajectories consisting of unlabeled example videos and actions, and by enabling generalized action cloning using future image similarity. The robot learns to visually predict the consequences of taking an action, and obtains the policy by evaluating how similar the predicted future image is to an expert image. We develop a stochastic action-conditioned convolutional autoencoder, and present how we take advantage of future images for robot learning. We conduct experiments in simulated and real-life environments using a ground mobility robot with and without obstacles, and compare our models to multiple baseline methods. |
Tasks | Imitation Learning |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03157v1 |
https://arxiv.org/pdf/1910.03157v1.pdf | |
PWC | https://paperswithcode.com/paper/model-based-behavioral-cloning-with-future |
Repo | https://github.com/anwu21/future-image-similarity |
Framework | pytorch |