January 31, 2020

3075 words 15 mins read

Paper Group AWR 428

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data. Meaning guided video captioning. Topology of Learning in Artificial Neural Networks. Real-world Underwater Enhancement: Challenges, Benchmarks, and Solutions. Cognitively-inspired Agent-based Service Composition for Mobile & Pervasive Computing. Global-to-local Memory Pointer Net …

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data


Title	Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data
Authors	Pau Panareda Busto, Juergen Gall
Abstract	The estimation of viewpoints and keypoints effectively enhance object detection methods by extracting valuable traits of the object instances. While the output of both processes differ, i.e., angles vs. list of characteristic points, they indeed share the same focus on how the object is placed in the scene, inducing that there is a certain level of correlation between them. Therefore, we propose a convolutional neural network that jointly computes the viewpoint and keypoints for different object categories. By training both tasks together, each task improves the accuracy of the other. Since the labelling of object keypoints is very time consuming for human annotators, we also introduce a new synthetic dataset with automatically generated viewpoint and keypoints annotations. Our proposed network can also be trained on datasets that contain viewpoint and keypoints annotations or only one of them. The experiments show that the proposed approach successfully exploits this implicit correlation between the tasks and outperforms previous techniques that are trained independently.
Tasks	Object Detection
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06274v1
PDF	https://arxiv.org/pdf/1912.06274v1.pdf
PWC	https://paperswithcode.com/paper/joint-viewpoint-and-keypoint-estimation-with
Repo	https://github.com/Heliot7/viewpoint-cnn-syn
Framework	none

Meaning guided video captioning


Title	Meaning guided video captioning
Authors	Rushi J. Babariya, Toru Tamaki
Abstract	Current video captioning approaches often suffer from problems of missing objects in the video to be described, while generating captions semantically similar with ground truth sentences. In this paper, we propose a new approach to video captioning that can describe objects detected by object detection, and generate captions having similar meaning with correct captions. Our model relies on S2VT, a sequence-to-sequence model for video captioning. Given a sequence of video frames, the encoding RNN takes a frame as well as detected objects in the frame in order to incorporate the information of the objects in the scene. The following decoding RNN outputs are then fed into an attention layer and then to a decoder for generating captions. The caption is compared with the ground truth by learning metric so that vector representations of generated captions are semantically similar to those of ground truth. Experimental results with the MSDV dataset demonstrate that the performance of the proposed approach is much better than the model without the proposed meaning-guided framework, showing the effectiveness of the proposed model. Code are publicly available at https://github.com/captanlevi/Meaning-guided-video-captioning-.
Tasks	Object Detection, Video Captioning
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05730v1
PDF	https://arxiv.org/pdf/1912.05730v1.pdf
PWC	https://paperswithcode.com/paper/meaning-guided-video-captioning
Repo	https://github.com/captanlevi/Meaning-guided-video-captioning-
Framework	pytorch

Topology of Learning in Artificial Neural Networks


Title	Topology of Learning in Artificial Neural Networks
Authors	Maxime Gabella, Nitya Afambo
Abstract	Understanding how neural networks learn remains one of the central challenges in machine learning research. From random at the start of training, the weights of a neural network evolve in such a way as to be able to perform a variety of tasks, like classifying images. Here we study the emergence of structure in the weights by applying methods from topological data analysis. We train simple feedforward neural networks on the MNIST dataset and monitor the evolution of the weights. When initialized to zero, the weights follow trajectories that branch off recurrently, thus generating trees that describe the growth of the effective capacity of each layer. When initialized to tiny random values, the weights evolve smoothly along two-dimensional surfaces. We show that natural coordinates on these learning surfaces correspond to important factors of variation.
Tasks	Topological Data Analysis
Published	2019-02-21
URL	https://arxiv.org/abs/1902.08160v3
PDF	https://arxiv.org/pdf/1902.08160v3.pdf
PWC	https://paperswithcode.com/paper/topology-of-learning-in-artificial-neural
Repo	https://github.com/maximevictor/topo-learning
Framework	none

Real-world Underwater Enhancement: Challenges, Benchmarks, and Solutions


Title	Real-world Underwater Enhancement: Challenges, Benchmarks, and Solutions
Authors	Risheng Liu, Xin Fan, Ming Zhu, Minjun Hou, Zhongxuan Luo
Abstract	Underwater image enhancement is such an important low-level vision task with many applications that numerous algorithms have been proposed in recent years. These algorithms developed upon various assumptions demonstrate successes from various aspects using different data sets and different metrics. In this work, we setup an undersea image capturing system, and construct a large-scale Real-world Underwater Image Enhancement (RUIE) data set divided into three subsets. The three subsets target at three challenging aspects for enhancement, i.e., image visibility quality, color casts, and higher-level detection/classification, respectively. We conduct extensive and systematic experiments on RUIE to evaluate the effectiveness and limitations of various algorithms to enhance visibility and correct color casts on images with hierarchical categories of degradation. Moreover, underwater image enhancement in practice usually serves as a preprocessing step for mid-level and high-level vision tasks. We thus exploit the object detection performance on enhanced images as a brand new task-specific evaluation criterion. The findings from these evaluations not only confirm what is commonly believed, but also suggest promising solutions and new directions for visibility enhancement, color correction, and object detection on real-world underwater images.
Tasks	Image Enhancement, Object Detection
Published	2019-01-15
URL	http://arxiv.org/abs/1901.05320v2
PDF	http://arxiv.org/pdf/1901.05320v2.pdf
PWC	https://paperswithcode.com/paper/real-world-underwater-enhancement-challenges
Repo	https://github.com/dlut-dimt/Realworld-Underwater-Image-Enhancement-RUIE-Benchmark
Framework	none

Cognitively-inspired Agent-based Service Composition for Mobile & Pervasive Computing


Title	Cognitively-inspired Agent-based Service Composition for Mobile & Pervasive Computing
Authors	Oscar J. Romero
Abstract	Automatic service composition in mobile and pervasive computing faces many challenges due to the complex and highly dynamic nature of the environment. Common approaches consider service composition as a decision problem whose solution is usually addressed from optimization perspectives which are not feasible in practice due to the intractability of the problem, limited computational resources of smart devices, service host’s mobility, and time constraints to tailor composition plans. Thus, our main contribution is the development of a cognitively-inspired agent-based service composition model focused on bounded rationality rather than optimality, which allows the system to compensate for limited resources by selectively filtering out continuous streams of data. Our approach exhibits features such as distributedness, modularity, emergent global functionality, and robustness, which endow it with capabilities to perform decentralized service composition by orchestrating manifold service providers and conflicting goals from multiple users. The evaluation of our approach shows promising results when compared against state-of-the-art service composition models.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12630v1
PDF	https://arxiv.org/pdf/1905.12630v1.pdf
PWC	https://paperswithcode.com/paper/cognitively-inspired-agent-based-service
Repo	https://github.com/ojrlopez27/copernic
Framework	none

Global-to-local Memory Pointer Networks for Task-Oriented Dialogue


Title	Global-to-local Memory Pointer Networks for Task-Oriented Dialogue
Authors	Chien-Sheng Wu, Richard Socher, Caiming Xiong
Abstract	End-to-end task-oriented dialogue is challenging since knowledge bases are usually large, dynamic and hard to incorporate into a learning framework. We propose the global-to-local memory pointer (GLMP) networks to address this issue. In our model, a global memory encoder and a local memory decoder are proposed to share external knowledge. The encoder encodes dialogue history, modifies global contextual representation, and generates a global memory pointer. The decoder first generates a sketch response with unfilled slots. Next, it passes the global memory pointer to filter the external knowledge for relevant information, then instantiates the slots via the local memory pointers. We empirically show that our model can improve copy accuracy and mitigate the common out-of-vocabulary problem. As a result, GLMP is able to improve over the previous state-of-the-art models in both simulated bAbI Dialogue dataset and human-human Stanford Multi-domain Dialogue dataset on automatic and human evaluation.
Tasks
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04713v2
PDF	http://arxiv.org/pdf/1901.04713v2.pdf
PWC	https://paperswithcode.com/paper/global-to-local-memory-pointer-networks-for
Repo	https://github.com/jasonwu0731/GLMP
Framework	pytorch

ChID: A Large-scale Chinese IDiom Dataset for Cloze Test


Title	ChID: A Large-scale Chinese IDiom Dataset for Cloze Test
Authors	Chujie Zheng, Minlie Huang, Aixin Sun
Abstract	Cloze-style reading comprehension in Chinese is still limited due to the lack of various corpora. In this paper we propose a large-scale Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese. In this corpus, the idioms in a passage are replaced by blank symbols and the correct answer needs to be chosen from well-designed candidate idioms. We carefully study how the design of candidate idioms and the representation of idioms affect the performance of state-of-the-art models. Results show that the machine accuracy is substantially worse than that of human, indicating a large space for further research.
Tasks	Reading Comprehension
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01265v3
PDF	https://arxiv.org/pdf/1906.01265v3.pdf
PWC	https://paperswithcode.com/paper/chid-a-large-scale-chinese-idiom-dataset-for
Repo	https://github.com/zhengcj1/ChID-Dataset
Framework	tf

Deep Active Learning with Adaptive Acquisition


Title	Deep Active Learning with Adaptive Acquisition
Authors	Manuel Haussmann, Fred A. Hamprecht, Melih Kandemir
Abstract	Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning problem are fixed, the model is selected by grid search on a held-out validation set. This is strictly inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is already exhausted. More importantly, none of the earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out as the unique best choice. We present a method to break this vicious circle by defining the acquisition function as a learning predictor and training it by reinforcement feedback collected from each labeling round. As active learning is a scarce data regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all heuristics would agree, and learn a policy to warp the top portion of this ranking in the most beneficial way for the character of a specific data distribution. Our system consists of a Bayesian neural net, the predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark data sets that our method always manages to either invent a new superior acquisition function or to adapt itself to the a priori unknown best performing heuristic for each specific data set.
Tasks	Active Learning, Model Selection
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11471v1
PDF	https://arxiv.org/pdf/1906.11471v1.pdf
PWC	https://paperswithcode.com/paper/deep-active-learning-with-adaptive
Repo	https://github.com/manuelhaussmann/ral
Framework	pytorch

Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes


Title	Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes
Authors	James Requeima, Jonathan Gordon, John Bronskill, Sebastian Nowozin, Richard E. Turner
Abstract	The goal of this paper is to design image classification systems that, after an initial multi-task training phase, can automatically adapt to new tasks encountered at test time. We introduce a conditional neural process based approach to the multi-task classification setting for this purpose, and establish connections to the meta-learning and few-shot learning literature. The resulting approach, called CNAPs, comprises a classifier whose parameters are modulated by an adaptation network that takes the current task’s dataset as input. We demonstrate that CNAPs achieves state-of-the-art results on the challenging Meta-Dataset benchmark indicating high-quality transfer-learning. We show that the approach is robust, avoiding both over-fitting in low-shot regimes and under-fitting in high-shot regimes. Timing experiments reveal that CNAPs is computationally efficient at test-time as it does not involve gradient based adaptation. Finally, we show that trained models are immediately deployable to continual learning and active learning where they can outperform existing approaches that do not leverage transfer learning.
Tasks	Active Learning, Continual Learning, Few-Shot Learning, Image Classification, Meta-Learning, Transfer Learning
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07697v2
PDF	https://arxiv.org/pdf/1906.07697v2.pdf
PWC	https://paperswithcode.com/paper/fast-and-flexible-multi-task-classification
Repo	https://github.com/cambridge-mlg/cnaps
Framework	pytorch

Deep Active Learning for Anchor User Prediction


Title	Deep Active Learning for Anchor User Prediction
Authors	Anfeng Cheng, Chuan Zhou, Hong Yang, Jia Wu, Lei Li, Jianlong Tan, Li Guo
Abstract	Predicting pairs of anchor users plays an important role in the cross-network analysis. Due to the expensive costs of labeling anchor users for training prediction models, we consider in this paper the problem of minimizing the number of user pairs across multiple networks for labeling as to improve the accuracy of the prediction. To this end, we present a deep active learning model for anchor user prediction (DALAUP for short). However, active learning for anchor user sampling meets the challenges of non-i.i.d. user pair data caused by network structures and the correlation among anchor or non-anchor user pairs. To solve the challenges, DALAUP uses a couple of neural networks with shared-parameter to obtain the vector representations of user pairs, and ensembles three query strategies to select the most informative user pairs for labeling and model training. Experiments on real-world social network data demonstrate that DALAUP outperforms the state-of-the-art approaches.
Tasks	Active Learning
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07318v3
PDF	https://arxiv.org/pdf/1906.07318v3.pdf
PWC	https://paperswithcode.com/paper/deep-active-learning-for-anchor-user
Repo	https://github.com/chengaf/DALAUP
Framework	pytorch

Conditional deep surrogate models for stochastic, high-dimensional, and multi-fidelity systems


Title	Conditional deep surrogate models for stochastic, high-dimensional, and multi-fidelity systems
Authors	Yibo Yang, Paris Perdikaris
Abstract	We present a probabilistic deep learning methodology that enables the construction of predictive data-driven surrogates for stochastic systems. Leveraging recent advances in variational inference with implicit distributions, we put forth a statistical inference framework that enables the end-to-end training of surrogate models on paired input-output observations that may be stochastic in nature, originate from different information sources of variable fidelity, or be corrupted by complex noise processes. The resulting surrogates can accommodate high-dimensional inputs and outputs and are able to return predictions with quantified uncertainty. The effectiveness our approach is demonstrated through a series of canonical studies, including the regression of noisy data, multi-fidelity modeling of stochastic processes, and uncertainty propagation in high-dimensional dynamical systems.
Tasks
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04878v1
PDF	http://arxiv.org/pdf/1901.04878v1.pdf
PWC	https://paperswithcode.com/paper/conditional-deep-surrogate-models-for
Repo	https://github.com/ybyangpku/CADGMs
Framework	tf

Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation


Title	Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation
Authors	Robin Chan, Matthias Rottmann, Fabian Hüger, Peter Schlicht, Hanno Gottschalk
Abstract	As part of autonomous car driving systems, semantic segmentation is an essential component to obtain a full understanding of the car’s environment. One difficulty, that occurs while training neural networks for this purpose, is class imbalance of training data. Consequently, a neural network trained on unbalanced data in combination with maximum a-posteriori classification may easily ignore classes that are rare in terms of their frequency in the dataset. However, these classes are often of highest interest. We approach such potential misclassifications by weighting the posterior class probabilities with the prior class probabilities which in our case are the inverse frequencies of the corresponding classes in the training dataset. More precisely, we adopt a localized method by computing the priors pixel-wise such that the impact can be analyzed at pixel level as well. In our experiments, we train one network from scratch using a proprietary dataset containing 20,000 annotated frames of video sequences recorded from street scenes. The evaluation on our test set shows an increase of average recall with regard to instances of pedestrians and info signs by $25%$ and $23.4%$, respectively. In addition, we significantly reduce the non-detection rate for instances of the same classes by $61%$ and $38%$.
Tasks	Semantic Segmentation
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08394v1
PDF	http://arxiv.org/pdf/1901.08394v1.pdf
PWC	https://paperswithcode.com/paper/application-of-decision-rules-for-handling
Repo	https://github.com/robin-chan/decision-rules
Framework	tf

A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context


Title	A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context
Authors	Takuma Udagawa, Akiko Aizawa
Abstract	Common grounding is the process of creating, repairing and updating mutual understandings, which is a critical aspect of sophisticated human communication. However, traditional dialogue systems have limited capability of establishing common ground, and we also lack task formulations which introduce natural difficulty in terms of common grounding while enabling easy evaluation and analysis of complex models. In this paper, we propose a minimal dialogue task which requires advanced skills of common grounding under continuous and partially-observable context. Based on this task formulation, we collected a largescale dataset of 6,760 dialogues which fulfills essential requirements of natural language corpora. Our analysis of the dataset revealed important phenomena related to common grounding that need to be considered. Finally, we evaluate and analyze baseline neural models on a simple subtask that requires recognition of the created common ground. We show that simple baseline models perform decently but leave room for further improvement. Overall, we show that our proposed task will be a fundamental testbed where we can train, evaluate, and analyze dialogue system’s ability for sophisticated common grounding.
Tasks	Dialogue Understanding, Goal-Oriented Dialog, Language Acquisition
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03399v1
PDF	https://arxiv.org/pdf/1907.03399v1.pdf
PWC	https://paperswithcode.com/paper/a-natural-language-corpus-of-common-grounding
Repo	https://github.com/Alab-NII/onecommon
Framework	none

Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank


Title	Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank
Authors	Xialei Liu, Joost van de Weijer, Andrew D. Bagdanov
Abstract	For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50%.
Tasks	Active Learning, Crowd Counting, Image Quality Assessment, Learning-To-Rank
Published	2019-02-17
URL	http://arxiv.org/abs/1902.06285v1
PDF	http://arxiv.org/pdf/1902.06285v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-unlabeled-data-in-cnns-by-self
Repo	https://github.com/xialeiliu/CrowdCountingCVPR18
Framework	none

Acute Lymphoblastic Leukemia Classification from Microscopic Images using Convolutional Neural Networks


Title	Acute Lymphoblastic Leukemia Classification from Microscopic Images using Convolutional Neural Networks
Authors	Jonas Prellberg, Oliver Kramer
Abstract	Examining blood microscopic images for leukemia is necessary when expensive equipment for flow cytometry is unavailable. Automated systems can ease the burden on medical experts for performing this examination and may be especially helpful to quickly screen a large number of patients. We present a simple, yet effective classification approach using a ResNeXt convolutional neural network with Squeeze-and-Excitation modules. The approach was evaluated in the C-NMC online challenge and achieves a weighted F1-score of 88.91% on the test set. Code is available at https://github.com/jprellberg/isbi2019cancer
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09020v2
PDF	https://arxiv.org/pdf/1906.09020v2.pdf
PWC	https://paperswithcode.com/paper/acute-lymphoblastic-leukemia-classification
Repo	https://github.com/jprellberg/isbi2019cancer
Framework	pytorch