October 21, 2019

3157 words 15 mins read

Paper Group AWR 24

Deterministic consensus maximization with biconvex programming. Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation. Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content. Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space. Language GA …

Deterministic consensus maximization with biconvex programming


Title	Deterministic consensus maximization with biconvex programming
Authors	Zhipeng Cai, Tat-Jun Chin, Huu Le, David Suter
Abstract	Consensus maximization is one of the most widely used robust fitting paradigms in computer vision, and the development of algorithms for consensus maximization is an active research topic. In this paper, we propose an efficient deterministic optimization algorithm for consensus maximization. Given an initial solution, our method conducts a deterministic search that forcibly increases the consensus of the initial solution. We show how each iteration of the update can be formulated as an instance of biconvex programming, which we solve efficiently using a novel biconvex optimization algorithm. In contrast to our algorithm, previous consensus improvement techniques rely on random sampling or relaxations of the objective function, which reduce their ability to significantly improve the initial consensus. In fact, on challenging instances, the previous techniques may even return a worse off solution. Comprehensive experiments show that our algorithm can consistently and greatly improve the quality of the initial solution, without substantial cost.
Tasks
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09436v3
PDF	http://arxiv.org/pdf/1807.09436v3.pdf
PWC	https://paperswithcode.com/paper/deterministic-consensus-maximization-with
Repo	https://github.com/ZhipengCai/Demo---Deterministic-consensus-maximization-with-biconvex-programming
Framework	none

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation


Title	Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
Authors	Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, Kiyoharu Aizawa
Abstract	Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, we present a framework for a novel task, cross-domain weakly supervised object detection, which addresses this question. For this paper, we have access to images with instance-level annotations in a source domain (e.g., natural image) and images with image-level annotations in a target domain (e.g., watercolor). In addition, the classes to be detected in the target domain are all or a subset of those in the source domain. Starting from a fully supervised object detector, which is pre-trained on the source domain, we propose a two-step progressive domain adaptation technique by fine-tuning the detector on two types of artificially and automatically generated samples. We test our methods on our newly collected datasets containing three image domains, and achieve an improvement of approximately 5 to 20 percentage points in terms of mean average precision (mAP) compared to the best-performing baselines.
Tasks	Domain Adaptation, Object Detection, Weakly Supervised Object Detection
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11365v1
PDF	http://arxiv.org/pdf/1803.11365v1.pdf
PWC	https://paperswithcode.com/paper/cross-domain-weakly-supervised-object
Repo	https://github.com/naoto0804/cross-domain-detection
Framework	none

Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content


Title	Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content
Authors	Weiming Wen, Songwen Su, Zhou Yu
Abstract	With the increasing popularity of smart devices, rumors with multimedia content become more and more common on social networks. The multimedia information usually makes rumors look more convincing. Therefore, finding an automatic approach to verify rumors with multimedia content is a pressing task. Previous rumor verification research only utilizes multimedia as input features. We propose not to use the multimedia content but to find external information in other news platforms pivoting on it. We introduce a new features set, cross-lingual cross-platform features that leverage the semantic similarity between the rumors and the external information. When implemented, machine learning methods utilizing such features achieved the state-of-the-art rumor verification results.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04911v2
PDF	http://arxiv.org/pdf/1808.04911v2.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-cross-platform-rumor
Repo	https://github.com/RakdosCC/CCRV
Framework	pytorch

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space


Title	Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
Authors	Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu
Abstract	Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method.
Tasks
Published	2018-10-10
URL	http://arxiv.org/abs/1810.06394v1
PDF	http://arxiv.org/pdf/1810.06394v1.pdf
PWC	https://paperswithcode.com/paper/parametrized-deep-q-networks-learning
Repo	https://github.com/cycraig/MP-DQN
Framework	pytorch

Language GANs Falling Short


Title	Language GANs Falling Short
Authors	Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin
Abstract	Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model’s conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at github.com/pclucas14/GansFallingShort
Tasks	Text Generation
Published	2018-11-06
URL	https://arxiv.org/abs/1811.02549v6
PDF	https://arxiv.org/pdf/1811.02549v6.pdf
PWC	https://paperswithcode.com/paper/language-gans-falling-short
Repo	https://github.com/pclucas14/GansFallingShort
Framework	pytorch

MeshNet: Mesh Neural Network for 3D Shape Representation


Title	MeshNet: Mesh Neural Network for 3D Shape Representation
Authors	Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, Yue Gao
Abstract	Mesh is an important and powerful type of data for 3D shapes and widely studied in the field of computer vision and computer graphics. Regarding the task of 3D shape representation, there have been extensive research efforts concentrating on how to represent 3D shapes well using volumetric grid, multi-view and point cloud. However, there is little effort on using mesh data in recent years, due to the complexity and irregularity of mesh data. In this paper, we propose a mesh neural network, named MeshNet, to learn 3D shape representation from mesh data. In this method, face-unit and feature splitting are introduced, and a general architecture with available and effective blocks are proposed. In this way, MeshNet is able to solve the complexity and irregularity problem of mesh and conduct 3D shape representation well. We have applied the proposed MeshNet method in the applications of 3D shape classification and retrieval. Experimental results and comparisons with the state-of-the-art methods demonstrate that the proposed MeshNet can achieve satisfying 3D shape classification and retrieval performance, which indicates the effectiveness of the proposed method on 3D shape representation.
Tasks	3D Shape Representation
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11424v1
PDF	http://arxiv.org/pdf/1811.11424v1.pdf
PWC	https://paperswithcode.com/paper/meshnet-mesh-neural-network-for-3d-shape
Repo	https://github.com/iMoonLab/MeshNet
Framework	pytorch

Text2Scene: Generating Compositional Scenes from Textual Descriptions


Title	Text2Scene: Generating Compositional Scenes from Textual Descriptions
Authors	Fuwen Tan, Song Feng, Vicente Ordonez
Abstract	In this paper, we propose Text2Scene, a model that generates various forms of compositional scene representations from natural language descriptions. Unlike recent works, our method does NOT use Generative Adversarial Networks (GANs). Text2Scene instead learns to sequentially generate objects and their attributes (location, size, appearance, etc) at every time step by attending to different parts of the input text and the current status of the generated scene. We show that under minor modifications, the proposed framework can handle the generation of different forms of scene representations, including cartoon-like scenes, object layouts corresponding to real images, and synthetic images. Our method is not only competitive when compared with state-of-the-art GAN-based methods using automatic metrics and superior based on human judgments but also has the advantage of producing interpretable results.
Tasks
Published	2018-09-04
URL	https://arxiv.org/abs/1809.01110v3
PDF	https://arxiv.org/pdf/1809.01110v3.pdf
PWC	https://paperswithcode.com/paper/text2scene-generating-compositional-scenes
Repo	https://github.com/uvavision/Text2Scene
Framework	pytorch

Heterogeneous Multilayer Generalized Operational Perceptron


Title	Heterogeneous Multilayer Generalized Operational Perceptron
Authors	Dat Thanh Tran, Serkan Kiranyaz, Moncef Gabbouj, Alexandros Iosifidis
Abstract	The traditional Multilayer Perceptron (MLP) using McCulloch-Pitts neuron model is inherently limited to a set of neuronal activities, i.e., linear weighted sum followed by nonlinear thresholding step. Previously, Generalized Operational Perceptron (GOP) was proposed to extend conventional perceptron model by defining a diverse set of neuronal activities to imitate a generalized model of biological neurons. Together with GOP, Progressive Operational Perceptron (POP) algorithm was proposed to optimize a pre-defined template of multiple homogeneous layers in a layerwise manner. In this paper, we propose an efficient algorithm to learn a compact, fully heterogeneous multilayer network that allows each individual neuron, regardless of the layer, to have distinct characteristics. Based on the complexity of the problem, the proposed algorithm operates in a progressive manner on a neuronal level, searching for a compact topology, not only in terms of depth but also width, i.e., the number of neurons in each layer. The proposed algorithm is shown to outperform other related learning methods in extensive experiments on several classification problems.
Tasks
Published	2018-04-13
URL	http://arxiv.org/abs/1804.05093v3
PDF	http://arxiv.org/pdf/1804.05093v3.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-multilayer-generalized
Repo	https://github.com/viebboy/PyGOP
Framework	tf

Relational inductive biases, deep learning, and graph networks


Title	Relational inductive biases, deep learning, and graph networks
Authors	Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, Razvan Pascanu
Abstract	Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one’s experiences–a hallmark of human intelligence from infancy–remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between “hand-engineering” and “end-to-end” learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias–the graph network–which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
Tasks	Decision Making, Relational Reasoning
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01261v3
PDF	http://arxiv.org/pdf/1806.01261v3.pdf
PWC	https://paperswithcode.com/paper/relational-inductive-biases-deep-learning-and
Repo	https://github.com/raphaelavalos/attention_tsp_graph_net
Framework	tf

Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.1.1


Title	Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.1.1
Authors	Avanti Shrikumar, Katherine Tian, Žiga Avsec, Anna Shcherbina, Abhimanyu Banerjee, Mahfuza Sharmin, Surag Nair, Anshul Kundaje
Abstract	TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) is an algorithm for identifying motifs from basepair-level importance scores computed on genomic sequence data. This technical note focuses on version v0.5.1.1. The implementation is available at https://github.com/kundajelab/tfmodisco/tree/v0.5.1.1
Tasks
Published	2018-10-31
URL	https://arxiv.org/abs/1811.00416v4
PDF	https://arxiv.org/pdf/1811.00416v4.pdf
PWC	https://paperswithcode.com/paper/tf-modisco-v0422-alpha-technical-note
Repo	https://github.com/kundajelab/tfmodisco
Framework	tf

Improving Document Binarization via Adversarial Noise-Texture Augmentation


Title	Improving Document Binarization via Adversarial Noise-Texture Augmentation
Authors	Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, Partha Pratim Roy
Abstract	Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Network that transfers the texture element of a degraded reference document image to a clean binary image. In this way, the network creates multiple versions of the same textual content with various noisy textures, thus enlarging the available document binarization datasets. At last, the newly generated images are passed through a Binarization network to get back the clean version. By jointly training the two networks we can increase the adversarial robustness of our system. Also, it is noteworthy that our model can learn from unpaired data. Experimental results suggest that the proposed method achieves superior performance over widely used DIBCO datasets.
Tasks	Document Binarization, Domain Adaptation, Transfer Learning
Published	2018-10-25
URL	http://arxiv.org/abs/1810.11120v2
PDF	http://arxiv.org/pdf/1810.11120v2.pdf
PWC	https://paperswithcode.com/paper/improving-document-binarization-via
Repo	https://github.com/ankanbhunia/AdverseBiNet
Framework	tf

Strong-Weak Distribution Alignment for Adaptive Object Detection


Title	Strong-Weak Distribution Alignment for Adaptive Object Detection
Authors	Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko
Abstract	We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire distributions of source and target images to each other at the global image level may fail, as domains could have distinct scene layouts and different combinations of objects. On the other hand, strong matching of local features such as texture and color makes sense, as it does not change category level semantics. This motivates us to propose a novel method for detector adaptation based on strong local alignment and weak global alignment. Our key contribution is the weak alignment model, which focuses the adversarial alignment loss on images that are globally similar and puts less emphasis on aligning images that are globally dissimilar. Additionally, we design the strong domain alignment model to only look at local receptive fields of the feature map. We empirically verify the effectiveness of our method on four datasets comprising both large and small domain shifts. Our code is available at \url{https://github.com/VisionLearningGroup/DA_Detection}
Tasks	Object Detection, Unsupervised Domain Adaptation
Published	2018-12-12
URL	http://arxiv.org/abs/1812.04798v3
PDF	http://arxiv.org/pdf/1812.04798v3.pdf
PWC	https://paperswithcode.com/paper/strong-weak-distribution-alignment-for
Repo	https://github.com/VisionLearningGroup/DA_Detection
Framework	pytorch

Unified Perceptual Parsing for Scene Understanding


Title	Unified Perceptual Parsing for Scene Understanding
Authors	Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun
Abstract	Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes. Models are available at \url{https://github.com/CSAILVision/unifiedparsing}.
Tasks	Scene Understanding, Semantic Segmentation
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10221v1
PDF	http://arxiv.org/pdf/1807.10221v1.pdf
PWC	https://paperswithcode.com/paper/unified-perceptual-parsing-for-scene
Repo	https://github.com/CSAILVision/unifiedparsing
Framework	pytorch

Embedding Multimodal Relational Data for Knowledge Base Completion


Title	Embedding Multimodal Relational Data for Knowledge Base Completion
Authors	Pouya Pezeshkpour, Liyan Chen, Sameer Singh
Abstract	Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on simple link structure between a finite set of entities, ignoring the variety of data types that are often used in knowledge bases, such as text, images, and numerical values. In this paper, we propose multimodal knowledge base embeddings (MKBE) that use different neural encoders for this variety of observed data, and combine them with existing relational models to learn embeddings of the entities and multimodal data. Further, using these learned embedings and different neural decoders, we introduce a novel multimodal imputation model to generate missing multimodal values, like text and images, from information in the knowledge base. We enrich existing relational datasets to create two novel benchmarks that contain additional information such as textual descriptions and images of the original entities. We demonstrate that our models utilize this additional information effectively to provide more accurate link prediction, achieving state-of-the-art results with a considerable gap of 5-7% over existing methods. Further, we evaluate the quality of our generated multimodal values via a user study. We have release the datasets and the open-source implementation of our models at https://github.com/pouyapez/mkbe
Tasks	Imputation, Knowledge Base Completion, Link Prediction
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01341v2
PDF	http://arxiv.org/pdf/1809.01341v2.pdf
PWC	https://paperswithcode.com/paper/embedding-multimodal-relational-data-for
Repo	https://github.com/pouyapez/mkbe
Framework	tf

Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record


Title	Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
Authors	Jinghe Zhang, Kamran Kowsari, James H. Harrison, Jennifer M. Lobo, Laura E. Barnes
Abstract	The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting health outcomes. However, the heterogeneity, sparsity, noise, and bias in this data present many complex challenges. This complexity makes it difficult to translate potentially relevant information into machine learning algorithms. In this paper, we propose a computational framework, Patient2Vec, to learn an interpretable deep representation of longitudinal EHR data which is personalized for each patient. To evaluate this approach, we apply it to the prediction of future hospitalizations using real EHR data and compare its predictive performance with baseline methods. Patient2Vec produces a vector space with meaningful structure and it achieves an AUC around 0.799 outperforming baseline methods. In the end, the learned feature importance can be visualized and interpreted at both the individual and population levels to bring clinical insights.
Tasks	Feature Importance
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04793v3
PDF	http://arxiv.org/pdf/1810.04793v3.pdf
PWC	https://paperswithcode.com/paper/patient2vec-a-personalized-interpretable-deep
Repo	https://github.com/BarnesLab/Patient2Vec
Framework	pytorch