Paper Group AWR 24
Deterministic consensus maximization with biconvex programming. Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation. Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content. Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space. Language GA …
Deterministic consensus maximization with biconvex programming
Title | Deterministic consensus maximization with biconvex programming |
Authors | Zhipeng Cai, Tat-Jun Chin, Huu Le, David Suter |
Abstract | Consensus maximization is one of the most widely used robust fitting paradigms in computer vision, and the development of algorithms for consensus maximization is an active research topic. In this paper, we propose an efficient deterministic optimization algorithm for consensus maximization. Given an initial solution, our method conducts a deterministic search that forcibly increases the consensus of the initial solution. We show how each iteration of the update can be formulated as an instance of biconvex programming, which we solve efficiently using a novel biconvex optimization algorithm. In contrast to our algorithm, previous consensus improvement techniques rely on random sampling or relaxations of the objective function, which reduce their ability to significantly improve the initial consensus. In fact, on challenging instances, the previous techniques may even return a worse off solution. Comprehensive experiments show that our algorithm can consistently and greatly improve the quality of the initial solution, without substantial cost. |
Tasks | |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09436v3 |
http://arxiv.org/pdf/1807.09436v3.pdf | |
PWC | https://paperswithcode.com/paper/deterministic-consensus-maximization-with |
Repo | https://github.com/ZhipengCai/Demo---Deterministic-consensus-maximization-with-biconvex-programming |
Framework | none |
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
Title | Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation |
Authors | Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, Kiyoharu Aizawa |
Abstract | Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, we present a framework for a novel task, cross-domain weakly supervised object detection, which addresses this question. For this paper, we have access to images with instance-level annotations in a source domain (e.g., natural image) and images with image-level annotations in a target domain (e.g., watercolor). In addition, the classes to be detected in the target domain are all or a subset of those in the source domain. Starting from a fully supervised object detector, which is pre-trained on the source domain, we propose a two-step progressive domain adaptation technique by fine-tuning the detector on two types of artificially and automatically generated samples. We test our methods on our newly collected datasets containing three image domains, and achieve an improvement of approximately 5 to 20 percentage points in terms of mean average precision (mAP) compared to the best-performing baselines. |
Tasks | Domain Adaptation, Object Detection, Weakly Supervised Object Detection |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1803.11365v1 |
http://arxiv.org/pdf/1803.11365v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-domain-weakly-supervised-object |
Repo | https://github.com/naoto0804/cross-domain-detection |
Framework | none |
Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content
Title | Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content |
Authors | Weiming Wen, Songwen Su, Zhou Yu |
Abstract | With the increasing popularity of smart devices, rumors with multimedia content become more and more common on social networks. The multimedia information usually makes rumors look more convincing. Therefore, finding an automatic approach to verify rumors with multimedia content is a pressing task. Previous rumor verification research only utilizes multimedia as input features. We propose not to use the multimedia content but to find external information in other news platforms pivoting on it. We introduce a new features set, cross-lingual cross-platform features that leverage the semantic similarity between the rumors and the external information. When implemented, machine learning methods utilizing such features achieved the state-of-the-art rumor verification results. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04911v2 |
http://arxiv.org/pdf/1808.04911v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-cross-platform-rumor |
Repo | https://github.com/RakdosCC/CCRV |
Framework | pytorch |
Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
Title | Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space |
Authors | Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu |
Abstract | Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method. |
Tasks | |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.06394v1 |
http://arxiv.org/pdf/1810.06394v1.pdf | |
PWC | https://paperswithcode.com/paper/parametrized-deep-q-networks-learning |
Repo | https://github.com/cycraig/MP-DQN |
Framework | pytorch |
Language GANs Falling Short
Title | Language GANs Falling Short |
Authors | Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin |
Abstract | Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model’s conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at github.com/pclucas14/GansFallingShort |
Tasks | Text Generation |
Published | 2018-11-06 |
URL | https://arxiv.org/abs/1811.02549v6 |
https://arxiv.org/pdf/1811.02549v6.pdf | |
PWC | https://paperswithcode.com/paper/language-gans-falling-short |
Repo | https://github.com/pclucas14/GansFallingShort |
Framework | pytorch |
MeshNet: Mesh Neural Network for 3D Shape Representation
Title | MeshNet: Mesh Neural Network for 3D Shape Representation |
Authors | Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, Yue Gao |
Abstract | Mesh is an important and powerful type of data for 3D shapes and widely studied in the field of computer vision and computer graphics. Regarding the task of 3D shape representation, there have been extensive research efforts concentrating on how to represent 3D shapes well using volumetric grid, multi-view and point cloud. However, there is little effort on using mesh data in recent years, due to the complexity and irregularity of mesh data. In this paper, we propose a mesh neural network, named MeshNet, to learn 3D shape representation from mesh data. In this method, face-unit and feature splitting are introduced, and a general architecture with available and effective blocks are proposed. In this way, MeshNet is able to solve the complexity and irregularity problem of mesh and conduct 3D shape representation well. We have applied the proposed MeshNet method in the applications of 3D shape classification and retrieval. Experimental results and comparisons with the state-of-the-art methods demonstrate that the proposed MeshNet can achieve satisfying 3D shape classification and retrieval performance, which indicates the effectiveness of the proposed method on 3D shape representation. |
Tasks | 3D Shape Representation |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11424v1 |
http://arxiv.org/pdf/1811.11424v1.pdf | |
PWC | https://paperswithcode.com/paper/meshnet-mesh-neural-network-for-3d-shape |
Repo | https://github.com/iMoonLab/MeshNet |
Framework | pytorch |
Text2Scene: Generating Compositional Scenes from Textual Descriptions
Title | Text2Scene: Generating Compositional Scenes from Textual Descriptions |
Authors | Fuwen Tan, Song Feng, Vicente Ordonez |
Abstract | In this paper, we propose Text2Scene, a model that generates various forms of compositional scene representations from natural language descriptions. Unlike recent works, our method does NOT use Generative Adversarial Networks (GANs). Text2Scene instead learns to sequentially generate objects and their attributes (location, size, appearance, etc) at every time step by attending to different parts of the input text and the current status of the generated scene. We show that under minor modifications, the proposed framework can handle the generation of different forms of scene representations, including cartoon-like scenes, object layouts corresponding to real images, and synthetic images. Our method is not only competitive when compared with state-of-the-art GAN-based methods using automatic metrics and superior based on human judgments but also has the advantage of producing interpretable results. |
Tasks | |
Published | 2018-09-04 |
URL | https://arxiv.org/abs/1809.01110v3 |
https://arxiv.org/pdf/1809.01110v3.pdf | |
PWC | https://paperswithcode.com/paper/text2scene-generating-compositional-scenes |
Repo | https://github.com/uvavision/Text2Scene |
Framework | pytorch |
Heterogeneous Multilayer Generalized Operational Perceptron
Title | Heterogeneous Multilayer Generalized Operational Perceptron |
Authors | Dat Thanh Tran, Serkan Kiranyaz, Moncef Gabbouj, Alexandros Iosifidis |
Abstract | The traditional Multilayer Perceptron (MLP) using McCulloch-Pitts neuron model is inherently limited to a set of neuronal activities, i.e., linear weighted sum followed by nonlinear thresholding step. Previously, Generalized Operational Perceptron (GOP) was proposed to extend conventional perceptron model by defining a diverse set of neuronal activities to imitate a generalized model of biological neurons. Together with GOP, Progressive Operational Perceptron (POP) algorithm was proposed to optimize a pre-defined template of multiple homogeneous layers in a layerwise manner. In this paper, we propose an efficient algorithm to learn a compact, fully heterogeneous multilayer network that allows each individual neuron, regardless of the layer, to have distinct characteristics. Based on the complexity of the problem, the proposed algorithm operates in a progressive manner on a neuronal level, searching for a compact topology, not only in terms of depth but also width, i.e., the number of neurons in each layer. The proposed algorithm is shown to outperform other related learning methods in extensive experiments on several classification problems. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.05093v3 |
http://arxiv.org/pdf/1804.05093v3.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneous-multilayer-generalized |
Repo | https://github.com/viebboy/PyGOP |
Framework | tf |
Relational inductive biases, deep learning, and graph networks
Title | Relational inductive biases, deep learning, and graph networks |
Authors | Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, Razvan Pascanu |
Abstract | Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one’s experiences–a hallmark of human intelligence from infancy–remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between “hand-engineering” and “end-to-end” learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias–the graph network–which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice. |
Tasks | Decision Making, Relational Reasoning |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01261v3 |
http://arxiv.org/pdf/1806.01261v3.pdf | |
PWC | https://paperswithcode.com/paper/relational-inductive-biases-deep-learning-and |
Repo | https://github.com/raphaelavalos/attention_tsp_graph_net |
Framework | tf |
Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.1.1
Title | Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.1.1 |
Authors | Avanti Shrikumar, Katherine Tian, Žiga Avsec, Anna Shcherbina, Abhimanyu Banerjee, Mahfuza Sharmin, Surag Nair, Anshul Kundaje |
Abstract | TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) is an algorithm for identifying motifs from basepair-level importance scores computed on genomic sequence data. This technical note focuses on version v0.5.1.1. The implementation is available at https://github.com/kundajelab/tfmodisco/tree/v0.5.1.1 |
Tasks | |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1811.00416v4 |
https://arxiv.org/pdf/1811.00416v4.pdf | |
PWC | https://paperswithcode.com/paper/tf-modisco-v0422-alpha-technical-note |
Repo | https://github.com/kundajelab/tfmodisco |
Framework | tf |
Improving Document Binarization via Adversarial Noise-Texture Augmentation
Title | Improving Document Binarization via Adversarial Noise-Texture Augmentation |
Authors | Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, Partha Pratim Roy |
Abstract | Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Network that transfers the texture element of a degraded reference document image to a clean binary image. In this way, the network creates multiple versions of the same textual content with various noisy textures, thus enlarging the available document binarization datasets. At last, the newly generated images are passed through a Binarization network to get back the clean version. By jointly training the two networks we can increase the adversarial robustness of our system. Also, it is noteworthy that our model can learn from unpaired data. Experimental results suggest that the proposed method achieves superior performance over widely used DIBCO datasets. |
Tasks | Document Binarization, Domain Adaptation, Transfer Learning |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.11120v2 |
http://arxiv.org/pdf/1810.11120v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-document-binarization-via |
Repo | https://github.com/ankanbhunia/AdverseBiNet |
Framework | tf |
Strong-Weak Distribution Alignment for Adaptive Object Detection
Title | Strong-Weak Distribution Alignment for Adaptive Object Detection |
Authors | Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko |
Abstract | We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire distributions of source and target images to each other at the global image level may fail, as domains could have distinct scene layouts and different combinations of objects. On the other hand, strong matching of local features such as texture and color makes sense, as it does not change category level semantics. This motivates us to propose a novel method for detector adaptation based on strong local alignment and weak global alignment. Our key contribution is the weak alignment model, which focuses the adversarial alignment loss on images that are globally similar and puts less emphasis on aligning images that are globally dissimilar. Additionally, we design the strong domain alignment model to only look at local receptive fields of the feature map. We empirically verify the effectiveness of our method on four datasets comprising both large and small domain shifts. Our code is available at \url{https://github.com/VisionLearningGroup/DA_Detection} |
Tasks | Object Detection, Unsupervised Domain Adaptation |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04798v3 |
http://arxiv.org/pdf/1812.04798v3.pdf | |
PWC | https://paperswithcode.com/paper/strong-weak-distribution-alignment-for |
Repo | https://github.com/VisionLearningGroup/DA_Detection |
Framework | pytorch |
Unified Perceptual Parsing for Scene Understanding
Title | Unified Perceptual Parsing for Scene Understanding |
Authors | Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun |
Abstract | Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes. Models are available at \url{https://github.com/CSAILVision/unifiedparsing}. |
Tasks | Scene Understanding, Semantic Segmentation |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10221v1 |
http://arxiv.org/pdf/1807.10221v1.pdf | |
PWC | https://paperswithcode.com/paper/unified-perceptual-parsing-for-scene |
Repo | https://github.com/CSAILVision/unifiedparsing |
Framework | pytorch |
Embedding Multimodal Relational Data for Knowledge Base Completion
Title | Embedding Multimodal Relational Data for Knowledge Base Completion |
Authors | Pouya Pezeshkpour, Liyan Chen, Sameer Singh |
Abstract | Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on simple link structure between a finite set of entities, ignoring the variety of data types that are often used in knowledge bases, such as text, images, and numerical values. In this paper, we propose multimodal knowledge base embeddings (MKBE) that use different neural encoders for this variety of observed data, and combine them with existing relational models to learn embeddings of the entities and multimodal data. Further, using these learned embedings and different neural decoders, we introduce a novel multimodal imputation model to generate missing multimodal values, like text and images, from information in the knowledge base. We enrich existing relational datasets to create two novel benchmarks that contain additional information such as textual descriptions and images of the original entities. We demonstrate that our models utilize this additional information effectively to provide more accurate link prediction, achieving state-of-the-art results with a considerable gap of 5-7% over existing methods. Further, we evaluate the quality of our generated multimodal values via a user study. We have release the datasets and the open-source implementation of our models at https://github.com/pouyapez/mkbe |
Tasks | Imputation, Knowledge Base Completion, Link Prediction |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01341v2 |
http://arxiv.org/pdf/1809.01341v2.pdf | |
PWC | https://paperswithcode.com/paper/embedding-multimodal-relational-data-for |
Repo | https://github.com/pouyapez/mkbe |
Framework | tf |
Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
Title | Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record |
Authors | Jinghe Zhang, Kamran Kowsari, James H. Harrison, Jennifer M. Lobo, Laura E. Barnes |
Abstract | The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting health outcomes. However, the heterogeneity, sparsity, noise, and bias in this data present many complex challenges. This complexity makes it difficult to translate potentially relevant information into machine learning algorithms. In this paper, we propose a computational framework, Patient2Vec, to learn an interpretable deep representation of longitudinal EHR data which is personalized for each patient. To evaluate this approach, we apply it to the prediction of future hospitalizations using real EHR data and compare its predictive performance with baseline methods. Patient2Vec produces a vector space with meaningful structure and it achieves an AUC around 0.799 outperforming baseline methods. In the end, the learned feature importance can be visualized and interpreted at both the individual and population levels to bring clinical insights. |
Tasks | Feature Importance |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04793v3 |
http://arxiv.org/pdf/1810.04793v3.pdf | |
PWC | https://paperswithcode.com/paper/patient2vec-a-personalized-interpretable-deep |
Repo | https://github.com/BarnesLab/Patient2Vec |
Framework | pytorch |