Paper Group ANR 885
Deep Metric Learning using Similarities from Nonlinear Rank Approximations. Graph Neural Networks for Maximum Constraint Satisfaction. Compiler-Level Matrix Multiplication Optimization for Deep Learning. Ensemble-based kernel learning for a class of data assimilation problems with imperfect forward simulators. A Conceptually Well-Founded Characteri …
Deep Metric Learning using Similarities from Nonlinear Rank Approximations
Title | Deep Metric Learning using Similarities from Nonlinear Rank Approximations |
Authors | Konstantin Schall, Kai Uwe Barthel, Nico Hezel, Klaus Jung |
Abstract | In recent years, deep metric learning has achieved promising results in learning high dimensional semantic feature embeddings where the spatial relationships of the feature vectors match the visual similarities of the images. Similarity search for images is performed by determining the vectors with the smallest distances to a query vector. However, high retrieval quality does not depend on the actual distances of the feature vectors, but rather on the ranking order of the feature vectors from similar images. In this paper, we introduce a metric learning algorithm that focuses on identifying and modifying those feature vectors that most strongly affect the retrieval quality. We compute normalized approximated ranks and convert them to similarities by applying a nonlinear transfer function. These similarities are used in a newly proposed loss function that better contracts similar and disperses dissimilar samples. Experiments demonstrate significant improvement over existing deep feature embedding methods on the CUB-200-2011, Cars196, and Stanford Online Products data sets for all embedding sizes. |
Tasks | Metric Learning |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09427v2 |
https://arxiv.org/pdf/1909.09427v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-metric-learning-using-similarities-from |
Repo | |
Framework | |
Graph Neural Networks for Maximum Constraint Satisfaction
Title | Graph Neural Networks for Maximum Constraint Satisfaction |
Authors | Jan Toenshoff, Martin Ritzert, Hinrikus Wolf, Martin Grohe |
Abstract | Many combinatorial optimization problems can be phrased in the language of constraint satisfaction problems. We introduce a graph neural network architecture for solving such optimization problems. The architecture is generic; it works for all binary constraint satisfaction problems. Training is unsupervised, and it is sufficient to train on relatively small instances; the resulting networks perform well on much larger instances (at least 10-times larger). We experimentally evaluate our approach for a variety of problems, including Maximum Cut and Maximum Independent Set. Despite being generic, we show that our approach matches or surpasses most greedy and semi-definite programming based algorithms and sometimes even outperforms state-of-the-art heuristics for the specific problems. |
Tasks | Combinatorial Optimization |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08387v3 |
https://arxiv.org/pdf/1909.08387v3.pdf | |
PWC | https://paperswithcode.com/paper/run-csp-unsupervised-learning-of-message |
Repo | |
Framework | |
Compiler-Level Matrix Multiplication Optimization for Deep Learning
Title | Compiler-Level Matrix Multiplication Optimization for Deep Learning |
Authors | Huaqing Zhang, Xiaolin Cheng, Hui Zang, Dae Hoon Park |
Abstract | An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level optimization of GEMM has significant performance impact on training and executing deep learning models. However, most deep learning frameworks rely on hardware-specific operator libraries in which GEMM optimization has been mostly achieved by manual tuning, which restricts the performance on different target hardware. In this paper, we propose two novel algorithms for GEMM optimization based on the TVM framework, a lightweight Greedy Best First Search (G-BFS) method based on heuristic search, and a Neighborhood Actor Advantage Critic (N-A2C) method based on reinforcement learning. Experimental results show significant performance improvement of the proposed methods, in both the optimality of the solution and the cost of search in terms of time and fraction of the search space explored. Specifically, the proposed methods achieve 24% and 40% savings in GEMM computation time over state-of-the-art XGBoost and RNN methods, respectively, while exploring only 0.1% of the search space. The proposed approaches have potential to be applied to other operator-level optimizations. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10616v1 |
https://arxiv.org/pdf/1909.10616v1.pdf | |
PWC | https://paperswithcode.com/paper/compiler-level-matrix-multiplication |
Repo | |
Framework | |
Ensemble-based kernel learning for a class of data assimilation problems with imperfect forward simulators
Title | Ensemble-based kernel learning for a class of data assimilation problems with imperfect forward simulators |
Authors | Xiaodong Luo |
Abstract | Simulator imperfection, often known as model error, is ubiquitous in practical data assimilation problems. Despite the enormous efforts dedicated to addressing this problem, properly handling simulator imperfection in data assimilation remains to be a challenging task. In this work, we propose an approach to dealing with simulator imperfection from a point of view of functional approximation that can be implemented through a certain machine learning method, such as kernel-based learning adopted in the current work. To this end, we start from considering a class of supervised learning problems, and then identify similarities between supervised learning and variational data assimilation. These similarities found the basis for us to develop an ensemble-based learning framework to tackle supervised learning problems, while achieving various advantages of ensemble-based methods over the variational ones. After establishing the ensemble-based learning framework, we proceed to investigate the integration of ensemble-based learning into an ensemble-based data assimilation framework to handle simulator imperfection. In the course of our investigations, we also develop a strategy to tackle the issue of multi-modality in supervised-learning problems, and transfer this strategy to data assimilation problems to help improve assimilation performance. For demonstration, we apply the ensemble-based learning framework and the integrated, ensemble-based data assimilation framework to a supervised learning problem and a data assimilation problem with an imperfect forward simulator, respectively. The experiment results indicate that both frameworks achieve good performance in relevant case studies, and that functional approximation through machine learning may serve as a viable way to account for simulator imperfection in data assimilation problems. |
Tasks | |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10758v1 |
http://arxiv.org/pdf/1901.10758v1.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-based-kernel-learning-for-a-class-of |
Repo | |
Framework | |
A Conceptually Well-Founded Characterization of Iterated Admissibility Using an “All I Know” Operator
Title | A Conceptually Well-Founded Characterization of Iterated Admissibility Using an “All I Know” Operator |
Authors | Joseph Y. Halpern, Rafael Pass |
Abstract | Brandenburger, Friedenberg, and Keisler provide an epistemic characterization of iterated admissibility (IA), also known as iterated deletion of weakly dominated strategies, where uncertainty is represented using LPSs (lexicographic probability sequences). Their characterization holds in a rich structure called a complete structure, where all types are possible. In earlier work, we gave a characterization of iterated admissibility using an “all I know” operator, that captures the intuition that “all the agent knows” is that agents satisfy the appropriate rationality assumptions. That characterization did not need complete structures and used probability structures, not LPSs. However, that characterization did not deal with Samuelson’s conceptual concern regarding IA, namely, that at higher levels, players do not consider possible strategies that were used to justify their choice of strategy at lower levels. In this paper, we give a characterization of IA using the all I know operator that does deal with Samuelson’s concern. However, it uses LPSs. We then show how to modify the characterization using notions of “approximate belief” and “approximately all I know” so as to deal with Samuelson’s concern while still working with probability structures. |
Tasks | |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09106v1 |
https://arxiv.org/pdf/1907.09106v1.pdf | |
PWC | https://paperswithcode.com/paper/a-conceptually-well-founded-characterization |
Repo | |
Framework | |
Neural Linguistic Steganography
Title | Neural Linguistic Steganography |
Authors | Zachary M. Ziegler, Yuntian Deng, Alexander M. Rush |
Abstract | Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal. Language is a particularly pragmatic cover signal due to its benign occurrence and independence from any one medium. Traditionally, linguistic steganography systems encode secret messages in existing text via synonym substitution or word order rearrangements. Advances in neural language models enable previously impractical generation-based techniques. We propose a steganography technique based on arithmetic coding with large-scale neural language models. We find that our approach can generate realistic looking cover sentences as evaluated by humans, while at the same time preserving security by matching the cover message distribution with the language model distribution. |
Tasks | Language Modelling |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01496v1 |
https://arxiv.org/pdf/1909.01496v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-linguistic-steganography |
Repo | |
Framework | |
Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies
Title | Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies |
Authors | Thomy Phan, Kyrill Schmid, Lenz Belzner, Thomas Gabor, Sebastian Feld, Claudia Linnhoff-Popien |
Abstract | Decision making in multi-agent systems (MAS) is a great challenge due to enormous state and joint action spaces as well as uncertainty, making centralized control generally infeasible. Decentralized control offers better scalability and robustness but requires mechanisms to coordinate on joint tasks and to avoid conflicts. Common approaches to learn decentralized policies for cooperative MAS suffer from non-stationarity and lacking credit assignment, which can lead to unstable and uncoordinated behavior in complex environments. In this paper, we propose Strong Emergent Policy approximation (STEP), a scalable approach to learn strong decentralized policies for cooperative MAS with a distributed variant of policy iteration. For that, we use function approximation to learn from action recommendations of a decentralized multi-agent planning algorithm. STEP combines decentralized multi-agent planning with centralized learning, only requiring a generative model for distributed black box optimization. We experimentally evaluate STEP in two challenging and stochastic domains with large state and joint action spaces and show that STEP is able to learn stronger policies than standard multi-agent reinforcement learning algorithms, when combining multi-agent open-loop planning with centralized function approximation. The learned policies can be reintegrated into the multi-agent planning process to further improve performance. |
Tasks | Decision Making, Multi-agent Reinforcement Learning |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08761v1 |
http://arxiv.org/pdf/1901.08761v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-policy-iteration-for-scalable |
Repo | |
Framework | |
Data Selection for training Semantic Segmentation CNNs with cross-dataset weak supervision
Title | Data Selection for training Semantic Segmentation CNNs with cross-dataset weak supervision |
Authors | Panagiotis Meletis, Rob Romijnders, Gijs Dubbelman |
Abstract | Training convolutional networks for semantic segmentation with strong (per-pixel) and weak (per-bounding-box) supervision requires a large amount of weakly labeled data. We propose two methods for selecting the most relevant data with weak supervision. The first method is designed for finding visually similar images without the need of labels and is based on modeling image representations with a Gaussian Mixture Model (GMM). As a byproduct of GMM modeling, we present useful insights on characterizing the data generating distribution. The second method aims at finding images with high object diversity and requires only the bounding box labels. Both methods are developed in the context of automated driving and experimentation is conducted on Cityscapes and Open Images datasets. We demonstrate performance gains by reducing the amount of employed weakly labeled images up to 100 times for Open Images and up to 20 times for Cityscapes. |
Tasks | Semantic Segmentation |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.07023v1 |
https://arxiv.org/pdf/1907.07023v1.pdf | |
PWC | https://paperswithcode.com/paper/data-selection-for-training-semantic |
Repo | |
Framework | |
Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions
Title | Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions |
Authors | Mehdi Molkaraie |
Abstract | We prove that the marginal densities of a global probability mass function in a primal normal factor graph and the corresponding marginal densities in the dual normal factor graph are related via local mappings. The mapping depends on the Fourier transform of the local factors of the models. Details of the mapping, including its fixed points, are derived for the Ising model, and then extended to the Potts model. By employing the mapping, we can transform simultaneously all the estimated marginal densities from one domain to the other, which is advantageous if estimating the marginals can be carried out more efficiently in the dual domain. An example of particular significance is the ferromagnetic Ising model in a positive external field, for which there is a rapidly mixing Markov chain (called the subgraphs-world process) to generate configurations in the dual normal factor graph of the model. Our numerical experiments illustrate that the proposed procedure can provide more accurate estimates of marginal densities in various settings. |
Tasks | |
Published | 2019-01-07 |
URL | https://arxiv.org/abs/1901.02733v2 |
https://arxiv.org/pdf/1901.02733v2.pdf | |
PWC | https://paperswithcode.com/paper/marginal-densities-factor-graph-duality-and |
Repo | |
Framework | |
AI Assisted Annotator using Reinforcement Learning
Title | AI Assisted Annotator using Reinforcement Learning |
Authors | V. Ratna Saripalli, Gopal Avinash, Charles W. Anderson |
Abstract | Healthcare data suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated in healthcare. Unlike other data sets, medical data annotation, which is critical to accurate ground truth, requires medical domain expertise for a better patient outcome. In this work, we report on the use of reinforcement learning to mimic the decision making process of annotators for medical events, to automate annotation and labelling. The reinforcement agent learns to annotate alarm data based on annotations done by an expert. Our method shows promising results on medical alarm data sets. We trained DQN and A2C agents using the data from monitoring devices annotated by an expert. Initial results from these RL agents learning the expert annotation behavior are promising. The A2C agent performs better in terms of learning the sparse events in a given state, thereby choosing more right actions compared to DQN agent. To the best of our knowledge, this is the first reinforcement learning application for the automation of medical events annotation, which has far-reaching practical use. |
Tasks | Decision Making |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.02052v2 |
https://arxiv.org/pdf/1910.02052v2.pdf | |
PWC | https://paperswithcode.com/paper/ai-assisted-annotator-using-reinforcement |
Repo | |
Framework | |
Inverse Reinforcement Learning with Multiple Ranked Experts
Title | Inverse Reinforcement Learning with Multiple Ranked Experts |
Authors | Pablo Samuel Castro, Shijian Li, Daqing Zhang |
Abstract | We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance. We assume the demonstrators are classified into one of k ranks, and use ideas from ordinal regression to find a reward function that maximizes the margin between the different ranks. This approach is based on the idea that agents should not only learn how to behave from experts, but also how not to behave from non-experts. We show there are MDPs where important differences in the reward function would be hidden from existing algorithms by the behaviour of the expert. Our method is particularly useful for problems where we have access to a large set of agent behaviours with varying degrees of expertise (such as through GPS or cellphones). We highlight the differences between our approach and existing methods using a simple grid domain and demonstrate its efficacy on determining passenger-finding strategies for taxi drivers, using a large dataset of GPS trajectories. |
Tasks | |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1907.13411v1 |
https://arxiv.org/pdf/1907.13411v1.pdf | |
PWC | https://paperswithcode.com/paper/inverse-reinforcement-learning-with-multiple |
Repo | |
Framework | |
Adaptive Transform Domain Image Super-resolution Via Orthogonally Regularized Deep Networks
Title | Adaptive Transform Domain Image Super-resolution Via Orthogonally Regularized Deep Networks |
Authors | Tiantong Guo, Hojjat S. Mousavi, Vishal Monga |
Abstract | Deep learning methods, in particular, trained Convolutional Neural Networks (CNN) have recently been shown to produce compelling results for single image Super-Resolution (SR). Invariably, a CNN is learned to map the Low Resolution (LR) image to its corresponding High Resolution (HR) version in the spatial domain. We propose a novel network structure for learning the SR mapping function in an image transform domain, specifically the Discrete Cosine Transform (DCT). As the first contribution, we show that DCT can be integrated into the network structure as a Convolutional DCT (CDCT) layer. With the CDCT layer, we construct the DCT Deep SR (DCT-DSR) network. We further extend the DCT-DSR to allow the CDCT layer to become trainable (i.e., optimizable). Because this layer represents an image transform, we enforce pairwise orthogonality constraints and newly formulated complexity order constraints on the individual basis functions/filters. This Orthogonally Regularized Deep SR network (ORDSR) simplifies the SR task by taking advantage of image transform domain while adapting the design of transform basis to the training image set. Experimental results show ORDSR achieves state-of-the-art SR image quality with fewer parameters than most of the deep CNN methods. A particular success of ORDSR is in overcoming the artifacts introduced by bicubic interpolation. A key burden of deep SR has been identified as the requirement of generous training LR and HR image pairs; ORSDR exhibits a much more graceful degradation as training size is reduced with significant benefits in the regime of limited training. Analysis of memory and computation requirements confirms that ORDSR can allow for a more efficient network with faster inference. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.10082v1 |
http://arxiv.org/pdf/1904.10082v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-transform-domain-image-super |
Repo | |
Framework | |
Gumbel-softmax Optimization: A Simple General Framework for Combinatorial Optimization Problems on Graphs
Title | Gumbel-softmax Optimization: A Simple General Framework for Combinatorial Optimization Problems on Graphs |
Authors | Jing Liu, Fei Gao, Jiang Zhang |
Abstract | Many problems in real life can be converted to combinatorial optimization problems (COPs) on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some constraints. However, these problems are notorious for their hardness to solve because most of them are NP-hard or NP-complete. Although traditional general methods such as simulated annealing (SA), genetic algorithms (GA) and so forth have been devised to these hard problems, their accuracy and time consumption are not satisfying in practice. In this work, we proposed a simple, fast, and general algorithm framework called Gumbel-softmax Optimization (GSO) for COPs. By introducing Gumbel-softmax technique which is developed in machine learning community, we can optimize the objective function directly by gradient descent algorithm regardless of the discrete nature of variables. We test our algorithm on four different problems including Sherrington-Kirkpatrick (SK) model, maximum independent set (MIS) problem, modularity optimization, and structural optimization problem. High-quality solutions can be obtained with much less time consuming compared to traditional approaches. |
Tasks | Combinatorial Optimization |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07018v1 |
https://arxiv.org/pdf/1909.07018v1.pdf | |
PWC | https://paperswithcode.com/paper/gumbel-softmax-optimization-a-simple-general |
Repo | |
Framework | |
Penalizing small errors using an Adaptive Logarithmic Loss
Title | Penalizing small errors using an Adaptive Logarithmic Loss |
Authors | Chaitanya Kaul, Nick Pears, Suresh Manandhar |
Abstract | Loss functions are error metrics that quantify the difference between a prediction and its corresponding ground truth. Fundamentally, they define a functional landscape for traversal by gradient descent. Although numerous loss functions have been proposed to date in order to handle various machine learning problems, little attention has been given to enhancing these functions to better traverse the loss landscape. In this paper, we simultaneously and significantly mitigate two prominent problems in medical image segmentation namely: i) class imbalance between foreground and background pixels and ii) poor loss function convergence. To this end, we propose an adaptive logarithmic loss function. We compare this loss function with the existing state-of-the-art on the ISIC 2018 dataset, the nuclei segmentation dataset as well as the DRIVE retinal vessel segmentation dataset. We measure the performance of our methodology on benchmark metrics and demonstrate state-of-the-art performance. More generally, we show that our system can be used as a framework for better training of deep neural networks. |
Tasks | Medical Image Segmentation, Retinal Vessel Segmentation, Semantic Segmentation |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09717v1 |
https://arxiv.org/pdf/1910.09717v1.pdf | |
PWC | https://paperswithcode.com/paper/penalizing-small-errors-using-an-adaptive |
Repo | |
Framework | |
Intracranial Hemorrhage Segmentation Using Deep Convolutional Model
Title | Intracranial Hemorrhage Segmentation Using Deep Convolutional Model |
Authors | Murtadha D. Hssayeni, M. S., Muayad S. Croock, Ph. D., Aymen Al-Ani, Ph. D., Hassan Falah Al-khafaji, M. D., Zakaria A. Yahya, M. D., Behnaz Ghoraani, Ph. D |
Abstract | Traumatic brain injuries could cause intracranial hemorrhage (ICH). ICH could lead to disability or death if it is not accurately diagnosed and treated in a time-sensitive procedure. The current clinical protocol to diagnose ICH is examining Computerized Tomography (CT) scans by radiologists to detect ICH and localize its regions. However, this process relies heavily on the availability of an experienced radiologist. In this paper, we designed a study protocol to collect a dataset of 82 CT scans of subjects with traumatic brain injury. Later, the ICH regions were manually delineated in each slice by a consensus decision of two radiologists. Recently, fully convolutional networks (FCN) have shown to be successful in medical image segmentation. We developed a deep FCN, called U-Net, to segment the ICH regions from the CT scans in a fully automated manner. The method achieved a Dice coefficient of 0.31 for the ICH segmentation based on 5-fold cross-validation. The dataset is publicly available online at PhysioNet repository for future analysis and comparison. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08643v2 |
https://arxiv.org/pdf/1910.08643v2.pdf | |
PWC | https://paperswithcode.com/paper/intracranial-hemorrhage-segmentation-using |
Repo | |
Framework | |