January 28, 2020

2839 words 14 mins read

Paper Group ANR 886

Batch Face Alignment using a Low-rank GAN. A Survey of Crowdsourcing in Medical Image Analysis. An evolutionary model that satisfies detailed balance. How to define co-occurrence in different domains of study?. 3D Deep Affine-Invariant Shape Learning for Brain MR Image Segmentation. Efficient Object Annotation via Speaking and Pointing. Learning Mu …

Batch Face Alignment using a Low-rank GAN


Title	Batch Face Alignment using a Low-rank GAN
Authors	Jiabo Huang, Xiaohua Xie, Wei-Shi Zheng
Abstract	This paper studies the problem of aligning a set of face images of the same individual into a normalized image while removing the outliers like partial occlusion, extreme facial expression as well as significant illumination variation. Our model seeks an optimal image domain transformation such that the matrix of misaligned images can be decomposed as the sum of a sparse matrix of noise and a rank-one matrix of aligned images. The image transformation is learned in an unsupervised manner, which means that ground-truth aligned images are unnecessary for our model. Specifically, we make use of the remarkable non-linear transforming ability of generative adversarial network(GAN) and guide it with low-rank generation as well as sparse noise constraint to achieve the face alignment. We verify the efficacy of the proposed model with extensive experiments on real-world face databases, demonstrating higher accuracy and efficiency than existing methods.
Tasks	Face Alignment
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09244v1
PDF	https://arxiv.org/pdf/1910.09244v1.pdf
PWC	https://paperswithcode.com/paper/batch-face-alignment-using-a-low-rank-gan
Repo
Framework

A Survey of Crowdsourcing in Medical Image Analysis


Title	A Survey of Crowdsourcing in Medical Image Analysis
Authors	Silas Ørting, Andrew Doyle, Arno van Hilten, Matthias Hirth, Oana Inel, Christopher R. Madan, Panagiotis Mavridis, Helen Spiers, Veronika Cheplygina
Abstract	Rapid advances in image processing capabilities have been seen across many domains, fostered by the application of machine learning algorithms to “big-data”. However, within the realm of medical image analysis, advances have been curtailed, in part, due to the limited availability of large-scale, well-annotated datasets. One of the main reasons for this is the high cost often associated with producing large amounts of high-quality meta-data. Recently, there has been growing interest in the application of crowdsourcing for this purpose; a technique that has proven effective for creating large-scale datasets across a range of disciplines, from computer vision to astrophysics. Despite the growing popularity of this approach, there has not yet been a comprehensive literature review to provide guidance to researchers considering using crowdsourcing methodologies in their own medical imaging analysis. In this survey, we review studies applying crowdsourcing to the analysis of medical images, published prior to July 2018. We identify common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach. Finally, we discuss future opportunities for development within this emerging domain.
Tasks
Published	2019-02-25
URL	https://arxiv.org/abs/1902.09159v2
PDF	https://arxiv.org/pdf/1902.09159v2.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-crowdsourcing-in-medical-image
Repo
Framework

An evolutionary model that satisfies detailed balance


Title	An evolutionary model that satisfies detailed balance
Authors	Jüri Lember, Chris Watkins
Abstract	We propose a class of evolutionary models that involves an arbitrary exchangeable process as the breeding process and different selection schemes. In those models, a new genome is born according to the breeding process, and then a genome is removed according to the selection scheme that involves fitness. Thus the population size remains constant. The process evolves according to a Markov chain, and, unlike in many other existing models, the stationary distribution – so called mutation-selection equilibrium – can be easily found and studied. The behaviour of the stationary distribution when the population size increases is our main object of interest. Several phase-transition theorems are proved.
Tasks
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10834v1
PDF	http://arxiv.org/pdf/1902.10834v1.pdf
PWC	https://paperswithcode.com/paper/an-evolutionary-model-that-satisfies-detailed
Repo
Framework

How to define co-occurrence in different domains of study?


Title	How to define co-occurrence in different domains of study?
Authors	Mathieu Roche
Abstract	This position paper presents a comparative study of co-occurrences. Some similarities and differences in the definition exist depending on the research domain (e.g. linguistics, NLP, computer science). This paper discusses these points, and deals with the methodological aspects in order to identify co-occurrences in a multidisciplinary paradigm.
Tasks
Published	2019-04-16
URL	http://arxiv.org/abs/1904.08010v1
PDF	http://arxiv.org/pdf/1904.08010v1.pdf
PWC	https://paperswithcode.com/paper/how-to-define-co-occurrence-in-different
Repo
Framework

3D Deep Affine-Invariant Shape Learning for Brain MR Image Segmentation


Title	3D Deep Affine-Invariant Shape Learning for Brain MR Image Segmentation
Authors	Zhou He, Siqi Bao, Albert Chung
Abstract	Recent advancements in medical image segmentation techniques have achieved compelling results. However, most of the widely used approaches do not take into account any prior knowledge about the shape of the biomedical structures being segmented. More recently, some works have presented approaches to incorporate shape information. However, many of them are indeed introducing more parameters to the segmentation network to learn the general features, which any segmentation network is able learn, instead of specifically shape features. In this paper, we present a novel approach that seamlessly integrates the shape information into the segmentation network. Experiments on human brain MRI segmentation demonstrate that our approach can achieve a lower Hausdorff distance and higher Dice coefficient than the state-of-the-art approaches.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06629v2
PDF	https://arxiv.org/pdf/1909.06629v2.pdf
PWC	https://paperswithcode.com/paper/3d-deep-affine-invariant-shape-learning-for
Repo
Framework

Efficient Object Annotation via Speaking and Pointing


Title	Efficient Object Annotation via Speaking and Pointing
Authors	Michael Gygli, Vittorio Ferrari
Abstract	Deep neural networks deliver state-of-the-art visual recognition, but they rely on large datasets, which are time-consuming to annotate. These datasets are typically annotated in two stages: (1) determining the presence of object classes at the image level and (2) marking the spatial extent for all objects of these classes. In this work we use speech, together with mouse inputs, to speed up this process. We first improve stage one, by letting annotators indicate object class presence via speech. We then combine the two stages: annotators draw an object bounding box via the mouse and simultaneously provide its class label via speech. Using speech has distinct advantages over relying on mouse inputs alone. First, it is fast and allows for direct access to the class name, by simply saying it. Second, annotators can simultaneously speak and mark an object location. Finally, speech-based interfaces can be kept extremely simple, hence using them requires less mouse movement compared to existing approaches. Through extensive experiments on the COCO and ILSVRC datasets we show that our approach yields high-quality annotations at significant speed gains. Stage one takes 2.3x - 14.9x less annotation time than existing methods based on a hierarchical organization of the classes to be annotated. Moreover, when combining the two stages, we find that object class labels come for free: annotating them at the same time as bounding boxes has zero additional cost. On COCO, this makes the overall process 1.9x faster than the two-stage approach.
Tasks
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10576v4
PDF	https://arxiv.org/pdf/1905.10576v4.pdf
PWC	https://paperswithcode.com/paper/efficient-object-annotation-via-speaking-and
Repo
Framework

Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing


Title	Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing
Authors	Hangyu Mao, Zhibo Gong, Zhengchao Zhang, Zhen Xiao, Yan Ni
Abstract	Communication is an important factor for the big multi-agent world to stay organized and productive. Recently, the AI community has applied the Deep Reinforcement Learning (DRL) to learn the communication strategy and the control policy for multiple agents. However, when implementing the communication for real-world multi-agent applications, there is a more practical limited-bandwidth restriction, which has been largely ignored by the existing DRL-based methods. Specifically, agents trained by most previous methods keep sending messages incessantly in every control cycle; due to emitting too many messages, these methods are unsuitable to be applied to the real-world systems that have a limited bandwidth to transmit the messages. To handle this problem, we propose a gating mechanism to adaptively prune unprofitable messages. Results show that the gating mechanism can prune more than 80% messages with little damage to the performance. Moreover, our method outperforms several state-of-the-art DRL-based and rule-based methods by a large margin in both the real-world packet routing tasks and four benchmark tasks.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1903.05561v1
PDF	http://arxiv.org/pdf/1903.05561v1.pdf
PWC	https://paperswithcode.com/paper/learning-multi-agent-communication-under
Repo
Framework

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model


Title	From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model
Authors	Aadirupa Saha, Aditya Gopalan
Abstract	We consider PAC-learning a good item from $k$-subsetwise feedback information sampled from a Plackett-Luce probability model, with instance-dependent sample complexity performance. In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl, parameters for the top-$k$ items. The algorithm is based on a wrapper around a PAC winner-finding algorithm with weaker performance guarantees to adapt to the hardness of the input instance. The sample complexity is also shown to be multiplicatively better depending on the length of rank-ordered feedback available in each subset-wise play. We show optimality of our algorithms with matching sample complexity lower bounds. We next address the winner-finding problem in Plackett-Luce models in the fixed-budget setting with instance dependent upper and lower bounds on the misidentification probability, of $\Omega\left(\exp(-2 \tilde \Delta Q) \right)$ for a given budget $Q$, where $\tilde \Delta$ is an explicit instance-dependent problem complexity parameter. Numerical performance results are also reported.
Tasks
Published	2019-03-01
URL	https://arxiv.org/abs/1903.00558v2
PDF	https://arxiv.org/pdf/1903.00558v2.pdf
PWC	https://paperswithcode.com/paper/from-pac-to-instance-optimal-sample
Repo
Framework

Document-Level $N$-ary Relation Extraction with Multiscale Representation Learning


Title	Document-Level $N$-ary Relation Extraction with Multiscale Representation Learning
Authors	Robin Jia, Cliff Wong, Hoifung Poon
Abstract	Most information extraction methods focus on binary relations expressed within single sentences. In high-value domains, however, $n$-ary relations are of great demand (e.g., drug-gene-mutation interactions in precision oncology). Such relations often involve entity mentions that are far apart in the document, yet existing work on cross-sentence relation extraction is generally confined to small text spans (e.g., three consecutive sentences), which severely limits recall. In this paper, we propose a novel multiscale neural architecture for document-level $n$-ary relation extraction. Our system combines representations learned over various text spans throughout the document and across the subrelation hierarchy. Widening the system’s purview to the entire document maximizes potential recall. Moreover, by integrating weak signals across the document, multiscale modeling increases precision, even in the presence of noisy labels from distant supervision. Experiments on biomedical machine reading show that our approach substantially outperforms previous $n$-ary relation extraction methods.
Tasks	Reading Comprehension, Relation Extraction, Representation Learning
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02347v3
PDF	https://arxiv.org/pdf/1904.02347v3.pdf
PWC	https://paperswithcode.com/paper/document-level-n-ary-relation-extraction-with
Repo
Framework

Personalizing ASR for Dysarthric and Accented Speech with Limited Data


Title	Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Authors	Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias
Abstract	Automatic speech recognition (ASR) systems have dramatically improved over the last few years. ASR systems are most often trained from ‘typical’ speech, which means that underrepresented groups don’t experience the same level of improvement. In this paper, we present and evaluate finetuning techniques to improve ASR for users with non-standard speech. We focus on two types of non-standard speech: speech from people with amyotrophic lateral sclerosis (ALS) and accented speech. We train personalized models that achieve 62% and 35% relative WER improvement on these two groups, bringing the absolute WER for ALS speakers, on a test set of message bank phrases, down to 10% for mild dysarthria and 20% for more serious dysarthria. We show that 71% of the improvement comes from only 5 minutes of training data. Finetuning a particular subset of layers (with many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state of the art ASR models for dysarthric speech.
Tasks	Speech Recognition
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13511v1
PDF	https://arxiv.org/pdf/1907.13511v1.pdf
PWC	https://paperswithcode.com/paper/personalizing-asr-for-dysarthric-and-accented
Repo
Framework

Using Word Embeddings for Visual Data Exploration with Ontodia and Wikidata


Title	Using Word Embeddings for Visual Data Exploration with Ontodia and Wikidata
Authors	Gerhard Wohlgenannt, Nikolay Klimov, Dmitry Mouromtsev, Daniil Razdyakonov, Dmitry Pavlov, Yury Emelyanov
Abstract	One of the big challenges in Linked Data consumption is to create visual and natural language interfaces to the data usable for non-technical users. Ontodia provides support for diagrammatic data exploration, showcased in this publication in combination with the Wikidata dataset. We present improvements to the natural language interface regarding exploring and querying Linked Data entities. The method uses models of distributional semantics to find and rank entity properties related to user input in Ontodia. Various word embedding types and model settings are evaluated, and the results show that user experience in visual data exploration benefits from the proposed approach.
Tasks	Word Embeddings
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01275v1
PDF	http://arxiv.org/pdf/1903.01275v1.pdf
PWC	https://paperswithcode.com/paper/using-word-embeddings-for-visual-data
Repo
Framework

Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series


Title	Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series
Authors	Tan Zhi-Xuan, Harold Soh, Desmond C. Ong
Abstract	Integrating deep learning with latent state space models has the potential to yield temporal models that are powerful, yet tractable and interpretable. Unfortunately, current models are not designed to handle missing data or multiple data modalities, which are both prevalent in real-world data. In this work, we introduce a factorized inference method for Multimodal Deep Markov Models (MDMMs), allowing us to filter and smooth in the presence of missing data, while also performing uncertainty-aware multimodal fusion. We derive this method by factorizing the posterior p(zx) for non-linear state space models, and develop a variational backward-forward algorithm for inference. Because our method handles incompleteness over both time and modalities, it is capable of interpolation, extrapolation, conditional generation, label prediction, and weakly supervised learning of multimodal time series. We demonstrate these capabilities on both synthetic and real-world multimodal data under high levels of data deletion. Our method performs well even with more than 50% missing data, and outperforms existing deep approaches to inference in latent time series.
Tasks	Time Series
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13570v3
PDF	https://arxiv.org/pdf/1905.13570v3.pdf
PWC	https://paperswithcode.com/paper/factorized-inference-in-deep-markov-models
Repo
Framework

Tucker Decomposition Network: Expressive Power and Comparison


Title	Tucker Decomposition Network: Expressive Power and Comparison
Authors	Ye Liu, Junjun Pan, Michael Ng
Abstract	Deep neural networks have achieved a great success in solving many machine learning and computer vision problems. The main contribution of this paper is to develop a deep network based on Tucker tensor decomposition, and analyze its expressive power. It is shown that the expressiveness of Tucker network is more powerful than that of shallow network. In general, it is required to use an exponential number of nodes in a shallow network in order to represent a Tucker network. Experimental results are also given to compare the performance of the proposed Tucker network with hierarchical tensor network and shallow network, and demonstrate the usefulness of Tucker network in image classification problems.
Tasks	Image Classification
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09635v1
PDF	https://arxiv.org/pdf/1905.09635v1.pdf
PWC	https://paperswithcode.com/paper/tucker-decomposition-network-expressive-power
Repo
Framework

An Enhanced Ad Event-Prediction Method Based on Feature Engineering


Title	An Enhanced Ad Event-Prediction Method Based on Feature Engineering
Authors	Saeid Soheily Khah, Yiming Wu
Abstract	In digital advertising, Click-Through Rate (CTR) and Conversion Rate (CVR) are very important metrics for evaluating ad performance. As a result, ad event prediction systems are vital and widely used for sponsored search and display advertising as well as Real-Time Bidding (RTB). In this work, we introduce an enhanced method for ad event prediction (i.e. clicks, conversions) by proposing a new efficient feature engineering approach. A large real-world event-based dataset of a running marketing campaign is used to evaluate the efficiency of the proposed prediction algorithm. The results illustrate the benefits of the proposed ad event prediction approach, which significantly outperforms the alternative ones.
Tasks	Feature Engineering
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01959v1
PDF	https://arxiv.org/pdf/1907.01959v1.pdf
PWC	https://paperswithcode.com/paper/an-enhanced-ad-event-prediction-method-based
Repo
Framework

Resource Optimized Neural Architecture Search for 3D Medical Image Segmentation


Title	Resource Optimized Neural Architecture Search for 3D Medical Image Segmentation
Authors	Woong Bae, Seungho Lee, Yeha Lee, Beomhee Park, Minki Chung, Kyu-Hwan Jung
Abstract	Neural Architecture Search (NAS), a framework which automates the task of designing neural networks, has recently been actively studied in the field of deep learning. However, there are only a few NAS methods suitable for 3D medical image segmentation. Medical 3D images are generally very large; thus it is difficult to apply previous NAS methods due to their GPU computational burden and long training time. We propose the resource-optimized neural architecture search method which can be applied to 3D medical segmentation tasks in a short training time (1.39 days for 1GB dataset) using a small amount of computation power (one RTX 2080Ti, 10.8GB GPU memory). Excellent performance can also be achieved without retraining(fine-tuning) which is essential in most NAS methods. These advantages can be achieved by using a reinforcement learning-based controller with parameter sharing and focusing on the optimal search space configuration of macro search rather than micro search. Our experiments demonstrate that the proposed NAS method outperforms manually designed networks with state-of-the-art performance in 3D medical image segmentation.
Tasks	Medical Image Segmentation, Neural Architecture Search, Semantic Segmentation
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00548v1
PDF	https://arxiv.org/pdf/1909.00548v1.pdf
PWC	https://paperswithcode.com/paper/resource-optimized-neural-architecture-search
Repo
Framework