January 29, 2020

3302 words 16 mins read

Paper Group ANR 633

Zero Shot Learning for Multi-Modal Real Time Image Registration. Evaluation of Multidisciplinary Effects of Artificial Intelligence with Optimization Perspective. DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning. Geometry of Deep Generative Models for Disentangled Representations. Towards Computing Infere …


Title	Zero Shot Learning for Multi-Modal Real Time Image Registration
Authors	Avinash Kori, Ganapathi Krishnamurthi
Abstract	In this report we present an unsupervised image registration framework, using a pre-trained deep neural network as a feature extractor. We refer this to zero-shot learning, due to nonoverlap between training and testing dataset (none of the network modules in the processing pipeline were trained specifically for the task of medical image registration). Highlights of our technique are: (a) No requirement of a training dataset (b) Keypoints i.e.locations of important features are automatically estimated (c) The number of key points in this model is fixed and can possibly be tuned as a hyperparameter. (d) Uncertaintycalculation of the proposed, transformation estimates (e) Real-time registration of images. Our technique was evaluated on BraTS, ALBERT, and collaborative hospital Brain MRI data. Results suggest that the method proved to be robust for affine transformation models and the results are practically instantaneous, irrespective of the size of the input image
Tasks	Image Registration, Medical Image Registration, Zero-Shot Learning
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06213v1
PDF	https://arxiv.org/pdf/1908.06213v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-learning-for-multi-modal-real-time
Repo
Framework

Evaluation of Multidisciplinary Effects of Artificial Intelligence with Optimization Perspective


Title	Evaluation of Multidisciplinary Effects of Artificial Intelligence with Optimization Perspective
Authors	M. H. Calp
Abstract	Artificial Intelligence has an important place in the scientific community as a result of its successful outputs in terms of different fields. In time, the field of Artificial Intelligence has been divided into many sub-fields because of increasing number of different solution approaches, methods, and techniques. Machine Learning has the most remarkable role with its functions to learn from samples from the environment. On the other hand, intelligent optimization done by inspiring from nature and swarms had its own unique scientific literature, with effective solutions provided for optimization problems from different fields. Because intelligent optimization can be applied in different fields effectively, this study aims to provide a general discussion on multidisciplinary effects of Artificial Intelligence by considering its optimization oriented solutions. The study briefly focuses on background of the intelligent optimization briefly and then gives application examples of intelligent optimization from a multidisciplinary perspective.
Tasks
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01362v1
PDF	http://arxiv.org/pdf/1902.01362v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-multidisciplinary-effects-of
Repo
Framework

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning


Title	DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning
Authors	Mohammadhosein Hasanbeig, Natasha Yogananda Jeppu, Alessandro Abate, Tom Melham, Daniel Kroening
Abstract	We propose a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires the attainment of an unknown sequence of high-level objectives. Our method employs a recently-published algorithm for synthesis of compact automata to uncover this sequential structure. We synthesise an automaton from trace data generated through exploration of the environment by the deep RL agent. A product construction is then used to enrich the state space of the environment so that generation of an optimal control policy by deep RL is guided by the discovered structure encoded in the automaton. Our experiments show that our method is able to achieve training results that are otherwise difficult with state-of-the-art RL techniques unaided by external guidance.
Tasks	Program Synthesis
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10244v3
PDF	https://arxiv.org/pdf/1911.10244v3.pdf
PWC	https://paperswithcode.com/paper/deepsynth-program-synthesis-for-automatic
Repo
Framework

Geometry of Deep Generative Models for Disentangled Representations


Title	Geometry of Deep Generative Models for Disentangled Representations
Authors	Ankita Shukla, Shagun Uppal, Sarthak Bhagat, Saket Anand, Pavan Turaga
Abstract	Deep generative models like variational autoencoders approximate the intrinsic geometry of high dimensional data manifolds by learning low-dimensional latent-space variables and an embedding function. The geometric properties of these latent spaces has been studied under the lens of Riemannian geometry; via analysis of the non-linearity of the generator function. In new developments, deep generative models have been used for learning semantically meaningful `disentangled’ representations; that capture task relevant attributes while being invariant to other attributes. In this work, we explore the geometry of popular generative models for disentangled representation learning. We use several metrics to compare the properties of latent spaces of disentangled representation models in terms of class separability and curvature of the latent-space. The results we obtain establish that the class distinguishable features in the disentangled latent space exhibits higher curvature as opposed to a variational autoencoder. We evaluate and compare the geometry of three such models with variational autoencoder on two different datasets. Further, our results show that distances and interpolation in the latent space are significantly improved with Riemannian metrics derived from the curvature of the space. We expect these results will have implications on understanding how deep-networks can be made more robust, generalizable, as well as interpretable. \|
Tasks	Representation Learning
Published	2019-02-19
URL	http://arxiv.org/abs/1902.06964v1
PDF	http://arxiv.org/pdf/1902.06964v1.pdf
PWC	https://paperswithcode.com/paper/geometry-of-deep-generative-models-for
Repo
Framework

Towards Computing Inferences from English News Headlines


Title	Towards Computing Inferences from English News Headlines
Authors	Elizabeth Jasmi George, Radhika Mamidi
Abstract	Newspapers are a popular form of written discourse, read by many people, thanks to the novelty of the information provided by the news content in it. A headline is the most widely read part of any newspaper due to its appearance in a bigger font and sometimes in colour print. In this paper, we suggest and implement a method for computing inferences from English news headlines, excluding the information from the context in which the headlines appear. This method attempts to generate the possible assumptions a reader formulates in mind upon reading a fresh headline. The generated inferences could be useful for assessing the impact of the news headline on readers including children. The understandability of the current state of social affairs depends greatly on the assimilation of the headlines. As the inferences that are independent of the context depend mainly on the syntax of the headline, dependency trees of headlines are used in this approach, to find the syntactical structure of the headlines and to compute inferences out of them.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08294v1
PDF	https://arxiv.org/pdf/1910.08294v1.pdf
PWC	https://paperswithcode.com/paper/towards-computing-inferences-from-english
Repo
Framework

Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning Test


Title	Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning Test
Authors	Lu Yihe, Scott C. Lowe, Penelope A. Lewis, Mark C. W. van Rossum
Abstract	Despite remarkable advances in automated visual recognition by machines, some visual tasks remain challenging for machines. Fleuret et al. (2011) introduced the Synthetic Visual Reasoning Test (SVRT) to highlight this point, which required classification of images consisting of randomly generated shapes based on hidden abstract rules using only a few examples. Ellis et al. (2015) demonstrated that a program synthesis approach could solve some of the SVRT problems with unsupervised, few-shot learning, whereas they remained challenging for several convolutional neural networks trained with thousands of examples. Here we re-considered the human and machine experiments, because they followed different protocols and yielded different statistics. We thus proposed a quantitative reintepretation of the data between the protocols, so that we could make fair comparison between human and machine performance. We improved the program synthesis classifier by correcting the image parsings, and compared the results to the performance of other machine agents and human subjects. We grouped the SVRT problems into different types by the two aspects of the core characteristics for classification: shape specification and location relation. We found that the program synthesis classifier could not solve problems involving shape distances, because it relied on symbolic computation which scales poorly with input dimension and adding distances into such computation would increase the dimension combinatorially with the number of shapes in an image. Therefore, although the program synthesis classifier is capable of abstract reasoning, its performance is highly constrained by the accessible information in image parsings.
Tasks	Few-Shot Learning, Program Synthesis, Visual Reasoning
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07721v2
PDF	https://arxiv.org/pdf/1911.07721v2.pdf
PWC	https://paperswithcode.com/paper/program-synthesis-performance-constrained-by
Repo
Framework

Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections


Title	Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections
Authors	Junxian He, Zhisong Zhang, Taylor Berg-Kiripatrick, Graham Neubig
Abstract	Cross-lingual transfer is an effective way to build syntactic analysis tools in low-resource languages. However, transfer is difficult when transferring to typologically distant languages, especially when neither annotated target data nor parallel corpora are available. In this paper, we focus on methods for cross-lingual transfer to distant languages and propose to learn a generative model with a structured prior that utilizes labeled source data and unlabeled target data jointly. The parameters of source model and target model are softly shared through a regularized log likelihood objective. An invertible projection is employed to learn a new interlingual latent embedding space that compensates for imperfect cross-lingual word embedding input. We evaluate our method on two syntactic tasks: part-of-speech (POS) tagging and dependency parsing. On the Universal Dependency Treebanks, we use English as the only source corpus and transfer to a wide range of target languages. On the 10 languages in this dataset that are distant from English, our method yields an average of 5.2% absolute improvement on POS tagging and 8.3% absolute improvement on dependency parsing over a direct transfer method using state-of-the-art discriminative models.
Tasks	Cross-Lingual Transfer, Dependency Parsing, Part-Of-Speech Tagging
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02656v3
PDF	https://arxiv.org/pdf/1906.02656v3.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-syntactic-transfer-through
Repo
Framework

A Batched Multi-Armed Bandit Approach to News Headline Testing


Title	A Batched Multi-Armed Bandit Approach to News Headline Testing
Authors	Yizhi Mao, Miao Chen, Abhinav Wagle, Junwei Pan, Michael Natkovich, Don Matheson
Abstract	Optimizing news headlines is important for publishers and media sites. A compelling headline will increase readership, user engagement and social shares. At Yahoo Front Page, headline testing is carried out using a test-rollout strategy: we first allocate equal proportion of the traffic to each headline variation for a defined testing period, and then shift all future traffic to the best-performing variation. In this paper, we introduce a multi-armed bandit (MAB) approach with batched Thompson Sampling (bTS) to dynamically test headlines for news articles. This method is able to gradually allocate traffic towards optimal headlines while testing. We evaluate the bTS method based on empirical impressions/clicks data and simulated user responses. The result shows that the bTS method is robust, converges accurately and quickly to the optimal headline, and outperforms the test-rollout strategy by 3.69% in terms of clicks.
Tasks
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06256v2
PDF	https://arxiv.org/pdf/1908.06256v2.pdf
PWC	https://paperswithcode.com/paper/a-batched-multi-armed-bandit-approach-to-news
Repo
Framework

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval


Title	LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Authors	Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer
Abstract	The goal of weakly-supervised video moment retrieval is to localize the video segment most relevant to the given natural language query without access to temporal annotations during training. Prior strongly- and weakly-supervised approaches often leverage co-attention mechanisms to learn visual-semantic representations for localization. However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query. Consequently, the above-mentioned visual-semantic representations, built upon local frame features, do not contain much contextual information. To address this limitation, we propose a Latent Graph Co-Attention Network (LoGAN) that exploits fine-grained frame-by-word interactions to reason about correspondences between all possible pairs of frames, given the semantic context of the query. Comprehensive experiments across two datasets, DiDeMo and Charades-Sta, demonstrate the effectiveness of our proposed latent co-attention model where it outperforms current state-of-the-art (SOTA) weakly-supervised approaches by a significant margin. Notably, it even achieves a 11% improvement to Recall@1 accuracy over strongly-supervised SOTA methods on DiDeMo.
Tasks
Published	2019-09-27
URL	https://arxiv.org/abs/1909.13784v2
PDF	https://arxiv.org/pdf/1909.13784v2.pdf
PWC	https://paperswithcode.com/paper/wman-weakly-supervised-moment-alignment-1
Repo
Framework

TriDepth: Triangular Patch-based Deep Depth Prediction


Title	TriDepth: Triangular Patch-based Deep Depth Prediction
Authors	Masaya Kaneko, Ken Sakurada, Kiyoharu Aizawa
Abstract	We propose a novel and efficient representation for single-view depth estimation using Convolutional Neural Networks (CNNs). Point-cloud is generally used for CNN-based 3D scene reconstruction; however it has some drawbacks: (1) it is redundant as a representation for planar surfaces, and (2) no spatial relationships between points are available (e.g, texture and surface). As a more efficient representation, we introduce a triangular-patch-cloud, which represents the surface of the 3D structure using a set of triangular patches, and propose a CNN framework for its 3D structure estimation. In our framework, we create it by separating all the faces in a 2D mesh, which are determined adaptively from the input image, and estimate depths and normals of all the faces. Using a common RGBD-dataset, we show that our representation has a better or comparable performance than the existing point-cloud-based methods, although it has much less parameters.
Tasks	3D Scene Reconstruction, Depth Estimation
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01312v2
PDF	https://arxiv.org/pdf/1905.01312v2.pdf
PWC	https://paperswithcode.com/paper/meshdepth-disconnected-mesh-based-deep-depth
Repo
Framework

MOTH- Mobility-induced Outages in THz: A Beyond 5G (B5G) application


Title	MOTH- Mobility-induced Outages in THz: A Beyond 5G (B5G) application
Authors	Rohit Singh, Douglas Sicker, Kazi Mohammed Saidul Huq
Abstract	5G will enable the growing demand for Internet of Things (IoT), high-resolution video streaming, and low latency wireless services. Demand for such services is expected to growth rapid, which will require a search for Beyond 5G technological advancements in wireless communications. Part of these advancements is the need for additional spectrum, namely moving toward the terahertz (THz) range. To compensate for the high path loss in THz, narrow beamwidths are used to improve antenna gains. However, with narrow beamwidths, even minor fluctuations in device location (such as through body movement) can cause frequent link failures due to beam misalignment. In this paper, we provide a solution to these small-scale indoor movement that result in mobility-induced outages. Like a moth randomly flutters about, Mobility-induced Outages in THz (MOTH) can be ephemeral in nature and hard to avoid. To deal with MOTH we propose two methods to predict these outage scenarios: (i) Align-After-Failure (AAF), which predicts based on fixed time margins, and (ii) Align-Before-Failure (ABF), which learns the time margins through user mobility patterns. In this paper, two different online classifiers were used to train the ABF model to predicate if a mobility-induced outage is going to occur; thereby, significantly reducing the time spent in outage scenarios. Simulation results demonstrate a relationship between optimal beamwidth and human mobility patterns. Additionally, to cater to a future with dense deployment of Wireless Personal Area Network (WPAN), it is necessary that we have efficient deployment of resources (e.g., THz-APs). One solution is to maximize the user coverage for a single AP, which might be dependent on multiple parameters. We identify these parameters and observe their tradeoffs for improving user coverage through a single THz-AP.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05589v2
PDF	https://arxiv.org/pdf/1911.05589v2.pdf
PWC	https://paperswithcode.com/paper/moth-mobility-induced-outages-in-thz-a-beyond
Repo
Framework

Structured Prediction using cGANs with Fusion Discriminator


Title	Structured Prediction using cGANs with Fusion Discriminator
Authors	Faisal Mahmood, Wenhao Xu, Nicholas J. Durr, Jeremiah W. Johnson, Alan Yuille
Abstract	We propose the fusion discriminator, a single unified framework for incorporating conditional information into a generative adversarial network (GAN) for a variety of distinct structured prediction tasks, including image synthesis, semantic segmentation, and depth estimation. Much like commonly used convolutional neural network – conditional Markov random field (CNN-CRF) models, the proposed method is able to enforce higher-order consistency in the model, but without being limited to a very specific class of potentials. The method is conceptually simple and flexible, and our experimental results demonstrate improvement on several diverse structured prediction tasks.
Tasks	Depth Estimation, Image Generation, Semantic Segmentation, Structured Prediction
Published	2019-04-30
URL	http://arxiv.org/abs/1904.13358v1
PDF	http://arxiv.org/pdf/1904.13358v1.pdf
PWC	https://paperswithcode.com/paper/structured-prediction-using-cgans-with-fusion-1
Repo
Framework

Rethinking travel behavior modeling representations through embeddings


Title	Rethinking travel behavior modeling representations through embeddings
Authors	Francisco C. Pereira
Abstract	This paper introduces the concept of travel behavior embeddings, a method for re-representing discrete variables that are typically used in travel demand modeling, such as mode, trip purpose, education level, family type or occupation. This re-representation process essentially maps those variables into a latent space called the \emph{embedding space}. The benefit of this is that such spaces allow for richer nuances than the typical transformations used in categorical variables (e.g. dummy encoding, contrasted encoding, principal components analysis). While the usage of latent variable representations is not new per se in travel demand modeling, the idea presented here brings several innovations: it is an entirely data driven algorithm; it is informative and consistent, since the latent space can be visualized and interpreted based on distances between different categories; it preserves interpretability of coefficients, despite being based on Neural Network principles; and it is transferrable, in that embeddings learned from one dataset can be reused for other ones, as long as travel behavior keeps consistent between the datasets. The idea is strongly inspired on natural language processing techniques, namely the word2vec algorithm. Such algorithm is behind recent developments such as in automatic translation or next word prediction. Our method is demonstrated using a model choice model, and shows improvements of up to 60% with respect to initial likelihood, and up to 20% with respect to likelihood of the corresponding traditional model (i.e. using dummy variables) in out-of-sample evaluation. We provide a new Python package, called PyTre (PYthon TRavel Embeddings), that others can straightforwardly use to replicate our results or improve their own models. Our experiments are themselves based on an open dataset (swissmetro).
Tasks
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00154v1
PDF	https://arxiv.org/pdf/1909.00154v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-travel-behavior-modeling
Repo
Framework

Toward estimating personal well-being using voice


Title	Toward estimating personal well-being using voice
Authors	Samuel Kim, Namhee Kwon, Henry O’Connell
Abstract	Estimating personal well-being draws increasing attention particularly from healthcare and pharmaceutical industries. We propose an approach to estimate personal well-being in terms of various measurements such as anxiety, sleep quality and mood using voice. With clinically validated questionnaires to score those measurements in a self-assessed way, we extract salient features from voice and train regression models with deep neural networks. Experiments with the collected database of 219 subjects show promising results in predicting the well-being related measurements; concordance correlation coefficients (CCC) between self-assessed scores and predicted scores are 0.41 for anxiety, 0.44 for sleep quality and 0.38 for mood.
Tasks	Sleep Quality
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10082v1
PDF	https://arxiv.org/pdf/1910.10082v1.pdf
PWC	https://paperswithcode.com/paper/toward-estimating-personal-well-being-using
Repo
Framework

A Reproducible Analysis of RSSI Fingerprinting for Outdoor Localization Using Sigfox: Preprocessing and Hyperparameter Tuning


Title	A Reproducible Analysis of RSSI Fingerprinting for Outdoor Localization Using Sigfox: Preprocessing and Hyperparameter Tuning
Authors	Grigorios G. Anagnostopoulos, Alexandros Kalousis
Abstract	Fingerprinting techniques, which are a common method for indoor localization, have been recently applied with success into outdoor settings. Particularly, the communication signals of Low Power Wide Area Networks (LPWAN) such as Sigfox, have been used for localization. In this rather recent field of study, not many publicly available datasets, which would facilitate the consistent comparison of different positioning systems, exist so far. In the current study, a published dataset of RSSI measurements on a Sigfox network deployed in Antwerp, Belgium is used to analyse the appropriate selection of preprocessing steps and to tune the hyperparameters of a kNN fingerprinting method. Initially, the tuning of hyperparameter k for a variety of distance metrics, and the selection of efficient data transformation schemes, proposed by relevant works, is presented. In addition, accuracy improvements are achieved in this study, by a detailed examination of the appropriate adjustment of the parameters of the data transformation schemes tested, and of the handling of out of range values. With the appropriate tuning of these factors, the achieved mean localization error was 298 meters, and the median error was 109 meters. To facilitate the reproducibility of tests and comparability of results, the code and train/validation/test split used in this study are available.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.06851v1
PDF	https://arxiv.org/pdf/1908.06851v1.pdf
PWC	https://paperswithcode.com/paper/a-reproducible-analysis-of-rssi
Repo
Framework