January 25, 2020

3237 words 16 mins read

Paper Group ANR 1661

Triplet-Aware Scene Graph Embeddings. Machine Learning at Microsoft with ML .NET. Efficient Data Analytics on Augmented Similarity Triplets. Learning Sparse Mixture of Experts for Visual Question Answering. Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering. A Multi …

Triplet-Aware Scene Graph Embeddings


Title	Triplet-Aware Scene Graph Embeddings
Authors	Brigit Schroeder, Subarna Tripathi, Hanlin Tang
Abstract	Scene graphs have become an important form of structured knowledge for tasks such as for image generation, visual relation detection, visual question answering, and image retrieval. While visualizing and interpreting word embeddings is well understood, scene graph embeddings have not been fully explored. In this work, we train scene graph embeddings in a layout generation task with different forms of supervision, specifically introducing triplet super-vision and data augmentation. We see a significant performance increase in both metrics that measure the goodness of layout prediction, mean intersection-over-union (mIoU)(52.3% vs. 49.2%) and relation score (61.7% vs. 54.1%),after the addition of triplet supervision and data augmentation. To understand how these different methods affect the scene graph representation, we apply several new visualization and evaluation methods to explore the evolution of the scene graph embedding. We find that triplet supervision significantly improves the embedding separability, which is highly correlated with the performance of the layout prediction model.
Tasks	Data Augmentation, Graph Embedding, Image Generation, Image Retrieval, Question Answering, Visual Question Answering, Word Embeddings
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09256v1
PDF	https://arxiv.org/pdf/1909.09256v1.pdf
PWC	https://paperswithcode.com/paper/triplet-aware-scene-graph-embeddings
Repo
Framework

Machine Learning at Microsoft with ML .NET


Title	Machine Learning at Microsoft with ML .NET
Authors	Zeeshan Ahmed, Saeed Amizadeh, Mikhail Bilenko, Rogan Carr, Wei-Sheng Chin, Yael Dekel, Xavier Dupre, Vadim Eksarevskiy, Eric Erhardt, Costin Eseanu, Senja Filipi, Tom Finley, Abhishek Goswami, Monte Hoover, Scott Inglis, Matteo Interlandi, Shon Katzenberger, Najeeb Kazmi, Gleb Krivosheev, Pete Luferenko, Ivan Matantsev, Sergiy Matusevych, Shahab Moradi, Gani Nazirov, Justin Ormont, Gal Oshri, Artidoro Pagnoni, Jignesh Parmar, Prabhat Roy, Sarthak Shah, Mohammad Zeeshan Siddiqui, Markus Weimer, Shauheen Zahirazami, Yiwen Zhu
Abstract	Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from standard software development processes. This separation makes incorporating machine learning capabilities inside applications unnecessarily costly and difficult, and furthermore discourage developers from embracing ML in first place. In this paper we present ML .NET, a framework developed at Microsoft over the last decade in response to the challenge of making it easy to ship machine learning models in large software applications. We present its architecture, and illuminate the application demands that shaped it. Specifically, we introduce DataView, the core data abstraction of ML .NET which allows it to capture full predictive pipelines efficiently and consistently across training and inference lifecycles. We close the paper with a surprisingly favorable performance study of ML .NET compared to more recent entrants, and a discussion of some lessons learned.
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05715v2
PDF	https://arxiv.org/pdf/1905.05715v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-at-microsoft-with-mlnet
Repo
Framework

Efficient Data Analytics on Augmented Similarity Triplets


Title	Efficient Data Analytics on Augmented Similarity Triplets
Authors	Muhammad Ahmad, Muhammad Haroon Shakeel, Sarwan Ali, Imdadullah Khan, Arif Zaman, Asim Karim
Abstract	Many machine learning methods (classification, clustering, etc.) start with a known kernel that provides similarity or distance measure between two objects. Recent work has extended this to situations where the information about objects is limited to comparisons of distances between three objects (triplets). Humans find the comparison task much easier than the estimation of absolute similarities, so this kind of data can be easily obtained using crowd-sourcing. In this work, we give an efficient method of augmenting the triplets data, by utilizing additional implicit information inferred from the existing data. Triplets augmentation improves the quality of kernel-based and kernel-free data analytics tasks. Secondly, we also propose a novel set of algorithms for common supervised and unsupervised machine learning tasks based on triplets. These methods work directly with triplets, avoiding kernel evaluations. Experimental evaluation on real and synthetic datasets shows that our methods are more accurate than the current best-known techniques.
Tasks
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12064v1
PDF	https://arxiv.org/pdf/1912.12064v1.pdf
PWC	https://paperswithcode.com/paper/efficient-data-analytics-on-augmented
Repo
Framework

Learning Sparse Mixture of Experts for Visual Question Answering


Title	Learning Sparse Mixture of Experts for Visual Question Answering
Authors	Vardaan Pahuja, Jie Fu, Christopher J. Pal
Abstract	There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visual processing pipeline of a VQA model (assuming the CNN is trained along with entire VQA model). In this project, we propose an efficient and modular neural architecture for the VQA task with focus on the CNN module. Our experiments demonstrate that a sparsely activated CNN based VQA model achieves comparable performance to a standard CNN based VQA model architecture.
Tasks	Question Answering, Visual Question Answering
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09192v1
PDF	https://arxiv.org/pdf/1909.09192v1.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-mixture-of-experts-for-visual
Repo
Framework

Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering


Title	Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Authors	Soravit Changpinyo, Bo Pang, Piyush Sharma, Radu Soricut
Abstract	Object detection plays an important role in current solutions to vision and language tasks like image captioning and visual question answering. However, popular models like Faster R-CNN rely on a costly process of annotating ground-truths for both the bounding boxes and their corresponding semantic labels, making it less amenable as a primitive task for transfer learning. In this paper, we examine the effect of decoupling box proposal and featurization for down-stream tasks. The key insight is that this allows us to leverage a large amount of labeled annotations that were previously unavailable for standard object detection benchmarks. Empirically, we demonstrate that this leads to effective transfer learning and improved image captioning and visual question answering models, as measured on publicly available benchmarks.
Tasks	Image Captioning, Object Detection, Question Answering, Transfer Learning, Visual Question Answering
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02097v1
PDF	https://arxiv.org/pdf/1909.02097v1.pdf
PWC	https://paperswithcode.com/paper/decoupled-box-proposal-and-featurization-with
Repo
Framework

A Multi-Task Learning Framework for Extracting Drugs and Their Interactions from Drug Labels


Title	A Multi-Task Learning Framework for Extracting Drugs and Their Interactions from Drug Labels
Authors	Tung Tran, Ramakanth Kavuluru, Halil Kilicoglu
Abstract	Preventable adverse drug reactions as a result of medical errors present a growing concern in modern medicine. As drug-drug interactions (DDIs) may cause adverse reactions, being able to extracting DDIs from drug labels into machine-readable form is an important effort in effectively deploying drug safety information. The DDI track of TAC 2018 introduces two large hand-annotated test sets for the task of extracting DDIs from structured product labels with linkage to standard terminologies. Herein, we describe our approach to tackling tasks one and two of the DDI track, which corresponds to named entity recognition (NER) and sentence-level relation extraction respectively. Namely, our approach resembles a multi-task learning framework designed to jointly model various sub-tasks including NER and interaction type and outcome prediction. On NER, our system ranked second (among eight teams) at 33.00% and 38.25% F1 on Test Sets 1 and 2 respectively. On relation extraction, our system ranked second (among four teams) at 21.59% and 23.55% on Test Sets 1 and 2 respectively.
Tasks	Multi-Task Learning, Named Entity Recognition, Relation Extraction
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07464v1
PDF	https://arxiv.org/pdf/1905.07464v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-task-learning-framework-for-1
Repo
Framework

Short-term Demand Forecasting for Online Car-hailing Services using Recurrent Neural Networks


Title	Short-term Demand Forecasting for Online Car-hailing Services using Recurrent Neural Networks
Authors	Alireza Nejadettehad, Hamid Mahini, Behnam Bahrak
Abstract	Short-term traffic flow prediction is one of the crucial issues in intelligent transportation system, which is an important part of smart cities. Accurate predictions can enable both the drivers and the passengers to make better decisions about their travel route, departure time and travel origin selection, which can be helpful in traffic management. Multiple models and algorithms based on time series prediction and machine learning were applied to this issue and achieved acceptable results. Recently, the availability of sufficient data and computational power, motivates us to improve the prediction accuracy via deep-learning approaches. Recurrent neural networks have become one of the most popular methods for time series forecasting, however, due to the variety of these networks, the question that which type is the most appropriate one for this task remains unsolved. In this paper, we use three kinds of recurrent neural networks including simple RNN units, GRU and LSTM neural network to predict short-term traffic flow. The dataset from TAP30 Corporation is used for building the models and comparing RNNs with several well-known models, such as DEMA, LASSO and XGBoost. The results show that all three types of RNNs outperform the others, however, more simple RNNs such as simple recurrent units and GRU perform work better than LSTM in terms of accuracy and training time.
Tasks	Time Series, Time Series Forecasting, Time Series Prediction
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10821v1
PDF	http://arxiv.org/pdf/1901.10821v1.pdf
PWC	https://paperswithcode.com/paper/short-term-demand-forecasting-for-online-car
Repo
Framework

An Autonomous Spectrum Management Scheme for Unmanned Aerial Vehicle Networks in Disaster Relief Operations


Title	An Autonomous Spectrum Management Scheme for Unmanned Aerial Vehicle Networks in Disaster Relief Operations
Authors	Alireza Shamsoshoara, Fatemeh Afghah, Abolfazl Razi, Sajad Mousavi, Jonathan Ashdown, Kurt Turk
Abstract	This paper studies the problem of spectrum shortage in an unmanned aerial vehicle (UAV) network during critical missions such as wildfire monitoring, search and rescue, and disaster monitoring. Such applications involve a high demand for high-throughput data transmissions such as real-time video-, image-, and voice- streaming where the assigned spectrum to the UAV network may not be adequate to provide the desired Quality of Service (QoS). In these scenarios, the aerial network can borrow an additional spectrum from the available terrestrial networks in the trade of a relaying service for them. We propose a spectrum sharing model in which the UAVs are grouped into two classes of relaying UAVs that service the spectrum owner and the sensing UAVs that perform the disaster relief mission using the obtained spectrum. The operation of the UAV network is managed by a hierarchical mechanism in which a central controller assigns the tasks of the UAVs based on their resources and determine their operation region based on the level of priority of impacted areas and then the UAVs autonomously fine-tune their position using a model-free reinforcement learning algorithm to maximize the individual throughput and prolong their lifetime. We analyze the performance and the convergence for the proposed method analytically and with extensive simulations in different scenarios.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11343v1
PDF	https://arxiv.org/pdf/1911.11343v1.pdf
PWC	https://paperswithcode.com/paper/an-autonomous-spectrum-management-scheme-for
Repo
Framework

An AGI with Time-Inconsistent Preferences


Title	An AGI with Time-Inconsistent Preferences
Authors	James D. Miller, Roman Yampolskiy
Abstract	This paper reveals a trap for artificial general intelligence (AGI) theorists who use economists’ standard method of discounting. This trap is implicitly and falsely assuming that a rational AGI would have time-consistent preferences. An agent with time-inconsistent preferences knows that its future self will disagree with its current self concerning intertemporal decision making. Such an agent cannot automatically trust its future self to carry out plans that its current self considers optimal.
Tasks	Decision Making
Published	2019-06-23
URL	https://arxiv.org/abs/1906.10536v1
PDF	https://arxiv.org/pdf/1906.10536v1.pdf
PWC	https://paperswithcode.com/paper/an-agi-with-time-inconsistent-preferences
Repo
Framework

PlotQA: Reasoning over Scientific Plots


Title	PlotQA: Reasoning over Scientific Plots
Authors	Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, Pratyush Kumar
Abstract	Existing synthetic datasets (FigureQA, DVQA) for reasoning over plots do not contain variability in data labels, real-valued data, or complex reasoning questions. Consequently, proposed models for these datasets do not fully address the challenge of reasoning over plots. In particular, they assume that the answer comes either from a small fixed size vocabulary or from a bounding box within the image. However, in practice, this is an unrealistic assumption because many questions require reasoning and thus have real-valued answers which appear neither in a small fixed size vocabulary nor in the image. In this work, we aim to bridge this gap between existing datasets and real-world plots. Specifically, we propose PlotQA with 28.9 million question-answer pairs over 224,377 plots on data from real-world sources and questions based on crowd-sourced question templates. Further, 80.76% of the out-of-vocabulary (OOV) questions in PlotQA have answers that are not in a fixed vocabulary. Analysis of existing models on PlotQA reveals that they cannot deal with OOV questions: their overall accuracy on our dataset is in single digits. This is not surprising given that these models were not designed for such questions. As a step towards a more holistic model which can address fixed vocabulary as well as OOV questions, we propose a hybrid approach: Specific questions are answered by choosing the answer from a fixed vocabulary or by extracting it from a predicted bounding box in the plot, while other questions are answered with a table question-answering engine which is fed with a structured table generated by detecting visual elements from the image. On the existing DVQA dataset, our model has an accuracy of 58%, significantly improving on the highest reported accuracy of 46%. On PlotQA, our model has an accuracy of 22.52%, which is significantly better than state of the art models.
Tasks	Question Answering, Visual Question Answering
Published	2019-09-03
URL	https://arxiv.org/abs/1909.00997v3
PDF	https://arxiv.org/pdf/1909.00997v3.pdf
PWC	https://paperswithcode.com/paper/data-interpretation-over-plots
Repo
Framework

Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network


Title	Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network
Authors	Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh Kapoor, Shashwat Gulyani
Abstract	User Interface (UI) prototyping is a necessary step in the early stages of application development. Transforming sketches of a Graphical User Interface (UI) into a coded UI application is an uninspired but time-consuming task performed by a UI designer. An automated system that can replace human efforts for straightforward implementation of UI designs will greatly speed up this procedure. The works that propose such a system primarily focus on using UI wireframes as input rather than hand-drawn sketches. In this paper, we put forward a novel approach wherein we employ a Deep Neural Network that is trained on our custom database of such sketches to detect UI elements in the input sketch. Detection of objects in sketches is a peculiar visual recognition task that requires a specific solution that our deep neural network model attempts to provide. The output from the network is a platform-independent UI representation object. The UI representation object is a dictionary of key-value pairs to represent the UI elements recognized along with their properties. This is further consumed by our UI parser which creates code for different platforms. The intrinsic platform-independence allows the model to create a UI prototype for multiple platforms with single training. This two-step approach without the need for two trained models improves over other methods giving time-efficient results (average time: 129 ms) with good accuracy.
Tasks
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08930v1
PDF	https://arxiv.org/pdf/1910.08930v1.pdf
PWC	https://paperswithcode.com/paper/sketch2code-transformation-of-sketches-to-ui
Repo
Framework

Scalable NAS with Factorizable Architectural Parameters


Title	Scalable NAS with Factorizable Architectural Parameters
Authors	Lanfei Wang, Lingxi Xie, Tianyi Zhang, Jun Guo, Qi Tian
Abstract	Neural architecture search (NAS) is an emerging topic in machine learning and computer vision. The fundamental ideology of NAS is using an automatic mechanism to replace manual designs for exploring powerful network architectures. One of the key factors of NAS is to scale-up the search space, e.g., increasing the number of operators, so that more possibilities are covered, but existing search algorithms often get lost in a large number of operators. This paper presents a scalable NAS algorithm by designing a factorizable set of architectural parameters, so that the size of the search space goes up quadratically while the burden of optimization increases linearly. As a practical example, we add a set of activation functions to the original set containing convolution, pooling and skip-connect, etc. With a marginal increase in search costs and no extra costs in retraining, we can find interesting architectures that were not explored before and achieve state-of-the-art performance in CIFAR10 and ImageNet, two standard image classification benchmarks.
Tasks	Image Classification, Neural Architecture Search
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13256v1
PDF	https://arxiv.org/pdf/1912.13256v1.pdf
PWC	https://paperswithcode.com/paper/scalable-nas-with-factorizable-architectural
Repo
Framework

Visual Question Answering using Deep Learning: A Survey and Performance Analysis


Title	Visual Question Answering using Deep Learning: A Survey and Performance Analysis
Authors	Yash Srivastava, Vaishnav Murali, Shiv Ram Dubey, Snehasis Mukherjee
Abstract	The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic `common sense’ questions about given images. Given an image and a question in natural language, the VQA system tries to find the correct answer to it using visual elements of the image and inference gathered from textual questions. In this survey, we cover and discuss the recent datasets released in the VQA domain dealing with various types of question-formats and enabling robustness of the machine-learning models. Next, we discuss about new deep learning models that have shown promising results over the VQA datasets. At the end, we present and discuss some of the results computed by us over the vanilla VQA models, Stacked Attention Network and the VQA Challenge 2017 winner model. We also provide the detailed analysis along with the challenges and future research directions. \|
Tasks	Common Sense Reasoning, Question Answering, Visual Question Answering
Published	2019-08-27
URL	https://arxiv.org/abs/1909.01860v1
PDF	https://arxiv.org/pdf/1909.01860v1.pdf
PWC	https://paperswithcode.com/paper/visual-question-answering-using-deep-learning
Repo
Framework

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning


Title	A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Authors	Francisco M. Garcia, Philip S. Thomas
Abstract	In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed approach.
Tasks
Published	2019-02-03
URL	http://arxiv.org/abs/1902.00843v1
PDF	http://arxiv.org/pdf/1902.00843v1.pdf
PWC	https://paperswithcode.com/paper/a-meta-mdp-approach-to-exploration-for
Repo
Framework

Modeling Neural Architecture Search Methods for Deep Networks


Title	Modeling Neural Architecture Search Methods for Deep Networks
Authors	Emad Malekhosseini, Mohsen Hajabdollahi, Nader Karimi, Shadrokh Samavi
Abstract	There are many research works on the designing of architectures for the deep neural networks (DNN), which are named neural architecture search (NAS) methods. Although there are many automatic and manual techniques for NAS problems, there is no unifying model in which these NAS methods can be explored and compared. In this paper, we propose a general abstraction model for NAS methods. By using the proposed framework, it is possible to compare different design approaches for categorizing and identifying critical areas of interest in designing DNN architectures. Also, under this framework, different methods in the NAS area are summarized; hence a better view of their advantages and disadvantages is possible.
Tasks	Neural Architecture Search
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13183v1
PDF	https://arxiv.org/pdf/1912.13183v1.pdf
PWC	https://paperswithcode.com/paper/modeling-neural-architecture-search-methods
Repo
Framework