October 19, 2019

2979 words 14 mins read

Paper Group ANR 312

Real-time Surgical Tools Recognition in Total Knee Arthroplasty Using Deep Neural Networks. Learning Restricted Boltzmann Machines via Influence Maximization. Semantic embeddings for program behavior patterns. Resource allocation under uncertainty: an algebraic and qualitative treatment. Selective Sampling and Mixture Models in Generative Adversari …

Real-time Surgical Tools Recognition in Total Knee Arthroplasty Using Deep Neural Networks


Title	Real-time Surgical Tools Recognition in Total Knee Arthroplasty Using Deep Neural Networks
Authors	Moazzem Hossain, Soichi Nishio, Takafumi Hiranaka, Syoji Kobashi
Abstract	Total knee arthroplasty (TKA) is a commonly performed surgical procedure to mitigate knee pain and improve functions for people with knee arthritis. The procedure is complicated due to the different surgical tools used in the stages of surgery. The recognition of surgical tools in real-time can be a solution to simplify surgical procedures for the surgeon. Also, the presence and movement of tools in surgery are crucial information for the recognition of the operational phase and to identify the surgical workflow. Therefore, this research proposes the development of a real-time system for the recognition of surgical tools during surgery using a convolutional neural network (CNN). Surgeons wearing smart glasses can see essential information about tools during surgery that may reduce the complication of the procedures. To evaluate the performance of the proposed method, we calculated and compared the Mean Average Precision (MAP) with state-of-the-art methods which are fast R-CNN and deformable part models (DPM). We achieved 87.6% mAP which is better in comparison to the existing methods. With the additional improvements of our proposed method, it can be a future point of reference, also the baseline for operational phase recognition.
Tasks
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02031v1
PDF	http://arxiv.org/pdf/1806.02031v1.pdf
PWC	https://paperswithcode.com/paper/real-time-surgical-tools-recognition-in-total
Repo
Framework

Learning Restricted Boltzmann Machines via Influence Maximization


Title	Learning Restricted Boltzmann Machines via Influence Maximization
Authors	Guy Bresler, Frederic Koehler, Ankur Moitra, Elchanan Mossel
Abstract	Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are a popular model with wide-ranging applications in dimensionality reduction, collaborative filtering, topic modeling, feature extraction and deep learning. The main message of our paper is a strong dichotomy in the feasibility of learning RBMs, depending on the nature of the interactions between variables: ferromagnetic models can be learned efficiently, while general models cannot. In particular, we give a simple greedy algorithm based on influence maximization to learn ferromagnetic RBMs with bounded degree. In fact, we learn a description of the distribution on the observed variables as a Markov Random Field. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Our algorithm extends straighforwardly to general ferromagnetic Ising models with latent variables. Conversely, we show that even for a contant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs: the distribution on their observed variables can simulate any bounded order MRF. This result is of independent interest since RBMs are the building blocks of deep belief networks.
Tasks	Dimensionality Reduction
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10262v2
PDF	http://arxiv.org/pdf/1805.10262v2.pdf
PWC	https://paperswithcode.com/paper/learning-restricted-boltzmann-machines-via
Repo
Framework

Semantic embeddings for program behavior patterns


Title	Semantic embeddings for program behavior patterns
Authors	Alexander Chistyakov, Ekaterina Lobacheva, Arseny Kuznetsov, Alexey Romanenko
Abstract	In this paper, we propose a new feature extraction technique for program execution logs. First, we automatically extract complex patterns from a program’s behavior graph. Then, we embed these patterns into a continuous space by training an autoencoder. We evaluate the proposed features on a real-world malicious software detection task. We also find that the embedding space captures interpretable structures in the space of pattern parts.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03635v1
PDF	http://arxiv.org/pdf/1804.03635v1.pdf
PWC	https://paperswithcode.com/paper/semantic-embeddings-for-program-behavior
Repo
Framework

Resource allocation under uncertainty: an algebraic and qualitative treatment


Title	Resource allocation under uncertainty: an algebraic and qualitative treatment
Authors	Franklin Camacho, Gerardo Chacón, Ramón Pino Peréz
Abstract	We use an algebraic viewpoint, namely a matrix framework to deal with the problem of resource allocation under uncertainty in the context of a qualitative approach. Our basic qualitative data are a plausibility relation over the resources, a hierarchical relation over the agents and of course the preference that the agents have over the resources. With this data we propose a qualitative binary relation $\unrhd$ between allocations such that $\mathcal{F}\unrhd \mathcal{G}$ has the following intended meaning: the allocation $\mathcal{F}$ produces more or equal social welfare than the allocation $\mathcal{G}$. We prove that there is a family of allocations which are maximal with respect to $\unrhd$. We prove also that there is a notion of simple deal such that optimal allocations can be reached by sequences of simple deals. Finally, we introduce some mechanism for discriminating {optimal} allocations.
Tasks
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06864v1
PDF	http://arxiv.org/pdf/1805.06864v1.pdf
PWC	https://paperswithcode.com/paper/resource-allocation-under-uncertainty-an
Repo
Framework

Selective Sampling and Mixture Models in Generative Adversarial Networks


Title	Selective Sampling and Mixture Models in Generative Adversarial Networks
Authors	Karim Said Barsim, Lirong Yang, Bin Yang
Abstract	In this paper, we propose a multi-generator extension to the adversarial training framework, in which the objective of each generator is to represent a unique component of a target mixture distribution. In the training phase, the generators cooperate to represent, as a mixture, the target distribution while maintaining distinct manifolds. As opposed to traditional generative models, inference from a particular generator after training resembles selective sampling from a unique component in the target distribution. We demonstrate the feasibility of the proposed architecture both analytically and with basic Multi-Layer Perceptron (MLP) models trained on the MNIST dataset.
Tasks
Published	2018-02-02
URL	http://arxiv.org/abs/1802.01568v1
PDF	http://arxiv.org/pdf/1802.01568v1.pdf
PWC	https://paperswithcode.com/paper/selective-sampling-and-mixture-models-in
Repo
Framework

Trusted Multi-Party Computation and Verifiable Simulations: A Scalable Blockchain Approach


Title	Trusted Multi-Party Computation and Verifiable Simulations: A Scalable Blockchain Approach
Authors	Ravi Kiran Raman, Roman Vaculin, Michael Hind, Sekou L. Remy, Eleftheria K. Pissadaki, Nelson Kibichii Bore, Roozbeh Daneshvar, Biplav Srivastava, Kush R. Varshney
Abstract	Large-scale computational experiments, often running over weeks and over large datasets, are used extensively in fields such as epidemiology, meteorology, computational biology, and healthcare to understand phenomena, and design high-stakes policies affecting everyday health and economy. For instance, the OpenMalaria framework is a computationally-intensive simulation used by various non-governmental and governmental agencies to understand malarial disease spread and effectiveness of intervention strategies, and subsequently design healthcare policies. Given that such shared results form the basis of inferences drawn, technological solutions designed, and day-to-day policies drafted, it is essential that the computations are validated and trusted. In particular, in a multi-agent environment involving several independent computing agents, a notion of trust in results generated by peers is critical in facilitating transparency, accountability, and collaboration. Using a novel combination of distributed validation of atomic computation blocks and a blockchain-based immutable audits mechanism, this work proposes a universal framework for distributed trust in computations. In particular we address the scalaibility problem by reducing the storage and communication costs using a lossy compression scheme. This framework guarantees not only verifiability of final results, but also the validity of local computations, and its cost-benefit tradeoffs are studied using a synthetic example of training a neural network.
Tasks	Epidemiology
Published	2018-09-22
URL	http://arxiv.org/abs/1809.08438v1
PDF	http://arxiv.org/pdf/1809.08438v1.pdf
PWC	https://paperswithcode.com/paper/trusted-multi-party-computation-and
Repo
Framework

Face Recognition via Centralized Coordinate Learning


Title	Face Recognition via Centralized Coordinate Learning
Authors	Xianbiao Qi, Lei Zhang
Abstract	Owe to the rapid development of deep neural network (DNN) techniques and the emergence of large scale face databases, face recognition has achieved a great success in recent years. During the training process of DNN, the face features and classification vectors to be learned will interact with each other, while the distribution of face features will largely affect the convergence status of network and the face similarity computing in test stage. In this work, we formulate jointly the learning of face features and classification vectors, and propose a simple yet effective centralized coordinate learning (CCL) method, which enforces the features to be dispersedly spanned in the coordinate space while ensuring the classification vectors to lie on a hypersphere. An adaptive angular margin is further proposed to enhance the discrimination capability of face features. Extensive experiments are conducted on six face benchmarks, including those have large age gap and hard negative samples. Trained only on the small-scale CASIA Webface dataset with 460K face images from about 10K subjects, our CCL model demonstrates high effectiveness and generality, showing consistently competitive performance across all the six benchmark databases.
Tasks	Face Recognition
Published	2018-01-17
URL	http://arxiv.org/abs/1801.05678v1
PDF	http://arxiv.org/pdf/1801.05678v1.pdf
PWC	https://paperswithcode.com/paper/face-recognition-via-centralized-coordinate
Repo
Framework

The 2017 AIBIRDS Competition


Title	The 2017 AIBIRDS Competition
Authors	Matthew Stephenson, Jochen Renz, Xiaoyu Ge, Peng Zhang
Abstract	This paper presents an overview of the sixth AIBIRDS competition, held at the 26th International Joint Conference on Artificial Intelligence. This competition tasked participants with developing an intelligent agent which can play the physics-based puzzle game Angry Birds. This game uses a sophisticated physics engine that requires agents to reason and predict the outcome of actions with only limited environmental information. Agents entered into this competition were required to solve a wide assortment of previously unseen levels within a set time limit. The physical reasoning and planning required to solve these levels are very similar to those of many real-world problems. This year’s competition featured some of the best agents developed so far and even included several new AI techniques such as deep reinforcement learning. Within this paper we describe the framework, rules, submitted agents and results for this competition. We also provide some background information on related work and other video game AI competitions, as well as discussing some potential ideas for future AIBIRDS competitions and agent improvements.
Tasks
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05156v1
PDF	http://arxiv.org/pdf/1803.05156v1.pdf
PWC	https://paperswithcode.com/paper/the-2017-aibirds-competition
Repo
Framework

Unsupervised Representation Learning of Speech for Dialect Identification


Title	Unsupervised Representation Learning of Speech for Dialect Identification
Authors	Suwon Shon, Wei-Ning Hsu, James Glass
Abstract	In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID). An FHVAE can learn a latent space that separates the more static attributes within an utterance from the more dynamic attributes by encoding them into two different sets of latent variables. Useful factors for dialect identification, such as phonetic or linguistic content, are encoded by a segmental latent variable, while irrelevant factors that are relatively constant within a sequence, such as a channel or a speaker information, are encoded by a sequential latent variable. The disentanglement property makes the segmental latent variable less susceptible to channel and speaker variation, and thus reduces degradation from channel domain mismatch. We demonstrate that on fully-supervised DID tasks, an end-to-end model trained on the features extracted from the FHVAE model achieves the best performance, compared to the same model trained on conventional acoustic features and an i-vector based system. Moreover, we also show that the proposed approach can leverage a large amount of unlabeled data for FHVAE training to learn domain-invariant features for DID, and significantly improve the performance in a low-resource condition, where the labels for the in-domain data are not available.
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04458v1
PDF	http://arxiv.org/pdf/1809.04458v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-learning-of
Repo
Framework


Title	Semi-Supervised Cross-Modal Retrieval with Label Prediction
Authors	Devraj Mandal, Pramod Rao, Soma Biswas
Abstract	Due to abundance of data from multiple modalities, cross-modal retrieval tasks with image-text, audio-image, etc. are gaining increasing importance. Of the different approaches proposed, supervised methods usually give significant improvement over their unsupervised counterparts at the additional cost of labeling or annotation of the training data. Semi-supervised methods are recently becoming popular as they provide an elegant framework to balance the conflicting requirement of labeling cost and accuracy. In this work, we propose a novel deep semi-supervised framework which can seamlessly handle both labeled as well as unlabeled data. The network has two important components: (a) the label prediction component predicts the labels for the unlabeled portion of the data and then (b) a common modality-invariant representation is learned for cross-modal retrieval. The two parts of the network are trained sequentially one after the other. Extensive experiments on three standard benchmark datasets, Wiki, Pascal VOC and NUS-WIDE demonstrate that the proposed framework outperforms the state-of-the-art for both supervised and semi-supervised settings.
Tasks	Cross-Modal Retrieval
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01391v2
PDF	https://arxiv.org/pdf/1812.01391v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-framework-for-semi-supervised
Repo
Framework

Chart-Text: A Fully Automated Chart Image Descriptor


Title	Chart-Text: A Fully Automated Chart Image Descriptor
Authors	Abhijit Balaji, Thuvaarakkesh Ramanathan, Venkateshwarlu Sonathi
Abstract	Images greatly help in understanding, interpreting and visualizing data. Adding textual description to images is the first and foremost principle of web accessibility. Visually impaired users using screen readers will use these textual descriptions to get better understanding of images present in digital contents. In this paper, we propose Chart-Text a novel fully automated system that creates textual description of chart images. Given a PNG image of a chart, our Chart-Text system creates a complete textual description of it. First, the system classifies the type of chart and then it detects and classifies the labels and texts in the charts. Finally, it uses specific image processing algorithms to extract relevant information from the chart images. Our proposed system achieves an accuracy of 99.72% in classifying the charts and an accuracy of 78.9% in extracting the data and creating the corresponding textual description.
Tasks
Published	2018-12-27
URL	http://arxiv.org/abs/1812.10636v1
PDF	http://arxiv.org/pdf/1812.10636v1.pdf
PWC	https://paperswithcode.com/paper/chart-text-a-fully-automated-chart-image
Repo
Framework

Ad-Net: Audio-Visual Convolutional Neural Network for Advertisement Detection In Videos


Title	Ad-Net: Audio-Visual Convolutional Neural Network for Advertisement Detection In Videos
Authors	Shervin Minaee, Imed Bouazizi, Prakash Kolan, Hossein Najafzadeh
Abstract	Personalized advertisement is a crucial task for many of the online businesses and video broadcasters. Many of today’s broadcasters use the same commercial for all customers, but as one can imagine different viewers have different interests and it seems reasonable to have customized commercial for different group of people, chosen based on their demographic features, and history. In this project, we propose a framework, which gets the broadcast videos, analyzes them, detects the commercial and replaces it with a more suitable commercial. We propose a two-stream audio-visual convolutional neural network, that one branch analyzes the visual information and the other one analyzes the audio information, and then the audio and visual embedding are fused together, and are used for commercial detection, and content categorization. We show that using both the visual and audio content of the videos significantly improves the model performance for video analysis. This network is trained on a dataset of more than 50k regular video and commercial shots, and achieved much better performance compared to the models based on hand-crafted features.
Tasks
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08612v1
PDF	http://arxiv.org/pdf/1806.08612v1.pdf
PWC	https://paperswithcode.com/paper/ad-net-audio-visual-convolutional-neural
Repo
Framework

Improved GQ-CNN: Deep Learning Model for Planning Robust Grasps


Title	Improved GQ-CNN: Deep Learning Model for Planning Robust Grasps
Authors	Maciej Jaśkowski, Jakub Świątkowski, Michał Zając, Maciej Klimek, Jarek Potiuk, Piotr Rybicki, Piotr Polatowski, Przemysław Walczyk, Kacper Nowicki, Marek Cygan
Abstract	Recent developments in the field of robot grasping have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQ-CNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQ-CNN and describe practical improvements that increase the model validation accuracy from 92.2% to 95.8% and from 85.9% to 88.0% on respectively image-wise and object-wise training and validation splits.
Tasks
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05992v1
PDF	http://arxiv.org/pdf/1802.05992v1.pdf
PWC	https://paperswithcode.com/paper/improved-gq-cnn-deep-learning-model-for
Repo
Framework

WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-Hop Inference


Title	WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-Hop Inference
Authors	Peter A. Jansen, Elizabeth Wainwright, Steven Marmorstein, Clayton T. Morrison
Abstract	Developing methods of automated inference that are able to provide users with compelling human-readable justifications for why the answer to a question is correct is critical for domains such as science and medicine, where user trust and detecting costly errors are limiting factors to adoption. One of the central barriers to training question answering models on explainable inference tasks is the lack of gold explanations to serve as training data. In this paper we present a corpus of explanations for standardized science exams, a recent challenge task for question answering. We manually construct a corpus of detailed explanations for nearly all publicly available standardized elementary science question (approximately 1,680 3rd through 5th grade questions) and represent these as “explanation graphs” – sets of lexically overlapping sentences that describe how to arrive at the correct answer to a question through a combination of domain and world knowledge. We also provide an explanation-centered tablestore, a collection of semi-structured tables that contain the knowledge to construct these elementary science explanations. Together, these two knowledge resources map out a substantial portion of the knowledge required for answering and explaining elementary science exams, and provide both structured and free-text training data for the explainable inference task.
Tasks	Question Answering
Published	2018-02-08
URL	http://arxiv.org/abs/1802.03052v1
PDF	http://arxiv.org/pdf/1802.03052v1.pdf
PWC	https://paperswithcode.com/paper/worldtree-a-corpus-of-explanation-graphs-for
Repo
Framework

Learning Region Features for Object Detection


Title	Learning Region Features for Object Detection
Authors	Jiayuan Gu, Han Hu, Liwei Wang, Yichen Wei, Jifeng Dai
Abstract	While most steps in the modern object detection methods are learnable, the region feature extraction step remains largely hand-crafted, featured by RoI pooling methods. This work proposes a general viewpoint that unifies existing region feature extraction methods and a novel method that is end-to-end learnable. The proposed method removes most heuristic choices and outperforms its RoI pooling counterparts. It moves further towards fully learnable object detection.
Tasks	Object Detection
Published	2018-03-19
URL	http://arxiv.org/abs/1803.07066v1
PDF	http://arxiv.org/pdf/1803.07066v1.pdf
PWC	https://paperswithcode.com/paper/learning-region-features-for-object-detection
Repo
Framework