January 28, 2020

3327 words 16 mins read

Paper Group ANR 915

Rethinking Classification and Localization for Cascade R-CNN. Efficient Bitmap-based Indexing and Retrieval of Similarity Search Image Queries. A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications. Addressing Objects and Their Relations: The Conversational Entity Dialogue Model. Learning Continuous Occupancy Ma …

Rethinking Classification and Localization for Cascade R-CNN


Title	Rethinking Classification and Localization for Cascade R-CNN
Authors	Ang Li, Xue Yang, Chongyang Zhang
Abstract	We extend the state-of-the-art Cascade R-CNN with a simple feature sharing mechanism. Our approach focuses on the performance increases on high IoU but decreases on low IoU thresholds–a key problem this detector suffers from. Feature sharing is extremely helpful, our results show that given this mechanism embedded into all stages, we can easily narrow the gap between the last stage and preceding stages on low IoU thresholds without resorting to the commonly used testing ensemble but the network itself. We also observe obvious improvements on all IoU thresholds benefited from feature sharing, and the resulting cascade structure can easily match or exceed its counterparts, only with negligible extra parameters introduced. To push the envelope, we demonstrate 43.2 AP on COCO object detection without any bells and whistles including testing ensemble, surpassing previous Cascade R-CNN by a large margin. Our framework is easy to implement and we hope it can serve as a general and strong baseline for future research.
Tasks	Object Detection
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11914v1
PDF	https://arxiv.org/pdf/1907.11914v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-classification-and-localization
Repo
Framework

Efficient Bitmap-based Indexing and Retrieval of Similarity Search Image Queries


Title	Efficient Bitmap-based Indexing and Retrieval of Similarity Search Image Queries
Authors	Omid Jafari, Parth Nagarkar, Jonathan Montaño
Abstract	Finding similar images is a necessary operation in many multimedia applications. Images are often represented and stored as a set of high-dimensional features, which are extracted using localized feature extraction algorithms. Locality Sensitive Hashing is one of the most popular approximate processing techniques for finding similar points in high-dimensional spaces. Locality Sensitive Hashing (LSH) and its variants are designed to find similar points, but they are not designed to find objects (such as images, which are made up of a collection of points) efficiently. In this paper, we propose an index structure, Bitmap-Image LSH (bImageLSH), for efficient processing of high-dimensional images. Using a real dataset, we experimentally show the performance benefit of our novel design while keeping the accuracy of the image results high.
Tasks
Published	2019-12-15
URL	https://arxiv.org/abs/1912.07101v1
PDF	https://arxiv.org/pdf/1912.07101v1.pdf
PWC	https://paperswithcode.com/paper/efficient-bitmap-based-indexing-and-retrieval
Repo
Framework

A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications


Title	A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications
Authors	Finn Kuusisto, John Steill, Zhaobin Kuang, James Thomson, David Page, Ron Stewart
Abstract	We present a simple text mining method that is easy to implement, requires minimal data collection and preparation, and is easy to use for proposing ranked associations between a list of target terms and a key phrase. We call this method KinderMiner, and apply it to two biomedical applications. The first application is to identify relevant transcription factors for cell reprogramming, and the second is to identify potential drugs for investigation in drug repositioning. We compare the results from our algorithm to existing data and state-of-the-art algorithms, demonstrating compelling results for both application areas. While we apply the algorithm here for biomedical applications, we argue that the method is generalizable to any available corpus of sufficient size.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05255v1
PDF	https://arxiv.org/pdf/1906.05255v1.pdf
PWC	https://paperswithcode.com/paper/a-simple-text-mining-approach-for-ranking
Repo
Framework

Addressing Objects and Their Relations: The Conversational Entity Dialogue Model


Title	Addressing Objects and Their Relations: The Conversational Entity Dialogue Model
Authors	Stefan Ultes, Paweł\ Budzianowski, Iñigo Casanueva, Lina Rojas-Barahona, Bo-Hsiang Tseng, Yen-Chen Wu, Steve Young, Milica Gašić
Abstract	Statistical spoken dialogue systems usually rely on a single- or multi-domain dialogue model that is restricted in its capabilities of modelling complex dialogue structures, e.g., relations. In this work, we propose a novel dialogue model that is centred around entities and is able to model relations as well as multiple entities of the same type. We demonstrate in a prototype implementation benefits of relation modelling on the dialogue level and show that a trained policy using these relations outperforms the multi-domain baseline. Furthermore, we show that by modelling the relations on the dialogue level, the system is capable of processing relations present in the user input and even learns to address them in the system response.
Tasks	Spoken Dialogue Systems
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01466v1
PDF	http://arxiv.org/pdf/1901.01466v1.pdf
PWC	https://paperswithcode.com/paper/addressing-objects-and-their-relations-the
Repo
Framework

Learning Continuous Occupancy Maps with the Ising Process Model


Title	Learning Continuous Occupancy Maps with the Ising Process Model
Authors	Nicholas O’Dell, Christopher Renton, Adrian Wills
Abstract	We present a new method of learning a continuous occupancy field for use in robot navigation. Occupancy grid maps, or variants of, are possibly the most widely used and accepted method of building a map of a robot’s environment. Various methods have been developed to learn continuous occupancy maps and have successfully resolved many of the shortcomings of grid mapping, namely, priori discretisation and spatial correlation. However, most methods for producing a continuous occupancy field remain computationally expensive or heuristic in nature. Our method explores a generalisation of the so-called Ising model as a suitable candidate for modelling an occupancy field. We also present a unique kernel for use within our method that models range measurements. The method is quite attractive as it requires only a small number of hyperparameters to be trained, and is computationally efficient. The small number of hyperparameters can be quickly learned by maximising a pseudo likelihood. The technique is demonstrated on both a small simulated indoor environment with known ground truth as well as large indoor and outdoor areas, using two common real data sets.
Tasks	Robot Navigation
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08225v1
PDF	https://arxiv.org/pdf/1910.08225v1.pdf
PWC	https://paperswithcode.com/paper/learning-continuous-occupancy-maps-with-the
Repo
Framework

Iterative Matching Point


Title	Iterative Matching Point
Authors	Jiahao Li, Changhao Zhang
Abstract	In this paper, we propose a neural network-based point cloud registration method named Iterative Matching Point (IMP). Our model iteratively matches features of points from two point clouds and solve the rigid body motion by minimizing the distance between the matching points. The idea is similar to Iterative Closest Point (ICP), but our model determines correspondences by comparing geometric features instead of just finding the closest point. Thus it does not suffer from the local minima problem and can handle point clouds with large rotation angles. Furthermore, the robustness of the feature extraction network allows IMP to register partial and noisy point clouds. Experiments on the ModelNet40 dataset show that our method outperforms existing point cloud registration method by a large margin, especially when the initial rotation angle is large. Also, its capability generalizes to real world 2.5D data without training on them.
Tasks	Point Cloud Registration
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10328v1
PDF	https://arxiv.org/pdf/1910.10328v1.pdf
PWC	https://paperswithcode.com/paper/iterative-matching-point
Repo
Framework

We Are Not Your Real Parents: Telling Causal from Confounded using MDL


Title	We Are Not Your Real Parents: Telling Causal from Confounded using MDL
Authors	David Kaltenpoth, Jilles Vreeken
Abstract	Given data over variables $(X_1,…,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well – even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.
Tasks
Published	2019-01-21
URL	http://arxiv.org/abs/1901.06950v1
PDF	http://arxiv.org/pdf/1901.06950v1.pdf
PWC	https://paperswithcode.com/paper/we-are-not-your-real-parents-telling-causal
Repo
Framework

Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks


Title	Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks
Authors	Kenta Oono, Taiji Suzuki
Abstract	Convolutional neural networks (CNNs) have been shown to achieve optimal approximation and estimation error rates (in minimax sense) in several function classes. However, previous analyzed optimal CNNs are unrealistically wide and difficult to obtain via optimization due to sparse constraints in important function classes, including the H"older class. We show a ResNet-type CNN can attain the minimax optimal error rates in these classes in more plausible situations – it can be dense, and its width, channel size, and filter size are constant with respect to sample size. The key idea is that we can replicate the learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as long as the FNNs have \textit{block-sparse} structures. Our theory is general in a sense that we can automatically translate any approximation rate achieved by block-sparse FNNs into that by CNNs. As an application, we derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes with the same strategy.
Tasks
Published	2019-03-24
URL	https://arxiv.org/abs/1903.10047v2
PDF	https://arxiv.org/pdf/1903.10047v2.pdf
PWC	https://paperswithcode.com/paper/approximation-and-non-parametric-estimation-1
Repo
Framework

Fast and Automatic Periacetabular Osteotomy Fragment Pose Estimation Using Intraoperatively Implanted Fiducials and Single-View Fluoroscopy


Title	Fast and Automatic Periacetabular Osteotomy Fragment Pose Estimation Using Intraoperatively Implanted Fiducials and Single-View Fluoroscopy
Authors	Robert Grupp, Ryan Murphy, Rachel Hegeman, Clayton Alexander, Mathias Unberath, Yoshito Otake, Benjamin McArthur, Mehran Armand, Russell Taylor
Abstract	Accurate and consistent mental interpretation of fluoroscopy to determine the position and orientation of acetabular bone fragments in 3D space is difficult. We propose a computer assisted approach that uses a single fluoroscopic view and quickly reports the pose of an acetabular fragment without any user input or initialization. Intraoperatively, but prior to any osteotomies, two constellations of metallic ball-bearings (BBs) are injected into the wing of a patient’s ilium and lateral superior pubic ramus. One constellation is located on the expected acetabular fragment, and the other is located on the remaining, larger, pelvis fragment. The 3D locations of each BB are reconstructed using three fluoroscopic views and 2D/3D registrations to a preoperative CT scan of the pelvis. The relative pose of the fragment is established by estimating the movement of the two BB constellations using a single fluoroscopic view taken after osteotomy and fragment relocation. BB detection and inter-view correspondences are automatically computed throughout the processing pipeline. The proposed method was evaluated on a multitude of fluoroscopic images collected from six cadaveric surgeries performed bilaterally on three specimens. Mean fragment rotation error was 2.4 +/- 1.0 degrees, mean translation error was 2.1 +/- 0.6 mm, and mean 3D lateral center edge angle error was 1.0 +/- 0.5 degrees. The average runtime of the single-view pose estimation was 0.7 +/- 0.2 seconds. The proposed method demonstrates accuracy similar to other state of the art systems which require optical tracking systems or multiple-view 2D/3D registrations with manual input. The errors reported on fragment poses and lateral center edge angles are within the margins required for accurate intraoperative evaluation of femoral head coverage.
Tasks	Pose Estimation
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10187v2
PDF	https://arxiv.org/pdf/1910.10187v2.pdf
PWC	https://paperswithcode.com/paper/fast-and-automatic-periacetabular-osteotomy
Repo
Framework

Continual Learning for Robotics: Definition, Framework, Learning Strategies, Opportunities and Challenges


Title	Continual Learning for Robotics: Definition, Framework, Learning Strategies, Opportunities and Challenges
Authors	Timothée Lesort, Vincenzo Lomonaco, Andrei Stoian, Davide Maltoni, David Filliat, Natalia Díaz-Rodríguez
Abstract	Continual learning (CL) is a particular machine learning paradigm where the data distribution and learning objective changes through time, or where all the training data and objective criteria are never available at once. The evolution of the learning process is modeled by a sequence of learning experiences where the goal is to be able to learn new skills all along the sequence without forgetting what has been previously learned. Continual learning also aims at the same time at optimizing the memory, the computation power and the speed during the learning process. An important challenge for machine learning is not necessarily finding solutions that work in the real world but rather finding stable algorithms that can learn in real world. Hence, the ideal approach would be tackling the real world in a embodied platform: an autonomous agent. Continual learning would then be effective in an autonomous agent or robot, which would learn autonomously through time about the external world, and incrementally develop a set of complex skills and knowledge. Robotic agents have to learn to adapt and interact with their environment using a continuous stream of observations. Some recent approaches aim at tackling continual learning for robotics, but most recent papers on continual learning only experiment approaches in simulation or with static datasets. Unfortunately, the evaluation of those algorithms does not provide insights on whether their solutions may help continual learning in the context of robotics. This paper aims at reviewing the existing state of the art of continual learning, summarizing existing benchmarks and metrics, and proposing a framework for presenting and evaluating both robotics and non robotics approaches in a way that makes transfer between both fields easier.
Tasks	Continual Learning
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00182v3
PDF	https://arxiv.org/pdf/1907.00182v3.pdf
PWC	https://paperswithcode.com/paper/continual-learning-for-robotics
Repo
Framework

Topology-Preserving Deep Image Segmentation


Title	Topology-Preserving Deep Image Segmentation
Authors	Xiaoling Hu, Li Fuxin, Dimitris Samaras, Chao Chen
Abstract	Segmentation algorithms are prone to make topological errors on fine-scale structures, e.g., broken connections. We propose a novel method that learns to segment with correct topology. In particular, we design a continuous-valued loss function that enforces a segmentation to have the same topology as the ground truth, i.e., having the same Betti number. The proposed topology-preserving loss function is differentiable and we incorporate it into end-to-end training of a deep neural network. Our method achieves much better performance on the Betti number error, which directly accounts for the topological correctness. It also performs superiorly on other topology-relevant metrics, e.g., the Adjusted Rand Index and the Variation of Information. We illustrate the effectiveness of the proposed method on a broad spectrum of natural and biomedical datasets.
Tasks	Semantic Segmentation
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05404v1
PDF	https://arxiv.org/pdf/1906.05404v1.pdf
PWC	https://paperswithcode.com/paper/topology-preserving-deep-image-segmentation
Repo
Framework

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments


Title	Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Authors	Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Abstract	End-to-end text-to-speech (TTS) synthesis is a method that directly converts input text to output acoustic features using a single network. A recent advance of end-to-end TTS is due to a key technique called attention mechanisms, and all successful methods proposed so far have been based on soft attention mechanisms. However, although network structures are becoming increasingly complex, end-to-end TTS systems with soft attention mechanisms may still fail to learn and to predict accurate alignment between the input and output. This may be because the soft attention mechanisms are too flexible. Therefore, we propose an approach that has more explicit but natural constraints suitable for speech signals to make alignment learning and prediction of end-to-end TTS systems more robust. The proposed system, with the constrained alignment scheme borrowed from segment-to-segment neural transduction (SSNT), directly calculates the joint probability of acoustic features and alignment given an input text. The alignment is designed to be hard and monotonically increase by considering the speech nature, and it is treated as a latent variable and marginalized during training. During prediction, both the alignment and acoustic features can be generated from the probabilistic distributions. The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function. As far as we know, our approach is the first end-to-end TTS without a soft attention mechanism.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11535v1
PDF	https://arxiv.org/pdf/1908.11535v1.pdf
PWC	https://paperswithcode.com/paper/initial-investigation-of-an-encoder-decoder
Repo
Framework

VASTA: A Vision and Language-assisted Smartphone Task Automation System


Title	VASTA: A Vision and Language-assisted Smartphone Task Automation System
Authors	Alborz Rezazadeh Sereshkeh, Gary Leung, Krish Perumal, Caleb Phillips, Minfan Zhang, Afsaneh Fazly, Iqbal Mohomed
Abstract	We present VASTA, a novel vision and language-assisted Programming By Demonstration (PBD) system for smartphone task automation. Development of a robust PBD automation system requires overcoming three key challenges: first, how to make a particular demonstration robust to positional and visual changes in the user interface (UI) elements; secondly, how to recognize changes in the automation parameters to make the demonstration as generalizable as possible; and thirdly, how to recognize from the user utterance what automation the user wishes to carry out. To address the first challenge, VASTA leverages state-of-the-art computer vision techniques, including object detection and optical character recognition, to accurately label interactions demonstrated by a user, without relying on the underlying UI structures. To address the second and third challenges, VASTA takes advantage of advanced natural language understanding algorithms for analyzing the user utterance to trigger the VASTA automation scripts, and to determine the automation parameters for generalization. We run an initial user study that demonstrates the effectiveness of VASTA at clustering user utterances, understanding changes in the automation parameters, detecting desired UI elements, and, most importantly, automating various tasks. A demo video of the system is available here: http://y2u.be/kr2xE-FixjI
Tasks	Object Detection, Optical Character Recognition
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01474v1
PDF	https://arxiv.org/pdf/1911.01474v1.pdf
PWC	https://paperswithcode.com/paper/vasta-a-vision-and-language-assisted
Repo
Framework

Graph Pattern Entity Ranking Model for Knowledge Graph Completion


Title	Graph Pattern Entity Ranking Model for Knowledge Graph Completion
Authors	Takuma Ebisu, Ryutaro Ichise
Abstract	Knowledge graphs have evolved rapidly in recent years and their usefulness has been demonstrated in many artificial intelligence tasks. However, knowledge graphs often have lots of missing facts. To solve this problem, many knowledge graph embedding models have been developed to populate knowledge graphs and these have shown outstanding performance. However, knowledge graph embedding models are so-called black boxes, and the user does not know how the information in a knowledge graph is processed and the models can be difficult to interpret. In this paper, we utilize graph patterns in a knowledge graph to overcome such problems. Our proposed model, the {\it graph pattern entity ranking model} (GRank), constructs an entity ranking system for each graph pattern and evaluates them using a ranking measure. By doing so, we can find graph patterns which are useful for predicting facts. Then, we perform link prediction tasks on standard datasets to evaluate our GRank method. We show that our approach outperforms other state-of-the-art approaches such as ComplEx and TorusE for standard metrics such as HITS@{\it n} and MRR. Moreover, our model is easily interpretable because the output facts are described by graph patterns.
Tasks	Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs, Link Prediction
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02856v1
PDF	http://arxiv.org/pdf/1904.02856v1.pdf
PWC	https://paperswithcode.com/paper/graph-pattern-entity-ranking-model-for
Repo
Framework

GAN-Knowledge Distillation for one-stage Object Detection


Title	GAN-Knowledge Distillation for one-stage Object Detection
Authors	Wei Hong, Jin ke Yu Fan Zong
Abstract	Convolutional neural networks have a significant improvement in the accuracy of Object detection. As convolutional neural networks become deeper, the accuracy of detection is also obviously improved, and more floating-point calculations are needed. Many researchers use the knowledge distillation method to improve the accuracy of student networks by transferring knowledge from a deeper and larger teachers network to a small student network, in object detection. Most methods of knowledge distillation need to designed complex cost functions and they are aimed at the two-stage object detection algorithm. This paper proposes a clean and effective knowledge distillation method for the one-stage object detection. The feature maps generated by teacher network and student network are used as true samples and fake samples respectively, and generate adversarial training for both to improve the performance of the student network in one-stage object detection.
Tasks	Object Detection
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08467v4
PDF	https://arxiv.org/pdf/1906.08467v4.pdf
PWC	https://paperswithcode.com/paper/gan-knowledge-distillation-for-one-stage
Repo
Framework