July 27, 2019

2868 words 14 mins read

Paper Group ANR 599

Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models. BKTreebank: Building a Vietnamese Dependency Treebank. Semantic Instance Labeling Leveraging Hierarchical Segmentation. Symbol detection in online handwritten graphics using Faster R-CNN. Automatic Music Highlight Extraction using Convolutional Recurrent Attention Ne …

Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models


Title	Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models
Authors	David Malmgren-Hansen, Allan Aasbjerg Nielsen, Rasmus Engholm
Abstract	Convolutional Neural Networks (Convnets) have achieved good results in a range of computer vision tasks the recent years. Though given a lot of attention, visualizing the learned representations to interpret Convnets, still remains a challenging task. The high dimensionality of internal representations and the high abstractions of deep layers are the main challenges when visualizing Convnet functionality. We present in this paper a technique based on clustering internal Convnet representations with a Dirichlet Process Gaussian Mixture Model, for visualization of learned representations in Convnets. Our method copes with the high dimensionality of a Convnet by clustering representations across all nodes of each layer. We will discuss how this application is useful when considering transfer learning, i.e.\ transferring a model trained on one dataset to solve a task on a different one.
Tasks	Transfer Learning
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07189v1
PDF	http://arxiv.org/pdf/1702.07189v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-learned-convnet-features-with
Repo
Framework

BKTreebank: Building a Vietnamese Dependency Treebank


Title	BKTreebank: Building a Vietnamese Dependency Treebank
Authors	Kiem-Hieu Nguyen
Abstract	Dependency treebank is an important resource in any language. In this paper, we present our work on building BKTreebank, a dependency treebank for Vietnamese. Important points on designing POS tagset, dependency relations, and annotation guidelines are discussed. We describe experiments on POS tagging and dependency parsing on the treebank. Experimental results show that the treebank is a useful resource for Vietnamese language processing.
Tasks	Dependency Parsing
Published	2017-10-16
URL	http://arxiv.org/abs/1710.05519v2
PDF	http://arxiv.org/pdf/1710.05519v2.pdf
PWC	https://paperswithcode.com/paper/bktreebank-building-a-vietnamese-dependency
Repo
Framework

Semantic Instance Labeling Leveraging Hierarchical Segmentation


Title	Semantic Instance Labeling Leveraging Hierarchical Segmentation
Authors	Steven Hickson, Irfan Essa, Henrik Christensen
Abstract	Most of the approaches for indoor RGBD semantic la- beling focus on using pixels or superpixels to train a classi- fier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmen- tation for training our classifier. By focusing on meaningful segments that conform more directly to objects, regardless of size, we train a random forest of decision trees as a clas- sifier using simple features such as the 3D size, LAB color histogram, width, height, and shape as specified by a his- togram of surface normals. We test our method on the NYU V2 depth dataset, a challenging dataset of cluttered indoor environments. Our experiments using the NYU V2 depth dataset show that our method achieves state of the art re- sults on both a general semantic labeling introduced by the dataset (floor, structure, furniture, and objects) and a more object specific semantic labeling. We show that training a classifier on a segmentation from a hierarchy of super pixels yields better results than training directly on super pixels, patches, or pixels as in previous work.
Tasks
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00946v1
PDF	http://arxiv.org/pdf/1708.00946v1.pdf
PWC	https://paperswithcode.com/paper/semantic-instance-labeling-leveraging
Repo
Framework

Symbol detection in online handwritten graphics using Faster R-CNN


Title	Symbol detection in online handwritten graphics using Faster R-CNN
Authors	Frank D. Julca-Aguilar, Nina S. T. Hirata
Abstract	Symbol detection techniques in online handwritten graphics (e.g. diagrams and mathematical expressions) consist of methods specifically designed for a single graphic type. In this work, we evaluate the Faster R-CNN object detection algorithm as a general method for detection of symbols in handwritten graphics. We evaluate different configurations of the Faster R-CNN method, and point out issues relative to the handwritten nature of the data. Considering the online recognition context, we evaluate efficiency and accuracy trade-offs of using Deep Neural Networks of different complexities as feature extractors. We evaluate the method on publicly available flowchart and mathematical expression (CROHME-2016) datasets. Results show that Faster R-CNN can be effectively used on both datasets, enabling the possibility of developing general methods for symbol detection, and furthermore, general graphic understanding methods that could be built on top of the algorithm.
Tasks	Object Detection
Published	2017-12-13
URL	http://arxiv.org/abs/1712.04833v1
PDF	http://arxiv.org/pdf/1712.04833v1.pdf
PWC	https://paperswithcode.com/paper/symbol-detection-in-online-handwritten
Repo
Framework

Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks


Title	Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks
Authors	Jung-Woo Ha, Adrian Kim, Chanju Kim, Jangyeon Park, Sunghun Kim
Abstract	Music highlights are valuable contents for music services. Most methods focused on low-level signal features. We propose a method for extracting highlights using high-level features from convolutional recurrent attention networks (CRAN). CRAN utilizes convolution and recurrent layers for sequential learning with an attention mechanism. The attention allows CRAN to capture significant snippets for distinguishing between genres, thus being used as a high-level feature. CRAN was evaluated on over 32,000 popular tracks in Korea for two months. Experimental results show our method outperforms three baseline methods through quantitative and qualitative evaluations. Also, we analyze the effects of attention and sequence information on performance.
Tasks
Published	2017-12-16
URL	http://arxiv.org/abs/1712.05901v1
PDF	http://arxiv.org/pdf/1712.05901v1.pdf
PWC	https://paperswithcode.com/paper/automatic-music-highlight-extraction-using
Repo
Framework

A Comparative Study of Word Embeddings for Reading Comprehension


Title	A Comparative Study of Word Embeddings for Reading Comprehension
Authors	Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, William W. Cohen
Abstract	The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures. Here we show that seemingly minor choices made on (1) the use of pre-trained word embeddings, and (2) the representation of out-of-vocabulary tokens at test time, can turn out to have a larger impact than architectural choices on the final performance. We systematically explore several options for these choices, and provide recommendations to researchers working in this area.
Tasks	Reading Comprehension, Word Embeddings
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00993v1
PDF	http://arxiv.org/pdf/1703.00993v1.pdf
PWC	https://paperswithcode.com/paper/a-comparative-study-of-word-embeddings-for
Repo
Framework

RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction


Title	RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction
Authors	Charles P. Martin, Jim Torresen
Abstract	RoboJam is a machine-learning system for generating music that assists users of a touchscreen music app by performing responses to their short improvisations. This system uses a recurrent artificial neural network to generate sequences of touchscreen interactions and absolute timings, rather than high-level musical notes. To accomplish this, RoboJam’s network uses a mixture density layer to predict appropriate touch interaction locations in space and time. In this paper, we describe the design and implementation of RoboJam’s network and how it has been integrated into a touchscreen music app. A preliminary evaluation analyses the system in terms of training, musical generation and user interaction.
Tasks
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10746v1
PDF	http://arxiv.org/pdf/1711.10746v1.pdf
PWC	https://paperswithcode.com/paper/robojam-a-musical-mixture-density-network-for
Repo
Framework

Learning Continuous User Representations through Hybrid Filtering with doc2vec


Title	Learning Continuous User Representations through Hybrid Filtering with doc2vec
Authors	Simon Stiebellehner, Jun Wang, Shuai Yuan
Abstract	Players in the online ad ecosystem are struggling to acquire the user data required for precise targeting. Audience look-alike modeling has the potential to alleviate this issue, but models’ performance strongly depends on quantity and quality of available data. In order to maximize the predictive performance of our look-alike modeling algorithms, we propose two novel hybrid filtering techniques that utilize the recent neural probabilistic language model algorithm doc2vec. We apply these methods to data from a large mobile ad exchange and additional app metadata acquired from the Apple App store and Google Play store. First, we model mobile app users through their app usage histories and app descriptions (user2vec). Second, we introduce context awareness to that model by incorporating additional user and app-related metadata in model training (context2vec). Our findings are threefold: (1) the quality of recommendations provided by user2vec is notably higher than current state-of-the-art techniques. (2) User representations generated through hybrid filtering using doc2vec prove to be highly valuable features in supervised machine learning models for look-alike modeling. This represents the first application of hybrid filtering user models using neural probabilistic language models, specifically doc2vec, in look-alike modeling. (3) Incorporating context metadata in the doc2vec model training process to introduce context awareness has positive effects on performance and is superior to directly including the data as features in the downstream supervised models.
Tasks	Language Modelling
Published	2017-12-31
URL	http://arxiv.org/abs/1801.00215v1
PDF	http://arxiv.org/pdf/1801.00215v1.pdf
PWC	https://paperswithcode.com/paper/learning-continuous-user-representations
Repo
Framework

Sparse-then-Dense Alignment based 3D Map Reconstruction Method for Endoscopic Capsule Robots


Title	Sparse-then-Dense Alignment based 3D Map Reconstruction Method for Endoscopic Capsule Robots
Authors	Mehmet Turan, Yusuf Yigit Pilavci, Ipek Ganiyusufoglu, Helder Araujo, Ender Konukoglu, Metin Sitti
Abstract	Since the development of capsule endoscopcy technology, substantial progress were made in converting passive capsule endoscopes to robotic active capsule endoscopes which can be controlled by the doctor. However, robotic capsule endoscopy still has some challenges. In particular, the use of such devices to generate a precise and globally consistent three-dimensional (3D) map of the entire inner organ remains an unsolved problem. Such global 3D maps of inner organs would help doctors to detect the location and size of diseased areas more accurately, precisely, and intuitively, thus permitting more accurate and intuitive diagnoses. The proposed 3D reconstruction system is built in a modular fashion including preprocessing, frame stitching, and shading-based 3D reconstruction modules. We propose an efficient scheme to automatically select the key frames out of the huge quantity of raw endoscopic images. Together with a bundle fusion approach that aligns all the selected key frames jointly in a globally consistent way, a significant improvement of the mosaic and 3D map accuracy was reached. To the best of our knowledge, this framework is the first complete pipeline for an endoscopic capsule robot based 3D map reconstruction containing all of the necessary steps for a reliable and accurate endoscopic 3D map. For the qualitative evaluations, a real pig stomach is employed. Moreover, for the first time in literature, a detailed and comprehensive quantitative analysis of each proposed pipeline modules is performed using a non-rigid esophagus gastro duodenoscopy simulator, four different endoscopic cameras, a magnetically activated soft capsule robot (MASCE), a sub-millimeter precise optical motion tracker and a fine-scale 3D optical scanner.
Tasks	3D Reconstruction
Published	2017-08-29
URL	http://arxiv.org/abs/1708.09740v1
PDF	http://arxiv.org/pdf/1708.09740v1.pdf
PWC	https://paperswithcode.com/paper/sparse-then-dense-alignment-based-3d-map
Repo
Framework

Where to Play: Retrieval of Video Segments using Natural-Language Queries


Title	Where to Play: Retrieval of Video Segments using Natural-Language Queries
Authors	Sangkuk Lee, Daesik Kim, Myunggi Lee, Jihye Hwang, Nojun Kwak
Abstract	In this paper, we propose a new approach for retrieval of video segments using natural language queries. Unlike most previous approaches such as concept-based methods or rule-based structured models, the proposed method uses image captioning model to construct sentential queries for visual information. In detail, our approach exploits multiple captions generated by visual features in each image with `Densecap’. Then, the similarities between captions of adjacent images are calculated, which is used to track semantically similar captions over multiple frames. Besides introducing this novel idea of ‘tracking by captioning’, the proposed method is one of the first approaches that uses a language generation model learned by neural networks to construct semantic query describing the relations and properties of visual information. To evaluate the effectiveness of our approach, we have created a new evaluation dataset, which contains about 348 segments of scenes in 20 movie-trailers. Through quantitative and qualitative evaluation, we show that our method is effective for retrieval of video segments using natural language queries. \|
Tasks	Image Captioning, Text Generation
Published	2017-07-02
URL	http://arxiv.org/abs/1707.00251v1
PDF	http://arxiv.org/pdf/1707.00251v1.pdf
PWC	https://paperswithcode.com/paper/where-to-play-retrieval-of-video-segments
Repo
Framework

On the Complexity of Robust Stable Marriage


Title	On the Complexity of Robust Stable Marriage
Authors	Begum Genc, Mohamed Siala, Gilles Simonin, Barry O’Sullivan
Abstract	Robust Stable Marriage (RSM) is a variant of the classical Stable Marriage problem, where the robustness of a given stable matching is measured by the number of modifications required for repairing it in case an unforeseen event occurs. We focus on the complexity of finding an (a,b)-supermatch. An (a,b)-supermatch is defined as a stable matching in which if any ‘a’ (non-fixed) men/women break up it is possible to find another stable matching by changing the partners of those ‘a’ men/women and also the partners of at most ‘b’ other couples. In order to show deciding if there exists an (a,b)-supermatch is NP-Complete, we first introduce a SAT formulation that is NP-Complete by using Schaefer’s Dichotomy Theorem. Then, we show the equivalence between the SAT formulation and finding a (1,1)-supermatch on a specific family of instances.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06172v2
PDF	http://arxiv.org/pdf/1709.06172v2.pdf
PWC	https://paperswithcode.com/paper/on-the-complexity-of-robust-stable-marriage
Repo
Framework

Segmentation of 3D High-frequency Ultrasound Images of Human Lymph Nodes Using Graph Cut with Energy Functional Adapted to Local Intensity Distribution


Title	Segmentation of 3D High-frequency Ultrasound Images of Human Lymph Nodes Using Graph Cut with Energy Functional Adapted to Local Intensity Distribution
Authors	Jen-wei Kuo, Jonathan Mamou, Yao Wang, Emi Saegusa-Beecroft, Junji Machi, Ernest J. Feleppa
Abstract	Previous studies by our group have shown that three-dimensional high-frequency quantitative ultrasound methods have the potential to differentiate metastatic lymph nodes from cancer-free lymph nodes dissected from human cancer patients. To successfully perform these methods inside the lymph node parenchyma, an automatic segmentation method is highly desired to exclude the surrounding thin layer of fat from quantitative ultrasound processing and accurately correct for ultrasound attenuation. In high-frequency ultrasound images of lymph nodes, the intensity distribution of lymph node parenchyma and fat varies spatially because of acoustic attenuation and focusing effects. Thus, the intensity contrast between two object regions (e.g., lymph node parenchyma and fat) is also spatially varying. In our previous work, nested graph cut demonstrated its ability to simultaneously segment lymph node parenchyma, fat, and the outer phosphate-buffered saline bath even when some boundaries are lost because of acoustic attenuation and focusing effects. This paper describes a novel approach called graph cut with locally adaptive energy to further deal with spatially varying distributions of lymph node parenchyma and fat caused by inhomogeneous acoustic attenuation. The proposed method achieved Dice similarity coefficients of 0.937+-0.035 when compared to expert manual segmentation on a representative dataset consisting of 115 three-dimensional lymph node images obtained from colorectal cancer patients.
Tasks
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07015v1
PDF	http://arxiv.org/pdf/1705.07015v1.pdf
PWC	https://paperswithcode.com/paper/segmentation-of-3d-high-frequency-ultrasound
Repo
Framework

Application of machine learning for hematological diagnosis


Title	Application of machine learning for hematological diagnosis
Authors	Gregor Gunčar, Matjaž Kukar, Mateja Notar, Miran Brvar, Peter Černelč, Manca Notar, Marko Notar
Abstract	Quick and accurate medical diagnosis is crucial for the successful treatment of a disease. Using machine learning algorithms, we have built two models to predict a hematologic disease, based on laboratory blood test results. In one predictive model, we used all available blood test parameters and in the other a reduced set, which is usually measured upon patient admittance. Both models produced good results, with a prediction accuracy of 0.88 and 0.86, when considering the list of five most probable diseases, and 0.59 and 0.57, when considering only the most probable disease. Models did not differ significantly from each other, which indicates that a reduced set of parameters contains a relevant fingerprint of a disease, expanding the utility of the model for general practitioner’s use and indicating that there is more information in the blood test results than physicians recognize. In the clinical test we showed that the accuracy of our predictive models was on a par with the ability of hematology specialists. Our study is the first to show that a machine learning predictive model based on blood tests alone, can be successfully applied to predict hematologic diseases and could open up unprecedented possibilities in medical diagnosis.
Tasks	Medical Diagnosis
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00253v1
PDF	http://arxiv.org/pdf/1708.00253v1.pdf
PWC	https://paperswithcode.com/paper/application-of-machine-learning-for
Repo
Framework

Training Triplet Networks with GAN


Title	Training Triplet Networks with GAN
Authors	Maciej Zieba, Lei Wang
Abstract	Triplet networks are widely used models that are characterized by good performance in classification and retrieval tasks. In this work we propose to train a triplet network by putting it as the discriminator in Generative Adversarial Nets (GANs). We make use of the good capability of representation learning of the discriminator to increase the predictive quality of the model. We evaluated our approach on Cifar10 and MNIST datasets and observed significant improvement on the classification performance using the simple k-nn method.
Tasks	Representation Learning
Published	2017-04-06
URL	http://arxiv.org/abs/1704.02227v1
PDF	http://arxiv.org/pdf/1704.02227v1.pdf
PWC	https://paperswithcode.com/paper/training-triplet-networks-with-gan
Repo
Framework

MatchZoo: A Toolkit for Deep Text Matching


Title	MatchZoo: A Toolkit for Deep Text Matching
Authors	Yixing Fan, Liang Pang, JianPeng Hou, Jiafeng Guo, Yanyan Lan, Xueqi Cheng
Abstract	In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Specifically, the toolkit provides a unified data preparation module for different text matching problems, a flexible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interaction-focused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo.
Tasks	Ad-Hoc Information Retrieval, Information Retrieval, Question Answering, Text Matching
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07270v1
PDF	http://arxiv.org/pdf/1707.07270v1.pdf
PWC	https://paperswithcode.com/paper/matchzoo-a-toolkit-for-deep-text-matching
Repo
Framework