Paper Group ANR 599
Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models. BKTreebank: Building a Vietnamese Dependency Treebank. Semantic Instance Labeling Leveraging Hierarchical Segmentation. Symbol detection in online handwritten graphics using Faster R-CNN. Automatic Music Highlight Extraction using Convolutional Recurrent Attention Ne …
Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models
Title | Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models |
Authors | David Malmgren-Hansen, Allan Aasbjerg Nielsen, Rasmus Engholm |
Abstract | Convolutional Neural Networks (Convnets) have achieved good results in a range of computer vision tasks the recent years. Though given a lot of attention, visualizing the learned representations to interpret Convnets, still remains a challenging task. The high dimensionality of internal representations and the high abstractions of deep layers are the main challenges when visualizing Convnet functionality. We present in this paper a technique based on clustering internal Convnet representations with a Dirichlet Process Gaussian Mixture Model, for visualization of learned representations in Convnets. Our method copes with the high dimensionality of a Convnet by clustering representations across all nodes of each layer. We will discuss how this application is useful when considering transfer learning, i.e.\ transferring a model trained on one dataset to solve a task on a different one. |
Tasks | Transfer Learning |
Published | 2017-02-23 |
URL | http://arxiv.org/abs/1702.07189v1 |
http://arxiv.org/pdf/1702.07189v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-learned-convnet-features-with |
Repo | |
Framework | |
BKTreebank: Building a Vietnamese Dependency Treebank
Title | BKTreebank: Building a Vietnamese Dependency Treebank |
Authors | Kiem-Hieu Nguyen |
Abstract | Dependency treebank is an important resource in any language. In this paper, we present our work on building BKTreebank, a dependency treebank for Vietnamese. Important points on designing POS tagset, dependency relations, and annotation guidelines are discussed. We describe experiments on POS tagging and dependency parsing on the treebank. Experimental results show that the treebank is a useful resource for Vietnamese language processing. |
Tasks | Dependency Parsing |
Published | 2017-10-16 |
URL | http://arxiv.org/abs/1710.05519v2 |
http://arxiv.org/pdf/1710.05519v2.pdf | |
PWC | https://paperswithcode.com/paper/bktreebank-building-a-vietnamese-dependency |
Repo | |
Framework | |
Semantic Instance Labeling Leveraging Hierarchical Segmentation
Title | Semantic Instance Labeling Leveraging Hierarchical Segmentation |
Authors | Steven Hickson, Irfan Essa, Henrik Christensen |
Abstract | Most of the approaches for indoor RGBD semantic la- beling focus on using pixels or superpixels to train a classi- fier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmen- tation for training our classifier. By focusing on meaningful segments that conform more directly to objects, regardless of size, we train a random forest of decision trees as a clas- sifier using simple features such as the 3D size, LAB color histogram, width, height, and shape as specified by a his- togram of surface normals. We test our method on the NYU V2 depth dataset, a challenging dataset of cluttered indoor environments. Our experiments using the NYU V2 depth dataset show that our method achieves state of the art re- sults on both a general semantic labeling introduced by the dataset (floor, structure, furniture, and objects) and a more object specific semantic labeling. We show that training a classifier on a segmentation from a hierarchy of super pixels yields better results than training directly on super pixels, patches, or pixels as in previous work. |
Tasks | |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00946v1 |
http://arxiv.org/pdf/1708.00946v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-instance-labeling-leveraging |
Repo | |
Framework | |
Symbol detection in online handwritten graphics using Faster R-CNN
Title | Symbol detection in online handwritten graphics using Faster R-CNN |
Authors | Frank D. Julca-Aguilar, Nina S. T. Hirata |
Abstract | Symbol detection techniques in online handwritten graphics (e.g. diagrams and mathematical expressions) consist of methods specifically designed for a single graphic type. In this work, we evaluate the Faster R-CNN object detection algorithm as a general method for detection of symbols in handwritten graphics. We evaluate different configurations of the Faster R-CNN method, and point out issues relative to the handwritten nature of the data. Considering the online recognition context, we evaluate efficiency and accuracy trade-offs of using Deep Neural Networks of different complexities as feature extractors. We evaluate the method on publicly available flowchart and mathematical expression (CROHME-2016) datasets. Results show that Faster R-CNN can be effectively used on both datasets, enabling the possibility of developing general methods for symbol detection, and furthermore, general graphic understanding methods that could be built on top of the algorithm. |
Tasks | Object Detection |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1712.04833v1 |
http://arxiv.org/pdf/1712.04833v1.pdf | |
PWC | https://paperswithcode.com/paper/symbol-detection-in-online-handwritten |
Repo | |
Framework | |
Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks
Title | Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks |
Authors | Jung-Woo Ha, Adrian Kim, Chanju Kim, Jangyeon Park, Sunghun Kim |
Abstract | Music highlights are valuable contents for music services. Most methods focused on low-level signal features. We propose a method for extracting highlights using high-level features from convolutional recurrent attention networks (CRAN). CRAN utilizes convolution and recurrent layers for sequential learning with an attention mechanism. The attention allows CRAN to capture significant snippets for distinguishing between genres, thus being used as a high-level feature. CRAN was evaluated on over 32,000 popular tracks in Korea for two months. Experimental results show our method outperforms three baseline methods through quantitative and qualitative evaluations. Also, we analyze the effects of attention and sequence information on performance. |
Tasks | |
Published | 2017-12-16 |
URL | http://arxiv.org/abs/1712.05901v1 |
http://arxiv.org/pdf/1712.05901v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-music-highlight-extraction-using |
Repo | |
Framework | |
A Comparative Study of Word Embeddings for Reading Comprehension
Title | A Comparative Study of Word Embeddings for Reading Comprehension |
Authors | Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, William W. Cohen |
Abstract | The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures. Here we show that seemingly minor choices made on (1) the use of pre-trained word embeddings, and (2) the representation of out-of-vocabulary tokens at test time, can turn out to have a larger impact than architectural choices on the final performance. We systematically explore several options for these choices, and provide recommendations to researchers working in this area. |
Tasks | Reading Comprehension, Word Embeddings |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00993v1 |
http://arxiv.org/pdf/1703.00993v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-study-of-word-embeddings-for |
Repo | |
Framework | |
RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction
Title | RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction |
Authors | Charles P. Martin, Jim Torresen |
Abstract | RoboJam is a machine-learning system for generating music that assists users of a touchscreen music app by performing responses to their short improvisations. This system uses a recurrent artificial neural network to generate sequences of touchscreen interactions and absolute timings, rather than high-level musical notes. To accomplish this, RoboJam’s network uses a mixture density layer to predict appropriate touch interaction locations in space and time. In this paper, we describe the design and implementation of RoboJam’s network and how it has been integrated into a touchscreen music app. A preliminary evaluation analyses the system in terms of training, musical generation and user interaction. |
Tasks | |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.10746v1 |
http://arxiv.org/pdf/1711.10746v1.pdf | |
PWC | https://paperswithcode.com/paper/robojam-a-musical-mixture-density-network-for |
Repo | |
Framework | |
Learning Continuous User Representations through Hybrid Filtering with doc2vec
Title | Learning Continuous User Representations through Hybrid Filtering with doc2vec |
Authors | Simon Stiebellehner, Jun Wang, Shuai Yuan |
Abstract | Players in the online ad ecosystem are struggling to acquire the user data required for precise targeting. Audience look-alike modeling has the potential to alleviate this issue, but models’ performance strongly depends on quantity and quality of available data. In order to maximize the predictive performance of our look-alike modeling algorithms, we propose two novel hybrid filtering techniques that utilize the recent neural probabilistic language model algorithm doc2vec. We apply these methods to data from a large mobile ad exchange and additional app metadata acquired from the Apple App store and Google Play store. First, we model mobile app users through their app usage histories and app descriptions (user2vec). Second, we introduce context awareness to that model by incorporating additional user and app-related metadata in model training (context2vec). Our findings are threefold: (1) the quality of recommendations provided by user2vec is notably higher than current state-of-the-art techniques. (2) User representations generated through hybrid filtering using doc2vec prove to be highly valuable features in supervised machine learning models for look-alike modeling. This represents the first application of hybrid filtering user models using neural probabilistic language models, specifically doc2vec, in look-alike modeling. (3) Incorporating context metadata in the doc2vec model training process to introduce context awareness has positive effects on performance and is superior to directly including the data as features in the downstream supervised models. |
Tasks | Language Modelling |
Published | 2017-12-31 |
URL | http://arxiv.org/abs/1801.00215v1 |
http://arxiv.org/pdf/1801.00215v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-continuous-user-representations |
Repo | |
Framework | |
Sparse-then-Dense Alignment based 3D Map Reconstruction Method for Endoscopic Capsule Robots
Title | Sparse-then-Dense Alignment based 3D Map Reconstruction Method for Endoscopic Capsule Robots |
Authors | Mehmet Turan, Yusuf Yigit Pilavci, Ipek Ganiyusufoglu, Helder Araujo, Ender Konukoglu, Metin Sitti |
Abstract | Since the development of capsule endoscopcy technology, substantial progress were made in converting passive capsule endoscopes to robotic active capsule endoscopes which can be controlled by the doctor. However, robotic capsule endoscopy still has some challenges. In particular, the use of such devices to generate a precise and globally consistent three-dimensional (3D) map of the entire inner organ remains an unsolved problem. Such global 3D maps of inner organs would help doctors to detect the location and size of diseased areas more accurately, precisely, and intuitively, thus permitting more accurate and intuitive diagnoses. The proposed 3D reconstruction system is built in a modular fashion including preprocessing, frame stitching, and shading-based 3D reconstruction modules. We propose an efficient scheme to automatically select the key frames out of the huge quantity of raw endoscopic images. Together with a bundle fusion approach that aligns all the selected key frames jointly in a globally consistent way, a significant improvement of the mosaic and 3D map accuracy was reached. To the best of our knowledge, this framework is the first complete pipeline for an endoscopic capsule robot based 3D map reconstruction containing all of the necessary steps for a reliable and accurate endoscopic 3D map. For the qualitative evaluations, a real pig stomach is employed. Moreover, for the first time in literature, a detailed and comprehensive quantitative analysis of each proposed pipeline modules is performed using a non-rigid esophagus gastro duodenoscopy simulator, four different endoscopic cameras, a magnetically activated soft capsule robot (MASCE), a sub-millimeter precise optical motion tracker and a fine-scale 3D optical scanner. |
Tasks | 3D Reconstruction |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1708.09740v1 |
http://arxiv.org/pdf/1708.09740v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-then-dense-alignment-based-3d-map |
Repo | |
Framework | |
Where to Play: Retrieval of Video Segments using Natural-Language Queries
Title | Where to Play: Retrieval of Video Segments using Natural-Language Queries |
Authors | Sangkuk Lee, Daesik Kim, Myunggi Lee, Jihye Hwang, Nojun Kwak |
Abstract | In this paper, we propose a new approach for retrieval of video segments using natural language queries. Unlike most previous approaches such as concept-based methods or rule-based structured models, the proposed method uses image captioning model to construct sentential queries for visual information. In detail, our approach exploits multiple captions generated by visual features in each image with `Densecap’. Then, the similarities between captions of adjacent images are calculated, which is used to track semantically similar captions over multiple frames. Besides introducing this novel idea of ‘tracking by captioning’, the proposed method is one of the first approaches that uses a language generation model learned by neural networks to construct semantic query describing the relations and properties of visual information. To evaluate the effectiveness of our approach, we have created a new evaluation dataset, which contains about 348 segments of scenes in 20 movie-trailers. Through quantitative and qualitative evaluation, we show that our method is effective for retrieval of video segments using natural language queries. | |
Tasks | Image Captioning, Text Generation |
Published | 2017-07-02 |
URL | http://arxiv.org/abs/1707.00251v1 |
http://arxiv.org/pdf/1707.00251v1.pdf | |
PWC | https://paperswithcode.com/paper/where-to-play-retrieval-of-video-segments |
Repo | |
Framework | |
On the Complexity of Robust Stable Marriage
Title | On the Complexity of Robust Stable Marriage |
Authors | Begum Genc, Mohamed Siala, Gilles Simonin, Barry O’Sullivan |
Abstract | Robust Stable Marriage (RSM) is a variant of the classical Stable Marriage problem, where the robustness of a given stable matching is measured by the number of modifications required for repairing it in case an unforeseen event occurs. We focus on the complexity of finding an (a,b)-supermatch. An (a,b)-supermatch is defined as a stable matching in which if any ‘a’ (non-fixed) men/women break up it is possible to find another stable matching by changing the partners of those ‘a’ men/women and also the partners of at most ‘b’ other couples. In order to show deciding if there exists an (a,b)-supermatch is NP-Complete, we first introduce a SAT formulation that is NP-Complete by using Schaefer’s Dichotomy Theorem. Then, we show the equivalence between the SAT formulation and finding a (1,1)-supermatch on a specific family of instances. |
Tasks | |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.06172v2 |
http://arxiv.org/pdf/1709.06172v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-complexity-of-robust-stable-marriage |
Repo | |
Framework | |
Segmentation of 3D High-frequency Ultrasound Images of Human Lymph Nodes Using Graph Cut with Energy Functional Adapted to Local Intensity Distribution
Title | Segmentation of 3D High-frequency Ultrasound Images of Human Lymph Nodes Using Graph Cut with Energy Functional Adapted to Local Intensity Distribution |
Authors | Jen-wei Kuo, Jonathan Mamou, Yao Wang, Emi Saegusa-Beecroft, Junji Machi, Ernest J. Feleppa |
Abstract | Previous studies by our group have shown that three-dimensional high-frequency quantitative ultrasound methods have the potential to differentiate metastatic lymph nodes from cancer-free lymph nodes dissected from human cancer patients. To successfully perform these methods inside the lymph node parenchyma, an automatic segmentation method is highly desired to exclude the surrounding thin layer of fat from quantitative ultrasound processing and accurately correct for ultrasound attenuation. In high-frequency ultrasound images of lymph nodes, the intensity distribution of lymph node parenchyma and fat varies spatially because of acoustic attenuation and focusing effects. Thus, the intensity contrast between two object regions (e.g., lymph node parenchyma and fat) is also spatially varying. In our previous work, nested graph cut demonstrated its ability to simultaneously segment lymph node parenchyma, fat, and the outer phosphate-buffered saline bath even when some boundaries are lost because of acoustic attenuation and focusing effects. This paper describes a novel approach called graph cut with locally adaptive energy to further deal with spatially varying distributions of lymph node parenchyma and fat caused by inhomogeneous acoustic attenuation. The proposed method achieved Dice similarity coefficients of 0.937+-0.035 when compared to expert manual segmentation on a representative dataset consisting of 115 three-dimensional lymph node images obtained from colorectal cancer patients. |
Tasks | |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07015v1 |
http://arxiv.org/pdf/1705.07015v1.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-of-3d-high-frequency-ultrasound |
Repo | |
Framework | |
Application of machine learning for hematological diagnosis
Title | Application of machine learning for hematological diagnosis |
Authors | Gregor Gunčar, Matjaž Kukar, Mateja Notar, Miran Brvar, Peter Černelč, Manca Notar, Marko Notar |
Abstract | Quick and accurate medical diagnosis is crucial for the successful treatment of a disease. Using machine learning algorithms, we have built two models to predict a hematologic disease, based on laboratory blood test results. In one predictive model, we used all available blood test parameters and in the other a reduced set, which is usually measured upon patient admittance. Both models produced good results, with a prediction accuracy of 0.88 and 0.86, when considering the list of five most probable diseases, and 0.59 and 0.57, when considering only the most probable disease. Models did not differ significantly from each other, which indicates that a reduced set of parameters contains a relevant fingerprint of a disease, expanding the utility of the model for general practitioner’s use and indicating that there is more information in the blood test results than physicians recognize. In the clinical test we showed that the accuracy of our predictive models was on a par with the ability of hematology specialists. Our study is the first to show that a machine learning predictive model based on blood tests alone, can be successfully applied to predict hematologic diseases and could open up unprecedented possibilities in medical diagnosis. |
Tasks | Medical Diagnosis |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00253v1 |
http://arxiv.org/pdf/1708.00253v1.pdf | |
PWC | https://paperswithcode.com/paper/application-of-machine-learning-for |
Repo | |
Framework | |
Training Triplet Networks with GAN
Title | Training Triplet Networks with GAN |
Authors | Maciej Zieba, Lei Wang |
Abstract | Triplet networks are widely used models that are characterized by good performance in classification and retrieval tasks. In this work we propose to train a triplet network by putting it as the discriminator in Generative Adversarial Nets (GANs). We make use of the good capability of representation learning of the discriminator to increase the predictive quality of the model. We evaluated our approach on Cifar10 and MNIST datasets and observed significant improvement on the classification performance using the simple k-nn method. |
Tasks | Representation Learning |
Published | 2017-04-06 |
URL | http://arxiv.org/abs/1704.02227v1 |
http://arxiv.org/pdf/1704.02227v1.pdf | |
PWC | https://paperswithcode.com/paper/training-triplet-networks-with-gan |
Repo | |
Framework | |
MatchZoo: A Toolkit for Deep Text Matching
Title | MatchZoo: A Toolkit for Deep Text Matching |
Authors | Yixing Fan, Liang Pang, JianPeng Hou, Jiafeng Guo, Yanyan Lan, Xueqi Cheng |
Abstract | In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Specifically, the toolkit provides a unified data preparation module for different text matching problems, a flexible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interaction-focused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo. |
Tasks | Ad-Hoc Information Retrieval, Information Retrieval, Question Answering, Text Matching |
Published | 2017-07-23 |
URL | http://arxiv.org/abs/1707.07270v1 |
http://arxiv.org/pdf/1707.07270v1.pdf | |
PWC | https://paperswithcode.com/paper/matchzoo-a-toolkit-for-deep-text-matching |
Repo | |
Framework | |