July 28, 2019

2698 words 13 mins read

Paper Group ANR 169

Fixing the Infix: Unsupervised Discovery of Root-and-Pattern Morphology. Training and Testing Object Detectors with Virtual Images. Learning Word Embeddings from Speech. Integration of Machine Learning Techniques to Evaluate Dynamic Customer Segmentation Analysis for Mobile Customers. Kernel k-Groups via Hartigan’s Method. Evolutionary Acyclic Grap …

Fixing the Infix: Unsupervised Discovery of Root-and-Pattern Morphology


Title	Fixing the Infix: Unsupervised Discovery of Root-and-Pattern Morphology
Authors	Tarek Sakakini, Suma Bhat, Pramod Viswanath
Abstract	We present an unsupervised and language-agnostic method for learning root-and-pattern morphology in Semitic languages. This form of morphology, abundant in Semitic languages, has not been handled in prior unsupervised approaches. We harness the syntactico-semantic information in distributed word representations to solve the long standing problem of root-and-pattern discovery in Semitic languages. Moreover, we construct an unsupervised root extractor based on the learned rules. We prove the validity of learned rules across Arabic, Hebrew, and Amharic, alongside showing that our root extractor compares favorably with a widely used, carefully engineered root extractor: ISRI.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02211v2
PDF	http://arxiv.org/pdf/1702.02211v2.pdf
PWC	https://paperswithcode.com/paper/fixing-the-infix-unsupervised-discovery-of
Repo
Framework

Training and Testing Object Detectors with Virtual Images


Title	Training and Testing Object Detectors with Virtual Images
Authors	Yonglin Tian, Xuan Li, Kunfeng Wang, Fei-Yue Wang
Abstract	In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world has a great demand for labor and money investments and is usually too passive to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM (Deformable Parts Model) and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.
Tasks
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08470v1
PDF	http://arxiv.org/pdf/1712.08470v1.pdf
PWC	https://paperswithcode.com/paper/training-and-testing-object-detectors-with
Repo
Framework

Learning Word Embeddings from Speech


Title	Learning Word Embeddings from Speech
Authors	Yu-An Chung, James Glass
Abstract	In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar. The design of the proposed model is based on the RNN Encoder-Decoder framework, and borrows the methodology of continuous skip-grams for training. The learned vector representations are evaluated on 13 widely used word similarity benchmarks, and achieved competitive results to that of GloVe. The biggest advantage of the proposed model is its capability of extracting semantic information of audio segments taken directly from raw speech, without relying on any other modalities such as text or images, which are challenging and expensive to collect and annotate.
Tasks	Learning Word Embeddings, Word Embeddings
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01515v1
PDF	http://arxiv.org/pdf/1711.01515v1.pdf
PWC	https://paperswithcode.com/paper/learning-word-embeddings-from-speech
Repo
Framework

Integration of Machine Learning Techniques to Evaluate Dynamic Customer Segmentation Analysis for Mobile Customers


Title	Integration of Machine Learning Techniques to Evaluate Dynamic Customer Segmentation Analysis for Mobile Customers
Authors	Cormac Dullaghan, Eleni Rozaki
Abstract	The telecommunications industry is highly competitive, which means that the mobile providers need a business intelligence model that can be used to achieve an optimal level of churners, as well as a minimal level of cost in marketing activities. Machine learning applications can be used to provide guidance on marketing strategies. Furthermore, data mining techniques can be used in the process of customer segmentation. The purpose of this paper is to provide a detailed analysis of the C.5 algorithm, within naive Bayesian modelling for the task of segmenting telecommunication customers behavioural profiling according to their billing and socio-demographic aspects. Results have been experimentally implemented.
Tasks
Published	2017-01-31
URL	http://arxiv.org/abs/1702.02215v1
PDF	http://arxiv.org/pdf/1702.02215v1.pdf
PWC	https://paperswithcode.com/paper/integration-of-machine-learning-techniques-to
Repo
Framework

Kernel k-Groups via Hartigan’s Method


Title	Kernel k-Groups via Hartigan’s Method
Authors	Guilherme França, Maria L. Rizzo, Joshua T. Vogelstein
Abstract	Energy statistics was proposed by Sz'{e}kely in the 80’s inspired by Newton’s gravitational potential in classical mechanics and it provides a model-free hypothesis test for equality of distributions. In its original form, energy statistics was formulated in Euclidean spaces. More recently, it was generalized to metric spaces of negative type. In this paper, we consider a formulation for the clustering problem using a weighted version of energy statistics in spaces of negative type. We show that this approach leads to a quadratically constrained quadratic program in the associated kernel space, establishing connections with graph partitioning problems and kernel methods in machine learning. To find local solutions of such an optimization problem, we propose kernel k-groups, which is an extension of Hartigan’s method to kernel spaces. Kernel k-groups is cheaper than spectral clustering and has the same computational cost as kernel k-means (which is based on Lloyd’s heuristic) but our numerical results show an improved performance, especially in higher dimensions. Moreover, we verify the efficiency of kernel k-groups in community detection in sparse stochastic block models which has fascinating applications in several areas of science.
Tasks	Community Detection, graph partitioning
Published	2017-10-26
URL	https://arxiv.org/abs/1710.09859v3
PDF	https://arxiv.org/pdf/1710.09859v3.pdf
PWC	https://paperswithcode.com/paper/kernel-k-groups-via-hartigans-method
Repo
Framework

Evolutionary Acyclic Graph Partitioning


Title	Evolutionary Acyclic Graph Partitioning
Authors	Orlando Moreira, Merten Popp, Christian Schulz
Abstract	Directed graphs are widely used to model data flow and execution dependencies in streaming applications. This enables the utilization of graph partitioning algorithms for the problem of parallelizing computation for multiprocessor architectures. However due to resource restrictions, an acyclicity constraint on the partition is necessary when mapping streaming applications to an embedded multiprocessor. Here, we contribute a multi-level algorithm for the acyclic graph partitioning problem. Based on this, we engineer an evolutionary algorithm to further reduce communication cost, as well as to improve load balancing and the scheduling makespan on embedded multiprocessor architectures.
Tasks	graph partitioning
Published	2017-09-25
URL	http://arxiv.org/abs/1709.08563v1
PDF	http://arxiv.org/pdf/1709.08563v1.pdf
PWC	https://paperswithcode.com/paper/evolutionary-acyclic-graph-partitioning
Repo
Framework

On the definition of Shape Parts: a Dominant Sets Approach


Title	On the definition of Shape Parts: a Dominant Sets Approach
Authors	Foteini Fotopoulou, George Economou
Abstract	In the present paper a novel graph-based approach to the shape decomposition problem is addressed. The shape is appropriately transformed into a visibility graph enriched with local neighborhood information. A two-step diffusion process is then applied to the visibility graph that efficiently enhances the information provided, thus leading to a more robust and meaningful graph construction. Inspired by the notion of a clique as a strict cluster definition, the dominant sets algorithm is invoked, slightly modified to comport with the specific problem of defining shape parts. The cluster cohesiveness and a node participation vector are two important outputs of the proposed graph partitioning method. Opposed to most of the existing techniques, the final number of the clusters is determined automatically, by estimating the cluster cohesiveness on a random network generation process. Experimental results on several shape databases show the effectiveness of our framework for graph based shape decomposition.
Tasks	graph construction, graph partitioning
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03588v1
PDF	http://arxiv.org/pdf/1709.03588v1.pdf
PWC	https://paperswithcode.com/paper/on-the-definition-of-shape-parts-a-dominant
Repo
Framework

3D Cell Nuclei Segmentation with Balanced Graph Partitioning


Title	3D Cell Nuclei Segmentation with Balanced Graph Partitioning
Authors	Julian Arz, Peter Sanders, Johannes Stegmaier, Ralf Mikut
Abstract	Cell nuclei segmentation is one of the most important tasks in the analysis of biomedical images. With ever-growing sizes and amounts of three-dimensional images to be processed, there is a need for better and faster segmentation methods. Graph-based image segmentation has seen a rise in popularity in recent years, but is seen as very costly with regard to computational demand. We propose a new segmentation algorithm which overcomes these limitations. Our method uses recursive balanced graph partitioning to segment foreground components of a fast and efficient binarization. We construct a model for the cell nuclei to guide the partitioning process. Our algorithm is compared to other state-of-the-art segmentation algorithms in an experimental evaluation on two sets of realistically simulated inputs. Our method is faster, has similar or better quality and an acceptable memory overhead.
Tasks	graph partitioning, Semantic Segmentation
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05413v1
PDF	http://arxiv.org/pdf/1702.05413v1.pdf
PWC	https://paperswithcode.com/paper/3d-cell-nuclei-segmentation-with-balanced
Repo
Framework

Semantic Web Today: From Oil Rigs to Panama Papers


Title	Semantic Web Today: From Oil Rigs to Panama Papers
Authors	Rivindu Perera, Parma Nand, Boris Bacic, Wen-Hsin Yang, Kazuhiro Seki, Radek Burget
Abstract	The next leap on the internet has already started as Semantic Web. At its core, Semantic Web transforms the document oriented web to a data oriented web enriched with semantics embedded as metadata. This change in perspective towards the web offers numerous benefits for vast amount of data intensive industries that are bound to the web and its related applications. The industries are diverse as they range from Oil & Gas exploration to the investigative journalism, and everything in between. This paper discusses eight different industries which currently reap the benefits of Semantic Web. The paper also offers a future outlook into Semantic Web applications and discusses the areas in which Semantic Web would play a key role in the future.
Tasks
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01518v1
PDF	http://arxiv.org/pdf/1711.01518v1.pdf
PWC	https://paperswithcode.com/paper/semantic-web-today-from-oil-rigs-to-panama
Repo
Framework

Weighted Low Rank Approximation for Background Estimation Problems


Title	Weighted Low Rank Approximation for Background Estimation Problems
Authors	Aritra Dutta, Xin Li
Abstract	Classical principal component analysis (PCA) is not robust to the presence of sparse outliers in the data. The use of the $\ell_1$ norm in the Robust PCA (RPCA) method successfully eliminates the weakness of PCA in separating the sparse outliers. In this paper, by sticking a simple weight to the Frobenius norm, we propose a weighted low rank (WLR) method to avoid the often computationally expensive algorithms relying on the $\ell_1$ norm. As a proof of concept, a background estimation model has been presented and compared with two $\ell_1$ norm minimization algorithms. We illustrate that as long as a simple weight matrix is inferred from the data, one can use the weighted Frobenius norm and achieve the same or better performance.
Tasks
Published	2017-07-04
URL	http://arxiv.org/abs/1707.01753v1
PDF	http://arxiv.org/pdf/1707.01753v1.pdf
PWC	https://paperswithcode.com/paper/weighted-low-rank-approximation-for
Repo
Framework

Intelligent Fault Analysis in Electrical Power Grids


Title	Intelligent Fault Analysis in Electrical Power Grids
Authors	Biswarup Bhattacharya, Abhishek Sinha
Abstract	Power grids are one of the most important components of infrastructure in today’s world. Every nation is dependent on the security and stability of its own power grid to provide electricity to the households and industries. A malfunction of even a small part of a power grid can cause loss of productivity, revenue and in some cases even life. Thus, it is imperative to design a system which can detect the health of the power grid and take protective measures accordingly even before a serious anomaly takes place. To achieve this objective, we have set out to create an artificially intelligent system which can analyze the grid information at any given time and determine the health of the grid through the usage of sophisticated formal models and novel machine learning techniques like recurrent neural networks. Our system simulates grid conditions including stimuli like faults, generator output fluctuations, load fluctuations using Siemens PSS/E software and this data is trained using various classifiers like SVM, LSTM and subsequently tested. The results are excellent with our methods giving very high accuracy for the data. This model can easily be scaled to handle larger and more complex grid architectures.
Tasks
Published	2017-11-08
URL	http://arxiv.org/abs/1711.03026v1
PDF	http://arxiv.org/pdf/1711.03026v1.pdf
PWC	https://paperswithcode.com/paper/intelligent-fault-analysis-in-electrical
Repo
Framework

Recursive Exponential Weighting for Online Non-convex Optimization


Title	Recursive Exponential Weighting for Online Non-convex Optimization
Authors	Lin Yang, Cheng Tan, Wing Shing Wong
Abstract	In this paper, we investigate the online non-convex optimization problem which generalizes the classic {online convex optimization problem by relaxing the convexity assumption on the cost function. For this type of problem, the classic exponential weighting online algorithm has recently been shown to attain a sub-linear regret of $O(\sqrt{T\log T})$. In this paper, we introduce a novel recursive structure to the online algorithm to define a recursive exponential weighting algorithm that attains a regret of $O(\sqrt{T})$, matching the well-known regret lower bound. To the best of our knowledge, this is the first online algorithm with provable $O(\sqrt{T})$ regret for the online non-convex optimization problem.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04136v1
PDF	http://arxiv.org/pdf/1709.04136v1.pdf
PWC	https://paperswithcode.com/paper/recursive-exponential-weighting-for-online
Repo
Framework

A Multiscale Patch Based Convolutional Network for Brain Tumor Segmentation


Title	A Multiscale Patch Based Convolutional Network for Brain Tumor Segmentation
Authors	Jean Stawiaski
Abstract	This article presents a multiscale patch based convolutional neural network for the automatic segmentation of brain tumors in multi-modality 3D MR images. We use multiscale deep supervision and inputs to train a convolutional network. We evaluate the effectiveness of the proposed approach on the BRATS 2017 segmentation challenge where we obtained dice scores of 0.755, 0.900, 0.782 and 95% Hausdorff distance of 3.63mm, 4.10mm, and 6.81mm for enhanced tumor core, whole tumor and tumor core respectively.
Tasks	Brain Tumor Segmentation
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02316v1
PDF	http://arxiv.org/pdf/1710.02316v1.pdf
PWC	https://paperswithcode.com/paper/a-multiscale-patch-based-convolutional
Repo
Framework

Data Sets: Word Embeddings Learned from Tweets and General Data


Title	Data Sets: Word Embeddings Learned from Tweets and General Data
Authors	Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh
Abstract	A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general text. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks.
Tasks	Sentiment Analysis, Word Embeddings
Published	2017-08-14
URL	http://arxiv.org/abs/1708.03994v1
PDF	http://arxiv.org/pdf/1708.03994v1.pdf
PWC	https://paperswithcode.com/paper/data-sets-word-embeddings-learned-from-tweets
Repo
Framework

Aspect-Based Sentiment Analysis Using a Two-Step Neural Network Architecture


Title	Aspect-Based Sentiment Analysis Using a Two-Step Neural Network Architecture
Authors	Soufian Jebbara, Philipp Cimiano
Abstract	The World Wide Web holds a wealth of information in the form of unstructured texts such as customer reviews for products, events and more. By extracting and analyzing the expressed opinions in customer reviews in a fine-grained way, valuable opportunities and insights for customers and businesses can be gained. We propose a neural network based system to address the task of Aspect-Based Sentiment Analysis to compete in Task 2 of the ESWC-2016 Challenge on Semantic Sentiment Analysis. Our proposed architecture divides the task in two subtasks: aspect term extraction and aspect-specific sentiment extraction. This approach is flexible in that it allows to address each subtask independently. As a first step, a recurrent neural network is used to extract aspects from a text by framing the problem as a sequence labeling task. In a second step, a recurrent network processes each extracted aspect with respect to its context and predicts a sentiment label. The system uses pretrained semantic word embedding features which we experimentally enhance with semantic knowledge extracted from WordNet. Further features extracted from SenticNet prove to be beneficial for the extraction of sentiment labels. As the best performing system in its category, our proposed system proves to be an effective approach for the Aspect-Based Sentiment Analysis.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06311v1
PDF	http://arxiv.org/pdf/1709.06311v1.pdf
PWC	https://paperswithcode.com/paper/aspect-based-sentiment-analysis-using-a-two
Repo
Framework