Paper Group ANR 282
Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts. MetaPAD: Meta Pattern Discovery from Massive Text Corpora. Complexity of Scheduling Charging in the Smart Grid. Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes. Common Representation Learning Using Step-based Correlation Multi-Modal CN …
Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts
Title | Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts |
Authors | Arpita Roy, Youngja Park, SHimei Pan |
Abstract | Word embedding is a Natural Language Processing (NLP) technique that automatically maps words from a vocabulary to vectors of real numbers in an embedding space. It has been widely used in recent years to boost the performance of a vari-ety of NLP tasks such as Named Entity Recognition, Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such as Word2Vec and GloVe work well when they are given a large text corpus. When the input texts are sparse as in many specialized domains (e.g., cybersecurity), these methods often fail to produce high-quality vectors. In this pa-per, we describe a novel method to train domain-specificword embeddings from sparse texts. In addition to domain texts, our method also leverages diverse types of domain knowledge such as domain vocabulary and semantic relations. Specifi-cally, we first propose a general framework to encode diverse types of domain knowledge as text annotations. Then we de-velop a novel Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text annotations in word em-bedding. We have evaluated our method on two cybersecurity text corpora: a malware description corpus and a Common Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have demonstrated the effectiveness of our method in learning domain-specific word embeddings. |
Tasks | Named Entity Recognition, Sentiment Analysis, Word Embeddings |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07470v1 |
http://arxiv.org/pdf/1709.07470v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-domain-specific-word-embeddings-from |
Repo | |
Framework | |
MetaPAD: Meta Pattern Discovery from Massive Text Corpora
Title | MetaPAD: Meta Pattern Discovery from Massive Text Corpora |
Authors | Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M. Kaplan, Timothy P. Hanratty, Jiawei Han |
Abstract | Mining textual patterns in news, tweets, papers, and many other kinds of text corpora has been an active theme in text mining and NLP research. Previous studies adopt a dependency parsing-based pattern discovery approach. However, the parsing results lose rich context around entities in the patterns, and the process is costly for a corpus of large scale. In this study, we propose a novel typed textual pattern structure, called meta pattern, which is extended to a frequent, informative, and precise subsequence pattern in certain context. We propose an efficient framework, called MetaPAD, which discovers meta patterns from massive corpora with three techniques: (1) it develops a context-aware segmentation method to carefully determine the boundaries of patterns with a learnt pattern quality assessment function, which avoids costly dependency parsing and generates high-quality patterns; (2) it identifies and groups synonymous meta patterns from multiple facets—their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patterns, and looks for appropriate type levels to make discovered patterns precise. Experiments demonstrate that our proposed framework discovers high-quality typed textual patterns efficiently from different genres of massive corpora and facilitates information extraction. |
Tasks | Dependency Parsing |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04213v2 |
http://arxiv.org/pdf/1703.04213v2.pdf | |
PWC | https://paperswithcode.com/paper/metapad-meta-pattern-discovery-from-massive |
Repo | |
Framework | |
Complexity of Scheduling Charging in the Smart Grid
Title | Complexity of Scheduling Charging in the Smart Grid |
Authors | Mathijs de Weerdt, Michael Albert, Vincent Conitzer |
Abstract | In the smart grid, the intent is to use flexibility in demand, both to balance demand and supply as well as to resolve potential congestion. A first prominent example of such flexible demand is the charging of electric vehicles, which do not necessarily need to be charged as soon as they are plugged in. The problem of optimally scheduling the charging demand of electric vehicles within the constraints of the electricity infrastructure is called the charge scheduling problem. The models of the charging speed, horizon, and charging demand determine the computational complexity of the charge scheduling problem. For about 20 variants, we show, using a dynamic programming approach, that the problem is either in P or weakly NP-hard. We also show that about 10 variants of the problem are strongly NP-hard, presenting a potentially significant obstacle to their use in practical situations of scale. |
Tasks | |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07480v1 |
http://arxiv.org/pdf/1709.07480v1.pdf | |
PWC | https://paperswithcode.com/paper/complexity-of-scheduling-charging-in-the |
Repo | |
Framework | |
Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes
Title | Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes |
Authors | Raphaël Berthon, Mickael Randour, Jean-François Raskin |
Abstract | The beyond worst-case synthesis problem was introduced recently by Bruy`ere et al. [BFRR14]: it aims at building system controllers that provide strict worst-case performance guarantees against an antagonistic environment while ensuring higher expected performance against a stochastic model of the environment. Our work extends the framework of [BFRR14] and follow-up papers, which focused on quantitative objectives, by addressing the case of $\omega$-regular conditions encoded as parity objectives, a natural way to represent functional requirements of systems. We build strategies that satisfy a main parity objective on all plays, while ensuring a secondary one with sufficient probability. This setting raises new challenges in comparison to quantitative objectives, as one cannot easily mix different strategies without endangering the functional properties of the system. We establish that, for all variants of this problem, deciding the existence of a strategy lies in ${\sf NP} \cap {\sf coNP}$, the same complexity class as classical parity games. Hence, our framework provides additional modeling power while staying in the same complexity class. [BFRR14] V'eronique Bruy`ere, Emmanuel Filiot, Mickael Randour, and Jean-Fran\c{c}ois Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. In Ernst W. Mayr and Natacha Portier, editors, 31st International Symposium on Theoretical Aspects of Computer Science, STACS 2014, March 5-8, 2014, Lyon, France, volume 25 of LIPIcs, pages 199-213. Schloss Dagstuhl - Leibniz - Zentrum fuer Informatik, 2014. |
Tasks | |
Published | 2017-02-17 |
URL | http://arxiv.org/abs/1702.05472v2 |
http://arxiv.org/pdf/1702.05472v2.pdf | |
PWC | https://paperswithcode.com/paper/threshold-constraints-with-guarantees-for |
Repo | |
Framework | |
Common Representation Learning Using Step-based Correlation Multi-Modal CNN
Title | Common Representation Learning Using Step-based Correlation Multi-Modal CNN |
Authors | Gaurav Bhatt, Piyush Jha, Balasubramanian Raman |
Abstract | Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we investigate the performance of deep autoencoder based methods on multi-view data. We propose a novel step-based correlation multi-modal CNN (CorrMCNN) which reconstructs one view of the data given the other while increasing the interaction between the representations at each hidden layer or every intermediate step. Finally, we evaluate the performance of the proposed model on two benchmark datasets - MNIST and XRMB. Through extensive experiments, we find that the proposed model achieves better performance than the current state-of-the-art techniques on joint common representation learning and transfer learning tasks. |
Tasks | Representation Learning, Transfer Learning |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1711.00003v1 |
http://arxiv.org/pdf/1711.00003v1.pdf | |
PWC | https://paperswithcode.com/paper/common-representation-learning-using-step |
Repo | |
Framework | |
Group Scissor: Scaling Neuromorphic Computing Design to Large Neural Networks
Title | Group Scissor: Scaling Neuromorphic Computing Design to Large Neural Networks |
Authors | Yandan Wang, Wei Wen, Beiye Liu, Donald Chiarulli, Hai Li |
Abstract | Synapse crossbar is an elementary structure in Neuromorphic Computing Systems (NCS). However, the limited size of crossbars and heavy routing congestion impedes the NCS implementations of big neural networks. In this paper, we propose a two-step framework (namely, group scissor) to scale NCS designs to big neural networks. The first step is rank clipping, which integrates low-rank approximation into the training to reduce total crossbar area. The second step is group connection deletion, which structurally prunes connections to reduce routing congestion between crossbars. Tested on convolutional neural networks of LeNet on MNIST database and ConvNet on CIFAR-10 database, our experiments show significant reduction of crossbar area and routing area in NCS designs. Without accuracy loss, rank clipping reduces total crossbar area to 13.62% and 51.81% in the NCS designs of LeNet and ConvNet, respectively. Following rank clipping, group connection deletion further reduces the routing area of LeNet and ConvNet to 8.1% and 52.06%, respectively. |
Tasks | |
Published | 2017-02-11 |
URL | http://arxiv.org/abs/1702.03443v2 |
http://arxiv.org/pdf/1702.03443v2.pdf | |
PWC | https://paperswithcode.com/paper/group-scissor-scaling-neuromorphic-computing |
Repo | |
Framework | |
Fault in your stars: An Analysis of Android App Reviews
Title | Fault in your stars: An Analysis of Android App Reviews |
Authors | Rahul Aralikatte, Giriprasad Sridhara, Neelamadhav Gantayat, Senthil Mani |
Abstract | Mobile app distribution platforms such as Google Play Store allow users to share their feedback about downloaded apps in the form of a review comment and a corresponding star rating. Typically, the star rating ranges from one to five stars, with one star denoting a high sense of dissatisfaction with the app and five stars denoting a high sense of satisfaction. Unfortunately, due to a variety of reasons, often the star rating provided by a user is inconsistent with the opinion expressed in the review. For example, consider the following review for the Facebook App on Android; “Awesome App”. One would reasonably expect the rating for this review to be five stars, but the actual rating is one star! Such inconsistent ratings can lead to a deflated (or inflated) overall average rating of an app which can affect user downloads, as typically users look at the average star ratings while making a decision on downloading an app. Also, the app developers receive a biased feedback about the application that does not represent ground reality. This is especially significant for small apps with a few thousand downloads as even a small number of mismatched reviews can bring down the average rating drastically. In this paper, we conducted a study on this review-rating mismatch problem. We manually examined 8600 reviews from 10 popular Android apps and found that 20% of the ratings in our dataset were inconsistent with the review. Further, we developed three systems; two of which were based on traditional machine learning and one on deep learning to automatically identify reviews whose rating did not match with the opinion expressed in the review. Our deep learning system performed the best and had an accuracy of 92% in identifying the correct star rating to be associated with a given review. |
Tasks | |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04968v2 |
http://arxiv.org/pdf/1708.04968v2.pdf | |
PWC | https://paperswithcode.com/paper/fault-in-your-stars-an-analysis-of-android |
Repo | |
Framework | |
Depth Estimation using Modified Cost Function for Occlusion Handling
Title | Depth Estimation using Modified Cost Function for Occlusion Handling |
Authors | Krzysztof Wegner, Olgierd Stankiewicz, Marek Domanski |
Abstract | The paper presents a novel approach to occlusion handling problem in depth estimation using three views. A solution based on modification of similarity cost function is proposed. During the depth estimation via optimization algorithms like Graph Cut similarity metric is constantly updated so that only non-occluded fragments in side views are considered. At each iteration of the algorithm non-occluded fragments are detected based on side view virtual depth maps synthesized from the best currently estimated depth map of the center view. Then similarity metric is updated for correspondence search only in non-occluded regions of the side views. The experimental results, conducted on well-known 3D video test sequences, have proved that the depth maps estimated with the proposed approach provide about 1.25 dB virtual view quality improvement in comparison to the virtual view synthesized based on depth maps generated by the state-of-the-art MPEG Depth Estimation Reference Software. |
Tasks | Depth Estimation |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00919v2 |
http://arxiv.org/pdf/1703.00919v2.pdf | |
PWC | https://paperswithcode.com/paper/depth-estimation-using-modified-cost-function |
Repo | |
Framework | |
Fusion of Heterogeneous Data in Convolutional Networks for Urban Semantic Labeling (Invited Paper)
Title | Fusion of Heterogeneous Data in Convolutional Networks for Urban Semantic Labeling (Invited Paper) |
Authors | Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre |
Abstract | In this work, we present a novel module to perform fusion of heterogeneous data using fully convolutional networks for semantic labeling. We introduce residual correction as a way to learn how to fuse predictions coming out of a dual stream architecture. Especially, we perform fusion of DSM and IRRG optical data on the ISPRS Vaihingen dataset over a urban area and obtain new state-of-the-art results. |
Tasks | |
Published | 2017-01-20 |
URL | http://arxiv.org/abs/1701.05818v1 |
http://arxiv.org/pdf/1701.05818v1.pdf | |
PWC | https://paperswithcode.com/paper/fusion-of-heterogeneous-data-in-convolutional |
Repo | |
Framework | |
Gabor Convolutional Networks
Title | Gabor Convolutional Networks |
Authors | Shangzhen Luan, Baochang Zhang, Chen Chen, Xianbin Cao, Jungong Han, Jianzhuang Liu |
Abstract | Steerable properties dominate the design of traditional filters, e.g., Gabor filters, and endow features the capability of dealing with spatial transformations. However, such excellent properties have not been well explored in the popular deep convolutional neural networks (DCNNs). In this paper, we propose a new deep model, termed Gabor Convolutional Networks (GCNs or Gabor CNNs), which incorporates Gabor filters into DCNNs to enhance the resistance of deep learned features to the orientation and scale changes. By only manipulating the basic element of DCNNs based on Gabor filters, i.e., the convolution operator, GCNs can be easily implemented and are compatible with any popular deep learning architecture. Experimental results demonstrate the super capability of our algorithm in recognizing objects, where the scale and rotation changes occur frequently. The proposed GCNs have much fewer learnable network parameters, and thus is easier to train with an end-to-end pipeline. |
Tasks | |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01450v3 |
http://arxiv.org/pdf/1705.01450v3.pdf | |
PWC | https://paperswithcode.com/paper/gabor-convolutional-networks |
Repo | |
Framework | |
The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research
Title | The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research |
Authors | Xuan Li, Kunfeng Wang, Yonglin Tian, Lan Yan, Fei-Yue Wang |
Abstract | Video image datasets are playing an essential role in design and evaluation of traffic vision algorithms. Nevertheless, a longstanding inconvenience concerning image datasets is that manually collecting and annotating large-scale diversified datasets from real scenes is time-consuming and prone to error. For that virtual datasets have begun to function as a proxy of real datasets. In this paper, we propose to construct large-scale artificial scenes for traffic vision research and generate a new virtual dataset called “ParallelEye”. First of all, the street map data is used to build 3D scene model of Zhongguancun Area, Beijing. Then, the computer graphics, virtual reality, and rule modeling technologies are utilized to synthesize large-scale, realistic virtual urban traffic scenes, in which the fidelity and geography match the real world well. Furthermore, the Unity3D platform is used to render the artificial scenes and generate accurate ground-truth labels, e.g., semantic/instance segmentation, object bounding box, object tracking, optical flow, and depth. The environmental conditions in artificial scenes can be controlled completely. As a result, we present a viable implementation pipeline for constructing large-scale artificial scenes for traffic vision research. The experimental results demonstrate that this pipeline is able to generate photorealistic virtual datasets with low modeling time and high accuracy labeling. |
Tasks | Instance Segmentation, Object Tracking, Optical Flow Estimation, Semantic Segmentation |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08394v1 |
http://arxiv.org/pdf/1712.08394v1.pdf | |
PWC | https://paperswithcode.com/paper/the-paralleleye-dataset-constructing-large |
Repo | |
Framework | |
An embedded segmental K-means model for unsupervised segmentation and clustering of speech
Title | An embedded segmental K-means model for unsupervised segmentation and clustering of speech |
Authors | Herman Kamper, Karen Livescu, Sharon Goldwater |
Abstract | Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most approaches lie at methodological extremes: some use probabilistic Bayesian models with convergence guarantees, while others opt for more efficient heuristic techniques. Despite competitive performance in previous work, the full Bayesian approach is difficult to scale to large speech corpora. We introduce an approximation to a recent Bayesian model that still has a clear objective function but improves efficiency by using hard clustering and segmentation rather than full Bayesian inference. Like its Bayesian counterpart, this embedded segmental K-means model (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional acoustic word embeddings. We first compare ES-KMeans to previous approaches on common English and Xitsonga data sets (5 and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in word segmentation, giving similar scores to the Bayesian model while being 5 times faster with fewer hyperparameters. However, its clusters are less pure than those of the other models. We then show that ES-KMeans scales to larger corpora by applying it to the 5 languages of the Zero Resource Speech Challenge 2017 (up to 45 hours), where it performs competitively compared to the challenge baseline. |
Tasks | Bayesian Inference, Word Embeddings |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08135v2 |
http://arxiv.org/pdf/1703.08135v2.pdf | |
PWC | https://paperswithcode.com/paper/an-embedded-segmental-k-means-model-for |
Repo | |
Framework | |
A Survey of Parallel A*
Title | A Survey of Parallel A* |
Authors | Alex Fukunaga, Adi Botea, Yuu Jinnai, Akihiro Kishimoto |
Abstract | A* is a best-first search algorithm for finding optimal-cost paths in graphs. A* benefits significantly from parallelism because in many applications, A* is limited by memory usage, so distributed memory implementations of A* that use all of the aggregate memory on the cluster enable problems that can not be solved by serial, single-machine implementations to be solved. We survey approaches to parallel A*, focusing on decentralized approaches to A* which partition the state space among processors. We also survey approaches to parallel, limited-memory variants of A* such as parallel IDA*. |
Tasks | |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.05296v1 |
http://arxiv.org/pdf/1708.05296v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-parallel-a |
Repo | |
Framework | |
Macro Grammars and Holistic Triggering for Efficient Semantic Parsing
Title | Macro Grammars and Holistic Triggering for Efficient Semantic Parsing |
Authors | Yuchen Zhang, Panupong Pasupat, Percy Liang |
Abstract | To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations. We propose a new online learning algorithm that searches faster as training progresses. The two key ideas are using macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity. On the WikiTableQuestions dataset, we first expand the search space of an existing model to improve the state-of-the-art accuracy from 38.7% to 42.7%, and then use macro grammars and holistic triggering to achieve an 11x speedup and an accuracy of 43.7%. |
Tasks | Semantic Parsing |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07806v2 |
http://arxiv.org/pdf/1707.07806v2.pdf | |
PWC | https://paperswithcode.com/paper/macro-grammars-and-holistic-triggering-for |
Repo | |
Framework | |
Automatic Extrinsic Calibration for Lidar-Stereo Vehicle Sensor Setups
Title | Automatic Extrinsic Calibration for Lidar-Stereo Vehicle Sensor Setups |
Authors | Carlos Guindel, Jorge Beltrán, David Martín, Fernando García |
Abstract | Sensor setups consisting of a combination of 3D range scanner lasers and stereo vision systems are becoming a popular choice for on-board perception systems in vehicles; however, the combined use of both sources of information implies a tedious calibration process. We present a method for extrinsic calibration of lidar-stereo camera pairs without user intervention. Our calibration approach is aimed to cope with the constraints commonly found in automotive setups, such as low-resolution and specific sensor poses. To demonstrate the performance of our method, we also introduce a novel approach for the quantitative assessment of the calibration results, based on a simulation environment. Tests using real devices have been conducted as well, proving the usability of the system and the improvement over the existing approaches. Code is available at http://wiki.ros.org/velo2cam_calibration |
Tasks | Calibration |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04085v3 |
http://arxiv.org/pdf/1705.04085v3.pdf | |
PWC | https://paperswithcode.com/paper/automatic-extrinsic-calibration-for-lidar |
Repo | |
Framework | |