Paper Group ANR 538
VoxML: A Visualization Modeling Language. Robust Moving Objects Detection in Lidar Data Exploiting Visual Cues. Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture. An Image Dataset of Text Patches in Everyday Scenes. Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscop …
VoxML: A Visualization Modeling Language
Title | VoxML: A Visualization Modeling Language |
Authors | James Pustejovsky, Nikhil Krishnaswamy |
Abstract | We present the specification for a modeling language, VoxML, which encodes semantic knowledge of real-world objects represented as three-dimensional models, and of events and attributes related to and enacted over these objects. VoxML is intended to overcome the limitations of existing 3D visual markup languages by allowing for the encoding of a broad range of semantic knowledge that can be exploited by a variety of systems and platforms, leading to multimodal simulations of real-world scenarios using conceptual objects that represent their semantic values. |
Tasks | |
Published | 2016-10-05 |
URL | http://arxiv.org/abs/1610.01508v1 |
http://arxiv.org/pdf/1610.01508v1.pdf | |
PWC | https://paperswithcode.com/paper/voxml-a-visualization-modeling-language |
Repo | |
Framework | |
Robust Moving Objects Detection in Lidar Data Exploiting Visual Cues
Title | Robust Moving Objects Detection in Lidar Data Exploiting Visual Cues |
Authors | Gheorghii Postica, Andrea Romanoni, Matteo Matteucci |
Abstract | Detecting moving objects in dynamic scenes from sequences of lidar scans is an important task in object tracking, mapping, localization, and navigation. Many works focus on changes detection in previously observed scenes, while a very limited amount of literature addresses moving objects detection. The state-of-the-art method exploits Dempster-Shafer Theory to evaluate the occupancy of a lidar scan and to discriminate points belonging to the static scene from moving ones. In this paper we improve both speed and accuracy of this method by discretizing the occupancy representation, and by removing false positives through visual cues. Many false positives lying on the ground plane are also removed thanks to a novel ground plane removal algorithm. Efficiency is improved through an octree indexing strategy. Experimental evaluation against the KITTI public dataset shows the effectiveness of our approach, both qualitatively and quantitatively with respect to the state- of-the-art. |
Tasks | Object Tracking |
Published | 2016-09-29 |
URL | http://arxiv.org/abs/1609.09267v1 |
http://arxiv.org/pdf/1609.09267v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-moving-objects-detection-in-lidar-data |
Repo | |
Framework | |
Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture
Title | Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture |
Authors | Ashesh Jain, Hema S Koppula, Shane Soh, Bharad Raghavan, Avi Singh, Ashutosh Saxena |
Abstract | Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they perform a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADAS more time to avoid or prepare for the danger. In this work we propose a vehicular sensor-rich platform and learning algorithms for maneuver anticipation. For this purpose we equip a car with cameras, Global Positioning System (GPS), and a computing device to capture the driving context from both inside and outside of the car. In order to anticipate maneuvers, we propose a sensory-fusion deep learning architecture which jointly learns to anticipate and fuse multiple sensory streams. Our architecture consists of Recurrent Neural Networks (RNNs) that use Long Short-Term Memory (LSTM) units to capture long temporal dependencies. We propose a novel training procedure which allows the network to predict the future given only a partial temporal context. We introduce a diverse data set with 1180 miles of natural freeway and city driving, and show that we can anticipate maneuvers 3.5 seconds before they occur in real-time with a precision and recall of 90.5% and 87.4% respectively. |
Tasks | |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00740v1 |
http://arxiv.org/pdf/1601.00740v1.pdf | |
PWC | https://paperswithcode.com/paper/brain4cars-car-that-knows-before-you-do-via |
Repo | |
Framework | |
An Image Dataset of Text Patches in Everyday Scenes
Title | An Image Dataset of Text Patches in Everyday Scenes |
Authors | Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein |
Abstract | This paper describes a dataset containing small images of text from everyday scenes. The purpose of the dataset is to support the development of new automated systems that can detect and analyze text. Although much research has been devoted to text detection and recognition in scanned documents, relatively little attention has been given to text detection in other types of images, such as photographs that are posted on social-media sites. This new dataset, known as COCO-Text-Patch, contains approximately 354,000 small images that are each labeled as “text” or “non-text”. This dataset particularly addresses the problem of text verification, which is an essential stage in the end-to-end text detection and recognition pipeline. In order to evaluate the utility of this dataset, it has been used to train two deep convolution neural networks to distinguish text from non-text. One network is inspired by the GoogLeNet architecture, and the second one is based on CaffeNet. Accuracy levels of 90.2% and 90.9% were obtained using the two networks, respectively. All of the images, source code, and deep-learning trained models described in this paper will be publicly available |
Tasks | |
Published | 2016-10-20 |
URL | http://arxiv.org/abs/1610.06494v1 |
http://arxiv.org/pdf/1610.06494v1.pdf | |
PWC | https://paperswithcode.com/paper/an-image-dataset-of-text-patches-in-everyday |
Repo | |
Framework | |
Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscopy
Title | Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscopy |
Authors | Menglong Ye, Edward Johns, Benjamin Walter, Alexander Meining, Guang-Zhong Yang |
Abstract | For early diagnosis of malignancies in the gastrointestinal tract, surveillance endoscopy is increasingly used to monitor abnormal tissue changes in serial examinations of the same patient. Despite successes with optical biopsy for in vivo and in situ tissue characterisation, biopsy retargeting for serial examinations is challenging because tissue may change in appearance between examinations. In this paper, we propose an inter-examination retargeting framework for optical biopsy, based on an image descriptor designed for matching between endoscopic scenes over significant time intervals. Each scene is described by a hierarchy of regional intensity comparisons at various scales, offering tolerance to long-term change in tissue appearance whilst remaining discriminative. Binary coding is then used to compress the descriptor via a novel random forests approach, providing fast comparisons in Hamming space and real-time retargeting. Extensive validation conducted on 13 in vivo gastrointestinal videos, collected from six patients, show that our approach outperforms state-of-the-art methods. |
Tasks | |
Published | 2016-05-18 |
URL | http://arxiv.org/abs/1605.05757v2 |
http://arxiv.org/pdf/1605.05757v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-image-descriptors-for-real-time-inter |
Repo | |
Framework | |
Character-Level Incremental Speech Recognition with Recurrent Neural Networks
Title | Character-Level Incremental Speech Recognition with Recurrent Neural Networks |
Authors | Kyuyeon Hwang, Wonyong Sung |
Abstract | In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained with connectionist temporal classification (CTC), and an RNN-based character-level language model (LM). The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding. The RNN LM augments the decoding by providing long-term dependency information. We propose tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency. This system not only responds quickly on speech but also can dictate out-of-vocabulary (OOV) words according to pronunciation. The proposed model achieves the word error rate (WER) of 8.90% on the Wall Street Journal (WSJ) Nov’92 20K evaluation set when trained on the WSJ SI-284 training set. |
Tasks | Language Modelling, Speech Recognition |
Published | 2016-01-25 |
URL | http://arxiv.org/abs/1601.06581v2 |
http://arxiv.org/pdf/1601.06581v2.pdf | |
PWC | https://paperswithcode.com/paper/character-level-incremental-speech |
Repo | |
Framework | |
Learning the Semantics of Structured Data Sources
Title | Learning the Semantics of Structured Data Sources |
Authors | Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, Jose Luis Ambite |
Abstract | Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical. We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. … |
Tasks | Knowledge Graphs |
Published | 2016-01-16 |
URL | http://arxiv.org/abs/1601.04105v1 |
http://arxiv.org/pdf/1601.04105v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-semantics-of-structured-data |
Repo | |
Framework | |
Recognizing Implicit Discourse Relations via Repeated Reading: Neural Networks with Multi-Level Attention
Title | Recognizing Implicit Discourse Relations via Repeated Reading: Neural Networks with Multi-Level Attention |
Authors | Yang Liu, Sujian Li |
Abstract | Recognizing implicit discourse relations is a challenging but important task in the field of Natural Language Processing. For such a complex text processing task, different from previous studies, we argue that it is necessary to repeatedly read the arguments and dynamically exploit the efficient features useful for recognizing discourse relations. To mimic the repeated reading strategy, we propose the neural networks with multi-level attention (NNMA), combining the attention mechanism and external memories to gradually fix the attention on some specific words helpful to judging the discourse relations. Experiments on the PDTB dataset show that our proposed method achieves the state-of-art results. The visualization of the attention weights also illustrates the progress that our model observes the arguments on each level and progressively locates the important words. |
Tasks | |
Published | 2016-09-20 |
URL | http://arxiv.org/abs/1609.06380v1 |
http://arxiv.org/pdf/1609.06380v1.pdf | |
PWC | https://paperswithcode.com/paper/recognizing-implicit-discourse-relations-via |
Repo | |
Framework | |
Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks
Title | Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks |
Authors | Qinglong Wang, Wenbo Guo, Alexander G. Ororbia II, Xinyu Xing, Lin Lin, C. Lee Giles, Xue Liu, Peng Liu, Gang Xiong |
Abstract | Deep neural networks have proven to be quite effective in a wide variety of machine learning tasks, ranging from improved speech recognition systems to advancing the development of autonomous vehicles. However, despite their superior performance in many applications, these models have been recently shown to be susceptible to a particular type of attack possible through the generation of particular synthetic examples referred to as adversarial samples. These samples are constructed by manipulating real examples from the training data distribution in order to “fool” the original neural model, resulting in misclassification (with high confidence) of previously correctly classified samples. Addressing this weakness is of utmost importance if deep neural architectures are to be applied to critical applications, such as those in the domain of cybersecurity. In this paper, we present an analysis of this fundamental flaw lurking in all neural architectures to uncover limitations of previously proposed defense mechanisms. More importantly, we present a unifying framework for protecting deep neural models using a non-invertible data transformation–developing two adversary-resilient architectures utilizing both linear and nonlinear dimensionality reduction. Empirical results indicate that our framework provides better robustness compared to state-of-art solutions while having negligible degradation in accuracy. |
Tasks | Autonomous Vehicles, Dimensionality Reduction, Speech Recognition |
Published | 2016-10-06 |
URL | http://arxiv.org/abs/1610.01934v5 |
http://arxiv.org/pdf/1610.01934v5.pdf | |
PWC | https://paperswithcode.com/paper/using-non-invertible-data-transformations-to |
Repo | |
Framework | |
Matrix Neural Networks
Title | Matrix Neural Networks |
Authors | Junbin Gao, Yi Guo, Zhiyong Wang |
Abstract | Traditional neural networks assume vectorial inputs as the network is arranged as layers of single line of computing units called neurons. This special structure requires the non-vectorial inputs such as matrices to be converted into vectors. This process can be problematic. Firstly, the spatial information among elements of the data may be lost during vectorisation. Secondly, the solution space becomes very large which demands very special treatments to the network parameters and high computational cost. To address these issues, we propose matrix neural networks (MatNet), which takes matrices directly as inputs. Each neuron senses summarised information through bilinear mapping from lower layer units in exactly the same way as the classic feed forward neural networks. Under this structure, back prorogation and gradient descent combination can be utilised to obtain network parameters efficiently. Furthermore, it can be conveniently extended for multimodal inputs. We apply MatNet to MNIST handwritten digits classification and image super resolution tasks to show its effectiveness. Without too much tweaking MatNet achieves comparable performance as the state-of-the-art methods in both tasks with considerably reduced complexity. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2016-01-15 |
URL | http://arxiv.org/abs/1601.03805v2 |
http://arxiv.org/pdf/1601.03805v2.pdf | |
PWC | https://paperswithcode.com/paper/matrix-neural-networks |
Repo | |
Framework | |
Team-maxmin equilibrium: efficiency bounds and algorithms
Title | Team-maxmin equilibrium: efficiency bounds and algorithms |
Authors | Nicola Basilico, Andrea Celli, Giuseppe De Nittis, Nicola Gatti |
Abstract | The Team-maxmin equilibrium prescribes the optimal strategies for a team of rational players sharing the same goal and without the capability of correlating their strategies in strategic games against an adversary. This solution concept can capture situations in which an agent controls multiple resources-corresponding to the team members-that cannot communicate. It is known that such equilibrium always exists and it is unique (unless degeneracy) and these properties make it a credible solution concept to be used in real-world applications, especially in security scenarios. Nevertheless, to the best of our knowledge, the Team-maxmin equilibrium is almost completely unexplored in the literature. In this paper, we investigate bounds of (in)efficiency of the Team-maxmin equilibrium w.r.t. the Nash equilibria and w.r.t. the Maxmin equilibrium when the team members can play correlated strategies. Furthermore, we study a number of algorithms to find and/or approximate an equilibrium, discussing their theoretical guarantees and evaluating their performance by using a standard testbed of game instances. |
Tasks | |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06134v1 |
http://arxiv.org/pdf/1611.06134v1.pdf | |
PWC | https://paperswithcode.com/paper/team-maxmin-equilibrium-efficiency-bounds-and |
Repo | |
Framework | |
High-speed real-time single-pixel microscopy based on Fourier sampling
Title | High-speed real-time single-pixel microscopy based on Fourier sampling |
Authors | Qiang Guo, Hongwei Chen, Yuxi Wang, Yong Guo, Peng Liu, Xiurui Zhu, Zheng Cheng, Zhenming Yu, Minghua Chen, Sigang Yang, Shizhong Xie |
Abstract | Single-pixel cameras based on the concepts of compressed sensing (CS) leverage the inherent structure of images to retrieve them with far fewer measurements and operate efficiently over a significantly broader spectral range than conventional silicon-based cameras. Recently, photonic time-stretch (PTS) technique facilitates the emergence of high-speed single-pixel cameras. A significant breakthrough in imaging speed of single-pixel cameras enables observation of fast dynamic phenomena. However, according to CS theory, image reconstruction is an iterative process that consumes enormous amounts of computational time and cannot be performed in real time. To address this challenge, we propose a novel single-pixel imaging technique that can produce high-quality images through rapid acquisition of their effective spatial Fourier spectrum. We employ phase-shifting sinusoidal structured illumination instead of random illumination for spectrum acquisition and apply inverse Fourier transform to the obtained spectrum for image restoration. We evaluate the performance of our prototype system by recognizing quick response (QR) codes and flow cytometric screening of cells. A frame rate of 625 kHz and a compression ratio of 10% are experimentally demonstrated in accordance with the recognition rate of the QR code. An imaging flow cytometer enabling high-content screening with an unprecedented throughput of 100,000 cells/s is also demonstrated. For real-time imaging applications, the proposed single-pixel microscope can significantly reduce the time required for image reconstruction by two orders of magnitude, which can be widely applied in industrial quality control and label-free biomedical imaging. |
Tasks | Image Reconstruction, Image Restoration |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.05200v1 |
http://arxiv.org/pdf/1606.05200v1.pdf | |
PWC | https://paperswithcode.com/paper/high-speed-real-time-single-pixel-microscopy |
Repo | |
Framework | |
LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
Title | LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain |
Authors | Zeyuan Allen-Zhu, Yuanzhi Li |
Abstract | We study $k$-SVD that is to obtain the first $k$ singular vectors of a matrix $A$. Recently, a few breakthroughs have been discovered on $k$-SVD: Musco and Musco [1] proved the first gap-free convergence result using the block Krylov method, Shamir [2] discovered the first variance-reduction stochastic method, and Bhojanapalli et al. [3] provided the fastest $O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$-time algorithm using alternating minimization. In this paper, we put forward a new and simple LazySVD framework to improve the above breakthroughs. This framework leads to a faster gap-free method outperforming [1], and the first accelerated and stochastic method outperforming [2]. In the $O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$ running-time regime, LazySVD outperforms [3] in certain parameter regimes without even using alternating minimization. |
Tasks | |
Published | 2016-07-12 |
URL | http://arxiv.org/abs/1607.03463v2 |
http://arxiv.org/pdf/1607.03463v2.pdf | |
PWC | https://paperswithcode.com/paper/lazysvd-even-faster-svd-decomposition-yet |
Repo | |
Framework | |
Multicuts and Perturb & MAP for Probabilistic Graph Clustering
Title | Multicuts and Perturb & MAP for Probabilistic Graph Clustering |
Authors | Jörg Hendrik Kappes, Paul Swoboda, Bogdan Savchynskyy, Tamir Hazan, Christoph Schnörr |
Abstract | We present a probabilistic graphical model formulation for the graph clustering problem. This enables to locally represent uncertainty of image partitions by approximate marginal distributions in a mathematically substantiated way, and to rectify local data term cues so as to close contours and to obtain valid partitions. We exploit recent progress on globally optimal MAP inference by integer programming and on perturbation-based approximations of the log-partition function, in order to sample clusterings and to estimate marginal distributions of node-pairs both more accurately and more efficiently than state-of-the-art methods. Our approach works for any graphically represented problem instance. This is demonstrated for image segmentation and social network cluster analysis. Our mathematical ansatz should be relevant also for other combinatorial problems. |
Tasks | Graph Clustering, Semantic Segmentation |
Published | 2016-01-09 |
URL | http://arxiv.org/abs/1601.02088v1 |
http://arxiv.org/pdf/1601.02088v1.pdf | |
PWC | https://paperswithcode.com/paper/multicuts-and-perturb-map-for-probabilistic |
Repo | |
Framework | |
Recommender systems inspired by the structure of quantum theory
Title | Recommender systems inspired by the structure of quantum theory |
Authors | Cyril Stark |
Abstract | Physicists use quantum models to describe the behavior of physical systems. Quantum models owe their success to their interpretability, to their relation to probabilistic models (quantization of classical models) and to their high predictive power. Beyond physics, these properties are valuable in general data science. This motivates the use of quantum models to analyze general nonphysical datasets. Here we provide both empirical and theoretical insights into the application of quantum models in data science. In the theoretical part of this paper, we firstly show that quantum models can be exponentially more efficient than probabilistic models because there exist datasets that admit low-dimensional quantum models and only exponentially high-dimensional probabilistic models. Secondly, we explain in what sense quantum models realize a useful relaxation of compressed probabilistic models. Thirdly, we show that sparse datasets admit low-dimensional quantum models and finally, we introduce a method to compute hierarchical orderings of properties of users (e.g., personality traits) and items (e.g., genres of movies). In the empirical part of the paper, we evaluate quantum models in item recommendation and observe that the predictive power of quantum-inspired recommender systems can compete with state-of-the-art recommender systems like SVD++ and PureSVD. Furthermore, we make use of the interpretability of quantum models by computing hierarchical orderings of properties of users and items. This work establishes a connection between data science (item recommendation), information theory (communication complexity), mathematical programming (positive semidefinite factorizations) and physics (quantum models). |
Tasks | Quantization, Recommendation Systems |
Published | 2016-01-22 |
URL | http://arxiv.org/abs/1601.06035v1 |
http://arxiv.org/pdf/1601.06035v1.pdf | |
PWC | https://paperswithcode.com/paper/recommender-systems-inspired-by-the-structure |
Repo | |
Framework | |