Paper Group ANR 494
3D Trajectory Reconstruction of Dynamic Objects Using Planarity Constraints. ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. Hyperprior on symmetric Dirichlet distribution. FPGA based Parallelized Architecture of Efficient Graph based Image Segmentation Algorithm. Debiased distributed learning for sparse parti …
3D Trajectory Reconstruction of Dynamic Objects Using Planarity Constraints
Title | 3D Trajectory Reconstruction of Dynamic Objects Using Planarity Constraints |
Authors | Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen |
Abstract | We present a method to reconstruct the three-dimensional trajectory of a moving instance of a known object category in monocular video data. We track the two-dimensional shape of objects on pixel level exploiting instance-aware semantic segmentation techniques and optical flow cues. We apply Structure from Motion techniques to object and background images to determine for each frame camera poses relative to object instances and background structures. By combining object and background camera pose information, we restrict the object trajectory to a one-parameter family of possible solutions. We compute a ground representation by fusing background structures and corresponding semantic segmentations. This allows us to determine an object trajectory consistent to image observations and reconstructed environment model. Our method is robust to occlusion and handles temporarily stationary objects. We show qualitative results using drone imagery. Due to the lack of suitable benchmark datasets we present a new dataset to evaluate the quality of reconstructed three-dimensional object trajectories. The video sequences contain vehicles in urban areas and are rendered using the path-tracing render engine Cycles to achieve realistic results. We perform a quantitative evaluation of the presented approach using this dataset. Our algorithm achieves an average reconstruction-to-ground-truth distance of 0.31 meter. |
Tasks | Optical Flow Estimation, Semantic Segmentation |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06136v1 |
http://arxiv.org/pdf/1711.06136v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-trajectory-reconstruction-of-dynamic |
Repo | |
Framework | |
ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning
Title | ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning |
Authors | Zhixiong Yang, Waheed U. Bajwa |
Abstract | Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent (ByRDiE) is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures. |
Tasks | |
Published | 2017-08-28 |
URL | https://arxiv.org/abs/1708.08155v4 |
https://arxiv.org/pdf/1708.08155v4.pdf | |
PWC | https://paperswithcode.com/paper/byrdie-byzantine-resilient-distributed |
Repo | |
Framework | |
Hyperprior on symmetric Dirichlet distribution
Title | Hyperprior on symmetric Dirichlet distribution |
Authors | Jun Lu |
Abstract | In this article we introduce how to put vague hyperprior on Dirichlet distribution, and we update the parameter of it by adaptive rejection sampling (ARS). Finally we analyze this hyperprior in an over-fitted mixture model by some synthetic experiments. |
Tasks | |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.08177v1 |
http://arxiv.org/pdf/1708.08177v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperprior-on-symmetric-dirichlet |
Repo | |
Framework | |
FPGA based Parallelized Architecture of Efficient Graph based Image Segmentation Algorithm
Title | FPGA based Parallelized Architecture of Efficient Graph based Image Segmentation Algorithm |
Authors | Roopal Nahar, Akanksha Baranwal, K. Madhava Krishna |
Abstract | Efficient and real time segmentation of color images has a variety of importance in many fields of computer vision such as image compression, medical imaging, mapping and autonomous navigation. Being one of the most computationally expensive operation, it is usually done through software imple- mentation using high-performance processors. In robotic systems, however, with the constrained platform dimensions and the need for portability, low power consumption and simultaneously the need for real time image segmentation, we envision hardware parallelism as the way forward to achieve higher acceleration. Field-programmable gate arrays (FPGAs) are among the best suited for this task as they provide high computing power in a small physical area. They exceed the computing speed of software based implementations by breaking the paradigm of sequential execution and accomplishing more per clock cycle operations by enabling hardware level parallelization at an architectural level. In this paper, we propose three novel architectures of a well known Efficient Graph based Image Segmentation algorithm. These proposed implementations optimizes time and power consumption when compared to software implementations. The hybrid design proposed, has notable furtherance of acceleration capabilities delivering atleast 2X speed gain over other implemen- tations, which henceforth allows real time image segmentation that can be deployed on Mobile Robotic systems. |
Tasks | Autonomous Navigation, Image Compression, Semantic Segmentation |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02260v1 |
http://arxiv.org/pdf/1710.02260v1.pdf | |
PWC | https://paperswithcode.com/paper/fpga-based-parallelized-architecture-of |
Repo | |
Framework | |
Debiased distributed learning for sparse partial linear models in high dimensions
Title | Debiased distributed learning for sparse partial linear models in high dimensions |
Authors | Shaogao Lv, Heng Lian |
Abstract | Although various distributed machine learning schemes have been proposed recently for pure linear models and fully nonparametric models, little attention has been paid on distributed optimization for semi-paramemetric models with multiple-level structures (e.g. sparsity, linearity and nonlinearity). To address these issues, the current paper proposes a new communication-efficient distributed learning algorithm for partially sparse linear models with an increasing number of features. The proposed method is based on the classical divide and conquer strategy for handing big data and each sub-method defined on each subsample consists of a debiased estimation of the double-regularized least squares approach. With the proposed method, we theoretically prove that our global parametric estimator can achieve optimal parametric rate in our semi-parametric model given an appropriate partition on the total data. Specially, the choice of data partition relies on the underlying smoothness of the nonparametric component, but it is adaptive to the sparsity parameter. Even under the non-distributed setting, we develop a new and easily-read proof for optimal estimation of the parametric error in high dimensional partial linear model. Finally, several simulated experiments are implemented to indicate comparable empirical performance of our debiased technique under the distributed setting. |
Tasks | Distributed Optimization |
Published | 2017-08-18 |
URL | https://arxiv.org/abs/1708.05487v2 |
https://arxiv.org/pdf/1708.05487v2.pdf | |
PWC | https://paperswithcode.com/paper/a-debiased-distributed-estimation-for-sparse |
Repo | |
Framework | |
Web-STAR: Towards a Visual Web-Based IDE for a Story Comprehension System
Title | Web-STAR: Towards a Visual Web-Based IDE for a Story Comprehension System |
Authors | Christos Rodosthenous, Loizos Michael |
Abstract | In this work, we present Web-STAR, an online platform for story understanding built on top of the STAR (STory comprehension through ARgumentation) reasoning engine. This platform includes a web-based IDE, integration with the STAR system and a web service infrastructure to support integration with other systems that rely on story understanding functionality to complete their tasks. The platform also delivers a number of “social” features like public story sharing with a built-in commenting system, a public repository for sharing stories with the community and collaboration tools that can be used from both project team members for development and educators for teaching. Moreover, we discuss the ongoing work on adding new features and functionality to this platform. |
Tasks | |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06954v1 |
http://arxiv.org/pdf/1706.06954v1.pdf | |
PWC | https://paperswithcode.com/paper/web-star-towards-a-visual-web-based-ide-for-a |
Repo | |
Framework | |
English Conversational Telephone Speech Recognition by Humans and Machines
Title | English Conversational Telephone Speech Recognition by Humans and Machines |
Authors | George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall |
Abstract | One of the most difficult speech recognition tasks is accurate recognition of human to human communication. Advances in deep learning over the last few years have produced major speech recognition improvements on the representative Switchboard conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates? A recent paper by Microsoft suggests that we have already achieved human performance. In trying to verify this statement, we performed an independent set of human performance measurements on two conversational tasks and found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. We also report on our own efforts in this area, presenting a set of acoustic and language modeling techniques that lowered the word error rate of our own English conversational telephone LVCSR system to the level of 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000 evaluation, which - at least at the writing of this paper - is a new performance milestone (albeit not at what we measure to be human performance!). On the acoustic side, we use a score fusion of three models: one LSTM with multiple feature inputs, a second LSTM trained with speaker-adversarial multi-task learning and a third residual net (ResNet) with 25 convolutional layers and time-dilated convolutions. On the language modeling side, we use word and character LSTMs and convolutional WaveNet-style language models. |
Tasks | Language Modelling, Large Vocabulary Continuous Speech Recognition, Multi-Task Learning, Speech Recognition |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.02136v1 |
http://arxiv.org/pdf/1703.02136v1.pdf | |
PWC | https://paperswithcode.com/paper/english-conversational-telephone-speech |
Repo | |
Framework | |
Denoising Autoencoders for Overgeneralization in Neural Networks
Title | Denoising Autoencoders for Overgeneralization in Neural Networks |
Authors | Giacomo Spigler |
Abstract | Despite the recent developments that allowed neural networks to achieve impressive performance on a variety of applications, these models are intrinsically affected by the problem of overgeneralization, due to their partitioning of the full input space into the fixed set of target classes used during training. Thus it is possible for novel inputs belonging to categories unknown during training or even completely unrecognizable to humans to fool the system into classifying them as one of the known classes, even with a high degree of confidence. Solving this problem may help improve the security of such systems in critical applications, and may further lead to applications in the context of open set recognition and 1-class recognition. This paper presents a novel way to compute a confidence score using denoising autoencoders and shows that such confidence score can correctly identify the regions of the input space close to the training distribution by approximately identifying its local maxima. |
Tasks | Denoising, Open Set Learning |
Published | 2017-09-14 |
URL | http://arxiv.org/abs/1709.04762v3 |
http://arxiv.org/pdf/1709.04762v3.pdf | |
PWC | https://paperswithcode.com/paper/denoising-autoencoders-for-overgeneralization |
Repo | |
Framework | |
Deep metric learning for multi-labelled radiographs
Title | Deep metric learning for multi-labelled radiographs |
Authors | Mauro Annarumma, Giovanni Montana |
Abstract | Many radiological studies can reveal the presence of several co-existing abnormalities, each one represented by a distinct visual pattern. In this article we address the problem of learning a distance metric for plain radiographs that captures a notion of “radiological similarity”: two chest radiographs are considered to be similar if they share similar abnormalities. Deep convolutional neural networks (DCNs) are used to learn a low-dimensional embedding for the radiographs that is equipped with the desired metric. Two loss functions are proposed to deal with multi-labelled images and potentially noisy labels. We report on a large-scale study involving over 745,000 chest radiographs whose labels were automatically extracted from free-text radiological reports through a natural language processing system. Using 4,500 validated exams, we demonstrate that the methodology performs satisfactorily on clustering and image retrieval tasks. Remarkably, the learned metric separates normal exams from those having radiological abnormalities. |
Tasks | Image Retrieval, Metric Learning |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.07682v1 |
http://arxiv.org/pdf/1712.07682v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-metric-learning-for-multi-labelled |
Repo | |
Framework | |
Simultaneous Multiple Surface Segmentation Using Deep Learning
Title | Simultaneous Multiple Surface Segmentation Using Deep Learning |
Authors | Abhay Shah, Michael Abramoff, Xiaodong Wu |
Abstract | The task of automatically segmenting 3-D surfaces representing boundaries of objects is important for quantitative analysis of volumetric images, and plays a vital role in biomedical image analysis. Recently, graph-based methods with a global optimization property have been developed and optimized for various medical imaging applications. Despite their widespread use, these require human experts to design transformations, image features, surface smoothness priors, and re-design for a different tissue, organ or imaging modality. Here, we propose a Deep Learning based approach for segmentation of the surfaces in volumetric medical images, by learning the essential features and transformations from training data, without any human expert intervention. We employ a regional approach to learn the local surface profiles. The proposed approach was evaluated on simultaneous intraretinal layer segmentation of optical coherence tomography (OCT) images of normal retinas and retinas affected by age related macular degeneration (AMD). The proposed approach was validated on 40 retina OCT volumes including 20 normal and 20 AMD subjects. The experiments showed statistically significant improvement in accuracy for our approach compared to state-of-the-art graph based optimal surface segmentation with convex priors (G-OSC). A single Convolution Neural Network (CNN) was used to learn the surfaces for both normal and diseased images. The mean unsigned surface positioning errors obtained by G-OSC method 2.31 voxels (95% CI 2.02-2.60 voxels) was improved to $1.27$ voxels (95% CI 1.14-1.40 voxels) using our new approach. On average, our approach takes 94.34 s, requiring 95.35 MB memory, which is much faster than the 2837.46 s and 6.87 GB memory required by the G-OSC method on the same computer system. |
Tasks | |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07142v1 |
http://arxiv.org/pdf/1705.07142v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-multiple-surface-segmentation |
Repo | |
Framework | |
Automated Audio Captioning with Recurrent Neural Networks
Title | Automated Audio Captioning with Recurrent Neural Networks |
Authors | Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen |
Abstract | We present the first approach to automated audio captioning. We employ an encoder-decoder scheme with an alignment model in between. The input to the encoder is a sequence of log mel-band energies calculated from an audio file, while the output is a sequence of words, i.e. a caption. The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder. The classification layer and the alignment model are fully connected layers with shared weights between timesteps. The proposed method is evaluated using data drawn from a commercial sound effects library, ProSound Effects. The resulting captions were rated through metrics utilized in machine translation and image captioning fields. Results from metrics show that the proposed method can predict words appearing in the original caption, but not always correctly ordered. |
Tasks | Image Captioning, Machine Translation |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10006v2 |
http://arxiv.org/pdf/1706.10006v2.pdf | |
PWC | https://paperswithcode.com/paper/automated-audio-captioning-with-recurrent |
Repo | |
Framework | |
Development and evaluation of a deep learning model for protein-ligand binding affinity prediction
Title | Development and evaluation of a deep learning model for protein-ligand binding affinity prediction |
Authors | Marta M. Stepniewska-Dziubinska, Piotr Zielenkiewicz, Pawel Siedlecki |
Abstract | Structure based ligand discovery is one of the most successful approaches for augmenting the drug discovery process. Currently, there is a notable shift towards machine learning (ML) methodologies to aid such procedures. Deep learning has recently gained considerable attention as it allows the model to “learn” to extract features that are relevant for the task at hand. We have developed a novel deep neural network estimating the binding affinity of ligand-receptor complexes. The complex is represented with a 3D grid, and the model utilizes a 3D convolution to produce a feature map of this representation, treating the atoms of both proteins and ligands in the same manner. Our network was tested on the CASF “scoring power” benchmark and Astex Diverse Set and outperformed classical scoring functions. The model, together with usage instructions and examples, is available as a git repository at http://gitlab.com/cheminfIBB/pafnucy |
Tasks | Drug Discovery |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.07042v2 |
http://arxiv.org/pdf/1712.07042v2.pdf | |
PWC | https://paperswithcode.com/paper/development-and-evaluation-of-a-deep-learning |
Repo | |
Framework | |
Temporal Context Network for Activity Localization in Videos
Title | Temporal Context Network for Activity Localization in Videos |
Authors | Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen |
Abstract | We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset. |
Tasks | Temporal Localization |
Published | 2017-08-08 |
URL | http://arxiv.org/abs/1708.02349v1 |
http://arxiv.org/pdf/1708.02349v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-context-network-for-activity |
Repo | |
Framework | |
Optimised Maintenance of Datalog Materialisations
Title | Optimised Maintenance of Datalog Materialisations |
Authors | Pan Hu, Boris Motik, Ian Horrocks |
Abstract | To efficiently answer queries, datalog systems often materialise all consequences of a datalog program, so the materialisation must be updated whenever the input facts change. Several solutions to the materialisation update problem have been proposed. The Delete/Rederive (DRed) and the Backward/Forward (B/F) algorithms solve this problem for general datalog, but both contain steps that evaluate rules ‘backwards’ by matching their heads to a fact and evaluating the partially instantiated rule bodies as queries. We show that this can be a considerable source of overhead even on very small updates. In contrast, the Counting algorithm does not evaluate the rules ‘backwards’, but it can handle only nonrecursive rules. We present two hybrid approaches that combine DRed and B/F with Counting so as to reduce or even eliminate ‘backward’ rule evaluation while still handling arbitrary datalog programs. We show empirically that our hybrid algorithms are usually significantly faster than existing approaches, sometimes by orders of magnitude. |
Tasks | |
Published | 2017-11-10 |
URL | http://arxiv.org/abs/1711.03987v2 |
http://arxiv.org/pdf/1711.03987v2.pdf | |
PWC | https://paperswithcode.com/paper/optimised-maintenance-of-datalog |
Repo | |
Framework | |
An IoT Real-Time Biometric Authentication System Based on ECG Fiducial Extracted Features Using Discrete Cosine Transform
Title | An IoT Real-Time Biometric Authentication System Based on ECG Fiducial Extracted Features Using Discrete Cosine Transform |
Authors | Ahmed F. Hussein, Abbas K. AlZubaidi, Ali Al-Bayaty, Qais A. Habash |
Abstract | The conventional authentication technologies, like RFID tags and authentication cards/badges, suffer from different weaknesses, therefore a prompt replacement to use biometric method of authentication should be applied instead. Biometrics, such as fingerprints, voices, and ECG signals, are unique human characters that can be used for authentication processing. In this work, we present an IoT real-time authentication system based on using extracted ECG features to identify the unknown persons. The Discrete Cosine Transform (DCT) is used as an ECG feature extraction, where it has better characteristics for real-time system implementations. There are a substantial number of researches with a high accuracy of authentication, but most of them ignore the real-time capability of authenticating individuals. With the accuracy rate of 97.78% at around 1.21 seconds of processing time, the proposed system is more suitable for use in many applications that require fast and reliable authentication processing demands. |
Tasks | |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.08189v1 |
http://arxiv.org/pdf/1708.08189v1.pdf | |
PWC | https://paperswithcode.com/paper/an-iot-real-time-biometric-authentication |
Repo | |
Framework | |