Paper Group ANR 872
Dynamic Temporal Alignment of Speech to Lips. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks. An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification. MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures. SufiSent - Univ …
Dynamic Temporal Alignment of Speech to Lips
Title | Dynamic Temporal Alignment of Speech to Lips |
Authors | Tavi Halperin, Ariel Ephrat, Shmuel Peleg |
Abstract | Many speech segments in movies are re-recorded in a studio during postproduction, to compensate for poor sound quality as recorded on location. Manual alignment of the newly-recorded speech with the original lip movements is a tedious task. We present an audio-to-video alignment method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this shared representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear, and where a constant shift of the sound can not give a perfect alignment. In these cases state-of-the-art methods will fail. |
Tasks | Video Alignment |
Published | 2018-08-19 |
URL | http://arxiv.org/abs/1808.06250v1 |
http://arxiv.org/pdf/1808.06250v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-temporal-alignment-of-speech-to-lips |
Repo | |
Framework | |
Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks
Title | Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks |
Authors | Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, Luo Si |
Abstract | Tasks such as search and recommendation have become increas- ingly important for E-commerce to deal with the information over- load problem. To meet the diverse needs of di erent users, person- alization plays an important role. In many large portals such as Taobao and Amazon, there are a bunch of di erent types of search and recommendation tasks operating simultaneously for person- alization. However, most of current techniques address each task separately. This is suboptimal as no information about users shared across di erent tasks. In this work, we propose to learn universal user representations across multiple tasks for more e ective personalization. In partic- ular, user behavior sequences (e.g., click, bookmark or purchase of products) are modeled by LSTM and attention mechanism by integrating all the corresponding content, behavior and temporal information. User representations are shared and learned in an end-to-end setting across multiple tasks. Bene ting from better information utilization of multiple tasks, the user representations are more e ective to re ect their interests and are more general to be transferred to new tasks. We refer this work as Deep User Perception Network (DUPN) and conduct an extensive set of o ine and online experiments. Across all tested ve di erent tasks, our DUPN consistently achieves better results by giving more e ective user representations. Moreover, we deploy DUPN in large scale operational tasks in Taobao. Detailed implementations, e.g., incre- mental model updating, are also provided to address the practical issues for the real world applications. |
Tasks | |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10727v1 |
http://arxiv.org/pdf/1805.10727v1.pdf | |
PWC | https://paperswithcode.com/paper/perceive-your-users-in-depth-learning |
Repo | |
Framework | |
An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification
Title | An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification |
Authors | Yixiu Zhao, Xiangrui Zeng, Qiang Guo, Min Xu |
Abstract | Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at submolecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results: Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation-maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost.Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT. |
Tasks | |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01203v1 |
http://arxiv.org/pdf/1804.01203v1.pdf | |
PWC | https://paperswithcode.com/paper/an-integration-of-fast-alignment-and-maximum |
Repo | |
Framework | |
MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures
Title | MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures |
Authors | Ahmed Fares, Walid Gomaa, Mohamed A. Khamis |
Abstract | The objective of this article is to optimize the overall traffic flow on freeways using multiple ramp metering controls plus its complementary Dynamic Speed Limits (DSLs). An optimal freeway operation can be reached when minimizing the difference between the freeway density and the critical ratio for maximum traffic flow. In this article, a Multi-Agent Reinforcement Learning for Freeways Control (MARL-FWC) system for ramps metering and DSLs is proposed. MARL-FWC introduces a new microscopic framework at the network level based on collaborative Markov Decision Process modeling (Markov game) and an associated cooperative Q-learning algorithm. The technique incorporates payoff propagation (Max-Plus algorithm) under the coordination graphs framework, particularly suited for optimal control purposes. MARL-FWC provides three control designs: fully independent, fully distributed, and centralized; suited for different network architectures. MARL-FWC was extensively tested in order to assess the proposed model of the joint payoff, as well as the global payoff. Experiments are conducted with heavy traffic flow under the renowned VISSIM traffic simulator to evaluate MARL-FWC. The experimental results show a significant decrease in the total travel time and an increase in the average speed (when compared with the base case) while maintaining an optimal traffic flow. |
Tasks | Multi-agent Reinforcement Learning, Q-Learning |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.09806v1 |
http://arxiv.org/pdf/1808.09806v1.pdf | |
PWC | https://paperswithcode.com/paper/marl-fwc-optimal-coordination-of-freeway |
Repo | |
Framework | |
SufiSent - Universal Sentence Representations Using Suffix Encodings
Title | SufiSent - Universal Sentence Representations Using Suffix Encodings |
Authors | Siddhartha Brahma |
Abstract | Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose a method to learn such representations by encoding the suffixes of word sequences in a sentence and training on the Stanford Natural Language Inference (SNLI) dataset. We demonstrate the effectiveness of our approach by evaluating it on the SentEval benchmark, improving on existing approaches on several transfer tasks. |
Tasks | Natural Language Inference |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07370v1 |
http://arxiv.org/pdf/1802.07370v1.pdf | |
PWC | https://paperswithcode.com/paper/sufisent-universal-sentence-representations |
Repo | |
Framework | |
Towards Image Understanding from Deep Compression without Decoding
Title | Towards Image Understanding from Deep Compression without Decoding |
Authors | Robert Torfason, Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool |
Abstract | Motivated by recent work on deep neural network (DNN)-based image compression methods showing potential improvements in image quality, savings in storage, and bandwidth reduction, we propose to perform image understanding tasks such as classification and segmentation directly on the compressed representations produced by these compression methods. Since the encoders and decoders in DNN-based compression methods are neural networks with feature-maps as internal representations of the images, we directly integrate these with architectures for image understanding. This bypasses decoding of the compressed representation into RGB space and reduces computational cost. Our study shows that accuracies comparable to networks that operate on compressed RGB images can be achieved while reducing the computational complexity up to $2\times$. Furthermore, we show that synergies are obtained by jointly training compression networks with classification networks on the compressed representations, improving image quality, classification accuracy, and segmentation performance. We find that inference from compressed representations is particularly advantageous compared to inference from compressed RGB images for aggressive compression rates. |
Tasks | Image Compression |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06131v1 |
http://arxiv.org/pdf/1803.06131v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-image-understanding-from-deep |
Repo | |
Framework | |
Discovering Latent Patterns of Urban Cultural Interactions in WeChat for Modern City Planning
Title | Discovering Latent Patterns of Urban Cultural Interactions in WeChat for Modern City Planning |
Authors | Xiao Zhou, Anastasios Noulas, Cecilia Mascoloo, Zhongxiang Zhao |
Abstract | Cultural activity is an inherent aspect of urban life and the success of a modern city is largely determined by its capacity to offer generous cultural entertainment to its citizens. To this end, the optimal allocation of cultural establishments and related resources across urban regions becomes of vital importance, as it can reduce financial costs in terms of planning and improve quality of life in the city, more generally. In this paper, we make use of a large longitudinal dataset of user location check-ins from the online social network WeChat to develop a data-driven framework for cultural planning in the city of Beijing. We exploit rich spatio-temporal representations on user activity at cultural venues and use a novel extended version of the traditional latent Dirichlet allocation model that incorporates temporal information to identify latent patterns of urban cultural interactions. Using the characteristic typologies of mobile user cultural activities emitted by the model, we determine the levels of demand for different types of cultural resources across urban areas. We then compare those with the corresponding levels of supply as driven by the presence and spatial reach of cultural venues in local areas to obtain high resolution maps that indicate urban regions with lack of cultural resources, and thus give suggestions for further urban cultural planning and investment optimisation. |
Tasks | |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05694v1 |
http://arxiv.org/pdf/1806.05694v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-latent-patterns-of-urban-cultural |
Repo | |
Framework | |
Pixel-level Semantics Guided Image Colorization
Title | Pixel-level Semantics Guided Image Colorization |
Authors | Jiaojiao Zhao, Li Liu, Cees G. M. Snoek, Jungong Han, Ling Shao |
Abstract | While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from the problems of context confusion and edge color bleeding. To address context confusion, we propose to incorporate the pixel-level object semantics to guide the image colorization. The rationale is that human beings perceive and distinguish colors based on the object’s semantic categories. We propose a hierarchical neural network with two branches. One branch learns what the object is while the other branch learns the object’s colors. The network jointly optimizes a semantic segmentation loss and a colorization loss. To attack edge color bleeding we generate more continuous color maps with sharp edges by adopting a joint bilateral upsamping layer at inference. Our network is trained on PASCAL VOC2012 and COCO-stuff with semantic segmentation labels and it produces more realistic and finer results compared to the colorization state-of-the-art. |
Tasks | Colorization, Semantic Segmentation |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01597v1 |
http://arxiv.org/pdf/1808.01597v1.pdf | |
PWC | https://paperswithcode.com/paper/pixel-level-semantics-guided-image |
Repo | |
Framework | |
Object Detection in Video with Spatiotemporal Sampling Networks
Title | Object Detection in Video with Spatiotemporal Sampling Networks |
Authors | Gedas Bertasius, Lorenzo Torresani, Jianbo Shi |
Abstract | We propose a Spatiotemporal Sampling Network (STSN) that uses deformable convolutions across time for object detection in videos. Our STSN performs object detection in a video frame by learning to spatially sample features from the adjacent frames. This naturally renders the approach robust to occlusion or motion blur in individual frames. Our framework does not require additional supervision, as it optimizes sampling locations directly with respect to object detection performance. Our STSN outperforms the state-of-the-art on the ImageNet VID dataset and compared to prior video object detection methods it uses a simpler design, and does not require optical flow data for training. |
Tasks | Object Detection, Optical Flow Estimation, Video Object Detection |
Published | 2018-03-15 |
URL | http://arxiv.org/abs/1803.05549v2 |
http://arxiv.org/pdf/1803.05549v2.pdf | |
PWC | https://paperswithcode.com/paper/object-detection-in-video-with-spatiotemporal |
Repo | |
Framework | |
Machine Learning in Astronomy: A Case Study in Quasar-Star Classification
Title | Machine Learning in Astronomy: A Case Study in Quasar-Star Classification |
Authors | Mohammed Viquar, Suryoday Basak, Ariruna Dasgupta, Surbhi Agrawal, Snehanshu Saha |
Abstract | We present the results of various automated classification methods, based on machine learning (ML), of objects from data releases 6 and 7 (DR6 and DR7) of the Sloan Digital Sky Survey (SDSS), primarily distinguishing stars from quasars. We provide a careful scrutiny of approaches available in the literature and have highlighted the pitfalls in those approaches based on the nature of data used for the study. The aim is to investigate the appropriateness of the application of certain ML methods. The manuscript argues convincingly in favor of the efficacy of asymmetric AdaBoost to classify photometric data. The paper presents a critical review of existing study and puts forward an application of asymmetric AdaBoost, as an offspring of that exercise. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.05051v1 |
http://arxiv.org/pdf/1804.05051v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-in-astronomy-a-case-study-in |
Repo | |
Framework | |
Assessment of electrical and infrastructure recovery in Puerto Rico following hurricane Maria using a multisource time series of satellite imagery
Title | Assessment of electrical and infrastructure recovery in Puerto Rico following hurricane Maria using a multisource time series of satellite imagery |
Authors | Jacob Shermeyer |
Abstract | Puerto Rico suffered severe damage from the category 5 hurricane (Maria) in September 2017. Total monetary damages are estimated to be ~92 billion USD, the third most costly tropical cyclone in US history. The response to this damage has been tempered and slow moving, with recent estimates placing 45% of the population without power three months after the storm. Consequently, we developed a unique data-fusion mapping approach called the Urban Development Index (UDI) and new open source tool, Comet Time Series (CometTS), to analyze the recovery of electricity and infrastructure in Puerto Rico. Our approach incorporates a combination of time series visualizations and change detection mapping to create depictions of power or infrastructure loss. It also provides a unique independent assessment of areas that are still struggling to recover. For this workflow, our time series approach combines nighttime imagery from the Suomi National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP VIIRS), multispectral imagery from two Landsat satellites, US Census data, and crowd-sourced building footprint labels. Based upon our approach we can identify and evaluate: 1) the recovery of electrical power compared to pre-storm levels, 2) the location of potentially damaged infrastructure that has yet to recover from the storm, and 3) the number of persons without power over time. As of May 31, 2018, declined levels of observed brightness across the island indicate that 13.9% +/- ~5.6% of persons still lack power and/or that 13.2% +/- ~5.3% of infrastructure has been lost. In comparison, the Puerto Rico Electric Power Authority states that less than 1% of their customers still are without power. |
Tasks | Time Series |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.05854v1 |
http://arxiv.org/pdf/1807.05854v1.pdf | |
PWC | https://paperswithcode.com/paper/assessment-of-electrical-and-infrastructure |
Repo | |
Framework | |
The Viterbi process, decay-convexity and parallelized maximum a-posteriori estimation
Title | The Viterbi process, decay-convexity and parallelized maximum a-posteriori estimation |
Authors | Nick Whiteley, Matt W. Jones, Aleks P. F. Domanski |
Abstract | The Viterbi process is the limiting maximum a-posteriori estimate of the unobserved path in a hidden Markov model as the length of the time horizon grows. The existence of such a process suggests that approximate estimation using optimization algorithms which process data segments in parallel may be accurate. For models on state-space $\mathbb{R}^{d}$ satisfying a new “decay-convexity” condition, we develop an approach to existence of the Viterbi process via fixed points of ordinary differential equations in a certain infinite dimensional Hilbert space. Bounds on the distance to the Viterbi process show that approximate estimation via parallelization can indeed be accurate and scaleable to high-dimensional problems because the rate of convergence to the Viterbi process does not necessarily depend on $d$. The results are applied to a factor model with stochastic volatility and a model of neural population activity. |
Tasks | |
Published | 2018-10-08 |
URL | https://arxiv.org/abs/1810.04115v4 |
https://arxiv.org/pdf/1810.04115v4.pdf | |
PWC | https://paperswithcode.com/paper/the-viterbi-process-decay-convexity-and |
Repo | |
Framework | |
One-shot Learning for iEEG Seizure Detection Using End-to-end Binary Operations: Local Binary Patterns with Hyperdimensional Computing
Title | One-shot Learning for iEEG Seizure Detection Using End-to-end Binary Operations: Local Binary Patterns with Hyperdimensional Computing |
Authors | Alessio Burrello, Kaspar Schindler, Luca Benini, Abbas Rahimi |
Abstract | This paper presents an efficient binarized algorithm for both learning and classification of human epileptic seizures from intracranial electroencephalography (iEEG). The algorithm combines local binary patterns with brain-inspired hyperdimensional computing to enable end-to-end learning and inference with binary operations. The algorithm first transforms iEEG time series from each electrode into local binary pattern codes. Then atomic high-dimensional binary vectors are used to construct composite representations of seizures across all electrodes. For the majority of our patients (10 out of 16), the algorithm quickly learns from one or two seizures (i.e., one-/few-shot learning) and perfectly generalizes on 27 further seizures. For other patients, the algorithm requires three to six seizures for learning. Overall, our algorithm surpasses the state-of-the-art methods for detecting 65 novel seizures with higher specificity and sensitivity, and lower memory footprint. |
Tasks | Few-Shot Learning, One-Shot Learning, Seizure Detection, Time Series |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01926v1 |
http://arxiv.org/pdf/1809.01926v1.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-learning-for-ieeg-seizure-detection |
Repo | |
Framework | |
Low complexity convolutional neural network for vessel segmentation in portable retinal diagnostic devices
Title | Low complexity convolutional neural network for vessel segmentation in portable retinal diagnostic devices |
Authors | M. Hajabdollahi, R. Esfandiarpoor, S. M. R. Soroushmehr, N. Karimi, S. Samavi, K. Najarian |
Abstract | Retinal vessel information is helpful in retinal disease screening and diagnosis. Retinal vessel segmentation provides useful information about vessels and can be used by physicians during intraocular surgery and retinal diagnostic operations. Convolutional neural networks (CNNs) are powerful tools for classification and segmentation of medical images. Complexity of CNNs makes it difficult to implement them in portable devices such as binocular indirect ophthalmoscopes. In this paper a simplification approach is proposed for CNNs based on combination of quantization and pruning. Fully connected layers are quantized and convolutional layers are pruned to have a simple and efficient network structure. Experiments on images of the STARE dataset show that our simplified network is able to segment retinal vessels with acceptable accuracy and low complexity. |
Tasks | Quantization, Retinal Vessel Segmentation |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07804v1 |
http://arxiv.org/pdf/1802.07804v1.pdf | |
PWC | https://paperswithcode.com/paper/low-complexity-convolutional-neural-network |
Repo | |
Framework | |
Early Seizure Detection with an Energy-Efficient Convolutional Neural Network on an Implantable Microcontroller
Title | Early Seizure Detection with an Energy-Efficient Convolutional Neural Network on an Implantable Microcontroller |
Authors | Maria Hügle, Simon Heller, Manuel Watter, Manuel Blum, Farrokh Manzouri, Matthias Dümpelmann, Andreas Schulze-Bonhage, Peter Woias, Joschka Boedecker |
Abstract | Implantable, closed-loop devices for automated early detection and stimulation of epileptic seizures are promising treatment options for patients with severe epilepsy that cannot be treated with traditional means. Most approaches for early seizure detection in the literature are, however, not optimized for implementation on ultra-low power microcontrollers required for long-term implantation. In this paper we present a convolutional neural network for the early detection of seizures from intracranial EEG signals, designed specifically for this purpose. In addition, we investigate approximations to comply with hardware limits while preserving accuracy. We compare our approach to three previously proposed convolutional neural networks and a feature-based SVM classifier with respect to detection accuracy, latency and computational needs. Evaluation is based on a comprehensive database with long-term EEG recordings. The proposed method outperforms the other detectors with a median sensitivity of 0.96, false detection rate of 10.1 per hour and median detection delay of 3.7 seconds, while being the only approach suited to be realized on a low power microcontroller due to its parsimonious use of computational and memory resources. |
Tasks | EEG, Seizure Detection |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04549v1 |
http://arxiv.org/pdf/1806.04549v1.pdf | |
PWC | https://paperswithcode.com/paper/early-seizure-detection-with-an-energy |
Repo | |
Framework | |