October 17, 2019

3354 words 16 mins read

Paper Group ANR 952

Ego-Downward and Ambient Video based Person Location Association. Cooking State Recognition from Images Using Inception Architecture. RUSSE’2018: A Shared Task on Word Sense Induction for the Russian Language. A generalized financial time series forecasting model based on automatic feature engineering using genetic algorithms and support vector mac …

Ego-Downward and Ambient Video based Person Location Association


Title	Ego-Downward and Ambient Video based Person Location Association
Authors	Liang Yang, Hao Jiang, Jizhong Xiao, Zhouyuan Huo
Abstract	Using an ego-centric camera to do localization and tracking is highly needed for urban navigation and indoor assistive system when GPS is not available or not accurate enough. The traditional hand-designed feature tracking and estimation approach would fail without visible features. Recently, there are several works exploring to use context features to do localization. However, all of these suffer severe accuracy loss if given no visual context information. To provide a possible solution to this problem, this paper proposes a camera system with both ego-downward and third-static view to perform localization and tracking in a learning approach. Besides, we also proposed a novel action and motion verification model for cross-view verification and localization. We performed comparative experiments based on our collected dataset which considers the same dressing, gender, and background diversity. Results indicate that the proposed model can achieve $18.32 %$ improvement in accuracy performance. Eventually, we tested the model on multi-people scenarios and obtained an average $67.767 %$ accuracy.
Tasks
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00477v1
PDF	http://arxiv.org/pdf/1812.00477v1.pdf
PWC	https://paperswithcode.com/paper/ego-downward-and-ambient-video-based-person
Repo
Framework

Cooking State Recognition from Images Using Inception Architecture


Title	Cooking State Recognition from Images Using Inception Architecture
Authors	Md Sirajus Salekin, Ahmad Babaeian Jelodar, Rafsanjany Kushol
Abstract	A kitchen robot properly needs to understand the cooking environment to continue any cooking activities. But object’s state detection has not been researched well so far as like object detection. In this paper, we propose a deep learning approach to identify different cooking states from images for a kitchen robot. In our research, we investigate particularly the performance of Inception architecture and propose a modified architecture based on Inception model to classify different cooking states. The model is analyzed robustly in terms of different layers, and optimizers. Experimental results on a cooking datasets demonstrate that proposed model can be a potential solution to the cooking state recognition problem.
Tasks	Object Detection
Published	2018-05-25
URL	http://arxiv.org/abs/1805.09967v2
PDF	http://arxiv.org/pdf/1805.09967v2.pdf
PWC	https://paperswithcode.com/paper/cooking-state-recognition-from-images-using
Repo
Framework

RUSSE’2018: A Shared Task on Word Sense Induction for the Russian Language


Title	RUSSE’2018: A Shared Task on Word Sense Induction for the Russian Language
Authors	Alexander Panchenko, Anastasiya Lopukhina, Dmitry Ustalov, Konstantin Lopukhin, Nikolay Arefyev, Alexey Leontyev, Natalia Loukachevitch
Abstract	The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and virtually free word order. The participants were asked to group contexts of a given word in accordance with its senses that were not provided beforehand. For instance, given a word “bank” and a set of contexts for this word, e.g. “bank is a financial institution that accepts deposits” and “river bank is a slope beside a body of water”, a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the “company” and the “area” senses of the word “bank”. For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall, 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings.
Tasks	Word Sense Induction
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05795v3
PDF	http://arxiv.org/pdf/1803.05795v3.pdf
PWC	https://paperswithcode.com/paper/russe2018-a-shared-task-on-word-sense
Repo
Framework

A generalized financial time series forecasting model based on automatic feature engineering using genetic algorithms and support vector machine


Title	A generalized financial time series forecasting model based on automatic feature engineering using genetic algorithms and support vector machine
Authors	Norberto Ritzmann Junior, Julio Cesar Nievola
Abstract	We propose the genetic algorithm for time window optimization, which is an embedded genetic algorithm (GA), to optimize the time window (TW) of the attributes using feature selection and support vector machine. This GA is evolved using the results of a trading simulation, and it determines the best TW for each technical indicator. An appropriate evaluation was conducted using a walk-forward trading simulation, and the trained model was verified to be generalizable for forecasting other stock data. The results show that using the GA to determine the TW can improve the rate of return, leading to better prediction models than those resulting from using the default TW.
Tasks	Feature Engineering, Feature Selection, Time Series, Time Series Forecasting
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06775v1
PDF	http://arxiv.org/pdf/1809.06775v1.pdf
PWC	https://paperswithcode.com/paper/a-generalized-financial-time-series
Repo
Framework

Correlated Time Series Forecasting using Deep Neural Networks: A Summary of Results


Title	Correlated Time Series Forecasting using Deep Neural Networks: A Summary of Results
Authors	Razvan-Gabriel Cirstea, Darius-Valer Micu, Gabriel-Marcel Muresan, Chenjuan Guo, Bin Yang
Abstract	Cyber-physical systems often consist of entities that interact with each other over time. Meanwhile, as part of the continued digitization of industrial processes, various sensor technologies are deployed that enable us to record time-varying attributes (a.k.a., time series) of such entities, thus producing correlated time series. To enable accurate forecasting on such correlated time series, this paper proposes two models that combine convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The first model employs a CNN on each individual time series, combines the convoluted features, and then applies an RNN on top of the convoluted features in the end to enable forecasting. The second model adds additional auto-encoders into the individual CNNs, making the second model a multi-task learning model, which provides accurate and robust forecasting. Experiments on two real-world correlated time series data set suggest that the proposed two models are effective and outperform baselines in most settings. This report extends the paper “Correlated Time Series Forecasting using Multi-Task Deep Neural Networks,” to appear in ACM CIKM 2018, by providing additional experimental results.
Tasks	Multi-Task Learning, Time Series, Time Series Forecasting
Published	2018-08-29
URL	http://arxiv.org/abs/1808.09794v2
PDF	http://arxiv.org/pdf/1808.09794v2.pdf
PWC	https://paperswithcode.com/paper/correlated-time-series-forecasting-using-deep
Repo
Framework

Information geometry for approximate Bayesian computation


Title	Information geometry for approximate Bayesian computation
Authors	Konstantinos Spiliopoulos
Abstract	The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the threshold parameter and of the size of the data. Relative entropy here is data driven as it depends on the values of the observed statistics. Relative entropy also allows us to explore the effect of the distance metric and sets up a mathematical framework for sensitivity analysis allowing to find important directions which could lead to lower computational cost of the algorithm for the same level of accuracy. In addition, we also investigate the bias of the estimators for generic observables as a function of both the threshold parameters and the size of the data. Our analysis provides error bounds on performance for positive tolerances and finite sample sizes. Simulation studies complement and illustrate the theoretical results.
Tasks
Published	2018-12-05
URL	https://arxiv.org/abs/1812.02127v2
PDF	https://arxiv.org/pdf/1812.02127v2.pdf
PWC	https://paperswithcode.com/paper/information-geometry-for-approximate-bayesian
Repo
Framework

Molecular Structure Extraction From Documents Using Deep Learning


Title	Molecular Structure Extraction From Documents Using Deep Learning
Authors	Joshua Staker, Kyle Marshall, Robert Abel, Carolyn McQuaw
Abstract	Chemical structure extraction from documents remains a hard problem due to both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally, but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We here present end-to-end deep learning solutions for both segmenting molecular structures from documents and for predicting chemical structures from these segmented images. This deep learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep-learning approach described herein we show that it is possible to perform well on both segmentation and prediction of low resolution images containing moderately sized molecules found in journal articles and patents.
Tasks
Published	2018-02-14
URL	http://arxiv.org/abs/1802.04903v1
PDF	http://arxiv.org/pdf/1802.04903v1.pdf
PWC	https://paperswithcode.com/paper/molecular-structure-extraction-from-documents
Repo
Framework

Combining time-series and textual data for taxi demand prediction in event areas: a deep learning approach


Title	Combining time-series and textual data for taxi demand prediction in event areas: a deep learning approach
Authors	Filipe Rodrigues, Ioulia Markou, Francisco Pereira
Abstract	Accurate time-series forecasting is vital for numerous areas of application such as transportation, energy, finance, economics, etc. However, while modern techniques are able to explore large sets of temporal data to build forecasting models, they typically neglect valuable information that is often available under the form of unstructured text. Although this data is in a radically different format, it often contains contextual explanations for many of the patterns that are observed in the temporal data. In this paper, we propose two deep learning architectures that leverage word embeddings, convolutional layers and attention mechanisms for combining text information with time-series data. We apply these approaches for the problem of taxi demand forecasting in event areas. Using publicly available taxi data from New York, we empirically show that by fusing these two complementary cross-modal sources of information, the proposed models are able to significantly reduce the error in the forecasts.
Tasks	Time Series, Time Series Forecasting, Word Embeddings
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05535v1
PDF	http://arxiv.org/pdf/1808.05535v1.pdf
PWC	https://paperswithcode.com/paper/combining-time-series-and-textual-data-for
Repo
Framework

A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations


Title	A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations
Authors	Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philippe von Wurstemberger
Abstract	Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, such numerical simulations indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems. There are also a series of rigorous mathematical approximation results for ANNs in the scientific literature. Some of these mathematical results prove convergence without convergence rates and some of these mathematical results even rigorously establish convergence rates but there are only a few special cases where mathematical results can rigorously explain the empirical success of ANNs when approximating high-dimensional functions. The key contribution of this article is to disclose that ANNs can efficiently approximate high-dimensional functions in the case of numerical approximations of Black-Scholes PDEs. More precisely, this work reveals that the number of required parameters of an ANN to approximate the solution of the Black-Scholes PDE grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$ and the PDE dimension $d \in \mathbb{N}$ and we thereby prove, for the first time, that ANNs do indeed overcome the curse of dimensionality in the numerical approximation of Black-Scholes PDEs.
Tasks	Image Classification, Speech Recognition, Time Series, Time Series Analysis
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02362v1
PDF	http://arxiv.org/pdf/1809.02362v1.pdf
PWC	https://paperswithcode.com/paper/a-proof-that-artificial-neural-networks
Repo
Framework

A Greedy Approach to $\ell_{0,\infty}$ Based Convolutional Sparse Coding


Title	A Greedy Approach to $\ell_{0,\infty}$ Based Convolutional Sparse Coding
Authors	Elad Plaut, Raja Giryes
Abstract	Sparse coding techniques for image processing traditionally rely on a processing of small overlapping patches separately followed by averaging. This has the disadvantage that the reconstructed image no longer obeys the sparsity prior used in the processing. For this purpose convolutional sparse coding has been introduced, where a shift-invariant dictionary is used and the sparsity of the recovered image is maintained. Most such strategies target the $\ell_0$ “norm” or the $\ell_1$ norm of the whole image, which may create an imbalanced sparsity across various regions in the image. In order to face this challenge, the $\ell_{0,\infty}$ “norm” has been proposed as an alternative that “operates locally while thinking globally”. The approaches taken for tackling the non-convexity of these optimization problems have been either using a convex relaxation or local pursuit algorithms. In this paper, we present an efficient greedy method for sparse coding and dictionary learning, which is specifically tailored to $\ell_{0,\infty}$, and is based on matching pursuit. We demonstrate the usage of our approach in salt-and-pepper noise removal and image inpainting. A code package which reproduces the experiments presented in this work is available at https://web.eng.tau.ac.il/~raja
Tasks	Dictionary Learning, Image Inpainting, Salt-And-Pepper Noise Removal
Published	2018-12-26
URL	http://arxiv.org/abs/1812.10538v1
PDF	http://arxiv.org/pdf/1812.10538v1.pdf
PWC	https://paperswithcode.com/paper/a-greedy-approach-to-ell_0infty-based
Repo
Framework

Dual Reweighted Lp-Norm Minimization for Salt-and-pepper Noise Removal


Title	Dual Reweighted Lp-Norm Minimization for Salt-and-pepper Noise Removal
Authors	Huiwen Dong, Jing Yu, Chuangbai Xiao
Abstract	The robust principal component analysis (RPCA), which aims to estimate underlying low-rank and sparse structures from the degraded observation data, has found wide applications in computer vision. It is usually replaced by the principal component pursuit (PCP) model in order to pursue the convex property, leading to the undesirable overshrink problem. In this paper, we propose a dual weighted lp-norm (DWLP) model with a more reasonable weighting rule and weaker powers, which greatly generalizes the previous work and provides a better approximation to the rank minimization problem for original matrix as well as the l0-norm minimization problem for sparse data. Moreover, an approximate closed-form solution is introduced to solve the lp-norm minimization, which has more stability in the nonconvex optimization and provides a more accurate estimation for the low-rank and sparse matrix recovery problem. We then apply the DWLP model to remove salt-and-pepper noise by exploiting the image nonlocal self-similarity. Both qualitative and quantitative experiments demonstrate that the proposed method outperforms other state-of-the-art methods. In terms of PSNR evaluation, our DWLP achieves about 7.188dB, 5.078dB, 3.854dB, 2.536dB and 0.158dB improvements over the current WSNM-RPCA under 10% to 50% salt-and-pepper noise with an interval 10% respectively.
Tasks	Salt-And-Pepper Noise Removal
Published	2018-11-22
URL	https://arxiv.org/abs/1811.09173v3
PDF	https://arxiv.org/pdf/1811.09173v3.pdf
PWC	https://paperswithcode.com/paper/dual-reweighted-lp-norm-minimization-for-salt
Repo
Framework

Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image


Title	Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image
Authors	Zhengqin Li, Kalyan Sunkavalli, Manmohan Chandraker
Abstract	We propose a material acquisition approach to recover the spatially-varying BRDF and normal map of a near-planar surface from a single image captured by a handheld mobile phone camera. Our method images the surface under arbitrary environment lighting with the flash turned on, thereby avoiding shadows while simultaneously capturing high-frequency specular highlights. We train a CNN to regress an SVBRDF and surface normals from this image. Our network is trained using a large-scale SVBRDF dataset and designed to incorporate physical insights for material estimation, including an in-network rendering layer to model appearance and a material classifier to provide additional supervision during training. We refine the results from the network using a dense CRF module whose terms are designed specifically for our task. The framework is trained end-to-end and produces high quality results for a variety of materials. We provide extensive ablation studies to evaluate our network on both synthetic and real data, while demonstrating significant improvements in comparisons with prior works.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05790v1
PDF	http://arxiv.org/pdf/1804.05790v1.pdf
PWC	https://paperswithcode.com/paper/materials-for-masses-svbrdf-acquisition-with
Repo
Framework

An Empirical Comparison of Syllabuses for Curriculum Learning


Title	An Empirical Comparison of Syllabuses for Curriculum Learning
Authors	Mark Collier, Joeran Beel
Abstract	Syllabuses for curriculum learning have been developed on an ad-hoc, per task basis and little is known about the relative performance of different syllabuses. We identify a number of syllabuses used in the literature. We compare the identified syllabuses based on their effect on the speed of learning and generalization ability of a LSTM network on three sequential learning tasks. We find that the choice of syllabus has limited effect on the generalization ability of a trained network. In terms of speed of learning our results demonstrate that the best syllabus is task dependent but that a recently proposed automated curriculum learning approach - Predictive Gain, performs very competitively against all identified hand-crafted syllabuses. The best performing hand-crafted syllabus which we term Look Back and Forward combines a syllabus which steps through tasks in the order of their difficulty with a uniform distribution over all tasks. Our experimental results provide an empirical basis for the choice of syllabus on a new problem that could benefit from curriculum learning. Additionally, insights derived from our results shed light on how to successfully design new syllabuses.
Tasks
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10789v2
PDF	http://arxiv.org/pdf/1809.10789v2.pdf
PWC	https://paperswithcode.com/paper/an-empirical-comparison-of-syllabuses-for
Repo
Framework

Deep Multimodal Learning: An Effective Method for Video Classification


Title	Deep Multimodal Learning: An Effective Method for Video Classification
Authors	Tianqi Zhao
Abstract	Videos have become ubiquitous on the Internet. And video analysis can provide lots of information for detecting and recognizing objects as well as help people understand human actions and interactions with the real world. However, facing data as huge as TB level, effective methods should be applied. Recurrent neural network (RNN) architecture has wildly been used on many sequential learning problems such as Language Model, Time-Series Analysis, etc. In this paper, we propose some variations of RNN such as stacked bidirectional LSTM/GRU network with attention mechanism to categorize large-scale video data. We also explore different multimodal fusion methods. Our model combines both visual and audio information on both video and frame level and received great result. Ensemble methods are also applied. Because of its multimodal characteristics, we decide to call this method Deep Multimodal Learning(DML). Our DML-based model was trained on Google Cloud and our own server and was tested in a well-known video classification competition on Kaggle held by Google.
Tasks	Language Modelling, Time Series, Time Series Analysis, Video Classification
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12563v1
PDF	http://arxiv.org/pdf/1811.12563v1.pdf
PWC	https://paperswithcode.com/paper/deep-multimodal-learning-an-effective-method
Repo
Framework

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM


Title	CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM
Authors	Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison
Abstract	The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only. We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00874v2
PDF	http://arxiv.org/pdf/1804.00874v2.pdf
PWC	https://paperswithcode.com/paper/codeslam-learning-a-compact-optimisable
Repo
Framework