Paper Group ANR 1254
Current Challenges in Spoken Dialogue Systems and Why They Are Critical for Those Living with Dementia. Optimising the Input Image to Improve Visual Relationship Detection. Deep Learning for Stock Selection Based on High Frequency Price-Volume Data. Combining nonparametric spatial context priors with nonparametric shape priors for dendritic spine s …
Current Challenges in Spoken Dialogue Systems and Why They Are Critical for Those Living with Dementia
Title | Current Challenges in Spoken Dialogue Systems and Why They Are Critical for Those Living with Dementia |
Authors | Angus Addlesee, Arash Eshghi, Ioannis Konstas |
Abstract | Dialogue technologies such as Amazon’s Alexa have the potential to transform the healthcare industry. However, current systems are not yet naturally interactive: they are often turn-based, have naive end-of-turn detection and completely ignore many types of verbal and visual feedback - such as backchannels, hesitation markers, filled pauses, gaze, brow furrows and disfluencies - that are crucial in guiding and managing the conversational process. This is especially important in the healthcare industry as target users of Spoken Dialogue Systems (SDSs) are likely to be frail, older, distracted or suffer from cognitive decline which impacts their ability to make effective use of current systems. In this paper, we outline some of the challenges that are in urgent need of further research, including Incremental Speech Recognition and a systematic study of the interactional patterns in conversation that are potentially diagnostic of dementia, and how these might inform research on and the design of the next generation of SDSs. |
Tasks | Speech Recognition, Spoken Dialogue Systems |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06644v1 |
https://arxiv.org/pdf/1909.06644v1.pdf | |
PWC | https://paperswithcode.com/paper/current-challenges-in-spoken-dialogue-systems |
Repo | |
Framework | |
Optimising the Input Image to Improve Visual Relationship Detection
Title | Optimising the Input Image to Improve Visual Relationship Detection |
Authors | Noel Mizzi, Adrian Muscat |
Abstract | Visual Relationship Detection is defined as, given an image composed of a subject and an object, the correct relation is predicted. To improve the visual part of this difficult problem, ten preprocessing methods were tested to determine whether the widely used Union method yields the optimal results. Therefore, focusing solely on predicate prediction, no object detection and linguistic knowledge were used to prevent them from affecting the comparison results. Once fine-tuned, the Visual Geometry Group models were evaluated using Recall@1, per-predicate recall, activation maximisations, class activation maps, and error analysis. From this research it was found that using preprocessing methods such as the Union-Without-Background-and-with-Binary-mask (Union-WB-and-B) method yields significantly better results than the widely used Union method since, as designed, it enables the Convolutional Neural Network to also identify the subject and object in the convolutional layers instead of solely in the fully-connected layers. |
Tasks | Object Detection |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.11029v1 |
http://arxiv.org/pdf/1903.11029v1.pdf | |
PWC | https://paperswithcode.com/paper/optimising-the-input-image-to-improve-visual |
Repo | |
Framework | |
Deep Learning for Stock Selection Based on High Frequency Price-Volume Data
Title | Deep Learning for Stock Selection Based on High Frequency Price-Volume Data |
Authors | Junming Yang, Yaoqi Li, Xuanyu Chen, Jiahang Cao, Kangkang Jiang |
Abstract | Training a practical and effective model for stock selection has been a greatly concerned problem in the field of artificial intelligence. Even though some of the models from previous works have achieved good performance in the U.S. market by using low-frequency data and features, training a suitable model with high-frequency stock data is still a problem worth exploring. Based on the high-frequency price data of the past several days, we construct two separate models-Convolution Neural Network and Long Short-Term Memory-which can predict the expected return rate of stocks on the current day, and select the stocks with the highest expected yield at the opening to maximize the total return. In our CNN model, we propose improvements on the CNNpred model presented by E. Hoseinzade and S. Haratizadeh in their paper which deals with low-frequency features. Such improvements enable our CNN model to exploit the convolution layer’s ability to extract high-level factors and avoid excessive loss of original information at the same time. Our LSTM model utilizes Recurrent Neural Network’advantages in handling time series data. Despite considerable transaction fees due to the daily changes of our stock position, annualized net rate of return is 62.27% for our CNN model, and 50.31% for our LSTM model. |
Tasks | Time Series |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02502v1 |
https://arxiv.org/pdf/1911.02502v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-stock-selection-based-on |
Repo | |
Framework | |
Combining nonparametric spatial context priors with nonparametric shape priors for dendritic spine segmentation in 2-photon microscopy images
Title | Combining nonparametric spatial context priors with nonparametric shape priors for dendritic spine segmentation in 2-photon microscopy images |
Authors | Ertunc Erdil, Ali Ozgur Argunsah, Tolga Tasdizen, Devrim Unay, Mujdat Cetin |
Abstract | Data driven segmentation is an important initial step of shape prior-based segmentation methods since it is assumed that the data term brings a curve to a plausible level so that shape and data terms can then work together to produce better segmentations. When purely data driven segmentation produces poor results, the final segmentation is generally affected adversely. One challenge faced by many existing data terms is due to the fact that they consider only pixel intensities to decide whether to assign a pixel to the foreground or to the background region. When the distributions of the foreground and background pixel intensities have significant overlap, such data terms become ineffective, as they produce uncertain results for many pixels in a test image. In such cases, using prior information about the spatial context of the object to be segmented together with the data term can bring a curve to a plausible stage, which would then serve as a good initial point to launch shape-based segmentation. In this paper, we propose a new segmentation approach that combines nonparametric context priors with a learned-intensity-based data term and nonparametric shape priors. We perform experiments for dendritic spine segmentation in both 2D and 3D 2-photon microscopy images. The experimental results demonstrate that using spatial context priors leads to significant improvements. |
Tasks | |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02513v2 |
http://arxiv.org/pdf/1901.02513v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-nonparametric-spatial-context |
Repo | |
Framework | |
Sales Demand Forecast in E-commerce using a Long Short-Term Memory Neural Network Methodology
Title | Sales Demand Forecast in E-commerce using a Long Short-Term Memory Neural Network Methodology |
Authors | Kasun Bandara, Peibei Shi, Christoph Bergmeir, Hansika Hewamalage, Quoc Tran, Brian Seaman |
Abstract | Generating accurate and reliable sales forecasts is crucial in the E-commerce business. The current state-of-the-art techniques are typically univariate methods, which produce forecasts considering only the historical sales data of a single product. However, in a situation where large quantities of related time series are available, conditioning the forecast of an individual time series on past behaviour of similar, related time series can be beneficial. Since the product assortment hierarchy in an E-commerce platform contains large numbers of related products, in which the sales demand patterns can be correlated, our attempt is to incorporate this cross-series information in a unified model. We achieve this by globally training a Long Short-Term Memory network (LSTM) that exploits the non-linear demand relationships available in an E-commerce product assortment hierarchy. Aside from the forecasting framework, we also propose a systematic pre-processing framework to overcome the challenges in the E-commerce business. We also introduce several product grouping strategies to supplement the LSTM learning schemes, in situations where sales patterns in a product portfolio are disparate. We empirically evaluate the proposed forecasting framework on a real-world online marketplace dataset from Walmart.com. Our method achieves competitive results on category level and super-departmental level datasets, outperforming state-of-the-art techniques. |
Tasks | Time Series |
Published | 2019-01-13 |
URL | https://arxiv.org/abs/1901.04028v2 |
https://arxiv.org/pdf/1901.04028v2.pdf | |
PWC | https://paperswithcode.com/paper/sales-demand-forecast-in-e-commerce-using-a |
Repo | |
Framework | |
Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective
Title | Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective |
Authors | Anirudh Vemula, Wen Sun, J. Andrew Bagnell |
Abstract | Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical analyses, we prove that complexity of exploration in parameter space depends on the dimensionality of parameter space, while complexity of exploration in action space depends on both the dimensionality of action space and horizon length. This is also demonstrated empirically by comparing simple exploration methods on several model problems, including Contextual Bandit, Linear Regression and Reinforcement Learning in continuous control. |
Tasks | Continuous Control |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11503v1 |
http://arxiv.org/pdf/1901.11503v1.pdf | |
PWC | https://paperswithcode.com/paper/contrasting-exploration-in-parameter-and |
Repo | |
Framework | |
Learning Depth from Monocular Videos Using Synthetic Data: A Temporally-Consistent Domain Adaptation Approach
Title | Learning Depth from Monocular Videos Using Synthetic Data: A Temporally-Consistent Domain Adaptation Approach |
Authors | Yipeng Mou, Mingming Gong, Huan Fu, Kayhan Batmanghelich, Kun Zhang, Dacheng Tao |
Abstract | Majority of state-of-the-art monocular depth estimation methods are supervised learning approaches. The success of such approaches heavily depends on the high-quality depth labels which are expensive to obtain. Some recent methods try to learn depth networks by leveraging unsupervised cues from monocular videos which are easier to acquire but less reliable. In this paper, we propose to resolve this dilemma by transferring knowledge from synthetic videos with easily obtainable ground-truth depth labels. Due to the stylish difference between synthetic and real images, we propose a temporally-consistent domain adaptation (TCDA) approach that simultaneously explores labels in the synthetic domain and temporal constraints in the videos to improve style transfer and depth prediction. Furthermore, we make use of the ground-truth optical flow and pose information in the synthetic data to learn moving mask and pose prediction networks. The learned moving masks can filter out moving regions that produces erroneous temporal constraints and the estimated poses provide better initializations for estimating temporal constraints. Experimental results demonstrate the effectiveness of our method and comparable performance against state-of-the-art. |
Tasks | Depth Estimation, Domain Adaptation, Monocular Depth Estimation, Optical Flow Estimation, Pose Prediction, Style Transfer |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.06882v2 |
https://arxiv.org/pdf/1907.06882v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-depth-from-monocular-videos-using-2 |
Repo | |
Framework | |
On the Duality between Network Flows and Network Lasso
Title | On the Duality between Network Flows and Network Lasso |
Authors | Alexander Jung |
Abstract | Many applications generate data with an intrinsic network structure such as time series data, image data or social network data. The network Lasso (nLasso) has been proposed recently as a method for joint clustering and optimization of machine learning models for networked data. The nLasso extends the Lasso from sparse linear models to clustered graph signals. This paper explores the duality of nLasso and network flow optimization. We show that, in a very precise sense, nLasso is equivalent to a minimum-cost flow problem on the data network structure. Our main technical result is a concise characterization of nLasso solutions via existence of certain network flows. The main conceptual result is a useful link between nLasso methods and basic graph algorithms such as clustering or maximum flow. |
Tasks | Time Series |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.01805v2 |
https://arxiv.org/pdf/1910.01805v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-duality-between-network-flows-and |
Repo | |
Framework | |
Learning Some Popular Gaussian Graphical Models without Condition Number Bounds
Title | Learning Some Popular Gaussian Graphical Models without Condition Number Bounds |
Authors | Jonathan Kelner, Frederic Koehler, Raghu Meka, Ankur Moitra |
Abstract | Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not. |
Tasks | |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01282v3 |
https://arxiv.org/pdf/1905.01282v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-some-popular-gaussian-graphical |
Repo | |
Framework | |
B-Spline CNNs on Lie Groups
Title | B-Spline CNNs on Lie Groups |
Authors | Erik J Bekkers |
Abstract | Group convolutional neural networks (G-CNNs) can be used to improve classical CNNs by equipping them with the geometric structure of groups. Central in the success of G-CNNs is the lifting of feature maps to higher dimensional disentangled representations, in which data characteristics are effectively learned, geometric data-augmentations are made obsolete, and predictable behavior under geometric transformations (equivariance) is guaranteed via group theory. Currently, however, the practical implementations of G-CNNs are limited to either discrete groups (that leave the grid intact) or continuous compact groups such as rotations (that enable the use of Fourier theory). In this paper we lift these limitations and propose a modular framework for the design and implementation of G-CNNs for arbitrary Lie groups. In our approach the differential structure of Lie groups is used to expand convolution kernels in a generic basis of B-splines that is defined on the Lie algebra. This leads to a flexible framework that enables localized, atrous, and deformable convolutions in G-CNNs by means of respectively localized, sparse and non-uniform B-spline expansions. The impact and potential of our approach is studied on two benchmark datasets: cancer detection in histopathology slides in which rotation equivariance plays a key role and facial landmark localization in which scale equivariance is important. In both cases, G-CNN architectures outperform their classical 2D counterparts and the added value of atrous and localized group convolutions is studied in detail. |
Tasks | Face Alignment |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12057v3 |
https://arxiv.org/pdf/1909.12057v3.pdf | |
PWC | https://paperswithcode.com/paper/b-spline-cnns-on-lie-groups-1 |
Repo | |
Framework | |
Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization
Title | Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization |
Authors | Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, Louis-Philippe Morency |
Abstract | There has been an increased interest in multimodal language processing including multimodal dialog, question answering, sentiment analysis, and speech recognition. However, naturally occurring multimodal data is often imperfect as a result of imperfect modalities, missing entries or noise corruption. To address these concerns, we present a regularization method based on tensor rank minimization. Our method is based on the observation that high-dimensional multimodal time series data often exhibit correlations across time and modalities which leads to low-rank tensor representations. However, the presence of noise or incomplete values breaks these correlations and results in tensor representations of higher rank. We design a model to learn such tensor representations and effectively regularize their rank. Experiments on multimodal language data show that our model achieves good results across various levels of imperfection. |
Tasks | Question Answering, Sentiment Analysis, Speech Recognition, Time Series |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.01011v1 |
https://arxiv.org/pdf/1907.01011v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-from-imperfect-time |
Repo | |
Framework | |
Edge SLAM: Edge Points Based Monocular Visual SLAM
Title | Edge SLAM: Edge Points Based Monocular Visual SLAM |
Authors | Soumyadip Maity, Arindam Saha, Brojeshwar Bhowmick |
Abstract | Visual SLAM shows significant progress in recent years due to high attention from vision community but still, challenges remain for low-textured environments. Feature based visual SLAMs do not produce reliable camera and structure estimates due to insufficient features in a low-textured environment. Moreover, existing visual SLAMs produce partial reconstruction when the number of 3D-2D correspondences is insufficient for incremental camera estimation using bundle adjustment. This paper presents Edge SLAM, a feature based monocular visual SLAM which mitigates the above mentioned problems. Our proposed Edge SLAM pipeline detects edge points from images and tracks those using optical flow for point correspondence. We further refine these point correspondences using geometrical relationship among three views. Owing to our edge-point tracking, we use a robust method for two-view initialization for bundle adjustment. Our proposed SLAM also identifies the potential situations where estimating a new camera into the existing reconstruction is becoming unreliable and we adopt a novel method to estimate the new camera reliably using a local optimization technique. We present an extensive evaluation of our proposed SLAM pipeline with most popular open datasets and compare with the state-of-the art. Experimental result indicates that our Edge SLAM is robust and works reliably well for both textured and less-textured environment in comparison to existing state-of-the-art SLAMs. |
Tasks | Optical Flow Estimation |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04210v1 |
http://arxiv.org/pdf/1901.04210v1.pdf | |
PWC | https://paperswithcode.com/paper/edge-slam-edge-points-based-monocular-visual |
Repo | |
Framework | |
On the Performance of Differential Evolution for Hyperparameter Tuning
Title | On the Performance of Differential Evolution for Hyperparameter Tuning |
Authors | Mischa Schmidt, Shahd Safarani, Julia Gastinger, Tobias Jacobs, Sebastien Nicolas, Anett Schülke |
Abstract | Automated hyperparameter tuning aspires to facilitate the application of machine learning for non-experts. In the literature, different optimization approaches are applied for that purpose. This paper investigates the performance of Differential Evolution for tuning hyperparameters of supervised learning algorithms for classification tasks. This empirical study involves a range of different machine learning algorithms and datasets with various characteristics to compare the performance of Differential Evolution with Sequential Model-based Algorithm Configuration (SMAC), a reference Bayesian Optimization approach. The results indicate that Differential Evolution outperforms SMAC for most datasets when tuning a given machine learning algorithm - particularly when breaking ties in a first-to-report fashion. Only for the tightest of computational budgets SMAC performs better. On small datasets, Differential Evolution outperforms SMAC by 19% (37% after tie-breaking). In a second experiment across a range of representative datasets taken from the literature, Differential Evolution scores 15% (23% after tie-breaking) more wins than SMAC. |
Tasks | |
Published | 2019-04-15 |
URL | http://arxiv.org/abs/1904.06960v1 |
http://arxiv.org/pdf/1904.06960v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-performance-of-differential-evolution |
Repo | |
Framework | |
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Title | TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines |
Authors | Jingxiang Lin, Unnat Jain, Alexander G. Schwing |
Abstract | Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense reasoning (VCR) task has been introduced. Not only do models have to answer questions, but also do they have to provide a reason for the given answer. The proposed baseline achieved compelling results, leveraging a meticulously designed model composed of LSTM modules and attention nets. Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters. By associating visual features with attribute information and better text to image grounding, we obtain further improvements for our simpler & effective baseline, TAB-VCR. We show that this approach results in a 5.3%, 4.4% and 6.5% absolute improvement over the previous state-of-the-art on question answering, answer justification and holistic VCR. |
Tasks | Question Answering, Visual Commonsense Reasoning, Visual Dialog, Visual Question Answering |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14671v2 |
https://arxiv.org/pdf/1910.14671v2.pdf | |
PWC | https://paperswithcode.com/paper/tab-vcr-tags-and-attributes-based-vcr |
Repo | |
Framework | |
Cross-Scale Residual Network for Multiple Tasks:Image Super-resolution, Denoising, and Deblocking
Title | Cross-Scale Residual Network for Multiple Tasks:Image Super-resolution, Denoising, and Deblocking |
Authors | Yuan Zhou, Xiaoting Du, Yeda Zhang, Sun-Yuan Kung |
Abstract | In general, image restoration involves mapping from low quality images to their high-quality counterparts. Such optimal mapping is usually non-linear and learnable by machine learning. Recently, deep convolutional neural networks have proven promising for such learning processing. It is desirable for an image processing network to support well with three vital tasks, namely, super-resolution, denoising, and deblocking. It is commonly recognized that these tasks have strong correlations. Therefore, it is imperative to harness the inter-task correlations. To this end, we propose the cross-scale residual network to exploit scale-related features and the inter-task correlations among the three tasks. The proposed network can extract multiple spatial scale features and establish multiple temporal feature reusage. Our experiments show that the proposed approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations for multiple image restoration tasks. |
Tasks | Denoising, Image Restoration, Super-Resolution |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01257v1 |
https://arxiv.org/pdf/1911.01257v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-scale-residual-network-for-multiple |
Repo | |
Framework | |