January 31, 2020

3085 words 15 mins read

Paper Group ANR 57

Paper Group ANR 57

A comprehensive study of speech separation: spectrogram vs waveform separation. How model accuracy and explanation fidelity influence user trust. Machine learning approach for segmenting glands in colon histology images using local intensity and texture features. Iterative Path Reconstruction for Large-Scale Inertial Navigation on Smartphones. Pose …

A comprehensive study of speech separation: spectrogram vs waveform separation

Title A comprehensive study of speech separation: spectrogram vs waveform separation
Authors Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu
Abstract Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform separation network (TasNet) is introduced for single-channel data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against the state-of-the-art solution in frequency-domain. In this study, we incorporate effective components of the TasNet into a frequency-domain separation method. We compare both for alternative scenarios. We introduce a solution for directly optimizing the separation criterion in frequency-domain networks. In addition to speech separation objective and subjective measurements, we evaluate the separation performance on a speech recognition task as well. We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information. For our experiments, we simulated multi-channel spatialized reverberate WSJ0-2mix dataset. Our experimental results show that spectrogram separation can achieve competitive performance with better network design. Multi-channel framework as well is shown to improve the single-channel performance relatively up to +35.5% and +46% in terms of WER and SDR, respectively.
Tasks Speech Recognition, Speech Separation
Published 2019-05-17
URL https://arxiv.org/abs/1905.07497v2
PDF https://arxiv.org/pdf/1905.07497v2.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-study-of-speech-separation
Repo
Framework

How model accuracy and explanation fidelity influence user trust

Title How model accuracy and explanation fidelity influence user trust
Authors Andrea Papenmeier, Gwenn Englebienne, Christin Seifert
Abstract Machine learning systems have become popular in fields such as marketing, financing, or data mining. While they are highly accurate, complex machine learning systems pose challenges for engineers and users. Their inherent complexity makes it impossible to easily judge their fairness and the correctness of statistically learned relations between variables and classes. Explainable AI aims to solve this challenge by modelling explanations alongside with the classifiers, potentially improving user trust and acceptance. However, users should not be fooled by persuasive, yet untruthful explanations. We therefore conduct a user study in which we investigate the effects of model accuracy and explanation fidelity, i.e. how truthfully the explanation represents the underlying model, on user trust. Our findings show that accuracy is more important for user trust than explainability. Adding an explanation for a classification result can potentially harm trust, e.g. when adding nonsensical explanations. We also found that users cannot be tricked by high-fidelity explanations into having trust for a bad classifier. Furthermore, we found a mismatch between observed (implicit) and self-reported (explicit) trust.
Tasks
Published 2019-07-26
URL https://arxiv.org/abs/1907.12652v1
PDF https://arxiv.org/pdf/1907.12652v1.pdf
PWC https://paperswithcode.com/paper/how-model-accuracy-and-explanation-fidelity
Repo
Framework

Machine learning approach for segmenting glands in colon histology images using local intensity and texture features

Title Machine learning approach for segmenting glands in colon histology images using local intensity and texture features
Authors Rupali Khatun, Soumick Chatterjee
Abstract Colon Cancer is one of the most common types of cancer. The treatment is planned to depend on the grade or stage of cancer. One of the preconditions for grading of colon cancer is to segment the glandular structures of tissues. Manual segmentation method is very time-consuming, and it leads to life risk for the patients. The principal objective of this project is to assist the pathologist to accurate detection of colon cancer. In this paper, the authors have proposed an algorithm for an automatic segmentation of glands in colon histology using local intensity and texture features. Here the dataset images are cropped into patches with different window sizes and taken the intensity of those patches, and also calculated texture-based features. Random forest classifier has been used to classify this patch into different labels. A multilevel random forest technique in a hierarchical way is proposed. This solution is fast, accurate and it is very much applicable in a clinical setup.
Tasks
Published 2019-05-15
URL https://arxiv.org/abs/1905.08611v1
PDF https://arxiv.org/pdf/1905.08611v1.pdf
PWC https://paperswithcode.com/paper/190508611
Repo
Framework

Iterative Path Reconstruction for Large-Scale Inertial Navigation on Smartphones

Title Iterative Path Reconstruction for Large-Scale Inertial Navigation on Smartphones
Authors Santiago Cortés Reina, Yuxin Hou, Juho Kannala, Arno Solin
Abstract Modern smartphones have all the sensing capabilities required for accurate and robust navigation and tracking. In specific environments some data streams may be absent, less reliable, or flat out wrong. In particular, the GNSS signal can become flawed or silent inside buildings or in streets with tall buildings. In this application paper, we aim to advance the current state-of-the-art in motion estimation using inertial measurements in combination with partial GNSS data on standard smartphones. We show how iterative estimation methods help refine the positioning path estimates in retrospective use cases that can cover both fixed-interval and fixed-lag scenarios. We compare estimation results provided by global iterated Kalman filtering methods to those of a visual-inertial tracking scheme (Apple ARKit). The practical applicability is demonstrated on real-world use cases on empirical data acquired from both smartphones and tablet devices.
Tasks Motion Estimation
Published 2019-06-02
URL https://arxiv.org/abs/1906.00360v1
PDF https://arxiv.org/pdf/1906.00360v1.pdf
PWC https://paperswithcode.com/paper/190600360
Repo
Framework

Pose Guided Attention for Multi-label Fashion Image Classification

Title Pose Guided Attention for Multi-label Fashion Image Classification
Authors Beatriz Quintino Ferreira, João P. Costeira, Ricardo G. Sousa, Liang-Yan Gui, João P. Gomes
Abstract We propose a compact framework with guided attention for multi-label classification in the fashion domain. Our visual semantic attention model (VSAM) is supervised by automatic pose extraction creating a discriminative feature space. VSAM outperforms the state of the art for an in-house dataset and performs on par with previous works on the DeepFashion dataset, even without using any landmark annotations. Additionally, we show that our semantic attention module brings robustness to large quantities of wrong annotations and provides more interpretable results.
Tasks Image Classification, Multi-Label Classification
Published 2019-11-12
URL https://arxiv.org/abs/1911.05024v1
PDF https://arxiv.org/pdf/1911.05024v1.pdf
PWC https://paperswithcode.com/paper/pose-guided-attention-for-multi-label-fashion
Repo
Framework

TITAN: A Spatiotemporal Feature Learning Framework for Traffic Incident Duration Prediction

Title TITAN: A Spatiotemporal Feature Learning Framework for Traffic Incident Duration Prediction
Authors Kaiqun Fu, Taoran Ji, Liang Zhao, Chang-Tien Lu
Abstract Critical incident stages identification and reasonable prediction of traffic incident duration are essential in traffic incident management. In this paper, we propose a traffic incident duration prediction model that simultaneously predicts the impact of the traffic incidents and identifies the critical groups of temporal features via a multi-task learning framework. First, we formulate a sparsity optimization problem that extracts low-level temporal features based on traffic speed readings and then generalizes higher level features as phases of traffic incidents. Second, we propose novel constraints on feature similarity exploiting prior knowledge about the spatial connectivity of the road network to predict the incident duration. The proposed problem is challenging to solve due to the orthogonality constraints, non-convexity objective, and non-smoothness penalties. We develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the proposed formulation. Extensive experiments and comparisons to other models on real-world traffic data and traffic incident records justify the efficacy of our model.
Tasks Multi-Task Learning
Published 2019-11-20
URL https://arxiv.org/abs/1911.08684v1
PDF https://arxiv.org/pdf/1911.08684v1.pdf
PWC https://paperswithcode.com/paper/titan-a-spatiotemporal-feature-learning
Repo
Framework

Image-to-Image Translation with Multi-Path Consistency Regularization

Title Image-to-Image Translation with Multi-Path Consistency Regularization
Authors Jianxin Lin, Yingce Xia, Yijun Wang, Tao Qin, Zhibo Chen
Abstract Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from source domain $\mathcal{D}_s$ to target domain $\mathcal{D}_t$ as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version through $\mathcal{D}_s\to\mathcal{D}_t\to\mathcal{D}_s$ translation. In this work, we introduce a new kind of loss, multi-path consistency loss, which evaluates the differences between direct translation $\mathcal{D}_s\to\mathcal{D}_t$ and indirect translation $\mathcal{D}_s\to\mathcal{D}_a\to\mathcal{D}_t$ with $\mathcal{D}_a$ as an auxiliary domain, to regularize training. For multi-domain translation (at least, three) which focuses on building translation models between any two domains, at each training iteration, we randomly select three domains, set them respectively as the source, auxiliary and target domains, build the multi-path consistency loss and optimize the network. For two-domain translation, we need to introduce an additional auxiliary domain and construct the multi-path consistency loss. We conduct various experiments to demonstrate the effectiveness of our proposed methods, including face-to-face translation, paint-to-photo translation, and de-raining/de-noising translation.
Tasks Face to Face Translation, Image-to-Image Translation
Published 2019-05-29
URL https://arxiv.org/abs/1905.12498v1
PDF https://arxiv.org/pdf/1905.12498v1.pdf
PWC https://paperswithcode.com/paper/image-to-image-translation-with-multi-path
Repo
Framework

Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face

Title Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face
Authors Matteo Fabbri, Guido Borghi, Fabio Lanzi, Roberto Vezzani, Simone Calderara, Rita Cucchiara
Abstract Can faces acquired by low-cost depth sensors be useful to catch some characteristic details of the face? Typically the answer is no. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper, we propose a new \textit{Deterministic Conditional GAN}, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some \textit{Perceptual Probes}, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available due to difficult luminance conditions. Experimental results are very promising and are as far as better than previously proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.
Tasks Face to Face Translation
Published 2019-01-23
URL http://arxiv.org/abs/1901.08101v1
PDF http://arxiv.org/pdf/1901.08101v1.pdf
PWC https://paperswithcode.com/paper/domain-translation-with-conditional-gans-from
Repo
Framework

Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems

Title Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems
Authors Christoph Reisinger, Yufei Zhang
Abstract In this paper, we establish that for a wide class of controlled stochastic differential equations (SDEs) with stiff coefficients, the value functions of corresponding zero-sum games can be represented by a deep artificial neural network (DNN), whose complexity grows at most polynomially in both the dimension of the state equation and the reciprocal of the required accuracy. Such nonlinear stiff systems may arise, for example, from Galerkin approximations of controlled stochastic partial differential equations (SPDEs), or controlled PDEs with uncertain initial conditions and source terms. This implies that DNNs can break the curse of dimensionality in numerical approximations and optimal control of PDEs and SPDEs. The main ingredient of our proof is to construct a suitable discrete-time system to effectively approximate the evolution of the underlying stochastic dynamics. Similar ideas can also be applied to obtain expression rates of DNNs for value functions induced by stiff systems with regime switching coefficients and driven by general L'{e}vy noise.
Tasks
Published 2019-03-15
URL http://arxiv.org/abs/1903.06652v1
PDF http://arxiv.org/pdf/1903.06652v1.pdf
PWC https://paperswithcode.com/paper/rectified-deep-neural-networks-overcome-the
Repo
Framework

Adaptive Exact Learning of Decision Trees from Membership Queries

Title Adaptive Exact Learning of Decision Trees from Membership Queries
Authors Nader H. Bshouty, Catherine A. Haddad-Zaknoon
Abstract In this paper we study the adaptive learnability of decision trees of depth at most $d$ from membership queries. This has many applications in automated scientific discovery such as drugs development and software update problem. Feldman solves the problem in a randomized polynomial time algorithm that asks $\tilde O(2^{2d})\log n$ queries and Kushilevitz-Mansour in a deterministic polynomial time algorithm that asks $ 2^{18d+o(d)}\log n$ queries. We improve the query complexity of both algorithms. We give a randomized polynomial time algorithm that asks $\tilde O(2^{2d}) + 2^{d}\log n$ queries and a deterministic polynomial time algorithm that asks $2^{5.83d}+2^{2d+o(d)}\log n$ queries.
Tasks
Published 2019-01-23
URL http://arxiv.org/abs/1901.07750v1
PDF http://arxiv.org/pdf/1901.07750v1.pdf
PWC https://paperswithcode.com/paper/adaptive-exact-learning-of-decision-trees
Repo
Framework

Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

Title Training without training data: Improving the generalizability of automated medical abbreviation disambiguation
Authors Marta Skreta, Aryan Arbabi, Jixuan Wang, Michael Brudno
Abstract Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model’s ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model’s representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14% on the CASI dataset and 4% on i2b2.
Tasks Data Augmentation
Published 2019-12-12
URL https://arxiv.org/abs/1912.06174v1
PDF https://arxiv.org/pdf/1912.06174v1.pdf
PWC https://paperswithcode.com/paper/training-without-training-data-improving-the
Repo
Framework

Intelligence via ultrafilters: structural properties of some intelligence comparators of deterministic Legg-Hutter agents

Title Intelligence via ultrafilters: structural properties of some intelligence comparators of deterministic Legg-Hutter agents
Authors Samuel Allen Alexander
Abstract Legg and Hutter, as well as subsequent authors, considered intelligent agents through the lens of interaction with reward-giving environments, attempting to assign numeric intelligence measures to such agents, with the guiding principle that a more intelligent agent should gain higher rewards from environments in some aggregate sense. In this paper, we consider a related question: rather than measure numeric intelligence of one Legg- Hutter agent, how can we compare the relative intelligence of two Legg-Hutter agents? We propose an elegant answer based on the following insight: we can view Legg-Hutter agents as candidates in an election, whose voters are environments, letting each environment vote (via its rewards) which agent (if either) is more intelligent. This leads to an abstract family of comparators simple enough that we can prove some structural theorems about them. It is an open question whether these structural theorems apply to more practical intelligence measures.
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.09721v2
PDF https://arxiv.org/pdf/1910.09721v2.pdf
PWC https://paperswithcode.com/paper/intelligence-via-ultrafilters-structural
Repo
Framework

BowTie - A deep learning feedforward neural network for sentiment analysis

Title BowTie - A deep learning feedforward neural network for sentiment analysis
Authors Apostol Vassilev
Abstract How to model and encode the semantics of human-written text and select the type of neural network to process it are not settled issues in sentiment analysis. Accuracy and transferability are critical issues in machine learning in general. These properties are closely related to the loss estimates for the trained model. I present a computationally-efficient and accurate feedforward neural network for sentiment prediction capable of maintaining low losses. When coupled with an effective semantics model of the text, it provides highly accurate models with low losses. Experimental results on representative benchmark datasets and comparisons to other methods show the advantages of the new approach.
Tasks Sentiment Analysis
Published 2019-04-18
URL http://arxiv.org/abs/1904.12624v1
PDF http://arxiv.org/pdf/1904.12624v1.pdf
PWC https://paperswithcode.com/paper/190412624
Repo
Framework

Privacy Preserving Gaze Estimation using Synthetic Images via a Randomized Encoding Based Framework

Title Privacy Preserving Gaze Estimation using Synthetic Images via a Randomized Encoding Based Framework
Authors Efe Bozkir, Ali Burak Ünal, Mete Akgün, Enkelejda Kasneci, Nico Pfeifer
Abstract Eye tracking is handled as one of the key technologies for applications which assess and evaluate human attention, behavior and biometrics, especially using gaze, pupillary and blink behaviors. One of the main challenges with regard to the social acceptance of eye-tracking technology is however the preserving of sensitive and personal information. To tackle this challenge, we employed a privacy-preserving framework based on randomized encoding to train a Support Vector Regression model on synthetic eye images privately to estimate human gaze. During the computation, none of the parties learns about the data or the result that any other party has. Furthermore, the party that trains the model cannot reconstruct pupil, blink or visual scanpath. The experimental results showed that our privacy preserving framework is also capable of working in real-time, as accurate as a non-private version of it and could be extended to other eye-tracking related problems.
Tasks Eye Tracking, Gaze Estimation
Published 2019-11-06
URL https://arxiv.org/abs/1911.07936v2
PDF https://arxiv.org/pdf/1911.07936v2.pdf
PWC https://paperswithcode.com/paper/privacy-preserving-gaze-estimation-using
Repo
Framework

Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps

Title Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps
Authors Yat Long Lo, Sina Ghiassian
Abstract Using neural networks in the reinforcement learning (RL) framework has achieved notable successes. Yet, neural networks tend to forget what they learned in the past, especially when they learn online and fully incrementally, a setting in which the weights are updated after each sample is received and the sample is then discarded. Under this setting, an update can lead to overly global generalization by changing too many weights. The global generalization interferes with what was previously learned and deteriorates performance, a phenomenon known as catastrophic interference. Many previous works use mechanisms such as experience replay (ER) buffers to mitigate interference by performing minibatch updates, ensuring the data distribution is approximately independent-and-identically-distributed (i.i.d.). But using ER would become infeasible in terms of memory as problem complexity increases. Thus, it is crucial to look for more memory-efficient alternatives. Interference can be averted if we replace global updates with more local ones, so only weights responsible for the observed data sample are updated. In this work, we propose the use of dynamic self-organizing map (DSOM) with neural networks to induce such locality in the updates without ER buffers. Our method learns a DSOM to produce a mask to reweigh each hidden unit’s output, modulating its degree of use. It prevents interference by replacing global updates with local ones, conditioned on the agent’s state. We validate our method on standard RL benchmarks including Mountain Car and Lunar Lander, where existing methods often fail to learn without ER. Empirically, we show that our online and fully incremental method is on par with and in some cases, better than state-of-the-art in terms of final performance and learning speed. We provide visualizations and quantitative measures to show that our method indeed mitigates interference.
Tasks
Published 2019-10-29
URL https://arxiv.org/abs/1910.13213v1
PDF https://arxiv.org/pdf/1910.13213v1.pdf
PWC https://paperswithcode.com/paper/191013213
Repo
Framework
comments powered by Disqus