January 27, 2020

3262 words 16 mins read

Paper Group ANR 1200

Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks. Enhanced Human-Machine Interaction by Combining Proximity Sensing with Global Perception. Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories. Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighbori …

Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks


Title	Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks
Authors	Wei Shen, Fei Li, Rujie Liu
Abstract	Training convolutional neural networks for image classification tasks usually causes information loss. Although most of the time the information lost is redundant with respect to the target task, there are still cases where discriminative information is also discarded. For example, if the samples that belong to the same category have multiple correlated features, the model may only learn a subset of the features and ignore the rest. This may not be a problem unless the classification in the test set highly depends on the ignored features. We argue that the discard of the correlated discriminative information is partially caused by the fact that the minimization of the classification loss doesn’t ensure to learn the overall discriminative information but only the most discriminative information. To address this problem, we propose an information flow maximization (IFM) loss as a regularization term to find the discriminative correlated features. With less information loss the classifier can make predictions based on more informative features. We validate our method on the shiftedMNIST dataset and show the effectiveness of IFM loss in learning representative and discriminative features.
Tasks	Image Classification
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00348v1
PDF	https://arxiv.org/pdf/1907.00348v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-find-correlated-features-by
Repo
Framework

Enhanced Human-Machine Interaction by Combining Proximity Sensing with Global Perception


Title	Enhanced Human-Machine Interaction by Combining Proximity Sensing with Global Perception
Authors	Christoph Heindl, Markus Ikeda, Gernot Stübl, Andreas Pichler, Josef Scharinger
Abstract	The raise of collaborative robotics has led to wide range of sensor technologies to detect human-machine interactions: at short distances, proximity sensors detect nontactile gestures virtually occlusion-free, while at medium distances, active depth sensors are frequently used to infer human intentions. We describe an optical system for large workspaces to capture human pose based on a single panoramic color camera. Despite the two-dimensional input, our system is able to predict metric 3D pose information over larger field of views than would be possible with active depth measurement cameras. We merge posture context with proximity perception to reduce occlusions and improve accuracy at long distances. We demonstrate the capabilities of our system in two use cases involving multiple humans and robots.
Tasks
Published	2019-10-06
URL	https://arxiv.org/abs/1910.02445v3
PDF	https://arxiv.org/pdf/1910.02445v3.pdf
PWC	https://paperswithcode.com/paper/enhanced-human-machine-interaction-by
Repo
Framework

Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories


Title	Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories
Authors	Zhen Mei, Kui Cai, Xingwei Zhong
Abstract	The memory physics induced unknown offset of the channel is a critical and difficult issue to be tackled for many non-volatile memories (NVMs). In this paper, we first propose novel neural network (NN) detectors by using the multilayer perceptron (MLP) network and the recurrent neural network (RNN), which can effectively tackle the unknown offset of the channel. However, compared with the conventional threshold detector, the NN detectors will incur a significant delay of the read latency and more power consumption. Therefore, we further propose a novel dynamic threshold detector (DTD), whose detection threshold can be derived based on the outputs of the proposed NN detectors. In this way, the NN-based detection only needs to be invoked when the error correction code (ECC) decoder fails, or periodically when the system is in the idle state. Thereafter, the threshold detector will still be adopted by using the adjusted detection threshold derived base on the outputs of the NN detector, until a further adjustment of the detection threshold is needed. Simulation results demonstrate that the proposed DTD based on the RNN detection can achieve the error performance of the optimum detector, without the prior knowledge of the channel.
Tasks
Published	2019-02-17
URL	http://arxiv.org/abs/1902.06289v1
PDF	http://arxiv.org/pdf/1902.06289v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-based-dynamic-threshold
Repo
Framework

Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization


Title	Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization
Authors	Jingyuan Wang, Junjie Wu, Ze Wang, Fei Gao, Zhang Xiong
Abstract	Recent years have witnessed the world-wide emergence of mega-metropolises with incredibly huge populations. Understanding residents mobility patterns, or urban dynamics, thus becomes crucial for building modern smart cities. In this paper, we propose a Neighbor-Regularized and context-aware Non-negative Tensor Factorization model (NR-cNTF) to discover interpretable urban dynamics from urban heterogeneous data. Different from many existing studies concerned with prediction tasks via tensor completion, NR-cNTF focuses on gaining urban managerial insights from spatial, temporal, and spatio-temporal patterns. This is enabled by high-quality Tucker factorizations regularized by both POI-based urban contexts and geographically neighboring relations. NR-cNTF is also capable of unveiling long-term evolutions of urban dynamics via a pipeline initialization approach. We apply NR-cNTF to a real-life data set containing rich taxi GPS trajectories and POI records of Beijing. The results indicate: 1) NR-cNTF accurately captures four kinds of city rhythms and seventeen spatial communities; 2) the rapid development of Beijing, epitomized by the CBD area, indeed intensifies the job-housing imbalance; 3) the southern areas with recent government investments have shown more healthy development tendency. Finally, NR-cNTF is compared with some baselines on traffic prediction, which further justifies the importance of urban contexts awareness and neighboring regulations.
Tasks	Traffic Prediction
Published	2019-04-25
URL	https://arxiv.org/abs/1905.00702v2
PDF	https://arxiv.org/pdf/1905.00702v2.pdf
PWC	https://paperswithcode.com/paper/190500702
Repo
Framework

Online Hierarchical Clustering Approximations


Title	Online Hierarchical Clustering Approximations
Authors	Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar
Abstract	Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this paper, we consider hierarchical clustering in the online setting, where points arrive one at a time. We propose two algorithms that seek to optimize the Moseley and Wang (MW) revenue function, a variant of the Dasgupta cost. These algorithms offer different tradeoffs between efficiency and MW revenue performance. The first algorithm, OTD, is a highly efficient Online Top Down algorithm which provably achieves a 1/3-approximation to the MW revenue under a data separation assumption. The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice. We show that OHAC approximates offline HAC by leveraging a novel split-merge procedure. We empirically show that OTD and OHAC offer significant efficiency and cluster quality gains respectively over baselines.
Tasks
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09667v1
PDF	https://arxiv.org/pdf/1909.09667v1.pdf
PWC	https://paperswithcode.com/paper/190909667
Repo
Framework


Title	Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot Topics
Authors	Meisam Navaki Arefi, Rajkumar Pandi, Michael Carl Tschantz, Jedidiah R. Crandall, King-wa Fu, Dahlia Qiu Shi, Miao Sha
Abstract	Widespread Chinese social media applications such as Weibo are widely known for monitoring and deleting posts to conform to Chinese government requirements. In this paper, we focus on analyzing a dataset of censored and uncensored posts in Weibo. Despite previous work that only considers text content of posts, we take a multi-modal approach that takes into account both text and image content. We categorize this dataset into 14 categories that have the potential to be censored on Weibo, and seek to quantify censorship by topic. Specifically, we investigate how different factors interact to affect censorship. We also investigate how consistently and how quickly different topics are censored. To this end, we have assembled an image dataset with 18,966 images, as well as a text dataset with 994 posts from 14 categories. We then utilized deep learning, CNN localization, and NLP techniques to analyze the target dataset and extract categories, for further analysis to better understand censorship mechanisms in Weibo. We found that sentiment is the only indicator of censorship that is consistent across the variety of topics we identified. Our finding matches with recently leaked logs from Sina Weibo. We also discovered that most categories like those related to anti-government actions (e.g. protest) or categories related to politicians (e.g. Xi Jinping) are often censored, whereas some categories such as crisis-related categories (e.g. rainstorm) are less frequently censored. We also found that censored posts across all categories are deleted in three hours on average.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10861v2
PDF	https://arxiv.org/pdf/1906.10861v2.pdf
PWC	https://paperswithcode.com/paper/assessing-post-deletion-in-sina-weibo-multi
Repo
Framework

Evolution-based Fine-tuning of CNNs for Prostate Cancer Detection


Title	Evolution-based Fine-tuning of CNNs for Prostate Cancer Detection
Authors	Khashayar Namdar, Isha Gujrathi, Masoom A. Haider, Farzad Khalvati
Abstract	Convolutional Neural Networks (CNNs) have been used for automated detection of prostate cancer where Area Under Receiver Operating Characteristic (ROC) curve (AUC) is usually used as the performance metric. Given that AUC is not differentiable, common practice is to train the CNN using a loss functions based on other performance metrics such as cross entropy and monitoring AUC to select the best model. In this work, we propose to fine-tune a trained CNN for prostate cancer detection using a Genetic Algorithm to achieve a higher AUC. Our dataset contained 6-channel Diffusion-Weighted MRI slices of prostate. On a cohort of 2,955 training, 1,417 validation, and 1,334 test slices, we reached test AUC of 0.773; a 9.3% improvement compared to the base CNN model.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01477v1
PDF	https://arxiv.org/pdf/1911.01477v1.pdf
PWC	https://paperswithcode.com/paper/evolution-based-fine-tuning-of-cnns-for
Repo
Framework

Recent Advances in Imitation Learning from Observation


Title	Recent Advances in Imitation Learning from Observation
Authors	Faraz Torabi, Garrett Warnell, Peter Stone
Abstract	Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task. Conventionally, the imitator has access to both state and action information generated by an expert performing the task (e.g., the expert may provide a kinesthetic demonstration of object placement using a robotic arm). However, requiring the action information prevents imitation learning from a large number of existing valuable learning resources such as online videos of humans performing tasks. To overcome this issue, the specific problem of imitation from observation (IfO) has recently garnered a great deal of attention, in which the imitator only has access to the state information (e.g., video frames) generated by the expert. In this paper, we provide a literature review of methods developed for IfO, and then point out some open research problems and potential future work.
Tasks	Imitation Learning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13566v2
PDF	https://arxiv.org/pdf/1905.13566v2.pdf
PWC	https://paperswithcode.com/paper/recent-advances-in-imitation-learning-from
Repo
Framework

Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features


Title	Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
Authors	Jennifer Williams, Joanna Rownicka
Abstract	We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or “spoofed” (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10324v1
PDF	https://arxiv.org/pdf/1909.10324v1.pdf
PWC	https://paperswithcode.com/paper/190910324
Repo
Framework

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings


Title	From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings
Authors	Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Lin-shan Lee
Abstract	Producing a large amount of annotated speech data for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced. However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and “generalize” such knowledge to other words without hearing a large amount of data. We initiate some preliminary work in this direction. Audio Word2Vec is used to learn the phonetic structures from spoken words (signal segments), while another autoencoder is used to learn the phonetic structures from text words. The relationships among the above two can be learned jointly, or separately after the above two are well trained. This relationship can be used in speech recognition with very low resource. In the initial experiments on the TIMIT dataset, only 2.1 hours of speech data (in which 2500 spoken words were annotated and the rest unlabeled) gave a word error rate of 44.6%, and this number can be reduced to 34.2% if 4.1 hr of speech data (in which 20000 spoken words were annotated) were given. These results are not satisfactory, but a good starting point.
Tasks	Speech Recognition
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05078v1
PDF	http://arxiv.org/pdf/1904.05078v1.pdf
PWC	https://paperswithcode.com/paper/from-semi-supervised-to-almost-unsupervised
Repo
Framework

Deep Learning-Based Classification Of the Defective Pistachios Via Deep Autoencoder Neural Networks


Title	Deep Learning-Based Classification Of the Defective Pistachios Via Deep Autoencoder Neural Networks
Authors	Mehdi Abbaszadeh, Aliakbar Rahimifard, Mohammadali Eftekhari, Hossein Ghayoumi Zadeh, Ali Fayazi, Ali Dini, Mostafa Danaeian
Abstract	Pistachio nut is mainly consumed as raw, salted or roasted because of its high nutritional properties and favorable taste. Pistachio nuts with shell and kernel defects, besides not being acceptable for a consumer, are also prone to insects damage, mold decay, and aflatoxin contamination. In this research, a deep learning-based imaging algorithm was developed to improve the sorting of nuts with shell and kernel defects that indicate the risk of aflatoxin contamination, such as dark stains, oily stains, adhering hull, fungal decay and Aspergillus molds. This paper presents an unsupervised learning method to classify defective and unpleasant pistachios based on deep Auto-encoder neural networks. The testing of the designed neural network on a validation dataset showed that nuts having dark stain, oily stain or adhering hull with an accuracy of 80.3% can be distinguished from normal nuts. Due to the limited memory available in the HPC of university, the results are reasonable and justifiable.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.11878v1
PDF	https://arxiv.org/pdf/1906.11878v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-classification-of-the
Repo
Framework

Emotionally-Aware Chatbots: A Survey


Title	Emotionally-Aware Chatbots: A Survey
Authors	Endang Wahyu Pamungkas
Abstract	Textual conversational agent or chatbots’ development gather tremendous traction from both academia and industries in recent years. Nowadays, chatbots are widely used as an agent to communicate with a human in some services such as booking assistant, customer service, and also a personal partner. The biggest challenge in building chatbot is to build a humanizing machine to improve user engagement. Some studies show that emotion is an important aspect to humanize machine, including chatbot. In this paper, we will provide a systematic review of approaches in building an emotionally-aware chatbot (EAC). As far as our knowledge, there is still no work focusing on this area. We propose three research question regarding EAC studies. We start with the history and evolution of EAC, then several approaches to build EAC by previous studies, and some available resources in building EAC. Based on our investigation, we found that in the early development, EAC exploits a simple rule-based approach while now most of EAC use neural-based approach. We also notice that most of EAC contain emotion classifier in their architecture, which utilize several available affective resources. We also predict that the development of EAC will continue to gain more and more attention from scholars, noted by some recent studies propose new datasets for building EAC in various languages.
Tasks	Chatbot
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09774v1
PDF	https://arxiv.org/pdf/1906.09774v1.pdf
PWC	https://paperswithcode.com/paper/emotionally-aware-chatbots-a-survey
Repo
Framework

Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired


Title	Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired
Authors	Basit Ayantunde, Jane Odum, Fadlullah Olawumi, Joshua Olalekan
Abstract	The physically impaired commonly have difficulties performing simple routine tasks without relying on other individuals who are not always readily available and thus make them strive for independence. While their impaired abilities can in many cases be augmented (to certain degrees) with the use of assistive technologies, there has been little attention to their applications in embodied AI with assistive technologies. This paper presents the modular framework, architecture, and design of the mid-fidelity prototype of MARVIN: an artificial-intelligence-powered robotic assistant designed to help the physically impaired in performing simple day-to-day tasks. The prototype features a trivial locomotion unit and also utilizes various state-of-the-art neural network architectures for specific modular components of the system. These components perform specialized functions, such as automatic speech recognition, object detection, natural language understanding, speech synthesis, etc. We also discuss the constraints, challenges encountered, potential future applications and improvements towards succeeding prototypes.
Tasks	Object Detection, Speech Recognition, Speech Synthesis
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12482v1
PDF	https://arxiv.org/pdf/1911.12482v1.pdf
PWC	https://paperswithcode.com/paper/designing-the-next-generation-of-intelligent
Repo
Framework

Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees


Title	Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees
Authors	Xavier Renard, Nicolas Woloszko, Jonathan Aigrain, Marcin Detyniecki
Abstract	Interpretable surrogates of black-box predictors trained on high-dimensional tabular datasets can struggle to generate comprehensible explanations in the presence of correlated variables. We propose a model-agnostic interpretable surrogate that provides global and local explanations of black-box classifiers to address this issue. We introduce the idea of concepts as intuitive groupings of variables that are either defined by a domain expert or automatically discovered using correlation coefficients. Concepts are embedded in a surrogate decision tree to enhance its comprehensibility. First experiments on FRED-MD, a macroeconomic database with 134 variables, show improvement in human-interpretability while accuracy and fidelity of the surrogate model are preserved.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01297v1
PDF	https://arxiv.org/pdf/1906.01297v1.pdf
PWC	https://paperswithcode.com/paper/concept-tree-high-level-representation-of
Repo
Framework

Exploiting Persona Information for Diverse Generation of Conversational Responses


Title	Exploiting Persona Information for Diverse Generation of Conversational Responses
Authors	Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, Ting Liu
Abstract	In human conversations, due to their personalities in mind, people can easily carry out and maintain the conversations. Giving conversational context with persona information to a chatbot, how to exploit the information to generate diverse and sustainable conversations is still a non-trivial task. Previous work on persona-based conversational models successfully make use of predefined persona information and have shown great promise in delivering more realistic responses. And they all learn with the assumption that given a source input, there is only one target response. However, in human conversations, there are massive appropriate responses to a given input message. In this paper, we propose a memory-augmented architecture to exploit persona information from context and incorporate a conditional variational autoencoder model together to generate diverse and sustainable conversations. We evaluate the proposed model on a benchmark persona-chat dataset. Both automatic and human evaluations show that our model can deliver more diverse and more engaging persona-based responses than baseline approaches.
Tasks	Chatbot
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12188v1
PDF	https://arxiv.org/pdf/1905.12188v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-persona-information-for-diverse
Repo
Framework