Paper Group ANR 1200
Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks. Enhanced Human-Machine Interaction by Combining Proximity Sensing with Global Perception. Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories. Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighbori …
Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks
Title | Learning to Find Correlated Features by Maximizing Information Flow in Convolutional Neural Networks |
Authors | Wei Shen, Fei Li, Rujie Liu |
Abstract | Training convolutional neural networks for image classification tasks usually causes information loss. Although most of the time the information lost is redundant with respect to the target task, there are still cases where discriminative information is also discarded. For example, if the samples that belong to the same category have multiple correlated features, the model may only learn a subset of the features and ignore the rest. This may not be a problem unless the classification in the test set highly depends on the ignored features. We argue that the discard of the correlated discriminative information is partially caused by the fact that the minimization of the classification loss doesn’t ensure to learn the overall discriminative information but only the most discriminative information. To address this problem, we propose an information flow maximization (IFM) loss as a regularization term to find the discriminative correlated features. With less information loss the classifier can make predictions based on more informative features. We validate our method on the shiftedMNIST dataset and show the effectiveness of IFM loss in learning representative and discriminative features. |
Tasks | Image Classification |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00348v1 |
https://arxiv.org/pdf/1907.00348v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-find-correlated-features-by |
Repo | |
Framework | |
Enhanced Human-Machine Interaction by Combining Proximity Sensing with Global Perception
Title | Enhanced Human-Machine Interaction by Combining Proximity Sensing with Global Perception |
Authors | Christoph Heindl, Markus Ikeda, Gernot Stübl, Andreas Pichler, Josef Scharinger |
Abstract | The raise of collaborative robotics has led to wide range of sensor technologies to detect human-machine interactions: at short distances, proximity sensors detect nontactile gestures virtually occlusion-free, while at medium distances, active depth sensors are frequently used to infer human intentions. We describe an optical system for large workspaces to capture human pose based on a single panoramic color camera. Despite the two-dimensional input, our system is able to predict metric 3D pose information over larger field of views than would be possible with active depth measurement cameras. We merge posture context with proximity perception to reduce occlusions and improve accuracy at long distances. We demonstrate the capabilities of our system in two use cases involving multiple humans and robots. |
Tasks | |
Published | 2019-10-06 |
URL | https://arxiv.org/abs/1910.02445v3 |
https://arxiv.org/pdf/1910.02445v3.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-human-machine-interaction-by |
Repo | |
Framework | |
Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories
Title | Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories |
Authors | Zhen Mei, Kui Cai, Xingwei Zhong |
Abstract | The memory physics induced unknown offset of the channel is a critical and difficult issue to be tackled for many non-volatile memories (NVMs). In this paper, we first propose novel neural network (NN) detectors by using the multilayer perceptron (MLP) network and the recurrent neural network (RNN), which can effectively tackle the unknown offset of the channel. However, compared with the conventional threshold detector, the NN detectors will incur a significant delay of the read latency and more power consumption. Therefore, we further propose a novel dynamic threshold detector (DTD), whose detection threshold can be derived based on the outputs of the proposed NN detectors. In this way, the NN-based detection only needs to be invoked when the error correction code (ECC) decoder fails, or periodically when the system is in the idle state. Thereafter, the threshold detector will still be adopted by using the adjusted detection threshold derived base on the outputs of the NN detector, until a further adjustment of the detection threshold is needed. Simulation results demonstrate that the proposed DTD based on the RNN detection can achieve the error performance of the optimum detector, without the prior knowledge of the channel. |
Tasks | |
Published | 2019-02-17 |
URL | http://arxiv.org/abs/1902.06289v1 |
http://arxiv.org/pdf/1902.06289v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-based-dynamic-threshold |
Repo | |
Framework | |
Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization
Title | Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization |
Authors | Jingyuan Wang, Junjie Wu, Ze Wang, Fei Gao, Zhang Xiong |
Abstract | Recent years have witnessed the world-wide emergence of mega-metropolises with incredibly huge populations. Understanding residents mobility patterns, or urban dynamics, thus becomes crucial for building modern smart cities. In this paper, we propose a Neighbor-Regularized and context-aware Non-negative Tensor Factorization model (NR-cNTF) to discover interpretable urban dynamics from urban heterogeneous data. Different from many existing studies concerned with prediction tasks via tensor completion, NR-cNTF focuses on gaining urban managerial insights from spatial, temporal, and spatio-temporal patterns. This is enabled by high-quality Tucker factorizations regularized by both POI-based urban contexts and geographically neighboring relations. NR-cNTF is also capable of unveiling long-term evolutions of urban dynamics via a pipeline initialization approach. We apply NR-cNTF to a real-life data set containing rich taxi GPS trajectories and POI records of Beijing. The results indicate: 1) NR-cNTF accurately captures four kinds of city rhythms and seventeen spatial communities; 2) the rapid development of Beijing, epitomized by the CBD area, indeed intensifies the job-housing imbalance; 3) the southern areas with recent government investments have shown more healthy development tendency. Finally, NR-cNTF is compared with some baselines on traffic prediction, which further justifies the importance of urban contexts awareness and neighboring regulations. |
Tasks | Traffic Prediction |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1905.00702v2 |
https://arxiv.org/pdf/1905.00702v2.pdf | |
PWC | https://paperswithcode.com/paper/190500702 |
Repo | |
Framework | |
Online Hierarchical Clustering Approximations
Title | Online Hierarchical Clustering Approximations |
Authors | Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar |
Abstract | Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this paper, we consider hierarchical clustering in the online setting, where points arrive one at a time. We propose two algorithms that seek to optimize the Moseley and Wang (MW) revenue function, a variant of the Dasgupta cost. These algorithms offer different tradeoffs between efficiency and MW revenue performance. The first algorithm, OTD, is a highly efficient Online Top Down algorithm which provably achieves a 1/3-approximation to the MW revenue under a data separation assumption. The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice. We show that OHAC approximates offline HAC by leveraging a novel split-merge procedure. We empirically show that OTD and OHAC offer significant efficiency and cluster quality gains respectively over baselines. |
Tasks | |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09667v1 |
https://arxiv.org/pdf/1909.09667v1.pdf | |
PWC | https://paperswithcode.com/paper/190909667 |
Repo | |
Framework | |
Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot Topics
Title | Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot Topics |
Authors | Meisam Navaki Arefi, Rajkumar Pandi, Michael Carl Tschantz, Jedidiah R. Crandall, King-wa Fu, Dahlia Qiu Shi, Miao Sha |
Abstract | Widespread Chinese social media applications such as Weibo are widely known for monitoring and deleting posts to conform to Chinese government requirements. In this paper, we focus on analyzing a dataset of censored and uncensored posts in Weibo. Despite previous work that only considers text content of posts, we take a multi-modal approach that takes into account both text and image content. We categorize this dataset into 14 categories that have the potential to be censored on Weibo, and seek to quantify censorship by topic. Specifically, we investigate how different factors interact to affect censorship. We also investigate how consistently and how quickly different topics are censored. To this end, we have assembled an image dataset with 18,966 images, as well as a text dataset with 994 posts from 14 categories. We then utilized deep learning, CNN localization, and NLP techniques to analyze the target dataset and extract categories, for further analysis to better understand censorship mechanisms in Weibo. We found that sentiment is the only indicator of censorship that is consistent across the variety of topics we identified. Our finding matches with recently leaked logs from Sina Weibo. We also discovered that most categories like those related to anti-government actions (e.g. protest) or categories related to politicians (e.g. Xi Jinping) are often censored, whereas some categories such as crisis-related categories (e.g. rainstorm) are less frequently censored. We also found that censored posts across all categories are deleted in three hours on average. |
Tasks | |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.10861v2 |
https://arxiv.org/pdf/1906.10861v2.pdf | |
PWC | https://paperswithcode.com/paper/assessing-post-deletion-in-sina-weibo-multi |
Repo | |
Framework | |
Evolution-based Fine-tuning of CNNs for Prostate Cancer Detection
Title | Evolution-based Fine-tuning of CNNs for Prostate Cancer Detection |
Authors | Khashayar Namdar, Isha Gujrathi, Masoom A. Haider, Farzad Khalvati |
Abstract | Convolutional Neural Networks (CNNs) have been used for automated detection of prostate cancer where Area Under Receiver Operating Characteristic (ROC) curve (AUC) is usually used as the performance metric. Given that AUC is not differentiable, common practice is to train the CNN using a loss functions based on other performance metrics such as cross entropy and monitoring AUC to select the best model. In this work, we propose to fine-tune a trained CNN for prostate cancer detection using a Genetic Algorithm to achieve a higher AUC. Our dataset contained 6-channel Diffusion-Weighted MRI slices of prostate. On a cohort of 2,955 training, 1,417 validation, and 1,334 test slices, we reached test AUC of 0.773; a 9.3% improvement compared to the base CNN model. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01477v1 |
https://arxiv.org/pdf/1911.01477v1.pdf | |
PWC | https://paperswithcode.com/paper/evolution-based-fine-tuning-of-cnns-for |
Repo | |
Framework | |
Recent Advances in Imitation Learning from Observation
Title | Recent Advances in Imitation Learning from Observation |
Authors | Faraz Torabi, Garrett Warnell, Peter Stone |
Abstract | Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task. Conventionally, the imitator has access to both state and action information generated by an expert performing the task (e.g., the expert may provide a kinesthetic demonstration of object placement using a robotic arm). However, requiring the action information prevents imitation learning from a large number of existing valuable learning resources such as online videos of humans performing tasks. To overcome this issue, the specific problem of imitation from observation (IfO) has recently garnered a great deal of attention, in which the imitator only has access to the state information (e.g., video frames) generated by the expert. In this paper, we provide a literature review of methods developed for IfO, and then point out some open research problems and potential future work. |
Tasks | Imitation Learning |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13566v2 |
https://arxiv.org/pdf/1905.13566v2.pdf | |
PWC | https://paperswithcode.com/paper/recent-advances-in-imitation-learning-from |
Repo | |
Framework | |
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
Title | Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features |
Authors | Jennifer Williams, Joanna Rownicka |
Abstract | We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or “spoofed” (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10324v1 |
https://arxiv.org/pdf/1909.10324v1.pdf | |
PWC | https://paperswithcode.com/paper/190910324 |
Repo | |
Framework | |
From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings
Title | From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings |
Authors | Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Lin-shan Lee |
Abstract | Producing a large amount of annotated speech data for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced. However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and “generalize” such knowledge to other words without hearing a large amount of data. We initiate some preliminary work in this direction. Audio Word2Vec is used to learn the phonetic structures from spoken words (signal segments), while another autoencoder is used to learn the phonetic structures from text words. The relationships among the above two can be learned jointly, or separately after the above two are well trained. This relationship can be used in speech recognition with very low resource. In the initial experiments on the TIMIT dataset, only 2.1 hours of speech data (in which 2500 spoken words were annotated and the rest unlabeled) gave a word error rate of 44.6%, and this number can be reduced to 34.2% if 4.1 hr of speech data (in which 20000 spoken words were annotated) were given. These results are not satisfactory, but a good starting point. |
Tasks | Speech Recognition |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05078v1 |
http://arxiv.org/pdf/1904.05078v1.pdf | |
PWC | https://paperswithcode.com/paper/from-semi-supervised-to-almost-unsupervised |
Repo | |
Framework | |
Deep Learning-Based Classification Of the Defective Pistachios Via Deep Autoencoder Neural Networks
Title | Deep Learning-Based Classification Of the Defective Pistachios Via Deep Autoencoder Neural Networks |
Authors | Mehdi Abbaszadeh, Aliakbar Rahimifard, Mohammadali Eftekhari, Hossein Ghayoumi Zadeh, Ali Fayazi, Ali Dini, Mostafa Danaeian |
Abstract | Pistachio nut is mainly consumed as raw, salted or roasted because of its high nutritional properties and favorable taste. Pistachio nuts with shell and kernel defects, besides not being acceptable for a consumer, are also prone to insects damage, mold decay, and aflatoxin contamination. In this research, a deep learning-based imaging algorithm was developed to improve the sorting of nuts with shell and kernel defects that indicate the risk of aflatoxin contamination, such as dark stains, oily stains, adhering hull, fungal decay and Aspergillus molds. This paper presents an unsupervised learning method to classify defective and unpleasant pistachios based on deep Auto-encoder neural networks. The testing of the designed neural network on a validation dataset showed that nuts having dark stain, oily stain or adhering hull with an accuracy of 80.3% can be distinguished from normal nuts. Due to the limited memory available in the HPC of university, the results are reasonable and justifiable. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.11878v1 |
https://arxiv.org/pdf/1906.11878v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-classification-of-the |
Repo | |
Framework | |
Emotionally-Aware Chatbots: A Survey
Title | Emotionally-Aware Chatbots: A Survey |
Authors | Endang Wahyu Pamungkas |
Abstract | Textual conversational agent or chatbots’ development gather tremendous traction from both academia and industries in recent years. Nowadays, chatbots are widely used as an agent to communicate with a human in some services such as booking assistant, customer service, and also a personal partner. The biggest challenge in building chatbot is to build a humanizing machine to improve user engagement. Some studies show that emotion is an important aspect to humanize machine, including chatbot. In this paper, we will provide a systematic review of approaches in building an emotionally-aware chatbot (EAC). As far as our knowledge, there is still no work focusing on this area. We propose three research question regarding EAC studies. We start with the history and evolution of EAC, then several approaches to build EAC by previous studies, and some available resources in building EAC. Based on our investigation, we found that in the early development, EAC exploits a simple rule-based approach while now most of EAC use neural-based approach. We also notice that most of EAC contain emotion classifier in their architecture, which utilize several available affective resources. We also predict that the development of EAC will continue to gain more and more attention from scholars, noted by some recent studies propose new datasets for building EAC in various languages. |
Tasks | Chatbot |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09774v1 |
https://arxiv.org/pdf/1906.09774v1.pdf | |
PWC | https://paperswithcode.com/paper/emotionally-aware-chatbots-a-survey |
Repo | |
Framework | |
Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired
Title | Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired |
Authors | Basit Ayantunde, Jane Odum, Fadlullah Olawumi, Joshua Olalekan |
Abstract | The physically impaired commonly have difficulties performing simple routine tasks without relying on other individuals who are not always readily available and thus make them strive for independence. While their impaired abilities can in many cases be augmented (to certain degrees) with the use of assistive technologies, there has been little attention to their applications in embodied AI with assistive technologies. This paper presents the modular framework, architecture, and design of the mid-fidelity prototype of MARVIN: an artificial-intelligence-powered robotic assistant designed to help the physically impaired in performing simple day-to-day tasks. The prototype features a trivial locomotion unit and also utilizes various state-of-the-art neural network architectures for specific modular components of the system. These components perform specialized functions, such as automatic speech recognition, object detection, natural language understanding, speech synthesis, etc. We also discuss the constraints, challenges encountered, potential future applications and improvements towards succeeding prototypes. |
Tasks | Object Detection, Speech Recognition, Speech Synthesis |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12482v1 |
https://arxiv.org/pdf/1911.12482v1.pdf | |
PWC | https://paperswithcode.com/paper/designing-the-next-generation-of-intelligent |
Repo | |
Framework | |
Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees
Title | Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees |
Authors | Xavier Renard, Nicolas Woloszko, Jonathan Aigrain, Marcin Detyniecki |
Abstract | Interpretable surrogates of black-box predictors trained on high-dimensional tabular datasets can struggle to generate comprehensible explanations in the presence of correlated variables. We propose a model-agnostic interpretable surrogate that provides global and local explanations of black-box classifiers to address this issue. We introduce the idea of concepts as intuitive groupings of variables that are either defined by a domain expert or automatically discovered using correlation coefficients. Concepts are embedded in a surrogate decision tree to enhance its comprehensibility. First experiments on FRED-MD, a macroeconomic database with 134 variables, show improvement in human-interpretability while accuracy and fidelity of the surrogate model are preserved. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01297v1 |
https://arxiv.org/pdf/1906.01297v1.pdf | |
PWC | https://paperswithcode.com/paper/concept-tree-high-level-representation-of |
Repo | |
Framework | |
Exploiting Persona Information for Diverse Generation of Conversational Responses
Title | Exploiting Persona Information for Diverse Generation of Conversational Responses |
Authors | Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, Ting Liu |
Abstract | In human conversations, due to their personalities in mind, people can easily carry out and maintain the conversations. Giving conversational context with persona information to a chatbot, how to exploit the information to generate diverse and sustainable conversations is still a non-trivial task. Previous work on persona-based conversational models successfully make use of predefined persona information and have shown great promise in delivering more realistic responses. And they all learn with the assumption that given a source input, there is only one target response. However, in human conversations, there are massive appropriate responses to a given input message. In this paper, we propose a memory-augmented architecture to exploit persona information from context and incorporate a conditional variational autoencoder model together to generate diverse and sustainable conversations. We evaluate the proposed model on a benchmark persona-chat dataset. Both automatic and human evaluations show that our model can deliver more diverse and more engaging persona-based responses than baseline approaches. |
Tasks | Chatbot |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12188v1 |
https://arxiv.org/pdf/1905.12188v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-persona-information-for-diverse |
Repo | |
Framework | |