Paper Group ANR 429
Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion. Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends. Weakly-supervised land classification for coastal zone based on deep convolutional neural networks by incorporating dual-polarimetric characteristics into training dataset. Ne …
Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion
Title | Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion |
Authors | Chunyi Wang |
Abstract | A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused by lack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extracted from existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposed solution is evaluated on the two Chinese corpus and two English corpus, and is shown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small. |
Tasks | Emotion Recognition, Speech Emotion Recognition |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05908v1 |
https://arxiv.org/pdf/2001.05908v1.pdf | |
PWC | https://paperswithcode.com/paper/speech-emotion-recognition-based-on-multi |
Repo | |
Framework | |
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends
Title | Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends |
Authors | Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, Björn W. Schuller |
Abstract | Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions. There are two main drawbacks to this approach: firstly, the feature engineering being manual is cumbersome and requires human knowledge; and secondly, the designed features might not be best for the objective at hand. This has motivated the adoption of a recent trend in speech community towards utilisation of representation learning techniques, which can learn an intermediate representation of the input signal automatically that better suits the task at hand and hence lead to improved performance. The significance of representation learning has increased with advances in deep learning (DL), where the representations are more useful and less dependent on human knowledge, making it very conducive for tasks like classification, prediction, etc. The main contribution of this paper is to present an up-to-date and comprehensive survey on different techniques of speech representation learning by bringing together the scattered research across three distinct research areas including Automatic Speech Recognition (ASR), Speaker Recognition (SR), and Speaker Emotion Recognition (SER). Recent reviews in speech have been conducted for ASR, SR, and SER, however, none of these has focused on the representation learning from speech—a gap that our survey aims to bridge. |
Tasks | Emotion Recognition, Feature Engineering, Representation Learning, Speaker Recognition, Speech Recognition |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00378v1 |
https://arxiv.org/pdf/2001.00378v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-representation-learning-in-speech |
Repo | |
Framework | |
Weakly-supervised land classification for coastal zone based on deep convolutional neural networks by incorporating dual-polarimetric characteristics into training dataset
Title | Weakly-supervised land classification for coastal zone based on deep convolutional neural networks by incorporating dual-polarimetric characteristics into training dataset |
Authors | Sheng Sun, Armando Marino, Wenze Shui, Zhongwen Hu |
Abstract | In this work we explore the performance of DCNNs on semantic segmentation using spaceborne polarimetric synthetic aperture radar (PolSAR) datasets. The semantic segmentation task using PolSAR data can be categorized as weakly supervised learning when the characteristics of SAR data and data annotating procedures are factored in. Datasets are initially analyzed for selecting feasible pre-training images. Then the differences between spaceborne and airborne datasets are examined in terms of spatial resolution and viewing geometry. In this study we used two dual-polarimetric images acquired by TerraSAR-X DLR. A novel method to produce training dataset with more supervised information is developed. Specifically, a series of typical classified images as well as intensity images serve as training datasets. A field survey is conducted for an area of about 20 square kilometers to obtain a ground truth dataset used for accuracy evaluation. Several transfer learning strategies are made for aforementioned training datasets which will be combined in a practicable order. Three DCNN models, including SegNet, U-Net, and LinkNet, are implemented next. |
Tasks | Semantic Segmentation, Transfer Learning |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13648v1 |
https://arxiv.org/pdf/2003.13648v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-land-classification-for |
Repo | |
Framework | |
Neural Network Tomography
Title | Neural Network Tomography |
Authors | Liang Ma, Ziyao Zhang, Mudhakar Srivatsa |
Abstract | Network tomography, a classic research problem in the realm of network monitoring, refers to the methodology of inferring unmeasured network attributes using selected end-to-end path measurements. In the research community, network tomography is generally investigated under the assumptions of known network topology, correlated path measurements, bounded number of faulty nodes/links, or even special network protocol support. The applicability of network tomography is considerably constrained by these strong assumptions, which therefore frequently position it in the theoretical world. In this regard, we revisit network tomography from the practical perspective by establishing a generic framework that does not rely on any of these assumptions or the types of performance metrics. Given only the end-to-end path performance metrics of sampled node pairs, the proposed framework, NeuTomography, utilizes deep neural network and data augmentation to predict the unmeasured performance metrics via learning non-linear relationships between node pairs and underlying unknown topological/routing properties. In addition, NeuTomography can be employed to reconstruct the original network topology, which is critical to most network planning tasks. Extensive experiments using real network data show that comparing to baseline solutions, NeuTomography can predict network characteristics and reconstruct network topologies with significantly higher accuracy and robustness using only limited measurement data. |
Tasks | Data Augmentation |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.02942v1 |
https://arxiv.org/pdf/2001.02942v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-tomography |
Repo | |
Framework | |
3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Face Photos
Title | 3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Face Photos |
Authors | Zipeng Ye, Ran Yi, Minjing Yu, Juyong Zhang, Yu-Kun Lai, Yong-jin Liu |
Abstract | Caricature is a kind of artistic style of human faces that attracts considerable research in computer vision. So far all existing 3D caricature generation methods require some information related to caricature as input, e.g., a caricature sketch or 2D caricature. However, this kind of input is difficult to provide by non-professional users. In this paper, we propose an end-to-end deep neural network model to generate high-quality 3D caricature with a simple face photo as input. The most challenging issue in our system is that the source domain of face photos (characterized by 2D normal faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and texture). To address this challenge, we (1) build a large dataset of 6,100 3D caricature meshes and use it to establish a PCA model in the 3D caricature shape space and (2) detect landmarks in the input face photo and use them to set up correspondence between 2D caricature and 3D caricature shape. Our system can automatically generate high-quality 3D caricatures. In many situations, users want to control the output by a simple and intuitive way, so we further introduce a simple-to-use interactive control with three horizontal and one vertical lines. Experiments and user studies show that our system is easy to use and can generate high-quality 3D caricatures. |
Tasks | Caricature |
Published | 2020-03-15 |
URL | https://arxiv.org/abs/2003.06841v1 |
https://arxiv.org/pdf/2003.06841v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-carigan-an-end-to-end-solution-to-3d |
Repo | |
Framework | |
Accelerating Generalized Benders Decomposition for Wireless Resource Allocation
Title | Accelerating Generalized Benders Decomposition for Wireless Resource Allocation |
Authors | Mengyuan Lee, Ning Ma, Guanding Yu, Huaiyu Dai |
Abstract | Generalized Benders decomposition (GBD) is a globally optimal algorithm for mixed integer nonlinear programming (MINLP) problems, which are NP-hard and can be widely found in the area of wireless resource allocation. The main idea of GBD is decomposing an MINLP problem into a primal problem and a master problem, which are iteratively solved until their solutions converge. However, a direct implementation of GBD is time- and memory-consuming. The main bottleneck is the high complexity of the master problem, which increases over the iterations. Therefore, we propose to leverage machine learning (ML) techniques to accelerate GBD aiming at decreasing the complexity of the master problem. Specifically, we utilize two different ML techniques, classification and regression, to deal with this acceleration task. In this way, a cut classifier and a cut regressor are learned, respectively, to distinguish between useful and useless cuts. Only useful cuts are added to the master problem and thus the complexity of the master problem is reduced. By using a resource allocation problem in device-to-device communication networks as an example, we validate that the proposed method can reduce the computational complexity of GBD without loss of optimality and has strong generalization ability. The proposed method is applicable for solving various MINLP problems in wireless networks since the designs are invariant for different problems. |
Tasks | |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01294v1 |
https://arxiv.org/pdf/2003.01294v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-generalized-benders |
Repo | |
Framework | |
Mining customer product reviews for product development: A summarization process
Title | Mining customer product reviews for product development: A summarization process |
Authors | Tianjun Hou, Bernard Yannou, Yann Leroy, Emilie Poirson |
Abstract | This research set out to identify and structure from online reviews the words and expressions related to customers’ likes and dislikes to guide product development. Previous methods were mainly focused on product features. However, reviewers express their preference not only on product features. In this paper, based on an extensive literature review in design science, the authors propose a summarization model containing multiples aspects of user preference, such as product affordances, emotions, usage conditions. Meanwhile, the linguistic patterns describing these aspects of preference are discovered and drafted as annotation guidelines. A case study demonstrates that with the proposed model and the annotation guidelines, human annotators can structure the online reviews with high inter-agreement. As high inter-agreement human annotation results are essential for automatizing the online review summarization process with the natural language processing, this study provides materials for the future study of automatization. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04200v1 |
https://arxiv.org/pdf/2001.04200v1.pdf | |
PWC | https://paperswithcode.com/paper/mining-customer-product-reviews-for-product |
Repo | |
Framework | |
Offensive Language Identification in Greek
Title | Offensive Language Identification in Greek |
Authors | Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe |
Abstract | As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With a few notable exceptions, most research on this topic so far has dealt with English. This is mostly due to the availability of language resources for English. To address this shortcoming, this paper presents the first Greek annotated dataset for offensive language identification: the Offensive Greek Tweet Dataset (OGTD). OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive. Along with a detailed description of the dataset, we evaluate several computational models trained and tested on this data. |
Tasks | Language Identification |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07459v2 |
https://arxiv.org/pdf/2003.07459v2.pdf | |
PWC | https://paperswithcode.com/paper/offensive-language-identification-in-greek |
Repo | |
Framework | |
clDice – a Topology-Preserving Loss Function for Tubular Structure Segmentation
Title | clDice – a Topology-Preserving Loss Function for Tubular Structure Segmentation |
Authors | Suprosanna Shit, Johannes C. Paetzold, Anjany Sekuboyina, Andrey Zhylka, Ivan Ezhov, Alexander Unger, Josien P. W. Pluim, Giles Tetteh, Bjoern H. Menze |
Abstract | Accurate segmentation of tubular, network-like structures, such as vessels, neurons, or roads, is relevant to many fields of research. For such structures, the topology is their most important characteristic, e.g. preserving connectedness: in case of vascular networks, missing a connected vessel entirely alters the blood-flow dynamics. We introduce a novel similarity measure termed clDice, which is calculated on the intersection of the segmentation masks and their (morphological) skeletons. Crucially, we theoretically prove that clDice guarantees topological correctness for binary 2D and 3D segmentation. Extending this, we propose a computationally efficient, differentiable soft-clDice as a loss function for training arbitrary neural segmentation networks. We benchmark the soft-clDice loss for segmentation on four public datasets (2D and 3D). Training on soft-clDice leads to segmentation with more accurate connectivity information, higher graph similarity, and better volumetric scores. |
Tasks | Graph Similarity |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07311v3 |
https://arxiv.org/pdf/2003.07311v3.pdf | |
PWC | https://paperswithcode.com/paper/cldice-a-topology-preserving-loss-function |
Repo | |
Framework | |
An Efficient Architecture for Predicting the Case of Characters using Sequence Models
Title | An Efficient Architecture for Predicting the Case of Characters using Sequence Models |
Authors | Gopi Ramena, Divija Nagaraju, Sukumar Moharana, Debi Prasanna Mohanty, Naresh Purre |
Abstract | The dearth of clean textual data often acts as a bottleneck in several natural language processing applications. The data available often lacks proper case (uppercase or lowercase) information. This often comes up when text is obtained from social media, messaging applications and other online platforms. This paper attempts to solve this problem by restoring the correct case of characters, commonly known as Truecasing. Doing so improves the accuracy of several processing tasks further down in the NLP pipeline. Our proposed architecture uses a combination of convolutional neural networks (CNN), bi-directional long short-term memory networks (LSTM) and conditional random fields (CRF), which work at a character level without any explicit feature engineering. In this study we compare our approach to previous statistical and deep learning based approaches. Our method shows an increment of 0.83 in F1 score over the current state of the art. Since truecasing acts as a preprocessing step in several applications, every increment in the F1 score leads to a significant improvement in the language processing tasks. |
Tasks | Feature Engineering |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2002.00738v1 |
https://arxiv.org/pdf/2002.00738v1.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-architecture-for-predicting-the |
Repo | |
Framework | |
Graph Similarity Using PageRank and Persistent Homology
Title | Graph Similarity Using PageRank and Persistent Homology |
Authors | Mustafa Hajij, Elizabeth Munch, Paul Rosen |
Abstract | The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this work, we utilize the PageRank function on the lower-star filtration of the graph as input to persistent homology to study the problem of graph similarity. By representing each graph as a persistence diagram, we can then compare outputs using the bottleneck distance. We show the effectiveness of our method by utilizing it on two shape mesh datasets. |
Tasks | Graph Similarity |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05158v1 |
https://arxiv.org/pdf/2002.05158v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-similarity-using-pagerank-and |
Repo | |
Framework | |
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Title | Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs |
Authors | Shizhe Chen, Qin Jin, Peng Wang, Qi Wu |
Abstract | Humans are able to describe image contents with coarse to fine details as they wish. However, most image captioning models are intention-agnostic which can not generate diverse descriptions according to different user intentions initiatively. In this work, we propose the Abstract Scene Graph (ASG) structure to represent user intention in fine-grained level and control what and how detailed the generated description should be. The ASG is a directed graph consisting of three types of \textbf{abstract nodes} (object, attribute, relationship) grounded in the image without any concrete semantic labels. Thus it is easy to obtain either manually or automatically. From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions according to the graph structure. Our model achieves better controllability conditioning on ASGs than carefully designed baselines on both VisualGenome and MSCOCO datasets. It also significantly improves the caption diversity via automatically sampling diverse ASGs as control signals. |
Tasks | Image Captioning |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00387v1 |
https://arxiv.org/pdf/2003.00387v1.pdf | |
PWC | https://paperswithcode.com/paper/say-as-you-wish-fine-grained-control-of-image |
Repo | |
Framework | |
Multi-lane Detection Using Instance Segmentation and Attentive Voting
Title | Multi-lane Detection Using Instance Segmentation and Attentive Voting |
Authors | Donghoon Chang, Vinjohn Chirakkal, Shubham Goswami, Munawar Hasan, Taekwon Jung, Jinkeon Kang, Seok-Cheol Kee, Dongkyu Lee, Ajit Pratap Singh |
Abstract | Autonomous driving is becoming one of the leading industrial research areas. Therefore many automobile companies are coming up with semi to fully autonomous driving solutions. Among these solutions, lane detection is one of the vital driver-assist features that play a crucial role in the decision-making process of the autonomous vehicle. A variety of solutions have been proposed to detect lanes on the road, which ranges from using hand-crafted features to the state-of-the-art end-to-end trainable deep learning architectures. Most of these architectures are trained in a traffic constrained environment. In this paper, we propose a novel solution to multi-lane detection, which outperforms state of the art methods in terms of both accuracy and speed. To achieve this, we also offer a dataset with a more intuitive labeling scheme as compared to other benchmark datasets. Using our approach, we are able to obtain a lane segmentation accuracy of 99.87% running at 54.53 fps (average). |
Tasks | Autonomous Driving, Decision Making, Instance Segmentation, Lane Detection, Semantic Segmentation |
Published | 2020-01-01 |
URL | https://arxiv.org/abs/2001.00236v1 |
https://arxiv.org/pdf/2001.00236v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-lane-detection-using-instance |
Repo | |
Framework | |
Towards Interpretable Deep Neural Networks: An Exact Transformation to Multi-Class Multivariate Decision Trees
Title | Towards Interpretable Deep Neural Networks: An Exact Transformation to Multi-Class Multivariate Decision Trees |
Authors | Tung D. Nguyen, Kathryn E. Kasmarik, Hussein A. Abbass |
Abstract | Deep neural networks (DNNs) are commonly labelled as black-boxes lacking interpretability; thus, hindering human’s understanding of DNNs’ behaviors. A need exists to generate a meaningful sequential logic for the production of a specific output. Decision trees exhibit better interpretability and expressive power due to their representation language and the existence of efficient algorithms to generate rules. Growing a decision tree based on the available data could produce larger than necessary trees or trees that do not generalise well. In this paper, we introduce two novel multivariate decision tree (MDT) algorithms for rule extraction from a DNN: an Exact-Convertible Decision Tree (EC-DT) and a Deep C-Net algorithm to transform a neural network with Rectified Linear Unit activation functions into a representative tree which can be used to extract multivariate rules for reasoning. While the EC-DT translates the DNN in a layer-wise manner to represent exactly the decision boundaries implicitly learned by the hidden layers of the network, the Deep C-Net inherits the decompositional approach from EC-DT and combines with a C5 tree learning algorithm to construct the decision rules. The results suggest that while EC-DT is superior in preserving the structure and the accuracy of DNN, C-Net generates the most compact and highly effective trees from DNN. Both proposed MDT algorithms generate rules including combinations of multiple attributes for precise interpretation of decision-making processes. |
Tasks | Decision Making |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04675v2 |
https://arxiv.org/pdf/2003.04675v2.pdf | |
PWC | https://paperswithcode.com/paper/an-exact-transformation-from-deep-neural |
Repo | |
Framework | |
Context-aware Non-linear and Neural Attentive Knowledge-based Models for Grade Prediction
Title | Context-aware Non-linear and Neural Attentive Knowledge-based Models for Grade Prediction |
Authors | Sara Morsy, George Karypis |
Abstract | Grade prediction for future courses not yet taken by students is important as it can help them and their advisers during the process of course selection as well as for designing personalized degree plans and modifying them based on their performance. One of the successful approaches for accurately predicting a student’s grades in future courses is Cumulative Knowledge-based Regression Models (CKRM). CKRM learns shallow linear models that predict a student’s grades as the similarity between his/her knowledge state and the target course. However, prior courses taken by a student can have \black{different contributions when estimating a student’s knowledge state and towards each target course, which} cannot be captured by linear models. Moreover, CKRM and other grade prediction methods ignore the effect of concurrently-taken courses on a student’s performance in a target course. In this paper, we propose context-aware non-linear and neural attentive models that can potentially better estimate a student’s knowledge state from his/her prior course information, as well as model the interactions between a target course and concurrent courses. Compared to the competing methods, our experiments on a large real-world dataset consisting of more than $1.5$M grades show the effectiveness of the proposed models in accurately predicting students’ grades. Moreover, the attention weights learned by the neural attentive model can be helpful in better designing their degree plans. |
Tasks | |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.05063v1 |
https://arxiv.org/pdf/2003.05063v1.pdf | |
PWC | https://paperswithcode.com/paper/context-aware-non-linear-and-neural-attentive |
Repo | |
Framework | |