Paper Group ANR 951
A Framework for Automated Pop-song Melody Generation with Piano Accompaniment Arrangement. Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM. Understanding Deep Convolutional Networks through Gestalt Theory. Binary Image Features Proposed to Empower Computer Vision. Improving Classification Rate of Schizophreni …
A Framework for Automated Pop-song Melody Generation with Piano Accompaniment Arrangement
Title | A Framework for Automated Pop-song Melody Generation with Piano Accompaniment Arrangement |
Authors | Ziyu Wang, Gus Xia |
Abstract | We contribute a pop-song automation framework for lead melody generation and accompaniment arrangement. The framework reflects the major procedures of human music composition, generating both lead melody and piano accompaniment by a unified strategy. Specifically, we take chord progression as an input and propose three models to generate a structured melody with piano accompaniment textures. First, the harmony alternation model transforms a raw input chord progression to an altered one to better fit the specified music style. Second, the melody generation model generates the lead melody and other voices (melody lines) of the accompaniment using seasonal ARMA (Autoregressive Moving Average) processes. Third, the melody integration model integrates melody lines (voices) together as the final piano accompaniment. We evaluate the proposed framework using subjective listening tests. Experimental results show that the generated melodies are rated significantly higher than the ones generated by bi-directional LSTM, and our accompaniment arrangement result is comparable with a state-of-the-art commercial software, Band in a Box. |
Tasks | |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.10906v1 |
http://arxiv.org/pdf/1812.10906v1.pdf | |
PWC | https://paperswithcode.com/paper/a-framework-for-automated-pop-song-melody |
Repo | |
Framework | |
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM
Title | Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM |
Authors | Szu-Wei Fu, Yu Tsao, Hsin-Te Hwang, Hsin-Min Wang |
Abstract | Nowadays, most of the objective speech quality assessment tools (e.g., perceptual evaluation of speech quality (PESQ)) are based on the comparison of the degraded/processed speech with its clean counterpart. The need of a “golden” reference considerably restricts the practicality of such assessment tools in real-world scenarios since the clean reference usually cannot be accessed. On the other hand, human beings can readily evaluate the speech quality without any reference (e.g., mean opinion score (MOS) tests), implying the existence of an objective and non-intrusive (no clean reference needed) quality assessment mechanism. In this study, we propose a novel end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory. The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment. Frame constraints and sensible initializations of forget gate biases are applied to learn meaningful frame-level quality assessment from the utterance-level quality label. Experimental results show that Quality-Net can yield high correlation to PESQ (0.9 for the noisy speech and 0.84 for the speech processed by speech enhancement). We believe that Quality-Net has potential to be used in a wide variety of applications of speech signal processing. |
Tasks | Speech Enhancement |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05344v2 |
http://arxiv.org/pdf/1808.05344v2.pdf | |
PWC | https://paperswithcode.com/paper/quality-net-an-end-to-end-non-intrusive |
Repo | |
Framework | |
Understanding Deep Convolutional Networks through Gestalt Theory
Title | Understanding Deep Convolutional Networks through Gestalt Theory |
Authors | Angelos Amanatiadis, Vasileios Kaburlasos, Elias Kosmatopoulos |
Abstract | The superior performance of deep convolutional networks over high-dimensional problems have made them very popular for several applications. Despite their wide adoption, their underlying mechanisms still remain unclear with their improvement procedures still relying mainly on a trial and error process. We introduce a novel sensitivity analysis based on the Gestalt theory for giving insights into the classifier function and intermediate layers. Since Gestalt psychology stipulates that perception can be a product of complex interactions among several elements, we perform an ablation study based on this concept to discover which principles and image context significantly contribute in the network classification. Our results reveal that ConvNets follow most of the visual cortical perceptual mechanisms defined by the Gestalt principles at several levels. The proposed framework stimulates specific feature maps in classification problems and reveal important network attributes that can produce more explainable network models. |
Tasks | |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08697v1 |
http://arxiv.org/pdf/1810.08697v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-deep-convolutional-networks-1 |
Repo | |
Framework | |
Binary Image Features Proposed to Empower Computer Vision
Title | Binary Image Features Proposed to Empower Computer Vision |
Authors | Soumi Ray, Vinod Kumar |
Abstract | This literature has proposed three fast and easy computable image features to improve computer vision by offering more human-like vision power. These features are not based on image pixels absolute or relative intensity; neither based on shape or colour. So, no complex pixel by pixel calculation is required. For human eyes, pixel by pixel calculation is like seeing an image with maximum zoom which is done only when a higher level of details is required. Normally, first we look at an image to get an overall idea about it to know whether it deserves further investigation or not. This capacity of getting an idea at a glance is analysed and three basic features are proposed to empower computer vision. Potential of proposed features is tested and established through different medical dataset. Achieved accuracy in classification demonstrates possibilities and potential of the use of the proposed features in image processing. |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.08275v1 |
http://arxiv.org/pdf/1808.08275v1.pdf | |
PWC | https://paperswithcode.com/paper/binary-image-features-proposed-to-empower |
Repo | |
Framework | |
Improving Classification Rate of Schizophrenia Using a Multimodal Multi-Layer Perceptron Model with Structural and Functional MR
Title | Improving Classification Rate of Schizophrenia Using a Multimodal Multi-Layer Perceptron Model with Structural and Functional MR |
Authors | Alvaro Ulloa, Sergey Plis, Vince Calhoun |
Abstract | The wide variety of brain imaging technologies allows us to exploit information inherent to different data modalities. The richness of multimodal datasets may increase predictive power and reveal latent variables that otherwise would have not been found. However, the analysis of multimodal data is often conducted by assuming linear interactions which impact the accuracy of the results. We propose the use of a multimodal multi-layer perceptron model to enhance the predictive power of structural and functional magnetic resonance imaging (sMRI and fMRI) combined. We also use a synthetic data generator to pre-train each modality input layers, alleviating the effects of the small sample size that is often the case for brain imaging modalities. The proposed model improved the average and uncertainty of the area under the ROC curve to 0.850+-0.051 compared to the best results on individual modalities (0.741+-0.075 for sMRI, and 0.833+-0.050 for fMRI). |
Tasks | |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.04591v1 |
http://arxiv.org/pdf/1804.04591v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-classification-rate-of |
Repo | |
Framework | |
Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing
Title | Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing |
Authors | Amir Rosenfeld, John K. Tsotsos |
Abstract | Training deep neural networks results in strong learned representations that show good generalization capabilities. In most cases, training involves iterative modification of all weights inside the network via back-propagation. In Extreme Learning Machines, it has been suggested to set the first layer of a network to fixed random values instead of learning it. In this paper, we propose to take this approach a step further and fix almost all layers of a deep convolutional neural network, allowing only a small portion of the weights to be learned. As our experiments show, fixing even the majority of the parameters of the network often results in performance which is on par with the performance of learning all of them. The implications of this intriguing property of deep neural networks are discussed and we suggest ways to harness it to create more robust representations. |
Tasks | |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00844v1 |
http://arxiv.org/pdf/1802.00844v1.pdf | |
PWC | https://paperswithcode.com/paper/intriguing-properties-of-randomly-weighted |
Repo | |
Framework | |
Exploiting Partially Annotated Data for Temporal Relation Extraction
Title | Exploiting Partially Annotated Data for Temporal Relation Extraction |
Authors | Qiang Ning, Zhongzhi Yu, Chuchu Fan, Dan Roth |
Abstract | Annotating temporal relations (TempRel) between events described in natural language is known to be labor intensive, partly because the total number of TempRels is quadratic in the number of events. As a result, only a small number of documents are typically annotated, limiting the coverage of various lexical/semantic phenomena. In order to improve existing approaches, one possibility is to make use of the readily available, partially annotated data (P as in partial) that cover more documents. However, missing annotations in P are known to hurt, rather than help, existing systems. This work is a case study in exploring various usages of P for TempRel extraction. Results show that despite missing annotations, P is still a useful supervision signal for this task within a constrained bootstrapping learning framework. The system described in this system is publicly available. |
Tasks | Relation Extraction |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.08420v2 |
http://arxiv.org/pdf/1804.08420v2.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-partially-annotated-data-for |
Repo | |
Framework | |
Twitter-based traffic information system based on vector representations for words
Title | Twitter-based traffic information system based on vector representations for words |
Authors | Sina Dabiri, Kevin Heaslip |
Abstract | Recently, researchers have shown an increased interest in harnessing Twitter data for dynamic monitoring of traffic conditions. Bag-of-words representation is a common method in literature for tweet modeling and retrieving traffic information, yet it suffers from the curse of dimensionality and sparsity. To address these issues, our specific objective is to propose a simple and robust framework on the top of word embedding for distinguishing traffic-related tweets against non-traffic-related ones. In our proposed model, a tweet is classified as traffic-related if semantic similarity between its words and a small set of traffic keywords exceeds a threshold value. Semantic similarity between words is captured by means of word-embedding models, which is an unsupervised learning tool. The proposed model is as simple as having only one trainable parameter. The model takes advantage of outstanding merits, which are demonstrated through several evaluation steps. The state-of-the-art test accuracy for our proposed model is 95.9%. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01199v1 |
http://arxiv.org/pdf/1812.01199v1.pdf | |
PWC | https://paperswithcode.com/paper/twitter-based-traffic-information-system |
Repo | |
Framework | |
Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings
Title | Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings |
Authors | Haw-Shiuan Chang, Amol Agrawal, Ananya Ganesh, Anirudha Desai, Vinayak Mathur, Alfred Hough, Andrew McCallum |
Abstract | Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-the-art methods while being significantly more efficient. |
Tasks | Word Sense Induction |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03257v2 |
http://arxiv.org/pdf/1804.03257v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-graph-based-word-sense-induction-by |
Repo | |
Framework | |
Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning
Title | Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning |
Authors | Lingbo Liu, Guanbin Li, Yuan Xie, Yizhou Yu, Qing Wang, Liang Lin |
Abstract | Facial landmark localization plays a critical role in face recognition and analysis. In this paper, we propose a novel cascaded backbone-branches fully convolutional neural network~(BB-FCN) for rapidly and accurately localizing facial landmarks in unconstrained and cluttered settings. Our proposed BB-FCN generates facial landmark response maps directly from raw images without any preprocessing. BB-FCN follows a coarse-to-fine cascaded pipeline, which consists of a backbone network for roughly detecting the locations of all facial landmarks and one branch network for each type of detected landmark for further refining their locations. Furthermore, to facilitate the facial landmark localization under unconstrained settings, we propose a large-scale benchmark named SYSU16K, which contains 16000 faces with large variations in pose, expression, illumination and resolution. Extensive experimental evaluations demonstrate that our proposed BB-FCN can significantly outperform the state-of-the-art under both constrained (i.e., within detected facial regions only) and unconstrained settings. We further confirm that high-quality facial landmarks localized with our proposed network can also improve the precision and recall of face detection. |
Tasks | Face Alignment, Face Detection, Face Recognition, Representation Learning |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03887v1 |
http://arxiv.org/pdf/1812.03887v1.pdf | |
PWC | https://paperswithcode.com/paper/facial-landmark-machines-a-backbone-branches |
Repo | |
Framework | |
A Descriptive Study of Variable Discretization and Cost-Sensitive Logistic Regression on Imbalanced Credit Data
Title | A Descriptive Study of Variable Discretization and Cost-Sensitive Logistic Regression on Imbalanced Credit Data |
Authors | Lili Zhang, Herman Ray, Jennifer Priestley, Soon Tan |
Abstract | Training classification models on imbalanced data tends to result in bias towards the majority class. In this paper, we demonstrate how variable discretization and cost-sensitive logistic regression help mitigate this bias on an imbalanced credit scoring dataset, and further show the application of the variable discretization technique on the data from other domains, demonstrating its potential as a generic technique for classifying imbalanced data beyond credit socring. The performance measurements include ROC curves, Area under ROC Curve (AUC), Type I Error, Type II Error, accuracy, and F1 score. The results show that proper variable discretization and cost-sensitive logistic regression with the best class weights can reduce the model bias and/or variance. From the perspective of the algorithm, cost-sensitive logistic regression is beneficial for increasing the value of predictors even if they are not in their optimized forms while maintaining monotonicity. From the perspective of predictors, the variable discretization performs better than cost-sensitive logistic regression, provides more reasonable coefficient estimates for predictors which have nonlinear relationships against their empirical logit, and is robust to penalty weights on misclassifications of events and non-events determined by their apriori proportions. |
Tasks | |
Published | 2018-12-28 |
URL | https://arxiv.org/abs/1812.10857v2 |
https://arxiv.org/pdf/1812.10857v2.pdf | |
PWC | https://paperswithcode.com/paper/a-descriptive-study-of-variable |
Repo | |
Framework | |
Enhanced Optimization with Composite Objectives and Novelty Selection
Title | Enhanced Optimization with Composite Objectives and Novelty Selection |
Authors | Hormoz Shahrzad, Daniel Fink, Risto Miikkulainen |
Abstract | An important benefit of multi-objective search is that it maintains a diverse population of candidates, which helps in deceptive problems in particular. Not all diversity is useful, however: candidates that optimize only one objective while ignoring others are rarely helpful. This paper proposes a solution: The original objectives are replaced by their linear combinations, thus focusing the search on the most useful tradeoffs between objectives. To compensate for the loss of diversity, this transformation is accompanied by a selection mechanism that favors novelty. In the highly deceptive problem of discovering minimal sorting networks, this approach finds better solutions, and finds them faster and more consistently than standard methods. It is therefore a promising approach to solving deceptive problems through multi-objective optimization. |
Tasks | |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03744v2 |
http://arxiv.org/pdf/1803.03744v2.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-optimization-with-composite |
Repo | |
Framework | |
Precipitation Nowcasting: Leveraging bidirectional LSTM and 1D CNN
Title | Precipitation Nowcasting: Leveraging bidirectional LSTM and 1D CNN |
Authors | Maitreya Patel, Anery Patel, Dr. Ranendu Ghosh |
Abstract | Short-term rainfall forecasting, also known as precipitation nowcasting has become a potentially fundamental technology impacting significant real-world applications ranging from flight safety, rainstorm alerts to farm irrigation timings. Since weather forecasting involves identifying the underlying structure in a huge amount of data, deep-learning based precipitation nowcasting has intuitively outperformed the traditional linear extrapolation methods. Our research work intends to utilize the recent advances in deep learning to nowcasting, a multi-variable time series forecasting problem. Specifically, we leverage a bidirectional LSTM (Long Short-Term Memory) neural network architecture which remarkably captures the temporal features and long-term dependencies from historical data. To further our studies, we compare the bidirectional LSTM network with 1D CNN model to prove the capabilities of sequence models over feed-forward neural architectures in forecasting related problems. |
Tasks | Time Series, Time Series Forecasting, Weather Forecasting |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10485v1 |
http://arxiv.org/pdf/1810.10485v1.pdf | |
PWC | https://paperswithcode.com/paper/precipitation-nowcasting-leveraging |
Repo | |
Framework | |
Multi-task Learning for Financial Forecasting
Title | Multi-task Learning for Financial Forecasting |
Authors | Tao Ma |
Abstract | Financial forecasting is challenging and attractive in machine learning. There are many classic solutions, as well as many deep learning based methods, proposed to deal with it yielding encouraging performance. Stock time series forecasting is the most representative problem in financial forecasting. Due to the strong connections among stocks, the information valuable for forecasting is not only included in individual stocks, but also included in the stocks related to them. However, most previous works focus on one single stock, which easily ignore the valuable information in others. To leverage more information, in this paper, we propose a jointly forecasting approach to process multiple time series of related stocks simultaneously, using multi-task learning framework. Compared to the previous works, we use multiple networks to forecast multiple related stocks, using the shared and private information of them simultaneously through multi-task learning. Moreover, we propose an attention method learning an optimized weighted combination of shared and private information based on the idea of Capital Asset Pricing Model (CAPM) to help forecast. Experimental results on various data show improved forecasting performance over baseline methods. |
Tasks | Multi-Task Learning, Time Series, Time Series Forecasting |
Published | 2018-09-27 |
URL | http://arxiv.org/abs/1809.10336v3 |
http://arxiv.org/pdf/1809.10336v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-for-financial-forecasting |
Repo | |
Framework | |
Sequential Experiment Design for Hypothesis Verification
Title | Sequential Experiment Design for Hypothesis Verification |
Authors | Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra |
Abstract | Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature. |
Tasks | |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01137v1 |
http://arxiv.org/pdf/1812.01137v1.pdf | |
PWC | https://paperswithcode.com/paper/sequential-experiment-design-for-hypothesis |
Repo | |
Framework | |