Paper Group ANR 303
AGM-Style Revision of Beliefs and Intentions from a Database Perspective (Preliminary Version). ACD: Action Concept Discovery from Image-Sentence Corpora. Unifying Geometric Features and Facial Action Units for Improved Performance of Facial Expression Analysis. Segmental Recurrent Neural Networks for End-to-end Speech Recognition. Machine Learning …
AGM-Style Revision of Beliefs and Intentions from a Database Perspective (Preliminary Version)
Title | AGM-Style Revision of Beliefs and Intentions from a Database Perspective (Preliminary Version) |
Authors | Marc van Zee, Dragan Doder |
Abstract | We introduce a logic for temporal beliefs and intentions based on Shoham’s database perspective. We separate strong beliefs from weak beliefs. Strong beliefs are independent from intentions, while weak beliefs are obtained by adding intentions to strong beliefs and everything that follows from that. We formalize coherence conditions on strong beliefs and intentions. We provide AGM-style postulates for the revision of strong beliefs and intentions. We show in a representation theorem that a revision operator satisfying our postulates can be represented by a pre-order on interpretations of the beliefs, together with a selection function for the intentions. |
Tasks | |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07183v2 |
http://arxiv.org/pdf/1604.07183v2.pdf | |
PWC | https://paperswithcode.com/paper/agm-style-revision-of-beliefs-and-intentions |
Repo | |
Framework | |
ACD: Action Concept Discovery from Image-Sentence Corpora
Title | ACD: Action Concept Discovery from Image-Sentence Corpora |
Authors | Jiyang Gao, Chen Sun, Ram Nevatia |
Abstract | Action classification in still images is an important task in computer vision. It is challenging as the appearances of ac- tions may vary depending on their context (e.g. associated objects). Manually labeling of context information would be time consuming and difficult to scale up. To address this challenge, we propose a method to automatically discover and cluster action concepts, and learn their classifiers from weakly supervised image-sentence corpora. It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images. Candidate action concepts are then clustered by using a multi-modal representation with image embeddings from deep convolutional networks and text embeddings from word2vec. More than one hundred human action concept classifiers are learned from the Flickr 30k dataset with no additional human effort and promising classification results are obtained. We further apply the AdaBoost algorithm to automatically select and combine relevant action concepts given an action query. Promising results have been shown on the PASCAL VOC 2012 action classification benchmark, which has zero overlap with Flickr30k. |
Tasks | Action Classification |
Published | 2016-04-16 |
URL | http://arxiv.org/abs/1604.04784v1 |
http://arxiv.org/pdf/1604.04784v1.pdf | |
PWC | https://paperswithcode.com/paper/acd-action-concept-discovery-from-image |
Repo | |
Framework | |
Unifying Geometric Features and Facial Action Units for Improved Performance of Facial Expression Analysis
Title | Unifying Geometric Features and Facial Action Units for Improved Performance of Facial Expression Analysis |
Authors | Mehdi Ghayoumi, Arvind K Bansal |
Abstract | Previous approaches to model and analyze facial expression analysis use three different techniques: facial action units, geometric features and graph based modelling. However, previous approaches have treated these technique separately. There is an interrelationship between these techniques. The facial expression analysis is significantly improved by utilizing these mappings between major geometric features involved in facial expressions and the subset of facial action units whose presence or absence are unique to a facial expression. This paper combines dimension reduction techniques and image classification with search space pruning achieved by this unique subset of facial action units to significantly prune the search space. The performance results on the publicly facial expression database shows an improvement in performance by 70% over time while maintaining the emotion recognition correctness. |
Tasks | Dimensionality Reduction, Emotion Recognition, Image Classification |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00822v1 |
http://arxiv.org/pdf/1606.00822v1.pdf | |
PWC | https://paperswithcode.com/paper/unifying-geometric-features-and-facial-action |
Repo | |
Framework | |
Segmental Recurrent Neural Networks for End-to-end Speech Recognition
Title | Segmental Recurrent Neural Networks for End-to-end Speech Recognition |
Authors | Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals |
Abstract | We study the segmental recurrent neural network for end-to-end acoustic modelling. This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction. Compared to most previous CRF-based acoustic models, it does not rely on an external system to provide features or segmentation boundaries. Instead, this model marginalises out all the possible segmentations, and features are extracted from the RNN trained together with the segmental CRF. In essence, this model is self-contained and can be trained end-to-end. In this paper, we discuss practical training and decoding issues as well as the method to speed up the training in the context of speech recognition. We performed experiments on the TIMIT dataset. We achieved 17.3 phone error rate (PER) from the first-pass decoding — the best reported result using CRFs, despite the fact that we only used a zeroth-order CRF and without using any language model. |
Tasks | Acoustic Modelling, End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2016-03-01 |
URL | http://arxiv.org/abs/1603.00223v2 |
http://arxiv.org/pdf/1603.00223v2.pdf | |
PWC | https://paperswithcode.com/paper/segmental-recurrent-neural-networks-for-end |
Repo | |
Framework | |
Machine Learning meets Data-Driven Journalism: Boosting International Understanding and Transparency in News Coverage
Title | Machine Learning meets Data-Driven Journalism: Boosting International Understanding and Transparency in News Coverage |
Authors | Elena Erdmann, Karin Boczek, Lars Koppers, Gerret von Nordheim, Christian Pölitz, Alejandro Molina, Katharina Morik, Henrik Müller, Jörg Rahnenführer, Kristian Kersting |
Abstract | Migration crisis, climate change or tax havens: Global challenges need global solutions. But agreeing on a joint approach is difficult without a common ground for discussion. Public spheres are highly segmented because news are mainly produced and received on a national level. Gain- ing a global view on international debates about important issues is hindered by the enormous quantity of news and by language barriers. Media analysis usually focuses only on qualitative re- search. In this position statement, we argue that it is imperative to pool methods from machine learning, journalism studies and statistics to help bridging the segmented data of the international public sphere, using the Transatlantic Trade and Investment Partnership (TTIP) as a case study. |
Tasks | |
Published | 2016-06-16 |
URL | http://arxiv.org/abs/1606.05110v1 |
http://arxiv.org/pdf/1606.05110v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-meets-data-driven-journalism |
Repo | |
Framework | |
Transfer Learning based Dynamic Multiobjective Optimization Algorithms
Title | Transfer Learning based Dynamic Multiobjective Optimization Algorithms |
Authors | Min Jiang, Zhongqiang Huang, Liming Qiu, Wenzhen Huang, Gary G. Yen |
Abstract | One of the major distinguishing features of the dynamic multiobjective optimization problems (DMOPs) is the optimization objectives will change over time, thus tracking the varying Pareto-optimal front becomes a challenge. One of the promising solutions is reusing the “experiences” to construct a prediction model via statistical machine learning approaches. However most of the existing methods ignore the non-independent and identically distributed nature of data used to construct the prediction model. In this paper, we propose an algorithmic framework, called Tr-DMOEA, which integrates transfer learning and population-based evolutionary algorithm for solving the DMOPs. This approach takes the transfer learning method as a tool to help reuse the past experience for speeding up the evolutionary process, and at the same time, any population based multiobjective algorithms can benefit from this integration without any extensive modifications. To verify this, we incorporate the proposed approach into the development of three well-known algorithms, nondominated sorting genetic algorithm II (NSGA-II), multiobjective particle swarm optimization (MOPSO), and the regularity model-based multiobjective estimation of distribution algorithm (RM-MEDA), and then employ twelve benchmark functions to test these algorithms as well as compare with some chosen state-of-the-art designs. The experimental results confirm the effectiveness of the proposed method through exploiting machine learning technology. |
Tasks | Multiobjective Optimization, Transfer Learning |
Published | 2016-12-19 |
URL | http://arxiv.org/abs/1612.06093v2 |
http://arxiv.org/pdf/1612.06093v2.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-based-dynamic |
Repo | |
Framework | |
Swarm Intelligence for Multiobjective Optimization of Extraction Process
Title | Swarm Intelligence for Multiobjective Optimization of Extraction Process |
Authors | T. Ganesan, I. Elamvazuthi, P. Vasant |
Abstract | Multi objective (MO) optimization is an emerging field which is increasingly being implemented in many industries globally. In this work, the MO optimization of the extraction process of bioactive compounds from the Gardenia Jasminoides Ellis fruit was solved. Three swarm-based algorithms have been applied in conjunction with normal-boundary intersection (NBI) method to solve this MO problem. The gravitational search algorithm (GSA) and the particle swarm optimization (PSO) technique were implemented in this work. In addition, a novel Hopfield-enhanced particle swarm optimization was developed and applied to the extraction problem. By measuring the levels of dominance, the optimality of the approximate Pareto frontiers produced by all the algorithms were gauged and compared. Besides, by measuring the levels of convergence of the frontier, some understanding regarding the structure of the objective space in terms of its relation to the level of frontier dominance is uncovered. Detail comparative studies were conducted on all the algorithms employed and developed in this work. |
Tasks | Multiobjective Optimization |
Published | 2016-09-30 |
URL | http://arxiv.org/abs/1611.06086v1 |
http://arxiv.org/pdf/1611.06086v1.pdf | |
PWC | https://paperswithcode.com/paper/swarm-intelligence-for-multiobjective |
Repo | |
Framework | |
Classification with the pot-pot plot
Title | Classification with the pot-pot plot |
Authors | Oleksii Pokotylo, Karl Mosler |
Abstract | We propose a procedure for supervised classification that is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class’s prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials. Separation of the classes, as well as classification of new data points, is performed on this plot. For this, either the $\alpha$-procedure ($\alpha$-P) or $k$-nearest neighbors ($k$-NN) are employed. For data that are generated from continuous distributions, these classifiers prove to be strongly Bayes-consistent. The potentials depend on the kernel and its bandwidth used in the density estimate. We investigate several variants of bandwidth selection, including joint and separate pre-scaling and a bandwidth regression approach. The new method is applied to benchmark data from the literature, including simulated data sets as well as 50 sets of real data. It compares favorably to known classification methods such as LDA, QDA, max kernel density estimates, $k$-NN, and $DD$-plot classification using depth functions. |
Tasks | |
Published | 2016-08-09 |
URL | http://arxiv.org/abs/1608.02861v1 |
http://arxiv.org/pdf/1608.02861v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-with-the-pot-pot-plot |
Repo | |
Framework | |
Efficient approaches for escaping higher order saddle points in non-convex optimization
Title | Efficient approaches for escaping higher order saddle points in non-convex optimization |
Authors | Anima Anandkumar, Rong Ge |
Abstract | Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle point structures in high dimensions. Many functions have degenerate saddle points such that the first and second order derivatives cannot distinguish them with local optima. In this paper we use higher order derivatives to escape these saddle points: we design the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order). We also show that it is NP-hard to extend this further to finding fourth order local optima. |
Tasks | |
Published | 2016-02-18 |
URL | http://arxiv.org/abs/1602.05908v1 |
http://arxiv.org/pdf/1602.05908v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-approaches-for-escaping-higher |
Repo | |
Framework | |
Sar image despeckling based on nonlocal similarity sparse decomposition
Title | Sar image despeckling based on nonlocal similarity sparse decomposition |
Authors | Chengwei Sang, Hong Sun, Quisong Xia |
Abstract | This letter presents a method of synthetic aperture radar (SAR) image despeckling aimed to preserve the detail information while suppressing speckle noise. This method combines the nonlocal self-similarity partition and a proposed modified sparse decomposition. The nonlocal partition method groups a series of structure-similarity data sets. Each data set has a good sparsity for learning an over-complete dictionary in sparse representation. In the sparse decomposition, we propose a novel method to identify principal atoms from over-complete dictionary to form a principal dictionary. Despeckling is performed on each data set over the principal dictionary with principal atoms. Experimental results demonstrate that the proposed method can achieve high performances in terms of both speckle noise reduction and structure details preservation. |
Tasks | Sar Image Despeckling |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07559v1 |
http://arxiv.org/pdf/1611.07559v1.pdf | |
PWC | https://paperswithcode.com/paper/sar-image-despeckling-based-on-nonlocal |
Repo | |
Framework | |
Actionness Estimation Using Hybrid Fully Convolutional Networks
Title | Actionness Estimation Using Hybrid Fully Convolutional Networks |
Authors | Limin Wang, Yu Qiao, Xiaoou Tang, Luc Van Gool |
Abstract | Actionness was introduced to quantify the likelihood of containing a generic action instance at a specific location. Accurate and efficient estimation of actionness is important in video analysis and may benefit other relevant tasks such as action recognition and action detection. This paper presents a new deep architecture for actionness estimation, called hybrid fully convolutional network (H-FCN), which is composed of appearance FCN (A-FCN) and motion FCN (M-FCN). These two FCNs leverage the strong capacity of deep models to estimate actionness maps from the perspectives of static appearance and dynamic motion, respectively. In addition, the fully convolutional nature of H-FCN allows it to efficiently process videos with arbitrary sizes. Experiments are conducted on the challenging datasets of Stanford40, UCF Sports, and JHMDB to verify the effectiveness of H-FCN on actionness estimation, which demonstrate that our method achieves superior performance to previous ones. Moreover, we apply the estimated actionness maps on action proposal generation and action detection. Our actionness maps advance the current state-of-the-art performance of these tasks substantially. |
Tasks | Action Detection, Temporal Action Localization |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07279v1 |
http://arxiv.org/pdf/1604.07279v1.pdf | |
PWC | https://paperswithcode.com/paper/actionness-estimation-using-hybrid-fully |
Repo | |
Framework | |
Sub-cortical brain structure segmentation using F-CNN’s
Title | Sub-cortical brain structure segmentation using F-CNN’s |
Authors | Mahsa Shakeri, Stavros Tsogkas, Enzo Ferrante, Sarah Lippe, Samuel Kadoury, Nikos Paragios, Iasonas Kokkinos |
Abstract | In this paper we propose a deep learning approach for segmenting sub-cortical structures of the human brain in Magnetic Resonance (MR) image data. We draw inspiration from a state-of-the-art Fully-Convolutional Neural Network (F-CNN) architecture for semantic segmentation of objects in natural images, and adapt it to our task. Unlike previous CNN-based methods that operate on image patches, our model is applied on a full blown 2D image, without any alignment or registration steps at testing time. We further improve segmentation results by interpreting the CNN output as potentials of a Markov Random Field (MRF), whose topology corresponds to a volumetric grid. Alpha-expansion is used to perform approximate inference imposing spatial volumetric homogeneity to the CNN priors. We compare the performance of the proposed pipeline with a similar system using Random Forest-based priors, as well as state-of-art segmentation algorithms, and show promising results on two different brain MRI datasets. |
Tasks | Semantic Segmentation |
Published | 2016-02-05 |
URL | http://arxiv.org/abs/1602.02130v1 |
http://arxiv.org/pdf/1602.02130v1.pdf | |
PWC | https://paperswithcode.com/paper/sub-cortical-brain-structure-segmentation |
Repo | |
Framework | |
Learning deep representation of multityped objects and tasks
Title | Learning deep representation of multityped objects and tasks |
Authors | Truyen Tran, Dinh Phung, Svetha Venkatesh |
Abstract | We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). At the same time, the image may have several social tags, which are best described using a sparse binary vector. Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation. At the same time, the model supports heterogeneously typed tasks. We demonstrate the capacity of the model on two applications: social image retrieval and multiple concept prediction. The deep architecture produces more compact representation, naturally integrates multiviews and multimodalities, exploits better side information, and most importantly, performs competitively against baselines. |
Tasks | Image Retrieval |
Published | 2016-03-04 |
URL | http://arxiv.org/abs/1603.01359v1 |
http://arxiv.org/pdf/1603.01359v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-representation-of-multityped |
Repo | |
Framework | |
Systholic Boolean Orthonormalizer Network in Wavelet Domain for SAR Image Despeckling
Title | Systholic Boolean Orthonormalizer Network in Wavelet Domain for SAR Image Despeckling |
Authors | Mario Mastriani |
Abstract | We describe a novel method for removing speckle (in wavelet domain) of unknown variance from SAR images. The me-thod is based on the following procedure: We apply 1) Bidimentional Discrete Wavelet Transform (DWT-2D) to the speckled image, 2) scaling and rounding to the coefficients of the highest subbands (to obtain integer and positive coefficients), 3) bit-slicing to the new highest subbands (to obtain bit-planes), 4) then we apply the Systholic Boolean Orthonormalizer Network (SBON) to the input bit-plane set and we obtain two orthonormal output bit-plane sets (in a Boolean sense), we project a set on the other one, by means of an AND operation, and then, 5) we apply re-assembling, and, 6) re-sca-ling. Finally, 7) we apply Inverse DWT-2D and reconstruct a SAR image from the modified wavelet coefficients. Despeckling results compare favorably to the most of methods in use at the moment. |
Tasks | Sar Image Despeckling |
Published | 2016-07-11 |
URL | http://arxiv.org/abs/1607.03105v1 |
http://arxiv.org/pdf/1607.03105v1.pdf | |
PWC | https://paperswithcode.com/paper/systholic-boolean-orthonormalizer-network-in |
Repo | |
Framework | |
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities
Title | Do You See What I Mean? Visual Resolution of Linguistic Ambiguities |
Authors | Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, Shimon Ullman |
Abstract | Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types. |
Tasks | |
Published | 2016-03-26 |
URL | http://arxiv.org/abs/1603.08079v1 |
http://arxiv.org/pdf/1603.08079v1.pdf | |
PWC | https://paperswithcode.com/paper/do-you-see-what-i-mean-visual-resolution-of |
Repo | |
Framework | |