Paper Group ANR 612
GlobalTrait: Personality Alignment of Multilingual Word Embeddings. A Stronger Baseline for Multilingual Word Embeddings. Generalised framework for multi-criteria method selection. Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting. Smart Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight …
GlobalTrait: Personality Alignment of Multilingual Word Embeddings
Title | GlobalTrait: Personality Alignment of Multilingual Word Embeddings |
Authors | Farhad Bin Siddique, Dario Bertero, Pascale Fung |
Abstract | We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian. Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, GlobalTrait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset. |
Tasks | Multilingual Word Embeddings, Word Embeddings |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00240v2 |
http://arxiv.org/pdf/1811.00240v2.pdf | |
PWC | https://paperswithcode.com/paper/globaltrait-personality-alignment-of |
Repo | |
Framework | |
A Stronger Baseline for Multilingual Word Embeddings
Title | A Stronger Baseline for Multilingual Word Embeddings |
Authors | Philipp Dufter, Hinrich Schütze |
Abstract | Levy, S{\o}gaard and Goldberg’s (2017) S-ID (sentence ID) method applies word2vec on tuples containing a sentence ID and a word from the sentence. It has been shown to be a strong baseline for learning multilingual embeddings. Inspired by recent work on concept based embedding learning we propose SC-ID, an extension to S-ID: given a sentence aligned corpus, we use sampling to extract concepts that are then processed in the same manner as S-IDs. We perform experiments on the Parallel Bible Corpus across 1000+ languages and show that SC-ID yields up to 6% performance increase in a word translation task. In addition, we provide evidence that SC-ID is easily and widely applicable by reporting competitive results across 8 tasks on a EuroParl based corpus. |
Tasks | Multilingual Word Embeddings, Word Embeddings |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00586v1 |
http://arxiv.org/pdf/1811.00586v1.pdf | |
PWC | https://paperswithcode.com/paper/a-stronger-baseline-for-multilingual-word |
Repo | |
Framework | |
Generalised framework for multi-criteria method selection
Title | Generalised framework for multi-criteria method selection |
Authors | Jarosław Wątróbski, Jarosław Jankowski, Paweł Ziemba, Artur Karczmarczyk, Magdalena Zioło |
Abstract | Multi-Criteria Decision Analysis (MCDA) methods are widely used in various fields and disciplines. While most of the research has been focused on the development and improvement of new MCDA methods, relatively limited attention has been paid to their appropriate selection for the given decision problem. Their improper application decreases the quality of recommendations, as different MCDA methods deliver inconsistent results. The current paper presents a methodological and practical framework for selecting suitable MCDA methods for a particular decision situation. A set of 56 available MCDA methods was analyzed and, based on that, a hierarchical set of methods characteristics and the rule base were obtained. This analysis, rules and modelling of the uncertainty in the decision problem description allowed to build a framework supporting the selection of a MCDA method for a given decision-making situation. The practical studies indicate consistency between the methods recommended with the proposed approach and those used by the experts in reference cases. The results of the research also showed that the proposed approach can be used as a general framework for selecting an appropriate MCDA method for a given area of decision support, even in cases of data gaps in the decision-making problem description. The proposed framework was implemented within a web platform available for public use at www.mcda.it. |
Tasks | Decision Making |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.11078v1 |
http://arxiv.org/pdf/1810.11078v1.pdf | |
PWC | https://paperswithcode.com/paper/generalised-framework-for-multi-criteria |
Repo | |
Framework | |
Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting
Title | Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting |
Authors | Clement Fung, Jamie Koerner, Stewart Grant, Ivan Beschastnikh |
Abstract | Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-party ML, to construct TorMentor: an anonymous hidden service that supports private multi-party ML. We define a new threat model by characterizing, developing and evaluating new attacks in the brokered learning setting, along with new defenses for these attacks. We show that TorMentor effectively protects data providers against known ML attacks while providing them with a tunable trade-off between model accuracy and privacy. We evaluate TorMentor with local and geo-distributed deployments on Azure/Tor. In an experiment with 200 clients and 14 MB of data per client, our prototype trained a logistic regression model using stochastic gradient descent in 65s. Code is available at: https://github.com/DistributedML/TorML |
Tasks | |
Published | 2018-11-23 |
URL | http://arxiv.org/abs/1811.09712v2 |
http://arxiv.org/pdf/1811.09712v2.pdf | |
PWC | https://paperswithcode.com/paper/dancing-in-the-dark-private-multi-party |
Repo | |
Framework | |
Smart Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight CNN
Title | Smart Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight CNN |
Authors | Seyed Yahya Nikouei, Yu Chen, Sejun Song, Ronghua Xu, Baek-Young Choi, Timothy R. Faughnan |
Abstract | Edge computing efficiently extends the realm of information technology beyond the boundary defined by cloud computing paradigm. Performing computation near the source and destination, edge computing is promising to address the challenges in many delay-sensitive applications, like real-time human surveillance. Leveraging the ubiquitously connected cameras and smart mobile devices, it enables video analytics at the edge. In recent years, many smart video surveillance approaches are proposed for object detection and tracking by using Artificial Intelligence (AI) and Machine Learning (ML) algorithms. This work explores the feasibility of two popular human-objects detection schemes, Harr-Cascade and HOG feature extraction and SVM classifier, at the edge and introduces a lightweight Convolutional Neural Network (L-CNN) leveraging the depthwise separable convolution for less computation, for human detection. Single Board computers (SBC) are used as edge devices for tests and algorithms are validated using real-world campus surveillance video streams and open data sets. The experimental results are promising that the final algorithm is able to track humans with a decent accuracy at a resource consumption affordable by edge devices in real-time manner. |
Tasks | Human Detection, Object Detection |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1805.00331v2 |
http://arxiv.org/pdf/1805.00331v2.pdf | |
PWC | https://paperswithcode.com/paper/smart-surveillance-as-an-edge-network-service |
Repo | |
Framework | |
Nonlinear Prediction of Multidimensional Signals via Deep Regression with Applications to Image Coding
Title | Nonlinear Prediction of Multidimensional Signals via Deep Regression with Applications to Image Coding |
Authors | Xi Zhang, Xiaolin Wu |
Abstract | Deep convolutional neural networks (DCNN) have enjoyed great successes in many signal processing applications because they can learn complex, non-linear causal relationships from input to output. In this light, DCNNs are well suited for the task of sequential prediction of multidimensional signals, such as images, and have the potential of improving the performance of traditional linear predictors. In this research we investigate how far DCNNs can push the envelop in terms of prediction precision. We propose, in a case study, a two-stage deep regression DCNN framework for nonlinear prediction of two-dimensional image signals. In the first-stage regression, the proposed deep prediction network (PredNet) takes the causal context as input and emits a prediction of the present pixel. Three PredNets are trained with the regression objectives of minimizing $\ell_1$, $\ell_2$ and $\ell_\infty$ norms of prediction residuals, respectively. The second-stage regression combines the outputs of the three PredNets to generate an even more precise and robust prediction. The proposed deep regression model is applied to lossless predictive image coding, and it outperforms the state-of-the-art linear predictors by appreciable margin. |
Tasks | |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12568v1 |
http://arxiv.org/pdf/1810.12568v1.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-prediction-of-multidimensional |
Repo | |
Framework | |
Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs
Title | Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs |
Authors | Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos |
Abstract | Visual and audiovisual speech recognition are witnessing a renaissance which is largely due to the advent of deep learning methods. In this paper, we present a deep learning architecture for lipreading and audiovisual word recognition, which combines Residual Networks equipped with spatiotemporal input layers and Bidirectional LSTMs. The lipreading architecture attains 11.92% misclassification rate on the challenging Lipreading-In-The-Wild database, which is composed of excerpts from BBC-TV, each containing one of the 500 target words. Audiovisual experiments are performed using both intermediate and late integration, as well as several types and levels of environmental noise, and notable improvements over the audio-only network are reported, even in the case of clean speech. A further analysis on the utility of target word boundaries is provided, as well as on the capacity of the network in modeling the linguistic context of the target word. Finally, we examine difficult word pairs and discuss how visual information helps towards attaining higher recognition accuracy. |
Tasks | Lipreading, Speech Recognition |
Published | 2018-11-03 |
URL | http://arxiv.org/abs/1811.01194v1 |
http://arxiv.org/pdf/1811.01194v1.pdf | |
PWC | https://paperswithcode.com/paper/pushing-the-boundaries-of-audiovisual-word |
Repo | |
Framework | |
Predicting the Argumenthood of English Prepositional Phrases
Title | Predicting the Argumenthood of English Prepositional Phrases |
Authors | Najoung Kim, Kyle Rawlins, Benjamin Van Durme, Paul Smolensky |
Abstract | Distinguishing between arguments and adjuncts of a verb is a longstanding, nontrivial problem. In natural language processing, argumenthood information is important in tasks such as semantic role labeling (SRL) and prepositional phrase (PP) attachment disambiguation. In theoretical linguistics, many diagnostic tests for argumenthood exist but they often yield conflicting and potentially gradient results. This is especially the case for syntactically oblique items such as PPs. We propose two PP argumenthood prediction tasks branching from these two motivations: (1) binary argument-adjunct classification of PPs in VerbNet, and (2) gradient argumenthood prediction using human judgments as gold standard, and report results from prediction models that use pretrained word embeddings and other linguistically informed features. Our best results on each task are (1) $acc.=0.955$, $F_1=0.954$ (ELMo+BiLSTM) and (2) Pearson’s $r=0.624$ (word2vec+MLP). Furthermore, we demonstrate the utility of argumenthood prediction in improving sentence representations via performance gains on SRL when a sentence encoder is pretrained with our tasks. |
Tasks | Semantic Role Labeling, Word Embeddings |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07889v4 |
http://arxiv.org/pdf/1809.07889v4.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-argumenthood-of-english |
Repo | |
Framework | |
The speaker-independent lipreading play-off; a survey of lipreading machines
Title | The speaker-independent lipreading play-off; a survey of lipreading machines |
Authors | Jake Burton, David Frank, Madhi Saleh, Nassir Navab, Helen L. Bear |
Abstract | Lipreading is a difficult gesture classification task. One problem in computer lipreading is speaker-independence. Speaker-independence means to achieve the same accuracy on test speakers not included in the training set as speakers within the training set. Current literature is limited on speaker-independent lipreading, the few independent test speaker accuracy scores are usually aggregated within dependent test speaker accuracies for an averaged performance. This leads to unclear independent results. Here we undertake a systematic survey of experiments with the TCD-TIMIT dataset using both conventional approaches and deep learning methods to provide a series of wholly speaker-independent benchmarks and show that the best speaker-independent machine scores 69.58% accuracy with CNN features and an SVM classifier. This is less than state of the art speaker-dependent lipreading machines, but greater than previously reported in independence experiments. |
Tasks | Lipreading |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10597v1 |
http://arxiv.org/pdf/1810.10597v1.pdf | |
PWC | https://paperswithcode.com/paper/the-speaker-independent-lipreading-play-off-a |
Repo | |
Framework | |
Measuring and Characterizing Generalization in Deep Reinforcement Learning
Title | Measuring and Characterizing Generalization in Deep Reinforcement Learning |
Authors | Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen |
Abstract | Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent’s performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported. |
Tasks | Representation Learning |
Published | 2018-12-07 |
URL | http://arxiv.org/abs/1812.02868v2 |
http://arxiv.org/pdf/1812.02868v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-and-characterizing-generalization |
Repo | |
Framework | |
Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?
Title | Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics? |
Authors | Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger |
Abstract | We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cognate sets come close to phylogenies inferred from expert-annotated ones, although on average, the latter are still superior. We conclude that future work on phylogenetic reconstruction can profit much from automatic cognate detection. Especially where scholars are merely interested in exploring the bigger picture of a language family’s phylogeny, algorithms for automatic cognate detection are a useful complement for current research on language phylogenies. |
Tasks | |
Published | 2018-04-15 |
URL | http://arxiv.org/abs/1804.05416v1 |
http://arxiv.org/pdf/1804.05416v1.pdf | |
PWC | https://paperswithcode.com/paper/are-automatic-methods-for-cognate-detection |
Repo | |
Framework | |
Compressive Single-pixel Fourier Transform Imaging using Structured Illumination
Title | Compressive Single-pixel Fourier Transform Imaging using Structured Illumination |
Authors | Amirafshar Moshtaghpour, José M. Bioucas-Dias, Laurent Jacques |
Abstract | Single Pixel (SP) imaging is now a reality in many applications, e.g., biomedical ultrathin endoscope and fluorescent spectroscopy. In this context, many schemes exist to improve the light throughput of these device, e.g., using structured illumination driven by compressive sensing theory. In this work, we consider the combination of SP imaging with Fourier Transform Interferometry (SP-FTI) to reach high-resolution HyperSpectral (HS) imaging, as desirable, e.g., in fluorescent spectroscopy. While this association is not new, we here focus on optimizing the spatial illumination, structured as Hadamard patterns, during the optical path progression. We follow a variable density sampling strategy for space-time coding of the light illumination, and show theoretically and numerically that this scheme allows us to reduce the number of measurements and light-exposure of the observed object compared to conventional compressive SP-FTI. |
Tasks | Compressive Sensing |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13200v2 |
http://arxiv.org/pdf/1810.13200v2.pdf | |
PWC | https://paperswithcode.com/paper/compressive-single-pixel-fourier-transform |
Repo | |
Framework | |
Single-channel Speech Dereverberation via Generative Adversarial Training
Title | Single-channel Speech Dereverberation via Generative Adversarial Training |
Authors | Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu |
Abstract | In this paper, we propose a single-channel speech dereverberation system (DeReGAT) based on convolutional, bidirectional long short-term memory and deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). In order to obtain better speech quality instead of only minimizing a mean square error (MSE), GAT is employed to make the dereverberated speech indistinguishable form the clean samples. Besides, our system can deal with wide range reverberation and be well adapted to variant environments. The experimental results show that the proposed model outperforms weighted prediction error (WPE) and deep neural network-based systems. In addition, DeReGAT is extended to an online speech dereverberation scenario, which reports comparable performance with the offline case. |
Tasks | |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09325v1 |
http://arxiv.org/pdf/1806.09325v1.pdf | |
PWC | https://paperswithcode.com/paper/single-channel-speech-dereverberation-via |
Repo | |
Framework | |
Real-Time Human Detection as an Edge Service Enabled by a Lightweight CNN
Title | Real-Time Human Detection as an Edge Service Enabled by a Lightweight CNN |
Authors | Seyed Yahya Nikouei, Yu Chen, Sejun Song, Ronghua Xu, Baek-Young Choi, Timothy R. Faughnan |
Abstract | Edge computing allows more computing tasks to take place on the decentralized nodes at the edge of networks. Today many delay sensitive, mission-critical applications can leverage these edge devices to reduce the time delay or even to enable real time, online decision making thanks to their onsite presence. Human objects detection, behavior recognition and prediction in smart surveillance fall into that category, where a transition of a huge volume of video streaming data can take valuable time and place heavy pressure on communication networks. It is widely recognized that video processing and object detection are computing intensive and too expensive to be handled by resource limited edge devices. Inspired by the depthwise separable convolution and Single Shot Multi-Box Detector (SSD), a lightweight Convolutional Neural Network (LCNN) is introduced in this paper. By narrowing down the classifier’s searching space to focus on human objects in surveillance video frames, the proposed LCNN algorithm is able to detect pedestrians with an affordable computation workload to an edge device. A prototype has been implemented on an edge node (Raspberry PI 3) using openCV libraries, and satisfactory performance is achieved using real world surveillance video streams. The experimental study has validated the design of LCNN and shown it is a promising approach to computing intensive applications at the edge. |
Tasks | Decision Making, Human Detection, Object Detection |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1805.00330v1 |
http://arxiv.org/pdf/1805.00330v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-human-detection-as-an-edge-service |
Repo | |
Framework | |
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video
Title | CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video |
Authors | Huizi Mao, Taeyoung Kong, William J. Dally |
Abstract | Detecting objects in a video is a compute-intensive task. In this paper we propose CaTDet, a system to speedup object detection by leveraging the temporal correlation in video. CaTDet consists of two DNN models that form a cascaded detector, and an additional tracker to predict regions of interests based on historic detections. We also propose a new metric, mean Delay(mD), which is designed for latency-critical video applications. Experiments on the KITTI dataset show that CaTDet reduces operation count by 5.1-8.7x with the same mean Average Precision(mAP) as the single-model Faster R-CNN detector and incurs additional delay of 0.3 frame. On CityPersons dataset, CaTDet achieves 13.0x reduction in operations with 0.8% mAP loss. |
Tasks | Object Detection |
Published | 2018-09-30 |
URL | http://arxiv.org/abs/1810.00434v2 |
http://arxiv.org/pdf/1810.00434v2.pdf | |
PWC | https://paperswithcode.com/paper/catdet-cascaded-tracked-detector-for |
Repo | |
Framework | |