Paper Group ANR 401
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems. Show, Recall, and Tell: Image Captioning with Recall Mechanism. Sample Complexity of Incentivized Exploration. Explainable Artificial Intelligence and Machine Learning: A reality rooted perspective. Unsupervised deep clustering for predictive texture pattern discovery in medical …
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
Title | Variational Autoencoders for Opponent Modeling in Multi-Agent Systems |
Authors | Georgios Papoudakis, Stefano V. Albrecht |
Abstract | Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment. In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies. Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system. By taking advantage of recent advances in unsupervised learning, we propose modeling opponents using variational autoencoders. Additionally, many existing methods in the literature assume that the opponent models have access to opponent’s observations and actions during both training and execution. To eliminate this assumption, we propose a modification that attempts to identify the underlying opponent model using only local information of our agent, such as its observations, actions, and rewards. The experiments indicate that our opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method. |
Tasks | |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.10829v1 |
https://arxiv.org/pdf/2001.10829v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-autoencoders-for-opponent-1 |
Repo | |
Framework | |
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Title | Show, Recall, and Tell: Image Captioning with Recall Mechanism |
Authors | Li Wang, Zechen Bai, Yonghua Zhang, Hongtao Lu |
Abstract | Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods. |
Tasks | Image Captioning, Text Summarization |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05876v1 |
https://arxiv.org/pdf/2001.05876v1.pdf | |
PWC | https://paperswithcode.com/paper/show-recall-and-tell-image-captioning-with |
Repo | |
Framework | |
Sample Complexity of Incentivized Exploration
Title | Sample Complexity of Incentivized Exploration |
Authors | Mark Sellke, Aleksandrs Slivkins |
Abstract | We consider incentivized exploration: a version of multi-armed bandits where the choice of actions is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work matches the optimal regret rates for bandits up to “constant” multiplicative factors determined by the Bayesian prior. However, the dependence on the prior in prior work could be arbitrarily large, and the dependence on the number of arms K could be exponential. The optimal dependence on the prior and K is very unclear. We make progress on these issues. Our first result is that Thompson sampling is incentive-compatible if initialized with enough data points. Thus, we reduce the problem of designing incentive-compatible algorithms to that of sample complexity: (i) How many data points are needed to incentivize Thompson sampling? (ii) How many rounds does it take to collect these samples? We address both questions, providing upper bounds on sample complexity that are typically polynomial in K and lower bounds that are polynomially matching. |
Tasks | Multi-Armed Bandits |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.00558v1 |
https://arxiv.org/pdf/2002.00558v1.pdf | |
PWC | https://paperswithcode.com/paper/sample-complexity-of-incentivized-exploration |
Repo | |
Framework | |
Explainable Artificial Intelligence and Machine Learning: A reality rooted perspective
Title | Explainable Artificial Intelligence and Machine Learning: A reality rooted perspective |
Authors | Frank Emmert-Streib, Olli Yli-Harja, Matthias Dehmer |
Abstract | We are used to the availability of big data generated in nearly all fields of science as a consequence of technological progress. However, the analysis of such data possess vast challenges. One of these relates to the explainability of artificial intelligence (AI) or machine learning methods. Currently, many of such methods are non-transparent with respect to their working mechanism and for this reason are called black box models, most notably deep learning methods. However, it has been realized that this constitutes severe problems for a number of fields including the health sciences and criminal justice and arguments have been brought forward in favor of an explainable AI. In this paper, we do not assume the usual perspective presenting explainable AI as it should be, but rather we provide a discussion what explainable AI can be. The difference is that we do not present wishful thinking but reality grounded properties in relation to a scientific theory beyond physics. |
Tasks | |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09464v1 |
https://arxiv.org/pdf/2001.09464v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-artificial-intelligence-and |
Repo | |
Framework | |
Unsupervised deep clustering for predictive texture pattern discovery in medical images
Title | Unsupervised deep clustering for predictive texture pattern discovery in medical images |
Authors | Matthias Perkonigg, Daniel Sobotka, Ahmed Ba-Ssalamah, Georg Langs |
Abstract | Predictive marker patterns in imaging data are a means to quantify disease and progression, but their identification is challenging, if the underlying biology is poorly understood. Here, we present a method to identify predictive texture patterns in medical images in an unsupervised way. Based on deep clustering networks, we simultaneously encode and cluster medical image patches in a low-dimensional latent space. The resulting clusters serve as features for disease staging, linking them to the underlying disease. We evaluate the method on 70 T1-weighted magnetic resonance images of patients with different stages of liver steatosis. The deep clustering approach is able to find predictive clusters with a stable ranking, differentiating between low and high steatosis with an F1-Score of 0.78. |
Tasks | |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2002.03721v1 |
https://arxiv.org/pdf/2002.03721v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-deep-clustering-for-predictive |
Repo | |
Framework | |
CNN-based InSAR Coherence Classification
Title | CNN-based InSAR Coherence Classification |
Authors | Subhayan Mukherjee, Aaron Zimmer, Xinyao Sun, Parwant Ghuman, Irene Cheng |
Abstract | Interferometric Synthetic Aperture Radar (InSAR) imagery based on microwaves reflected off ground targets is becoming increasingly important in remote sensing for ground movement estimation. However, the reflections are contaminated by noise, which distorts the signal’s wrapped phase. Demarcation of image regions based on degree of contamination (“coherence”) is an important component of the InSAR processing pipeline. We introduce Convolutional Neural Networks (CNNs) to this problem domain and show their effectiveness in improving coherence-based demarcation and reducing misclassifications in completely incoherent regions through intelligent preprocessing of training data. Quantitative and qualitative comparisons prove superiority of proposed method over three established methods. |
Tasks | |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2001.06956v1 |
https://arxiv.org/pdf/2001.06956v1.pdf | |
PWC | https://paperswithcode.com/paper/cnn-based-insar-coherence-classification |
Repo | |
Framework | |
Fully Automated Hand Hygiene Monitoring\in Operating Room using 3D Convolutional Neural Network
Title | Fully Automated Hand Hygiene Monitoring\in Operating Room using 3D Convolutional Neural Network |
Authors | Minjee Kim, Joonmyeong Choi, Namkug Kim |
Abstract | Hand hygiene is one of the most significant factors in preventing hospital acquired infections (HAI) which often be transmitted by medical staffs in contact with patients in the operating room (OR). Hand hygiene monitoring could be important to investigate and reduce the outbreak of infections within the OR. However, an effective monitoring tool for hand hygiene compliance is difficult to develop due to the visual complexity of the OR scene. Recent progress in video understanding with convolutional neural net (CNN) has increased the application of recognition and detection of human actions. Leveraging this progress, we proposed a fully automated hand hygiene monitoring tool of the alcohol-based hand rubbing action of anesthesiologists on OR video using spatio-temporal features with 3D CNN. First, the region of interest (ROI) of anesthesiologists’ upper body were detected and cropped. A temporal smoothing filter was applied to the ROIs. Then, the ROIs were given to a 3D CNN and classified into two classes: rubbing hands or other actions. We observed that a transfer learning from Kinetics-400 is beneficial and the optical flow stream was not helpful in our dataset. The final accuracy, precision, recall and F1 score in testing is 0.76, 0.85, 0.65 and 0.74, respectively. |
Tasks | Optical Flow Estimation, Transfer Learning, Video Understanding |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09087v1 |
https://arxiv.org/pdf/2003.09087v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-automated-hand-hygiene-monitoringin |
Repo | |
Framework | |
ASR Error Correction and Domain Adaptation Using Machine Translation
Title | ASR Error Correction and Domain Adaptation Using Machine Translation |
Authors | Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze |
Abstract | Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While these ASR systems are trained on large amounts of data, domain mismatch is still an issue for many such parties that want to use this service as-is leading to not so optimal results for their task. We propose a simple technique to perform domain adaptation for ASR error correction via machine translation. The machine translation model is a strong candidate to learn a mapping from out-of-domain ASR errors to in-domain terms in the corresponding reference files. We use two off-the-shelf ASR systems in this work: Google ASR (commercial) and the ASPIRE model (open-source). We observe 7% absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output via our proposed method. We also evaluate ASR error correction via a downstream task of Speaker Diarization that captures speaker style, syntax, structure and semantic improvements we obtain via ASR correction. |
Tasks | Domain Adaptation, Machine Translation, Speaker Diarization, Speech Recognition |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.07692v1 |
https://arxiv.org/pdf/2003.07692v1.pdf | |
PWC | https://paperswithcode.com/paper/asr-error-correction-and-domain-adaptation |
Repo | |
Framework | |
A Close Look at Deep Learning with Small Data
Title | A Close Look at Deep Learning with Small Data |
Authors | L. Brigato, L. Iocchi |
Abstract | In this work, we perform a wide variety of experiments with different Deep Learning architectures in small data conditions. We show that model complexity is a critical factor when only a few samples per class are available. Differently from the literature, we improve the state of the art using low complexity models. We show that standard convolutional neural networks with relatively few parameters are effective in this scenario. In many of our experiments, low complexity models outperform state-of-the-art architectures. Moreover, we propose a novel network that uses an unsupervised loss to regularize its training. Such architecture either improves the results either performs comparably well to low capacity networks. Surprisingly, experiments show that the dynamic data augmentation pipeline is not beneficial in this particular domain. Statically augmenting the dataset might be a promising research direction while dropout maintains its role as a good regularizer. |
Tasks | Data Augmentation |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12843v2 |
https://arxiv.org/pdf/2003.12843v2.pdf | |
PWC | https://paperswithcode.com/paper/a-close-look-at-deep-learning-with-small-data |
Repo | |
Framework | |
4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras
Title | 4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras |
Authors | Yuxiang Zhang, Liang An, Tao Yu, Xiu Li, Kun Li, Yebin Liu |
Abstract | This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs. Due to the heavy occlusions in each view, joint optimization on the multiview images and multiple temporal frames is indispensable, which brings up the essential challenge of realtime efficiency. To this end, for the first time, we unify per-view parsing, cross-view matching, and temporal tracking into a single optimization framework, i.e., a 4D association graph that each dimension (image space, viewpoint and time) can be treated equally and simultaneously. To solve the 4D association graph efficiently, we further contribute the idea of 4D limb bundle parsing based on heuristic searching, followed with limb bundle assembling by proposing a bundle Kruskal’s algorithm. Our method enables a realtime online motion capture system running at 30fps using 5 cameras on a 5-person scene. Benefiting from the unified parsing, matching and tracking constraints, our method is robust to noisy detection, and achieves high-quality online pose reconstruction quality. The proposed method outperforms the state-of-the-art method quantitatively without using high-level appearance information. We also contribute a multiview video dataset synchronized with a marker-based motion capture system for scientific evaluation. |
Tasks | Motion Capture |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12625v1 |
https://arxiv.org/pdf/2002.12625v1.pdf | |
PWC | https://paperswithcode.com/paper/4d-association-graph-for-realtime-multi |
Repo | |
Framework | |
Estimating Human Teleoperator Posture Using Only a Haptic-Input Device
Title | Estimating Human Teleoperator Posture Using Only a Haptic-Input Device |
Authors | Amir Yazdani, Roya Sabbagh Novin, Andrew Merryweather, Tucker Hermans |
Abstract | Ergonomic analysis of human posture plays a vital role in understanding long-term, work-related safety and health. Current analysis is often hindered due to difficulties in estimating human posture. We introduce a new approach to the problem of human posture estimation for teleoperation tasks which relies solely on a haptic-input device for generating observations. We model the human upper body using a redundant, partially observable dynamical system. This allows us to naturally formulate the estimation problem as probabilistic inference and solve the inference problem using a standard particle filter. We show that our approach accurately estimates the posture of different human users without knowing their specific segment lengths. We evaluate our posture estimation approach from a haptic-input device by comparing it with the human posture estimates from a commercial motion capture system. Our results show that the proposed algorithm successfully estimates human posture based only on the trajectory of the haptic-input device stylus. We additionally show that ergonomic risk estimates derived from our posture estimation approach are comparable to those estimates from gold-standard, motion-capture based pose estimates. |
Tasks | Motion Capture |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10586v1 |
https://arxiv.org/pdf/2002.10586v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-human-teleoperator-posture-using |
Repo | |
Framework | |
The Internet of Things as a Deep Neural Network
Title | The Internet of Things as a Deep Neural Network |
Authors | Rong Du, Sindri Magnússon, Carlo Fischione |
Abstract | An important task in the Internet of Things (IoT) is field monitoring, where multiple IoT nodes take measurements and communicate them to the base station or the cloud for processing, inference, and analysis. This communication becomes costly when the measurements are high-dimensional (e.g., videos or time-series data). The IoT networks with limited bandwidth and low power devices may not be able to support such frequent transmissions with high data rates. To ensure communication efficiency, this article proposes to model the measurement compression at IoT nodes and the inference at the base station or cloud as a deep neural network (DNN). We propose a new framework where the data to be transmitted from nodes are the intermediate outputs of a layer of the DNN. We show how to learn the model parameters of the DNN and study the trade-off between the communication rate and the inference accuracy. The experimental results show that we can save approximately 96% transmissions with only a degradation of 2.5% in inference accuracy. Our findings have the potentiality to enable many new IoT data analysis applications generating large amount of measurements. |
Tasks | Time Series |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10538v1 |
https://arxiv.org/pdf/2003.10538v1.pdf | |
PWC | https://paperswithcode.com/paper/the-internet-of-things-as-a-deep-neural |
Repo | |
Framework | |
Time-Varying Graph Learning with Constraints on Graph Temporal Variation
Title | Time-Varying Graph Learning with Constraints on Graph Temporal Variation |
Authors | Koki Yamada, Yuichi Tanaka, Antonio Ortega |
Abstract | We propose a novel framework for learning time-varying graphs from spatiotemporal measurements. Given an appropriate prior on the temporal behavior of signals, our proposed method can estimate time-varying graphs from a small number of available measurements. To achieve this, we introduce two regularization terms in convex optimization problems that constrain sparseness of temporal variations of the time-varying networks. Moreover, a computationally-scalable algorithm is introduced to efficiently solve the optimization problem. The experimental results with synthetic and real datasets (point cloud and temperature data) demonstrate our proposed method outperforms the existing state-of-the-art methods. |
Tasks | |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03346v1 |
https://arxiv.org/pdf/2001.03346v1.pdf | |
PWC | https://paperswithcode.com/paper/time-varying-graph-learning-with-constraints |
Repo | |
Framework | |
Fine tuning U-Net for ultrasound image segmentation: which layers?
Title | Fine tuning U-Net for ultrasound image segmentation: which layers? |
Authors | Mina Amiri, Rupert Brooks, Hassan Rivaz |
Abstract | Fine-tuning a network which has been trained on a large dataset is an alternative to full training in order to overcome the problem of scarce and expensive data in medical applications. While the shallow layers of the network are usually kept unchanged, deeper layers are modified according to the new dataset. This approach may not work for ultrasound images due to their drastically different appearance. In this study, we investigated the effect of fine-tuning different layers of a U-Net which was trained on segmentation of natural images in breast ultrasound image segmentation. Tuning the contracting part and fixing the expanding part resulted in substantially better results compared to fixing the contracting part and tuning the expanding part. Furthermore, we showed that starting to fine-tune the U-Net from the shallow layers and gradually including more layers will lead to a better performance compared to fine-tuning the network from the deep layers moving back to shallow layers. We did not observe the same results on segmentation of X-ray images, which have different salient features compared to ultrasound, it may therefore be more appropriate to fine-tune the shallow layers rather than deep layers. Shallow layers learn lower level features (including speckle pattern, and probably the noise and artifact properties) which are critical in automatic segmentation in this modality. |
Tasks | Semantic Segmentation |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08438v1 |
https://arxiv.org/pdf/2002.08438v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-tuning-u-net-for-ultrasound-image |
Repo | |
Framework | |
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system
Title | Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system |
Authors | Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani |
Abstract | Automatic meeting analysis is an essential fundamental technology required to let, e.g. smart devices follow and respond to our conversations. To achieve an optimal automatic meeting analysis, we previously proposed an all-neural approach that jointly solves source separation, speaker diarization and source counting problems in an optimal way (in a sense that all the 3 tasks can be jointly optimized through error back-propagation). It was shown that the method could well handle simulated clean (noiseless and anechoic) dialog-like data, and achieved very good performance in comparison with several conventional methods. However, it was not clear whether such all-neural approach would be successfully generalized to more complicated real meeting data containing more spontaneously-speaking speakers, severe noise and reverberation, and how it performs in comparison with the state-of-the-art systems in such scenarios. In this paper, we first consider practical issues required for improving the robustness of the all-neural approach, and then experimentally show that, even in real meeting scenarios, the all-neural approach can perform effective speech enhancement, and simultaneously outperform state-of-the-art systems. |
Tasks | Speaker Diarization, Speech Enhancement |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.03987v1 |
https://arxiv.org/pdf/2003.03987v1.pdf | |
PWC | https://paperswithcode.com/paper/tackling-real-noisy-reverberant-meetings-with |
Repo | |
Framework | |