Paper Group ANR 898
Transfer Learning with Deep CNNs for Gender Recognition and Age Estimation. GlymphVIS: Visualizing Glymphatic Transport Pathways Using Regularized Optimal Transport. A Fully Automated System for Sizing Nasal PAP Masks Using Facial Photographs. Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model. Supervised classi …
Transfer Learning with Deep CNNs for Gender Recognition and Age Estimation
Title | Transfer Learning with Deep CNNs for Gender Recognition and Age Estimation |
Authors | Philip Smith, Cuixian Chen |
Abstract | In this project, competition-winning deep neural networks with pretrained weights are used for image-based gender recognition and age estimation. Transfer learning is explored using both VGG19 and VGGFace pretrained models by testing the effects of changes in various design schemes and training parameters in order to improve prediction accuracy. Training techniques such as input standardization, data augmentation, and label distribution age encoding are compared. Finally, a hierarchy of deep CNNs is tested that first classifies subjects by gender, and then uses separate male and female age models to predict age. A gender recognition accuracy of 98.7% and an MAE of 4.1 years is achieved. This paper shows that, with proper training techniques, good results can be obtained by retasking existing convolutional filters towards a new purpose. |
Tasks | Age Estimation, Data Augmentation, Transfer Learning |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07344v1 |
http://arxiv.org/pdf/1811.07344v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-with-deep-cnns-for-gender |
Repo | |
Framework | |
GlymphVIS: Visualizing Glymphatic Transport Pathways Using Regularized Optimal Transport
Title | GlymphVIS: Visualizing Glymphatic Transport Pathways Using Regularized Optimal Transport |
Authors | Rena Elkin, Saad Nadeem, Eldad Haber, Klara Steklova, Hedok Lee, Helene Benveniste, Allen Tannenbaum |
Abstract | The glymphatic system (GS) is a transit passage that facilitates brain metabolic waste removal and its dysfunction has been associated with neurodegenerative diseases such as Alzheimer’s disease. The GS has been studied by acquiring temporal contrast enhanced magnetic resonance imaging (MRI) sequences of a rodent brain, and tracking the cerebrospinal fluid injected contrast agent as it flows through the GS. We present here a novel visualization framework, GlymphVIS, which uses regularized optimal transport (OT) to study the flow behavior between time points at which the images are taken. Using this regularized OT approach, we can incorporate diffusion, handle noise, and accurately capture and visualize the time varying dynamics in GS transport. Moreover, we are able to reduce the registration mean-squared and infinity-norm error across time points by up to a factor of 5 as compared to the current state-of-the-art method. Our visualization pipeline yields flow patterns that align well with experts’ current findings of the glymphatic system. |
Tasks | |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08304v2 |
http://arxiv.org/pdf/1808.08304v2.pdf | |
PWC | https://paperswithcode.com/paper/glymphvis-visualizing-glymphatic-transport |
Repo | |
Framework | |
A Fully Automated System for Sizing Nasal PAP Masks Using Facial Photographs
Title | A Fully Automated System for Sizing Nasal PAP Masks Using Facial Photographs |
Authors | Benjamin Johnston, Philip de Chazal |
Abstract | We present a fully automated system for sizing nasal Positive Airway Pressure (PAP) masks. The system is comprised of a mix of HOG object detectors as well as multiple convolutional neural network stages for facial landmark detection. The models were trained using samples from the publicly available PUT and MUCT datasets while transfer learning was also employed to improve the performance of the models on facial photographs of actual PAP mask users. The fully automated system demonstrated an overall accuracy of 64.71% in correctly selecting the appropriate mask size and 86.1% accuracy sizing within 1 mask size. |
Tasks | Facial Landmark Detection, Transfer Learning |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03773v1 |
http://arxiv.org/pdf/1811.03773v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fully-automated-system-for-sizing-nasal-pap |
Repo | |
Framework | |
Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model
Title | Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model |
Authors | Manuel Carbonell, Mauricio Villegas, Alicia Fornés, Josep Lladós |
Abstract | When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing. |
Tasks | Language Modelling, Named Entity Recognition, Transfer Learning |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06252v2 |
http://arxiv.org/pdf/1803.06252v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-recognition-of-handwritten-text-and |
Repo | |
Framework | |
Supervised classification for object identification in urban areas using satellite imagery
Title | Supervised classification for object identification in urban areas using satellite imagery |
Authors | Hazrat Ali, Adnan Ali Awan, Sanaullah Khan, Omer Shafique, Atiq ur Rahman, Shahid Khan |
Abstract | This paper presents a useful method to achieve classification in satellite imagery. The approach is based on pixel level study employing various features such as correlation, homogeneity, energy and contrast. In this study gray-scale images are used for training the classification model. For supervised classification, two classification techniques are employed namely the Support Vector Machine (SVM) and the Naive Bayes. With textural features used for gray-scale images, Naive Bayes performs better with an overall accuracy of 76% compared to 68% achieved by SVM. The computational time is evaluated while performing the experiment with two different window sizes i.e., 50x50 and 70x70. The required computational time on a single image is found to be 27 seconds for a window size of 70x70 and 45 seconds for a window size of 50x50. |
Tasks | |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00878v1 |
http://arxiv.org/pdf/1808.00878v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-classification-for-object |
Repo | |
Framework | |
Log-sum-exp neural networks and posynomial models for convex and log-log-convex data
Title | Log-sum-exp neural networks and posynomial models for convex and log-log-convex data |
Authors | Giuseppe C. Calafiore, Stephane Gaubert, Corrado Possieri |
Abstract | We show in this paper that a one-layer feedforward neural network with exponential activation functions in the inner layer and logarithmic activation in the output neuron is an universal approximator of convex functions. Such a network represents a family of scaled log-sum exponential functions, here named LSET. Under a suitable exponential transformation, the class of LSET functions maps to a family of generalized posynomials GPOST, which we similarly show to be universal approximators for log-log-convex functions. A key feature of an LSET network is that, once it is trained on data, the resulting model is convex in the variables, which makes it readily amenable to efficient design based on convex optimization. Similarly, once a GPOST model is trained on data, it yields a posynomial model that can be efficiently optimized with respect to its variables by using geometric programming (GP). The proposed methodology is illustrated by two numerical examples, in which, first, models are constructed from simulation data of the two physical processes (namely, the level of vibration in a vehicle suspension system, and the peak power generated by the combustion of propane), and then optimization-based design is performed on these models. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07850v2 |
http://arxiv.org/pdf/1806.07850v2.pdf | |
PWC | https://paperswithcode.com/paper/log-sum-exp-neural-networks-and-posynomial |
Repo | |
Framework | |
Sample Efficient Algorithms for Learning Quantum Channels in PAC Model and the Approximate State Discrimination Problem
Title | Sample Efficient Algorithms for Learning Quantum Channels in PAC Model and the Approximate State Discrimination Problem |
Authors | Kai-Min Chung, Han-Hsuan Lin |
Abstract | We generalize the PAC (probably approximately correct) learning model to the quantum world by generalizing the concepts from classical functions to quantum processes, defining the problem of \emph{PAC learning quantum process}, and study its sample complexity. In the problem of PAC learning quantum process, we want to learn an $\epsilon$-approximate of an unknown quantum process $c^$ from a known finite concept class $C$ with probability $1-\delta$ using samples ${(x_1,c^(x_1)),(x_2,c^*(x_2)),\dots}$, where ${x_1,x_2, \dots}$ are computational basis states sampled from an unknown distribution $D$ and ${c^*(x_1),c^*(x_2),\dots}$ are the (possibly mixed) quantum states outputted by $c^*$. The special case of PAC-learning quantum process under constant input reduces to a natural problem which we named as approximate state discrimination, where we are given copies of an unknown quantum state $c^*$ from an known finite set $C$, and we want to learn with probability $1-\delta$ an $\epsilon$-approximate of $c^*$ with as few copies of $c^*$ as possible. We show that the problem of PAC learning quantum process can be solved with $$O\left(\frac{\logC + \log(1/ \delta)} { \epsilon^2}\right)$$ samples when the outputs are pure states and $$O\left(\frac{\log^3 C(\log C+\log(1/ \delta))} { \epsilon^2}\right)$$ samples if the outputs can be mixed. Some implications of our results are that we can PAC-learn a polynomial sized quantum circuit in polynomial samples and approximate state discrimination can be solved in polynomial samples even when concept class size $C$ is exponential in the number of qubits, an exponentially improvement over a full state tomography. |
Tasks | |
Published | 2018-10-25 |
URL | https://arxiv.org/abs/1810.10938v2 |
https://arxiv.org/pdf/1810.10938v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-sample-complexity-of-pac-learning |
Repo | |
Framework | |
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Title | Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks |
Authors | Benjamin Bloem-Reddy, Adam Foster, Emile Mathieu, Yee Whye Teh |
Abstract | Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents $\eta$ that may take values either less than and greater than two. Models based on various forms of exchangeability are able to capture power laws with $\eta < 2$, and admit tractable inference algorithms; we draw on previous results to show that $\eta > 2$ cannot be generated by the forms of exchangeability used in existing random graph models. Preferential attachment models generate power law exponents greater than two, but have been of limited use as statistical models due to the inherent difficulty of performing inference in non-exchangeable models. Motivated by this gap, we design and implement inference algorithms for a recently proposed class of models that generates $\eta$ of all possible values. We show that although they are not exchangeable, these models have probabilistic structure amenable to inference. Our methods make a large class of previously intractable models useful for statistical inference. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03113v1 |
http://arxiv.org/pdf/1807.03113v1.pdf | |
PWC | https://paperswithcode.com/paper/sampling-and-inference-for-beta-neutral-to |
Repo | |
Framework | |
What Makes Natural Scene Memorable?
Title | What Makes Natural Scene Memorable? |
Authors | Jiaxin Lu, Mai Xu, Ren Yang, Zulin Wang |
Abstract | Recent studies on image memorability have shed light on the visual features that make generic images, object images or face photographs memorable. However, a clear understanding and reliable estimation of natural scene memorability remain elusive. In this paper, we provide an attempt to answer: “what exactly makes natural scene memorable”. Specifically, we first build LNSIM, a large-scale natural scene image memorability database (containing 2,632 images and memorability annotations). Then, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of natural scene. In particular, we find that high-level feature of scene category is rather correlated with natural scene memorability. Thus, we propose a deep neural network based natural scene memorability (DeepNSM) predictor, which takes advantage of scene category. Finally, the experimental results validate the effectiveness of DeepNSM. |
Tasks | |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08754v1 |
http://arxiv.org/pdf/1808.08754v1.pdf | |
PWC | https://paperswithcode.com/paper/what-makes-natural-scene-memorable |
Repo | |
Framework | |
Machine Friendly Machine Learning: Interpretation of Computed Tomography Without Image Reconstruction
Title | Machine Friendly Machine Learning: Interpretation of Computed Tomography Without Image Reconstruction |
Authors | Hyunkwang Lee, Chao Huang, Sehyo Yune, Shahein H. Tajmir, Myeongchan Kim, Synho Do |
Abstract | Recent advancements in deep learning for automated image processing and classification have accelerated many new applications for medical image analysis. However, most deep learning applications have been developed using reconstructed, human-interpretable medical images. While image reconstruction from raw sensor data is required for the creation of medical images, the reconstruction process only uses a partial representation of all the data acquired. Here we report the development of a system to directly process raw computed tomography (CT) data in sinogram-space, bypassing the intermediary step of image reconstruction. Two classification tasks were evaluated for their feasibility for sinogram-space machine learning: body region identification and intracranial hemorrhage (ICH) detection. Our proposed SinoNet performed favorably compared to conventional reconstructed image-space-based systems for both tasks, regardless of scanning geometries in terms of projections or detectors. Further, SinoNet performed significantly better when using sparsely sampled sinograms than conventional networks operating in image-space. As a result, sinogram-space algorithms could be used in field settings for binary diagnosis testing, triage, and in clinical settings where low radiation dose is desired. These findings also demonstrate another strength of deep learning where it can analyze and interpret sinograms that are virtually impossible for human experts. |
Tasks | Computed Tomography (CT), Image Reconstruction |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.01068v1 |
http://arxiv.org/pdf/1812.01068v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-friendly-machine-learning |
Repo | |
Framework | |
Scalable Graph Learning for Anti-Money Laundering: A First Look
Title | Scalable Graph Learning for Anti-Money Laundering: A First Look |
Authors | Mark Weber, Jie Chen, Toyotaro Suzumura, Aldo Pareja, Tengfei Ma, Hiroki Kanezashi, Tim Kaler, Charles E. Leiserson, Tao B. Schardl |
Abstract | Organized crime inflicts human suffering on a genocidal scale: the Mexican drug cartels have murdered 150,000 people since 2006, upwards of 700,000 people per year are “exported” in a human trafficking industry enslaving an estimated 40 million people. These nefarious industries rely on sophisticated money laundering schemes to operate. Despite tremendous resources dedicated to anti-money laundering (AML) only a tiny fraction of illicit activity is prevented. The research community can help. In this brief paper, we map the structural and behavioral dynamics driving the technical challenge. We review AML methods, current and emergent. We provide a first look at scalable graph convolutional neural networks for forensic analysis of financial data, which is massive, dense, and dynamic. We report preliminary experimental results using a large synthetic graph (1M nodes, 9M edges) generated by a data simulator we created called AMLSim. We consider opportunities for high performance efficiency, in terms of computation and memory, and we share results from a simple graph compression experiment. Our results support our working hypothesis that graph deep learning for AML bears great promise in the fight against criminal financial activity. |
Tasks | |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00076v1 |
http://arxiv.org/pdf/1812.00076v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-graph-learning-for-anti-money |
Repo | |
Framework | |
Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras
Title | Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras |
Authors | Liuyuan Deng, Ming Yang, Hao Li, Tianyi Li, Bing Hu, Chunxiang Wang |
Abstract | Understanding the surrounding environment of the vehicle is still one of the challenges for autonomous driving. This paper addresses 360-degree road scene semantic segmentation using surround view cameras, which are widely equipped in existing production cars. First, in order to address large distortion problem in the fisheye images, Restricted Deformable Convolution (RDC) is proposed for semantic segmentation, which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map. Second, in order to obtain a large-scale training set of surround view images, a novel method called zoom augmentation is proposed to transform conventional images to fisheye images. Finally, an RDC based semantic segmentation model is built; the model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images. Experiments demonstrate the effectiveness of the RDC to handle images with large distortions, and that the proposed approach shows a good performance using surround view cameras with the help of the transformed images. |
Tasks | Autonomous Driving, Multi-Task Learning, Semantic Segmentation |
Published | 2018-01-02 |
URL | https://arxiv.org/abs/1801.00708v3 |
https://arxiv.org/pdf/1801.00708v3.pdf | |
PWC | https://paperswithcode.com/paper/restricted-deformable-convolution-based-road |
Repo | |
Framework | |
Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions
Title | Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions |
Authors | Mingrui Liu, Xiaoxuan Zhang, Lijun Zhang, Rong Jin, Tianbao Yang |
Abstract | Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set. They have recently received increasing attention in the field of optimization for developing optimization algorithms with fast convergence. However, the studies of EBC in statistical learning are hitherto still limited. The main contributions of this paper are two-fold. First, we develop fast and intermediate rates of empirical risk minimization (ERM) under EBC for risk minimization with Lipschitz continuous, and smooth convex random functions. Second, we establish fast and intermediate rates of an efficient stochastic approximation (SA) algorithm for risk minimization with Lipschitz continuous random functions, which requires only one pass of $n$ samples and adapts to EBC. For both approaches, the convergence rates span a full spectrum between $\widetilde O(1/\sqrt{n})$ and $\widetilde O(1/n)$ depending on the power constant in EBC, and could be even faster than $O(1/n)$ in special cases for ERM. Moreover, these convergence rates are automatically adaptive without using any knowledge of EBC. Overall, this work not only strengthens the understanding of ERM for statistical learning but also brings new fast stochastic algorithms for solving a broad range of statistical learning problems. |
Tasks | |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04577v1 |
http://arxiv.org/pdf/1805.04577v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-rates-of-erm-and-stochastic |
Repo | |
Framework | |
Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?
Title | Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct? |
Authors | Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang |
Abstract | Yes, they do. This work investigates a perspective for deep learning: whether different normalization layers in a ConvNet require different normalizers. This is the first step towards understanding this phenomenon. We allow each convolutional layer to be stacked before a switchable normalization (SN) that learns to choose a normalizer from a pool of normalization methods. Through systematic experiments in ImageNet, COCO, Cityscapes, and ADE20K, we answer three questions: (a) Is it useful to allow each normalization layer to select its own normalizer? (b) What impacts the choices of normalizers? (c) Do different tasks and datasets prefer different normalizers? Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers. |
Tasks | |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07727v1 |
http://arxiv.org/pdf/1811.07727v1.pdf | |
PWC | https://paperswithcode.com/paper/do-normalization-layers-in-a-deep-convnet |
Repo | |
Framework | |
Teacher’s Perception in the Classroom
Title | Teacher’s Perception in the Classroom |
Authors | Ömer Sümer, Patricia Goldberg, Kathleen Stürmer, Tina Seidel, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci |
Abstract | The ability for a teacher to engage all students in active learning processes in classroom constitutes a crucial prerequisite for enhancing students achievement. Teachers’ attentional processes provide important insights into teachers’ ability to focus their attention on relevant information in the complexity of classroom interaction and distribute their attention across students in order to recognize the relevant needs for learning. In this context, mobile eye tracking is an innovative approach within teaching effectiveness research to capture teachers’ attentional processes while teaching. However, analyzing mobile eye-tracking data by hand is time consuming and still limited. In this paper, we introduce a new approach to enhance the impact of mobile eye tracking by connecting it with computer vision. In mobile eye tracking videos from an educational study using a standardized small group situation, we apply a state-ofthe-art face detector, create face tracklets, and introduce a novel method to cluster faces into the number of identity. Subsequently, teachers’ attentional focus is calculated per student during a teaching unit by associating eye tracking fixations and face tracklets. To the best of our knowledge, this is the first work to combine computer vision and mobile eye tracking to model teachers’ attention while instructing. |
Tasks | Active Learning, Eye Tracking |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08897v1 |
http://arxiv.org/pdf/1805.08897v1.pdf | |
PWC | https://paperswithcode.com/paper/teachers-perception-in-the-classroom |
Repo | |
Framework | |