Sökning: "HDBSCAN"

Visar resultat 1 - 5 av 6 uppsatser innehållade ordet HDBSCAN.

  1. 1. Text mining Twitter social media for Covid-19 : Comparing latent semantic analysis and latent Dirichlet allocation

    Kandidat-uppsats, Högskolan i Gävle/Avdelningen för datavetenskap och samhällsbyggnad

    Författare :Hassan Sheikha; [2020]
    Nyckelord :Data mining; Text mining; artificial intelligence; Natural language processing; Latent Semantic Analysis; Latent Dirichlet Allocation; KMeans; HDBSCAN; Dimension reduction;

    Sammanfattning : In this thesis, the Twitter social media is data mined for information about the covid-19 outbreak during the month of March, starting from the 3’rd and ending on the 31’st. 100,000 tweets were collected from Harvard’s opensource data and recreated using Hydrate. LÄS MER

  2. 2. Evaluation of the correlation between test cases dependency and their semantic text similarity

    Kandidat-uppsats, Mälardalens högskola/Akademin för innovation, design och teknik

    Författare :Filip Andersson; [2020]
    Nyckelord :Software Testing; Test optimization; NLP; Dependency; Semantic Similarity; Clustering; Cosine Similarity; HDBSCAN;

    Sammanfattning : An important step in developing software is to test the system thoroughly. Testing software requires a generation of test cases that can reach large numbers and is important to be performed in the correct order. Certain information is critical to know to schedule the test cases incorrectly order and isn’t always available. LÄS MER

  3. 3. Cluster analysis on sparse customer data on purchase of insurance products

    Master-uppsats, KTH/Matematisk statistik

    Författare :Michel Alexander Postigo Smura; [2019]
    Nyckelord :;

    Sammanfattning : This thesis work aims at performing a cluster analysis on customer data of insurance products. Three different clustering algorithms are investigated. These are K-means (center-based clustering), Two-Level clustering (SOM and Hierarchical clustering) and HDBSCAN (density-based clustering). LÄS MER

  4. 4. Duplicate Detection and Text Classification on Simplified Technical English

    Master-uppsats, Linköpings universitet/Institutionen för datavetenskap

    Författare :Max Lund; [2019]
    Nyckelord :NLP; CNL; transformer models; LSTM; BERT; document embeddings; word embeddings; text classification; text clustering; transfer learning; machine learning;

    Sammanfattning : This thesis investigates the most effective way of performing classification of text labels and clustering of duplicate texts in technical documentation written in Simplified Technical English. Pre-trained language models from transformers (BERT) were tested against traditional methods such as tf-idf with cosine similarity (kNN) and SVMs on the classification task. LÄS MER

  5. 5. Pattern analysis of the user behaviour in a mobile application using unsupervised machine learning

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Dusan Viktor Hrstic; [2019]
    Nyckelord :Clustering; HDBSCAN; K-medoids; data preprocessing; user behaviour; mobile application; Klustring; HDBSCAN; K-medoids; databearbetning; användarbeteende; mobila applikationer;

    Sammanfattning : Continuously increasing amount of logged data increases the possibilities of finding new discoveries about the user interaction with the application for which the data is logged. Traces from the data may reveal some specific user behavioural patterns which can discover how to improve the development of the application by showing the ways in which the application is utilized. LÄS MER