Comparative Analysis of Language Models: hallucinations in ChatGPT : Prompt Study

Detta är en Kandidat-uppsats från Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

Sammanfattning: This thesis looks at the percentage of hallucinations in two large language models (LLM), ChatGPT 3.5 and ChatGPT 4 output for a set of prompts. This work was motivated by two factors: the release of ChatGPT 4 and its parent company OpenAI, claiming it to be much more potent than its predecessor ChatGPT 3.5, which raised questions regarding the capabilities of the LLM. Furthermore, the other factor is that ChatGPT 3.5 showcased hallucinations (creating material that is factually wrong, deceptive, or untrue.) in response to different prompts, as shown by other studies. The intended audience was members of the computer science community, such as researchers, software developers, and policymakers. The aim was to highlight large language models' potential capabilities and provide insights into their dependability. This study used a quasi-experimental study design and a systematic literature review.Our hypothesis predicted that the percentage of hallucinations (creating factually wrong, deceptive, or untrue material) would be more prevalent in ChatGPT 3.5 compared to ChatGPT 4. We based our prediction on the fact that OpenAI trained ChatGPT 4 on more material than ChatGPT 3.5. We experimented on both LLMS, and our findings supported The hypothesis. Furthermore, we looked into the literature and found studies that also agree that ChatGPT 4 is better than ChatGPT 3.5. The research concluded with suggestions for future work, like using extensive datasets and comparing the performance of different models, not only ChatGPT 3.5 and ChatGPT 4.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)