Finish seminar 1 notes

2025-08-18 11:11:24 +12:00
parent 574ff09929
commit da7e9dd6ef
12 changed files with 64 additions and 1 deletions
--- a/wed13thaugust/headlines/team9_slideshow.pdf
+++ b/wed13thaugust/headlines/team9_slideshow.pdf
--- a/wed13thaugust/headlines/team9_summary.pdf
+++ b/wed13thaugust/headlines/team9_summary.pdf
--- a/test/team3_slides.pdf
+++ b/test/team3_slides.pdf
--- a/test/team3_summary.pdf
+++ b/test/team3_summary.pdf
--- a/wed13thaugust/wed13thaugust.md
+++ b/wed13thaugust/wed13thaugust.md
@@ -0,0 +1,16 @@
+## AI headlines
+
+This paper talks about 2 studies that were conducted to test the effect of labels on peoples perceptions of new Headlines. It found with seemingly strong evidence that having a AI label on a news headlines reduced people trustworthness of the articles and willing ness to share regardless of what the headline itself was. I think the observation that people assume incorrectly full AI involvment becuase of the label and have lower trust of AI is something that will change rapidly over the next few years. As AI gets stronger the articles will be fully generated autonomously and could be stronger and more truthfull then their human made counter parts
+
+My questions that I would of liked to ask are:
+- For the AI news headline did they just take the first news headline given or choose out of the top 3  or such(I imagine in practice choosing from top three options or such is more realistic).
+- Was any part of the study testing peoples prediction of whether the news headline was AI generated. The results suggest people either can't tell or have no opinion but explicit testing woudl be interesting.
+- I imagine different polictical groups have different views of AI was there noticable difference in groups feeling different about AI.
+
+## Turing test
+
+This study conducts a text based turing test. They have two tests one which is using remote workers from Prolific and another is using a smaller set of US undergraduate students. In both sets of experiments AI performed at a similar level with humans ~2 thirds detection rate. Assuming this study is regouruos enough I would consider this enough evidence to consider the simple test based turing test (i.e sit down and have shortish converstaion) to be saturated and no longer a usefull test. I think that next steps have to involve signicant time of converstaion (i.e weeks - months) and more modes like voice.
+Some questions that I would like to ask are:
+- Were there any tests of using the same AI systems to predcit whether they were speaking with a human or a AI
+- What caused the conversation to end, i.e was it a hard timer or soft timer.
+- Did the TIKTOK prompt (and others) specifically instruc the model to be deceiptive and tick the human.