How did ChatGPT "Monday" learn Taiwanese Mandarin? I read all of PTT, Dcard, and Jiubadao.

ChatGPT's style module "Monday" blurts out a lot of Taiwanese phrases and idioms because it "captures" a lot of material from the Taiwanese network. (Synopsis: ChatGPT launched the world-weary female voice "Monday", lazy and lost in the community to become popular) (Background supplement: The IP Bureau officially responded to "ChatGPT is a large number of imitations of Ghibli": AI imitation is not illegal, depending on the case) When you open the ChatGPT-style voice module "Monday", you will find that "this guy is a little chilly and world-weary", and will detect your accent, and will speak "Taiwanese Mandarin" in response, why does it sound so much like Taiwanese? The answer: ChatGPT has admitted that it captures a lot of data on the Taiwanese web. What is "Monday Mode"? We must clarify that "Monday" is not a new GPT model, nor an upgraded version of GPT-5, but a dialogue style made by OpenAI with style tuning on the GPT-4 architecture. Simply put, the same AI changes its tone, like wearing different sets of clothes, going to work and weekends. Monday mode is relaxed, a little chilly, polite but not verbose, and it feels like you just checked in at the company on Monday, and you are very melancholy. A crawler with a lot of Taiwanese data trains OpenAI to train GPT, which is actually very "old-school" but super effective: to see the whole network exploded. Including news sites, Wikipedia, Chinese books, social forums, blogs, PDFs, black histories you used to write on nameless sites. As long as it is a public web page, those who can be crawled down by crawlers are basically likely to be thrown into the corpus for training. We cross-compared the behavioral reactions of major open source corpora and GPT, and found that these Taiwanese media were read by ChatGPT: "United News Network" "ETtoday" "Zhongshi Electronic News" "Wind Media" "NOWnews"... These media outlets have one thing in common: there is no locked paywall, Google searches, and the website structure is clean and easy to climb. Conversely, sites like Tianxia, The Report, and BusinessWeek that are paid or blocked by membership walls have a very low chance of being trained. GPT has really read the works of Taiwanese writers GPT is very good at imitating the rhythm of novel dialogue in the style of nine knives, and can also tell sentimental sentences in Wu Nianzhen's style, and even the tone of Long Yingtai's "The Great River and the Sea" It has a bit of a mastery. What does this mean? It actually read, or at least saw the reposted clip. Most likely, these works were heavily copied and pasted on PTT, blogs, or content reposting sites, and Nine Knives' early works were even published directly on PTT storyboards, and then captured by models as learning materials. If you ask it about the details of Zhang Dachun or Luo Yijun's novel? GPT usually starts talking nonsense, because literary works are rarely discussed and cited, there are no public electronic files, they are not directly reprinted on the Internet, and even if they are, they cannot be caught. PTT is GPT's Taiwanese sense teacher This is almost certain: GPT understands the terrier of the villagers, can understand what "tweet", "shh", "old driver" is, even the world-weary sense of the Tech_Job board, it can be restored, and the speech can be very much like a bamboo engineer. Why? Because PTT's data has long been collated by the academic community into a trainable corpus, publicly released, or in JSON format. It's heaven for the model. In contrast, although Dcard is very popular, but the later anti-crawler is doing well, except for early articles or popular events that have been reprinted, Dcard's articles in the past 2 years may not be mastered by ChatGPT. The "soul" behind Monday is actually learned from all the words you have left on the Internet in the past ten years. That's right, everything you said, it remembers a little. The next time you talk to ChatGPT, think about it, "Huh, shouldn't it have actually seen my tweet on PTT ten years ago?" Most likely there is. Related Stories GPT-5 Postponed! OpenAI first pushes o3, o4-Mini, Sam Altman self-exposed: integration is more difficult than imagined OpenAI strengthens GPT-4o rushed to the second place! Sam Altman: Better understanding of people and writing programs, creativity greatly increased OpenAI announced: Open Agents SDK supports MCP, connecting everything to another key step 〈How did ChatGPT "Monday" learn Taiwanese Chinese? PTT, Dcard, and Nine Knives have all been read" This article was first published in BlockTempo's "Dynamic Trend - The Most Influential Blockchain News Media".

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments