Twitter posted a thread this morning, detailing that they have now updated their source code on the public GitHub repository ‘the-algoritm’. Twitter first open-sourced its code for how it ranks posts back on April 1st this year, delivering on Elon’s promise to release the code to the world.
If we review the Insights tab on the repo, it shows that during the past month, there have been 195 Pull Requests (code changes) opened by 162 people. There have been 24 commits to the code and 644 issues closed out, while 132 new issues were raised.
In this release, the code gets an update which Twitter details in their thread, teasing that it also includes a preview of what’s next.
Twitter says that user signals are the most important data source for candidate sourcing algorithms. Understanding these factors may help you modify the way you use the platform for maximum effect.
These signals are:
- Author Follow: The accounts which user explicit follows.
- Author Unfollow: The accounts which user recently unfollows.
- Author Mute: The accounts which user have muted.
- Author Block: The accounts which user have blocked.
- Tweet Favorite: The tweets which user clicked the like button.
- Tweet Unfavorite: The tweets which user clicked the unlike button.
- Retweet: The tweets which user retweeted.
- Quote Tweet: The tweets which user retweeted with comments.
- Tweet Reply: The tweets which user replied.
- Tweet Share: The tweets which user clicked the share botton.
- Tweet Bookmark: The tweets which user clicked the bookmark button.
- Tweet Click: The tweets which user clicked and viewed the tweet detail page.
- Tweet Video Watch: The video tweets which user watched certain seconds or percentage.
- Tweet Don’t like: The tweets which user clicked “Not interested in this tweet” button.
- Tweet Report: The tweets which user clicked “Report Tweet” button.
- Notification Open: The push notification tweets which user opened.
- Ntab click: The tweets which user click on the Notifications page.
- User AddressBook: The author accounts identifiers of the user’s address book.
Twitter also highlighted their Aggregation Framework is a config-driven Summingbird-based framework for generating real-time and batch aggregate features to be consumed by ML models. Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.
What’s interesting about this is a comment on line 21 of the code in the file BCELabelTransformFromUUADataRecord.scala. The comment below shows that Twitter is using dwell time as an indicator of engagement, not different to how TikTok monitors how long you stay on a particular video, and then fills your feed with similar videos. These are engagement indicators that are not likes or retweets, but monitoring of how users are interacting with content on the platform, hopefully, increasing the quality of the For You page.
/** * To transfrom BCE events UUA data records that contain only continuous dwell time to datarecords that contain corresponding binary label features * The UUA datarecords inputted would have USER_ID, SOURCE_TWEET_ID,TIMESTAMP and * 0 or one of (TWEET_DETAIL_DWELL_TIME_MS, PROFILE_DWELL_TIME_MS, FULLSCREEN_VIDEO_DWELL_TIME_MS) features. * We will use the different engagement TIME_MS to differentiate different engagements, * and then re-use the function in EngagementTypeConverte to add the binary label to the datarecord. **/
For those playing at home and remembering the controversy when Twitter first open-sourced the algorithm, I can confirm the reference to Elon Musk, Democrat and Republican have all been removed from the code.