“In the United States, there is no suspense in doing open-source large models or doing general-purpose ones. The investment has also been clear. However, in China, it is not yet determined who can do the best large model. Everyone has a chance to strive for it and it may not necessarily be limited to big companies.” said Wang Xiaochuan, CEO of Baichuan at a media conference on August 8th.
According to the “Research Report on China’s Artificial Intelligence Large Model Map,” as of May 28th, at least 79 large-scale basic models with a parameter scale of over one billion have been released domestically. If we trace back to when Google released the Transformer network structure in 2017, various forms of large model technologies that have been applied in different scenarios have emerged globally within five years.
On the afternoon of August 8th, Baichuan announced the release of its third large-scale product, Baichuan-53B, and initiated the first round of internal testing. At the same time, Wang Xiaochuan accepted interviews from media outlets such as Jiemian News.
Previously, on July 11th, Baichuan released two quantized versions of its general large language model Baichuan-13B-Base and chat model Baichuan-13B-Chat, with a parameter size of 13 billion. This release signifies that in just four months since its establishment, Baichuan has already launched three large-scale model products at an astonishing speed.
Although the names of these three large models all start with “Baichuan”, Wang Xiaochuan specifically pointed out that these large models are not positioned as sandbox products for ultimate consumer use, but rather primarily serve business-to-business (B2B) purposes.
On the afternoon of the 8th, Baichuan’s third model, Baichuan-53B, launched its first batch of internal testing services. Interface News reporters found through testing that this product demonstrated strong logical reasoning when answering the latest and slightly challenging questions.
According to Wang Xiaochuan, the greater capability of Baichuan-53B is its ability to understand the underlying meaning behind language generalization. This product represents strong abilities in abstraction, analogy, and association at a level comparable to humanities subjects. It can organically connect various concepts. “Our model is at the forefront in the field of humanities,” said Wang Xiaochuan.
In fact, the large-scale model of strong language and science abilities embodies Wang Xiaochuan’s technological aesthetics. In an interview at the beginning of April when he started his business, he mentioned that logic itself is not advanced. Higher-level human wisdom lies in analogy and abstraction, such as classification and categories, which ChatGPT does quite well.
But whether it is the previous accumulation of language abilities by the Sogou team, or the impressive performance of the new product in terms of grammar, rhetoric, and logic, Baichuan Intelligence’s model is not aimed at the C-end. Although the Baichuan team has deployed super applications including C outside of B-to-B scenarios, Wang Xiaochuan emphasized that the current open interface testing is to help everyone make progress in their work and not specifically optimized for C-end scenarios. “Regardless of the previous 7B and 13B or 53B, it is more about preparing for B-to-B industries.” Next month, Baichuan-53B will open its API and related components will also be gradually opened.
This detailed expression has triggered a misconception about the positioning of B-end and C-end within the same company.
Just recently, a prominent VC investor told reporters that the primary market is currently not optimistic about models targeting B2B vertical sectors because it is difficult to establish barriers based on data. In response to this, Wang Xiaochuan expressed to Interface News that while the ceiling for large-scale B2B models may not be high, there is actually more clarity in terms of certainty. Many enterprises have B2B demands; however, the complexity of integration and high R&D costs pose challenges. Each enterprise has its own proprietary data, so it is crucial to establish effective intermediate-layer connections. Without a good model, both parties could suffer. He also presented a vision for a business model involving large-scale B2B models: “With natural real-life scenarios on the B2B side and an intermediate layer providing enterprise services along with companies (such as ours) developing models in the background – I understand it as such a three-tier structure,” Wang Xiaochuan pointed out.
But he also told that after completing the B-side, they will start to fill in the C-side layout. Baichuan will not only focus on one direction.
The current emphasis on the B-end positioning also explains Wang Xiaochuan’s choices in terms of open source and closed source. He stated that large models themselves do not necessarily cater to the C-end, unlike Android or iOS where a choice between the two is required. From the perspective of the B-end, both open source and closed source are actually needed.
According to media reports, after the surge of the big model trend in March this year, Wang Xiaochuan made the decision to enter the big model entrepreneurship within two weeks. At this time point, a few leading big model companies such as Zhipu AI and MiniMax have already gained fame.
Wang Xiaochuan admitted that compared to big model companies like Zhipu AI and MiniMax, which already have a certain market influence, Baichuan entered the market as a latecomer. Therefore, open source is a way to demonstrate technological strength. “We believe that technology development will be very fast in the future, as long as there is continuous technological iteration, it will generate its own business model.” Wang Xiaochuan values the value brought by open source. He believes that 80% of companies in the future will use open-source models because they are compact and closed-source and cannot provide optimal adaptation for many scenarios.
Since March this year, various large-scale ChatGPT models have emerged in China at a rapid pace, causing confusion. Along with this development is the construction of an evaluation system. In July, IDC surveyed 14 mainstream Chinese market vendors of large-scale model technology and examined more than 10 indicators for these models. They released the “AI Large Model Technology Capability Assessment Report 2023,” which sparked heated discussions. Subsequently, more research institutions have invested resources to publish corresponding evaluation standards.
Wang Xiaochuan believes that among various rankings, Super Clue and the evaluation benchmark launched by Fudan University are relatively fair and can provide insights into model quality. According to him, the English language ability of Baichuan’s second large-scale model 13B is on par with Meta’s open-source large-scale model LLaMA1, while its Chinese language capability is leading domestically, thanks to iterative development enabled by open source.
In late July, Sogou’s former CMO Hong Tao joined Baichuan to oversee the commercialization business. With this, Sogou’s former CEO Wang Xiaochuan, former COO Ru Liyun, and former CMO reunited at Baichuan. At the media conference on August 8th, another figure from Sogou’s old team appeared – Chen Weipeng, the former general manager of Sogou Search. He is a key figure in technology collaboration at Baichuan and played an indispensable role in the release of three large-scale model products within four months.
Wang Xiaochuan sighed with emotion. Among the old team members of Sogou, everyone trusts each other and will prioritize returning to the team. “Wei Peng, Hong Tao, Li Yun, and Ma Zhao are all part of the old team,” Wang Xiaochuan introduced.
Currently, Baichuan has 103 members, with technical professionals accounting for 70-80%. Chen Weipeng, the co-founder of technology, stated to Interface News that the best talents from various business lines of Sogou have now gathered in Baichuan. However, Baichuan is also recruiting various talents from domestic giants, startups, and Silicon Valley. He found that in the AI2.0 era, there are significant differences in the required skills for positions such as product managers compared to the AI1.0 era.
When it comes to the criteria for selecting technical talents in the era of fierce competition, Chen Weipeng stated that Baichuan tends to favor two types of talents. The first type is those who have strong problem-solving skills for complex issues and possess a good sense of aesthetics toward algorithm systems. The second type is those with solid foundational skills in various technologies and who have a strong desire to build large models themselves.
In terms of financing progress, at the beginning of April when it was established, Baichuan was reported to have already obtained $50 million in seed funding from personal support from Wang Xiaochuan and his industry friends. Wang Xiaochuan also revealed that during the first round of financing, Baichuan had a valuation exceeding $500 million, and for the next round of financing, the valuation will exceed $1 billion. Currently, the new round of financing is also progressing very smoothly.
With Wang Xiaochuan and Wang Huiwen starting their entrepreneurial journey together, previously being a leading model company in the Zhuyuan system, they had a slight first-mover advantage. When internet giants like Wang Xiaochuan announced their entrepreneurship, the capital immediately expressed high recognition for these AGI star entrepreneurs who were ready to go solo. As we entered July, there was an undercurrent in the primary market as some investors took the lead and events began to brew with AI giants joining forces.
Regarding the more intense competition that may be faced, Wang Xiaochuan believes that a company needs a soul. Various participants in the current venture capital industry have many misunderstandings about technology, such as their previous understanding of search which clearly had various misjudgments. “Whether it is (the outside world) hoping for technology-driven or content-driven, at least from my 20 years of work experience, I think their interpretations are still relatively shallow,” Wang Xiaochuan also pointed out his perception of the essence of search. “In the past during the development period of AI, everyone slowly forgot that search is also AI, and today there are many similarities between building large models and doing the search.”