In the AI era, "shared computing power" is the new bike for rookie coders.

TechubNews · 2026-04-08T08:28:43+00:00

Source: GeekParkWritten by: Xu Shan “Token costs are plummeting.” If you put this sentence two years ago, it would have excited every AI entrepreneur. From 2023 to 2025, AI inference costs have dropped by 99.7%. To put it into perspective, when GPT-4 was released, the cost per million tokens was 37.5 USD; by 2025, that figure has already fallen to 0.14 USD. Given this trend, computational costs for entrepreneurs should not be an issue. But reality is exactly the opposite. During the same period, global enterprise AI cloud spending surged from 11.5 billion USD to 37.0 billion USD—up by a full three times. After AI entered the A2A era, dozens of agents engaged in repeated interactions, causing token call volumes to explode exponentially. This has also led to, although

TechubNews

2026-04-08 08:28:43

Source: Geek Park

Written by: Shan Xu

“Token costs are plummeting.”

If you put this sentence two years ago, it would have made every AI entrepreneur excited. From 2023 to 2025, AI inference costs dropped by 99.7%. To put it in perspective: when GPT-4 launched, the cost per million tokens was $37.5; by 2025, that figure had already fallen to $0.14. Given this trend, the question of compute costs shouldn’t be a problem for entrepreneurs, right?

But reality is exactly the opposite.

During the same period, global enterprise AI cloud spending surged from $11.5 billion to $37 billion—tripling, in full. After AI entered the A2A era, dozens of agents interacting repeatedly caused token calls to explode exponentially. This also led to the paradox that even though the unit price of tokens became cheaper, the number of tokens consumed per task skyrocketed.

Clearly, compute power is becoming the most unusual resource of this era. It’s getting cheaper and cheaper, but the money you spend on it will only keep rising.

For big players, this problem can be solved by building their own compute centers. But for most startups, they can only stand in the public compute market, accept the pricing from cloud vendors, watch their compute bills grow month by month, yet have no ability to negotiate.

What Fu Zhi, founder of Gongji Technology, saw was a business opportunity created by this market mismatch.

In his view, the solution to reducing compute costs isn’t limited to waiting for costs to naturally fall. Instead, it’s about changing the way compute is used, which can also start driving compute costs down. Make compute as flexible as electricity—available on demand, pay as you go—reactivating vast amounts of compute resources that have been idle and wasted.

Recently, Gongji Technology completed its Pre-A round of financing. Post-investment valuation stands at 350 million RMB, and the company plans to launch its A round of financing soon. In 2025, when the compute sector is under widespread pressure, this technology company—using AI methods to solve resource scheduling problems—quietly achieved tens of millions in RMB revenue, with a customer retention rate close to 100%.

Gongji Technology is turning compute scheduling into a real business.

Gongji Technology founder Fu Zhi | Image source: Gongji Technology

01 When AI companies explode, the books on compute costs find a new solution

On the eve of the product launch, Remy’s team barely slept, staying ready for any sudden situation.

But when the company website drew 500,000 users in 48 hours, for an AI startup just stepping out of internal testing into public launch, they needed to scale up all basic infrastructure by dozens of times in a short period. Even with some preparation, before launch Remy had already tested multiple cloud service platforms, including Ucloud, Alibaba Cloud, and Huawei Cloud. Yet when the flood of traffic truly hit, the final solution provider they relied on was Gongji Technology.

In simple terms, what Gongji Technology does is to take idle compute capacity and allocate it on demand to AI enterprises with elastic needs. Whether it’s machines idling all night in an internet café, a 4090 from an individual user, or spare resources in a small server room—any of these can become part of Gongji Technology’s schedulable compute pool. If a customer isn’t enough, Gongji can reroute more from the pool at any time; use it when needed, take it when needed.

During those 48 hours, Gongji Technology urgently provisioned nearly 1,900 GPU cards for Remy. Each time a user initiates a request, a new order is created; when the user’s computation completes, the order closes immediately. That day, the platform processed more than one million orders.

“At peak moments, it’s already very hard for a typical compute service provider to temporarily spin up 20 cards. In more cases, enterprises have to wait—yet waiting also means traffic loss, which enterprises absolutely don’t want to see.” Fu Zhi said that after this incident, the vast majority of the compute used by Remy came from Gongji Technology.

Remy’s compute needs are actually straightforward: whenever traffic surges, user clicks need to be responded to promptly. Compute calls must be fast and timely, and the cost must be low. These are the most basic compute requirements for AI startups just getting started.

By contrast, there’s another category of AI application customers whose compute needs are more niche—but also more realistic.

During last year’s Spring Festival, a company in a scenic area offering AI outfit-changing photo sessions came to Gongji Technology. They weren’t unaware of when traffic would spike, but they still found it difficult to accurately calculate the compute cost.

Their AI devices are placed in the scenic area; during holidays, they become overcrowded and compute demand surges. But once the holiday ends, compute demand nearly drops to zero. “Spring Festival is the biggest peak of the whole year. For the rest of the half year, there aren’t many people in the scenic area,” they told Fu Zhi.

Such compute fluctuations mean that if they rent compute based on peak capacity, it’s essentially burning money to keep cards running idle for 90% of the time. If they rent based on an average, then during Spring Festival demand will definitely collapse, hurting the user experience. “With demand like this, it’s relatively hard to get an appropriate solution with traditional compute service solutions, because this extreme peak-to-valley gap simply doesn’t have a corresponding pricing logic in standard products,” Fu Zhi said.

But this kind of scenario is exactly where Gongji Technology’s compute-sharing platform fits well.

That month, the service nodes switched to 1,963 personal computers. Throughout the entire Spring Festival, there was not even a single stability issue. “Compared with deploying compute based on peak capacity themselves, we helped them save nearly 70% of the costs,” Fu Zhi added.

Demand with such time-based fluctuations doesn’t only appear in some niche vertical scenarios—it’s also common for many new AI companies.

Liblib is one of the largest AI image generation platforms in China by user base. They previously rented large numbers of GPU cards from cloud vendors. But after carefully analyzing them, they found that averaged across these GPUs, their overall utilization rate is only 45%.

This also means that more than half of the cards are burning money every day—wasted.

According to Fu Zhi, enterprises like Liblib aren’t rare. Almost all AI application tools whose core users are office workers face this problem. Users are dense during the day, and the number of users drops sharply at night. If compute is provisioned based on peaks, the idle rate at night will be high; but if compute is provisioned based on averages, it’s hard to satisfy all users’ needs during the day.

The AI track looks exciting, but what may choke a company’s lifeline is the compute cost. Many companies have overestimated their compute needs. Compute costs can strangle cash flow. Others underestimate compute demand; when usage peaks, services crash and users leave—never to come back.

“AI application traffic is naturally volatile, and the pricing logic in the compute market is designed for stable demand. The way compute costs are allocated has also remained relatively traditional,” Fu Zhi said. That’s why when an AI company truly takes off, the books on compute costs need a new algorithm.

In the past, traditional compute service models mainly relied on long-term rental contracts. Companies rent for a year—whether they use it or not—and must prepay for compute resources. The cost of idle compute is mainly borne by the enterprise itself. What Gongji Technology is doing, in essence, is moving that cost to another place: those who already have idle compute but can’t run it fully—such as individual users, internet cafés, and the like. This compute was already being wasted. By scheduling it instead of generating new compute costs, Gongji also reactivates idle compute that already exists.

“More compute isn’t necessarily better,” Fu Zhi said. “What you need is compute that can flow—available anytime you can call it. That’s the key.”

02 The elastic compute business tests energy-scheduling capabilities

For Fu Zhi, the opportunity to start a compute-scheduling business actually came from an accidental chance.

In May 2023, during a holiday, at a time when the AI wave had just begun to stir, Fu Zhi posted a message in an AI entrepreneur community. The message was simple: I have an A100. The shorter the rental, the cheaper it is. If you need it, come find me.

At the time, his expectations weren’t that high—after all, he only had one GPU card. But unexpectedly, in the end 30 people consulted him, and they all paid promptly.

“I’ll serve whoever pays fastest,” he said. He ultimately selected five people to serve. One card, five customers. It validated a judgment he had been thinking about for a long time: ordinary people are starting to need compute.

But he was also clear that this business could only be established at that point in time—not because he was lucky, but because before then, there were simply no conditions to do it.

After all, as early as 1999, someone proposed compute sharing and built a BOINC platform, where hundreds of thousands of people contributed compute. But what they did was a public-interest scientific computing platform where everyone could use it for free. Later, when Bitcoin was hot, some people even considered leveraging the mining boom to schedule idle compute, but that wasn’t legal.

The idea had always been there, but the “soil” wasn’t.

After all, truly high-performance GPU owners among ordinary users were mostly the post-90s and post-00s. Before that, it was rare for personal computers to be configured with a 4090. And even running Linux virtual environments safely on personal computers via WSL 1.0.0 was only officially released in 2022. Not to mention remote calls distributed across personal devices located in different places—plus the technology that lets them be penetrated through private networks—that truly matured around 2021.

Only when the supply side, demand side, and technical conditions are all in place can this business become possible today.

But Fu Zhi believes the real signal that “the timing is right” wasn’t DeepSeek or an all-in-one machine—it was AI consumer use cases, which were accelerating from niche tools into everyday entertainment for ordinary people.

“Once this process speeds up, demand for compute won’t just be a few big companies purchasing. It will need to be scheduled and distributed at large scale, across nodes, like electricity,” Fu Zhi said.

That’s also why Gongji Technology is working to negotiate cooperation with national compute centers. At present, they have already participated in building provincial compute scheduling platforms in Jing-Jin-Ji, Yangtze River Delta, Shenzhen, and Qinghai. In the scheduling systems built in each region, Gongji has had technical participation everywhere.

However, “compute scheduling” is far more difficult than it looks.

Compute scheduling is not the same as compute management. Fu Zhi made a distinction between the two: what big manufacturers do is management—put a bunch of machines into one unified system, know who’s using them and who’s idle—but it’s hard to achieve dynamic allocation across regions and devices.

Compute scheduling is something else entirely: it needs to fill that place’s peak demand using idle compute from elsewhere. In computer engineering, there’s actually no ready-made solution. Instead, it’s an old problem from the energy domain. The phrase “peak shaving and valley filling” is originally a term from power systems.

Fu Zhi’s undergraduate major was Tsinghua University’s Building Environment and Energy Applications Engineering. His supervisor was an academician in the energy field. He transplanted energy-scheduling algorithms, solving the same underlying problem in the compute version. That is also Gongji’s core moat.

Of course, in terms of engineering, this cross-region scheduling system also faces plenty of challenges. For example, personal computers connected to the scheduling pool might be “occupied” at any moment: if a user starts a game, that machine needs to exit—yet downstream customers require that the service cannot be interrupted.

What Fu Zhi chose is hot standby plus prediction: reserve redundant nodes in advance for each task, and at the same time use accumulated historical data to predict each supply party’s online patterns, dynamically adjusting the backup ratio. The more data you have, the more accurate the backups are, and the lower the cost. “I used to need to back you up with two machines. But as usage accumulates, now I only need to back it up with one.” Since the network transmission layer is also unstable, Gongji’s response is to connect three top cloud vendors at the same time, Fu Zhi mentioned: “It’s impossible for them to all fail at the same time.”

Then why don’t cloud vendors do elastic compute?

Fu Zhi’s explanation is that big manufacturers have seen it, but their elastic compute differs in product positioning and pricing strategy. Gongji’s advantage is pricing and scheduling efficiency.

The core contradiction of elastic compute is that you have to prepare “compute that can be called anytime” in advance. But when nobody uses it, those compute resources are pure idle cost. Typically, elastic scaling from compute service providers is about five times the standard price, or they require customers to sign a one-year long-term contract—shifting the risk of idle compute to the customer.

Gongji can provide truly elastic compute because the resources it uses are already idle in the first place. Those resources weren’t purchased in advance to preemptively lock in costs. They’re already sitting there idle—so Gongji can offer a more advantageous price.

According to Fu Zhi’s analysis, in the whole market, 80% of compute demand goes to big manufacturers’ long-term rental bundled packages, and the remaining 20% is the portion with elastic needs. Fu Zhi doesn’t plan to fight for the 80%—he is more focused on that 20% market. And as AI applications continue to grow, the space for that 20% will only get bigger. “Over there, the longer you rent, the cheaper it gets; with me, the shorter you rent, the cheaper it gets,” Fu Zhi added. Today, Gongji Technology’s shared compute platform, “suanli.cn,” lets ordinary consumers rent related compute by the millisecond.

Gongji Technology team photo | Image source: Gongji Technology

This kind of shared business model has actually been validated in other industries long ago.

Fu Zhi compares the essence of this business to Airbnb: when a city holds a large exhibition, nearby hotels are fully booked. Airbnb matches residents with idle rooms to attendees who have no place to stay. The compute version follows a similar path: at the moment of software releases and traffic surges, AI applications need huge amounts of compute, far more than normal everyday demand. On the other side, the compute owned by individual users, internet cafés, and small server rooms is heavily idle at night and on workdays. Connecting both sides is what Gongji is doing.

It’s just that what’s shared isn’t rooms—it’s compute.

03 Compute and energy scheduling—“software-defined infrastructure” in the AI era

This road has been walked by others overseas too. For example, RunPod also provides elastic inference services through spare compute. In 2024, it raised a $20 million seed round led jointly by Intel Capital and Dell Technologies Capital. Among its customers are Cursor, OpenAI, and Perplexity.

But in Fu Zhi’s view, doing this in the United States versus doing it in China are completely different matters.

AWS has been providing elastic compute since its founding. From the beginning, it promised on-demand usage, serving a mature market through high-priced elastic services. However, cloud providers in China are more inclined toward long-term rental models, and their related incentive policies also tend to favor that. They don’t pay as much attention to elastic services, and users’ willingness to pay for elastic compute is much lower than in the U.S. Therefore, if you transplant RunPod’s logic into China, the pricing won’t work.

But Fu Zhi believes compute scheduling isn’t a business that only looks at renting out compute. “Shared compute may just be a foot in the door,” he said without hesitation. In his judgment, there’s probably a two- or three-year window for this business. As long as the compute supply and demand mismatch still exists, that gap will remain—but it won’t last forever.

This kind of clear-headedness isn’t common among entrepreneurs. But precisely because of that, he started thinking about something more fundamental very early: where will the next truly explosive AI application emerge from? This judgment will directly determine the trajectory of compute demand. Fu Zhi has two forward-looking views.

First, based on his analysis, China’s super-apps won’t emerge from PC-based productivity tools. The real opportunities are in mobile social entertainment, cross-border hardware combined with the supply chain, and AI applications that can be embedded into real-life scenarios.

China’s internet has never gone through a long era of deep PC-based productivity tools. Users jumped directly from feature-phone era to mobile internet. The AI documents, AI slide decks, and AI code assistants that run in the U.S. rely on a user base of tens of millions who are used to working on PCs and are willing to pay for SaaS tools—China isn’t like that. “Does more than 100 million people across China actually need to write Word? I don’t think so.” What’s even more complicated is that even if this demand exists, big manufacturers will quickly make those functions into free plug-ins.

Instead, he has seen high growth in social-entertainment scenarios. He has spoken with many practitioners involved in short dramas and film/TV, asking them why they embrace AI so enthusiastically. Their responses gave him a new idea: “I don’t have anything left to lose. No one watches movies or TV dramas anymore—we’re basically going to die.” These people are among the most active AI adopters in the Chinese market. It’s not because they understand the technology the most, but because there’s no way back. “Right now, there isn’t much TV or movies being watched anymore.”

And when it comes to the development of AI hardware, he also has a different take.

In the past few years, the mainstream approach for AI hardware has been “everything plus a chat window”—equipping every device with a conversation interface. Fu Zhi thinks this direction isn’t right. “Consumers don’t need a refrigerator that can write poetry.”

Truly viable AI hardware is AI that moves into high-frequency scenarios users already have, silently doing the work in the background rather than pulling users to sit down and chat with it.

It’s like how a pet camera should be able to automatically recognize whether a cat is sick, or how a scenic spot camera should complete outfit-changing photo sessions automatically. Users don’t need to change anything; the AI just finishes the job. “If this kind of hardware can be deployed using open-source models, then in the moment when traffic surges, it would also become a customer for elastic compute,” Fu Zhi believes. This is also one of the growth points for Gongji Technology in the future.

Fu Zhi’s second judgment is more deeply hidden. It had already taken shape by the end of 2024, but he only waited until this year for the opportunity to validate it.

He believes that having humans directly talk to AI in itself is a kind of inefficient waste. Human information input/output has an upper limit: you can only ask one question at a time, wait for an answer, then ask the next. But AI can process tens of thousands of threads simultaneously, completing information transfer between machines within milliseconds. “Driving AI with humans means using the slowest link, dragging the speed of the entire system.”

What should truly happen is AI collaborating directly with AI—A2A. When a task is issued, it triggers a chain of operations among a group of AIs. Humans only need to define the goal and don’t need to participate in every intermediate step. That’s also why OpenClaw is valued today. And that’s also why Fu Zhi thinks OpenClaw is truly important—not because of the product itself, but because it proves something: AI can form communities on its own between each other, A2A has people willing to pay, and this direction is feasible.

Once the A2A model becomes mainstream, the consumption of compute will be several times, or even dozens of times, what it is today. At GTC 2026, Huang Renxun said that due to the explosion of agentic AI and inference capabilities, the compute required today is at least 100 times more than expected a year ago—and this is just the beginning. By then, compute will really be like electricity. The question won’t be how many cards you need to stock up. Instead, it will be whether the entire “compute power grid” can distribute compute resources on demand. Compute resource management will have arrived in the scheduling domain.

When A2A truly arrives, compute will become the underlying infrastructure behind every person, every task, and every AI node—just like electricity. Then whoever can precisely schedule compute across regions, across devices, and across time slots will truly control the operational capability of that network.

In Fu Zhi’s view, what Gongji Technology is doing now is preparing for that moment—using this two- or three-year window to build scheduling capability, node networking, and customer relationships. When A2A demand truly explodes, this system will be Gongji Technology’s real moat.

He recently sent a sentence internally within the company. Near the end of an interview, he repeated it again:

“Even so, all of this is still just getting started.”

In the context of elastic compute, this line may simply be an entrepreneur’s optimistic judgment about the market. But in the context of A2A, when he says “starting,” perhaps he isn’t talking about the start of this business at all—maybe it’s the moment when the proposition of compute becoming infrastructure truly began.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.