Money-Saving Tips! Save Like No One Else Taught You

💰 Money-Saving Tips! Here’s How Others Won’t Tell You to Save

Use Different Models for AI Planning and Execution

Example 1:

Here, we’ll use CLine as an example. In the screenshot below, this is the usage history of an unsuspecting user. Because CLine includes more than 10,000 system prompt tokens in every prompt, and AI coding requires feeding your code into the model, each prompt can easily consume tens of thousands of tokens. If you use the most powerful model, token usage will be very high, the cost will rise quickly, and you’ll run out of quota fast. Frustrating to use! Not smooth at all! So how do you solve this? See below.

You can configure the planning mode to use a more expensive but powerful 480b model, and the execution mode to use a cheaper 30b model with still solid coding capability (distilled from 480b). Since planning only takes one step while execution involves many steps, this setup can save you 80% of wasted cost!!! The same applies to tools like claude code and qwen code as well!

Use Ultra-Low-Cost Models for Simple Repetitive Tasks

Example 1:

Here we’ll use ChatBox as an example. For some simple repetitive tasks in AI clients, such as topic naming, article summarization, translation (for standard use cases that don’t require high specialization), embedding, and so on, using our ultra-low-cost model instead of a more expensive large model can significantly reduce expenses.

4All API - One-stop AI model API aggregation platform
Official site: https://4allapi.com
API Base: https://api.4allapi.com