跳至主要内容

Code Infilling API 及參數說明

Code Infilling

一般使用

基於給定程式碼前後文來預測程式中要填補的段落,以 <FILL_ME> 標籤當成要填補的部分,實際應用會是在開發環境 (IDE) 中自動完成程式中缺漏或是待完成的程式碼區段。

注意事項
  • 目前 Qwen2.5-coder-7B 與 Meta-CodeLlama 系列模型支援 Code Infilling,若使用的模型不支援,API 會回傳錯誤訊息。
  • 如果輸入內容包含多個 <FILL_ME>,API 會回傳錯誤訊息。
export API_KEY={API_KEY}
export API_URL={API_URL}

curl "${API_URL}/models/text_infilling" \
-H "X-API-KEY:${API_KEY}" \
-H "content-type: application/json" \
-d '{"model":"qwen2.5-coder-7b-32k",
"inputs":"def remove_non_ascii(s: str) -> str:\n \"\"\" <FILL_ME>\n return result\n",
"parameters":{
"max_new_tokens":43,
"temperature":0.1,
"top_k":50,
"top_p":1,
"frequence_penalty":1}}'

輸出:取代 <FILL_ME> 的程式片段、token 個數以及所花費的時間秒數。

{
"generated_text": "Remove non-ASCII characters from a string. \"\"\"\n result = \"\"\n for c in s:\n if ord(c) < 128:\n result += c\n ",
"function_call": null,
"details": null,
"total_time_taken": "0.99 sec",
"prompt_tokens": 27,
"generated_tokens": 43,
"total_tokens": 70,
"finish_reason": "length"
}

Python 範例
import json
import requests, re

MODEL_NAME = "qwen2.5-coder-7b-32k"
API_KEY = "{API_KEY}"
API_URL = "{API_URL}"

# parameters
max_new_tokens = 43
temperature = 0.1
top_k = 50
top_p = 1.0
frequence_penalty = 1.0

def text_infilling(prompt):
headers = {
"content-type": "application/json",
"X-API-Key": API_KEY}
data = {
"model": MODEL_NAME,
"inputs": prompt,
"parameters": {
"max_new_tokens": max_new_tokens,
"temperature": temperature,
"top_k": top_k,
"top_p": top_p,
"frequence_penalty": frequence_penalty
}
}

result = ''
try:
response = requests.post(
API_URL + "/models/text_infilling", json=data, headers=headers)
if response.status_code == 200:
result = json.loads(response.text, strict=False)['generated_text']
else:
print("error")
except:
print("error")
return result.strip("<EOT>")

text = '''def remove_non_ascii(s: str) -> str:
""" <FILL_ME>
return result
'''
result = text_infilling(text)
print(re.sub("<FILL_ME>", result, text))

輸出:
def remove_non_ascii(s: str) -> str:
""" Remove non-ascii characters from a string. """
result = ""
for c in s:
if ord(c) < 128:
result += c

return result

使用 Stream 模式

Server-sent event (SSE):伺服器主動向客戶端推送資料,連線建立後,在一步步生成字句的同時也將資料往客戶端拋送,和先前的一次性回覆不同,可加強使用者體驗。若有輸出大量 token 文字的需求,請務必優先採用 Stream 模式,以免遇到 Timeout 的情形。

export API_KEY={API_KEY}
export API_URL={API_URL}

# model: qwen2.5-coder-7b-32k
curl "${API_URL}/models/text_infilling" \
-H "X-API-KEY:${API_KEY}" \
-H "content-type: application/json" \
-d '{"model":"qwen2.5-coder-7b-32k",
"inputs":"def compute_gcd(x, y):\n <FILL_ME>\n return result\n",
"stream":true,
"parameters":{
"max_new_tokens":50,
"temperature":0.5,
"top_k":50,
"top_p":1,
"frequence_penalty":1}}'

輸出:每個 token 會輸出一筆資料,最末筆則是會返回總 token 個數,結束原因 finish_reason(length) 和所花費的時間秒數。

data: {"generated_text": "result", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "1", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " while", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " (", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " !=", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "0", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": ")", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " and", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " (", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " !=", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "0", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "):", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " if", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " >", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": ":", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " %", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " else", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": ":", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " %", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " result", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": "0.80 sec", "prompt_tokens": 20, "generated_tokens": 50, "total_tokens": 70, "finish_reason": "length"}

注意事項
  1. 每筆 token 並不一定能解碼成合適的文字,如果遇到該種情況,該筆 generated_text 欄位會顯示空字串,該 token 會結合下一筆資料再來解碼,直接能呈現為止。
  2. 本案例採用 sse-starlette,在 SSE 過程中約 15 秒就會收到 ping event,目前在程式中如果連線大於該時間就會收到以下資訊 (非 JSON 格式),在資料處理時需特別注意,下列 python 範例已經有包含此資料處理。

    event: ping
    data: 2023-09-26 04:25:08.978531

Python 範例
import json
import requests, re

MODEL_NAME = "qwen2.5-coder-7b-32k"
API_KEY = "{API_KEY}"
API_URL = "{API_URL}"

# parameters
max_new_tokens = 50
temperature = 0.5
top_k = 50
top_p = 1.0
frequence_penalty = 1.0

def text_infilling(prompt):
headers = {
"content-type": "application/json",
"X-API-Key": API_KEY}
data = {
"model": MODEL_NAME,
"inputs": prompt,
"parameters": {
"max_new_tokens": max_new_tokens,
"temperature": temperature,
"top_k": top_k,
"top_p": top_p,
"frequence_penalty": frequence_penalty
},
"stream": True
}

messages = []
result = ""
try:
response = requests.post(API_URL + "/models/text_infilling", json=data, headers=headers, stream=True)
if response.status_code == 200:
for chunk in response.iter_lines():
chunk = chunk.decode('utf-8')
if chunk == "":
continue

# only check format => data: ${JSON_FORMAT}
try:
record = json.loads(chunk[5:], strict=False)
if "status_code" in record:
print("{:d}, {}".format(record["status_code"], record["error"]))
break
elif "finish_reason" in record and record["finish_reason"] is not None:
message = record["generated_text"]
messages.append(message)
print(">>> " + message)
result = ''.join(messages)
break
elif record["generated_text"] is not None:
message = record["generated_text"]
messages.append(message)
print(">>> " + message)
else:
print("error")
break
except:
pass
else:
print("error")
except:
print("error")
return result.strip("<EOT>")

text = """def compute_gcd(x, y):
<FILL_ME>
return result
"""
result = text_infilling(text)
print(re.sub("<FILL_ME>", result, text))

輸出:
>>> result
>>> =
>>>
>>> 1
>>>

>>>
>>> while
>>> (
>>> x
>>> !=
>>>
>>> 0
>>> )
>>> and
>>> (
>>> y
>>> !=
>>>
>>> 0
>>> ):
>>>

>>>
>>> if
>>> x
>>> >
>>> y
>>> :
>>>

>>>
>>> x
>>> =
>>> x
>>> %
>>> y
>>>

>>>
>>> else
>>> :
>>>

>>>
>>> y
>>> =
>>> y
>>> %
>>> x
>>>

>>>
>>> result
>>> =
>>> x
def compute_gcd(x, y):
result = 1
while (x != 0) and (y != 0):
if x > y:
x = x % y
else:
y = y % x
result = x
return result