Code Infilling API 及參數說明
Code Infilling
一般使用
基於給定程式碼前後文來預測程式中要填補的段落,以 <FILL_ME>
標籤當成要填補的部分,實際應用會是在開發環境 (IDE) 中自動完成程式中缺漏或是待完成的程式碼區段。
- 目前 Qwen2.5-coder-7B 與 Meta-CodeLlama 系列模型支援 Code Infilling,若使用的模型不支援,API 會回傳錯誤訊息。
- 如果輸入內容包含多個
<FILL_ME>
,API 會回傳錯誤訊息。
export API_KEY={API_KEY}
export API_URL={API_URL}
curl "${API_URL}/models/text_infilling" \
-H "X-API-KEY:${API_KEY}" \
-H "content-type: application/json" \
-d '{"model":"qwen2.5-coder-7b-32k",
"inputs":"def remove_non_ascii(s: str) -> str:\n \"\"\" <FILL_ME>\n return result\n",
"parameters":{
"max_new_tokens":43,
"temperature":0.1,
"top_k":50,
"top_p":1,
"frequence_penalty":1}}'
輸出:取代 <FILL_ME>
的程式片段、token 個數以及所花費的時間秒數。
{
"generated_text": "Remove non-ASCII characters from a string. \"\"\"\n result = \"\"\n for c in s:\n if ord(c) < 128:\n result += c\n ",
"function_call": null,
"details": null,
"total_time_taken": "0.99 sec",
"prompt_tokens": 27,
"generated_tokens": 43,
"total_tokens": 70,
"finish_reason": "length"
}
Python 範例
import json
import requests, re
MODEL_NAME = "qwen2.5-coder-7b-32k"
API_KEY = "{API_KEY}"
API_URL = "{API_URL}"
# parameters
max_new_tokens = 43
temperature = 0.1
top_k = 50
top_p = 1.0
frequence_penalty = 1.0
def text_infilling(prompt):
headers = {
"content-type": "application/json",
"X-API-Key": API_KEY}
data = {
"model": MODEL_NAME,
"inputs": prompt,
"parameters": {
"max_new_tokens": max_new_tokens,
"temperature": temperature,
"top_k": top_k,
"top_p": top_p,
"frequence_penalty": frequence_penalty
}
}
result = ''
try:
response = requests.post(
API_URL + "/models/text_infilling", json=data, headers=headers)
if response.status_code == 200:
result = json.loads(response.text, strict=False)['generated_text']
else:
print("error")
except:
print("error")
return result.strip("<EOT>")
text = '''def remove_non_ascii(s: str) -> str:
""" <FILL_ME>
return result
'''
result = text_infilling(text)
print(re.sub("<FILL_ME>", result, text))
輸出:
def remove_non_ascii(s: str) -> str:
""" Remove non-ascii characters from a string. """
result = ""
for c in s:
if ord(c) < 128:
result += c
return result
使用 Stream 模式
Server-sent event (SSE):伺服器主動向客戶端推送資料,連線建立後,在一步步生成字句的同時也將資料往客戶端拋送,和先前的一次性回覆不同,可加強使用者體驗。若有輸出大量 token 文字的需求,請務必優先採用 Stream 模式,以免遇到 Timeout 的情形。
export API_KEY={API_KEY}
export API_URL={API_URL}
# model: qwen2.5-coder-7b-32k
curl "${API_URL}/models/text_infilling" \
-H "X-API-KEY:${API_KEY}" \
-H "content-type: application/json" \
-d '{"model":"qwen2.5-coder-7b-32k",
"inputs":"def compute_gcd(x, y):\n <FILL_ME>\n return result\n",
"stream":true,
"parameters":{
"max_new_tokens":50,
"temperature":0.5,
"top_k":50,
"top_p":1,
"frequence_penalty":1}}'
輸出:每個 token 會輸出一筆資料,最末筆則是會返回總 token 個數,結束原因 finish_reason(length) 和所花費的時間秒數。
data: {"generated_text": "result", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "1", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " while", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " (", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " !=", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "0", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": ")", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " and", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " (", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " !=", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "0", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "):", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " if", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " >", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": ":", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " %", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " else", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": ":", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " y", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " %", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": "\n", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " ", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " result", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " =", "function_call": null, "details": null, "total_time_taken": null, "prompt_tokens": 0, "generated_tokens": 0, "total_tokens": 0, "finish_reason": null}
data: {"generated_text": " x", "function_call": null, "details": null, "total_time_taken": "0.80 sec", "prompt_tokens": 20, "generated_tokens": 50, "total_tokens": 70, "finish_reason": "length"}
- 每筆 token 並不一定能解碼成合適的文字,如果遇到該種情況,該筆 generated_text 欄位會顯示空字串,該 token 會結合下一筆資料再來解碼,直接能呈現為止。
- 本案例採用 sse-starlette,在 SSE 過程中約 15 秒就會收到 ping event,目前在程式中如果連線大於該時間就會收到以下資訊 (非 JSON 格式),在資料處理時需特別注意,下列 python 範例已經有包含此資料處理。
event: ping
data: 2023-09-26 04:25:08.978531
Python 範例
import json
import requests, re
MODEL_NAME = "qwen2.5-coder-7b-32k"
API_KEY = "{API_KEY}"
API_URL = "{API_URL}"
# parameters
max_new_tokens = 50
temperature = 0.5
top_k = 50
top_p = 1.0
frequence_penalty = 1.0
def text_infilling(prompt):
headers = {
"content-type": "application/json",
"X-API-Key": API_KEY}
data = {
"model": MODEL_NAME,
"inputs": prompt,
"parameters": {
"max_new_tokens": max_new_tokens,
"temperature": temperature,
"top_k": top_k,
"top_p": top_p,
"frequence_penalty": frequence_penalty
},
"stream": True
}
messages = []
result = ""
try:
response = requests.post(API_URL + "/models/text_infilling", json=data, headers=headers, stream=True)
if response.status_code == 200:
for chunk in response.iter_lines():
chunk = chunk.decode('utf-8')
if chunk == "":
continue
# only check format => data: ${JSON_FORMAT}
try:
record = json.loads(chunk[5:], strict=False)
if "status_code" in record:
print("{:d}, {}".format(record["status_code"], record["error"]))
break
elif "finish_reason" in record and record["finish_reason"] is not None:
message = record["generated_text"]
messages.append(message)
print(">>> " + message)
result = ''.join(messages)
break
elif record["generated_text"] is not None:
message = record["generated_text"]
messages.append(message)
print(">>> " + message)
else:
print("error")
break
except:
pass
else:
print("error")
except:
print("error")
return result.strip("<EOT>")
text = """def compute_gcd(x, y):
<FILL_ME>
return result
"""
result = text_infilling(text)
print(re.sub("<FILL_ME>", result, text))
輸出:
>>> result
>>> =
>>>
>>> 1
>>>
>>>
>>> while
>>> (
>>> x
>>> !=
>>>
>>> 0
>>> )
>>> and
>>> (
>>> y
>>> !=
>>>
>>> 0
>>> ):
>>>
>>>
>>> if
>>> x
>>> >
>>> y
>>> :
>>>
>>>
>>> x
>>> =
>>> x
>>> %
>>> y
>>>
>>>
>>> else
>>> :
>>>
>>>
>>> y
>>> =
>>> y
>>> %
>>> x
>>>
>>>
>>> result
>>> =
>>> x
def compute_gcd(x, y):
result = 1
while (x != 0) and (y != 0):
if x > y:
x = x % y
else:
y = y % x
result = x
return result