WorFBenchの解析
gold trajectory の構造
以下は、gold_traj/ ディレクトリ内に収められているデータセットの構造の概要。LLaMA-Factoryを参照する。toolbenchのみcategoryのキーがない。
- “source”
- “conversations”
- “role”: “system”
- “content”
- “role”: “user”
- “content”
- “role”: “assistant”
- “content”
- “target”
- “id”
- “category”
“role”: “system”はシステムプロンプトを意味し、すべてのデータセットが同じ値を持つ。
alfworld
"source": "alfworld",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and actions list you can take to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can take actions to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies, please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "You are in the middle of a room. Looking quickly around you, you see a armchair 1, a diningtable 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a garbagecan 1, a shelf 1, a sidetable 3, a sidetable 2, a sidetable 1, and a sofa 1.\n\nYour task is to: find two newspaper and put them in armchair.The action list you can take: 1. go to {recep}\n2. task {obj} from {recep}\n3. put {obj} in/on {recep}\n4. open {recep}\n5. close {recep}\n6. toggle {obj} {recep}\n7. clean {obj} with {recep}\n8. heat {obj} with {recep}\n9. cool {obj} with {recep}\nwhere {obj} and {recep} correspond to objects and receptacles."
},
{
"role": "assistant",
"content": "Node:\n1: go to locations where newspapers may be placed\n2: take newspaper from location\n3: go to armchair\n4: put newspaper in/on armchair\n5: repeat process for second newspaper.\nEdge: (START,1) (1,2) (2,3) (3,4) (4,5) (5,END)"
}
],
"target": "graph",
"id": "alfworld_196",
"category": "eval"
intercodesql
"source": "intercodesql",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and tools list you can use to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can use the tools to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies. Please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Task: What is the average GNP and total population in all nations whose government is US territory?\nThe database tables and corresponding fields you can operate are: {'city': ['ID', 'Name', 'CountryCode', 'District', 'Population'], 'sqlite_sequence': ['name', 'seq'], 'country': ['Code', 'Name', 'Continent', 'Region', 'SurfaceArea', 'IndepYear', 'Population', 'LifeExpectancy', 'GNP', 'GNPOld', 'LocalName', 'GovernmentForm', 'HeadOfState', 'Capital', 'Code2'], 'countrylanguage': ['CountryCode', 'Language', 'IsOfficial', 'Percentage']}\n"
},
{
"role": "assistant",
"content": "Node:\n1: Filter records where GovernmentForm is \"US Territory\" in the country table\n2: Calculate the average GNP and sum of the population for these filtered records\nEdge: (START,1) (1,2) (2,END)"
}
],
"target": "graph",
"id": "intercodesql_1",
"category": "eval"
lumos
"source": "lumos",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and tools list you can use to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can use the tools to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies. Please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Task: What region/continent is likely to appear here? There're some choices: A. australia, B. arctic, C. africa, D. asia. The image shows the following contents: The image features a group of three polar bears standing on a rocky beach near the water. They appear to be walking and exploring the area. In the background, there is a boat visible, possibly a tourist boat or a fishing vessel. There are several people in the scene, some of them standing near the boat, while others are scattered around the area. They seem to be observing the polar bears from a safe distance, possibly enjoying the sight of these magnificent creatures in their natural habitat.\nThe api list you can use: {'VQA(image_context, query)': 'Given image context, answer the given query', 'QA(context, query)': 'Given context, answer the given query.'}"
},
{
"role": "assistant",
"content": "Node:\n1: Identify the animals in the image.\n2: Answer which region/continent polar bears are native to.\n3: Select the choice from the four answer choices.\nEdge: (START,1) (1,2) (2,3) (3,END)"
}
],
"target": "graph",
"id": "lumos_40163",
"category": "eval"
os
"source": "os",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and actions list you can take to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can take actions to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies, please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Now, I will start a new problem in a new OS. My problem is:\n\nLocate in your home directory all the text files (.txt) that contain the string \"Linux\", then count the total number of occurrences for the string across all these files.The action list you can take: bash: execute some bash code, take bash action\nfinish: have finished the task, take finish action\nanswer: have got the answer to the question, take answer action"
},
{
"role": "assistant",
"content": "Node:\n1: execute bash code to find all .txt files containing \"Linux\" in the home directory\n2: execute bash code to count occurrences of \"Linux\" in the first file\n3: execute bash code to count occurrences of \"Linux\" in the second file\n4: execute bash code to count occurrences of \"Linux\" in the third file\n5: execute bash code to count occurrences of \"Linux\" in the fourth file\n6: take answer action\nEdge: (START,1) (1,2) (2,3) (3,4) (4,5) (5,6) (6,END)"
}
],
"target": "graph",
"id": "os_92",
"category": "eval"
seal_tools
"source": "seal_tools",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and tools list you can use to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can use the tools to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies. Please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Now it's your turn.\nTask: Please perform the following tasks to ensure the security of our endpoint. First, move the mouse cursor to the specified position with X-coordinate 41 and Y-coordinate 35. Next, scan the endpoint with the IP address '192.168.95.226' for potential security threats. Finally, encrypt the sensitive data 'Social Security Numbers' using the AES cryptographic algorithm and the encryption key 'qwerty123'.\"\nOutput:\n\nThe tool list you can select from: [{'api_name': 'encryptData', 'api_description': 'Encrypt sensitive data using a cryptographic algorithm', 'field': 'Technology/Cybersecurity', 'parameters': {'data': {'type': 'str', 'description': 'The sensitive data to be encrypted'}, 'encryption_algorithm': {'type': 'str', 'description': 'The cryptographic algorithm to be used for encryption (e.g., AES, RSA)'}, 'encryption_key': {'type': 'str', 'description': 'The encryption key required for encryption'}}, 'required': ['data', 'encryption_algorithm', 'encryption_key'], 'responses': {'encrypted_data': {'type': 'str', 'description': 'The encrypted data'}}}, {'api_name': 'encryptText', 'api_description': 'Encrypt a given text using a specified encryption algorithm', 'field': 'Engineering/Computer engineering', 'parameters': {'text': {'type': 'str', 'description': \"The text to encrypt (e.g., 'hello world')\"}, 'algorithm': {'type': 'str', 'description': \"The encryption algorithm to use (e.g., 'AES', 'DES', 'RSA')\"}}, 'required': ['text', 'algorithm'], 'responses': {'encrypted_text': {'type': 'str', 'description': 'The encrypted text'}}}, {'api_name': 'decryptResource', 'api_description': 'Decrypt an encrypted digital resource or document in a library', 'field': 'Library Science/Library Digital Security', 'parameters': {'encrypted_resource_url': {'type': 'str', 'description': 'The URL or path to the encrypted digital resource'}, 'decryption_key': {'type': 'str', 'description': 'The decryption key for accessing the resource'}}, 'required': ['encrypted_resource_url', 'decryption_key'], 'responses': {'decrypted_resource_url': {'type': 'str', 'description': 'The URL or path to the decrypted digital resource'}}}, {'api_name': 'moveMouseTo', 'api_description': 'Move mouse cursor to specified position', 'field': 'Software/Human-computer interaction', 'parameters': {'x': {'type': 'int', 'description': 'The X-coordinate to move the mouse cursor to'}, 'y': {'type': 'int', 'description': 'The Y-coordinate to move the mouse cursor to'}}, 'required': ['x', 'y'], 'responses': {'success': {'type': 'bool', 'description': 'Indicates whether the mouse cursor was successfully moved'}}}, {'api_name': 'scanEndpoint', 'api_description': 'Scan a specific endpoint for potential security threats', 'field': 'Cybersecurity/Endpoint Security', 'parameters': {'endpoint': {'type': 'str', 'description': 'The IP address or hostname of the endpoint to scan'}}, 'required': ['endpoint'], 'responses': {'scan_result': {'type': 'str', 'description': 'The result of the scan indicating potential security threats'}}}]\n"
},
{
"role": "assistant",
"content": "Node:\n1: Move mouse cursor to position X=41, Y=35\n2: Scan the endpoint with IP '192.168.95.226' for security threats\n3: Encrypt 'Social Security Numbers' using AES algorithm with key 'qwerty123'\nEdge: (START,1) (1,2) (2,3) (3,END)"
}
],
"target": "graph",
"id": "seal_tools_1",
"category": "eval"
toolalpaca
"source": "toolalpaca",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and tools list you can use to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can use the tools to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies. Please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Task: I want to compare the performance of Coca-Cola and PepsiCo stocks. Can you get me the real-time market data for both of them? Also, I'd like to see their historical market data for the past year, with monthly intervals.\nThe api list you can use: [{'name': 'getRealTimeMarketData', 'description': 'Retrieve real-time market data for multiple exchanges around the world.', 'parameters': {'symbols': 'Required. string. Comma-separated list of stock symbols or indices.', 'exchange': 'string. The stock exchange to retrieve data from.', 'limit': 'integer. The number of results to return per request.'}, 'output': 'An array of objects containing real-time market data for the specified symbols.', 'format': 'application/json', 'path': '/realtime', 'method': 'get'}, {'name': 'getHistoricalMarketData', 'description': 'Retrieve historical market data for a set of specific stocks or indices.', 'parameters': {'symbols': 'Required. string. Comma-separated list of stock symbols or indices.', 'date_from': 'Required. string. The start date for the historical data in YYYY-MM-DD format.', 'date_to': 'Required. string. The end date for the historical data in YYYY-MM-DD format.', 'interval': \"string. The time interval for the historical data (e.g., '1min', '1hour', '1day').\"}, 'output': 'An array of objects containing historical market data for the specified symbols and date range.', 'format': 'application/json', 'path': '/historical', 'method': 'get'}, {'name': 'getTechnicalIndicators', 'description': 'Retrieve a variety of technical indicators to help analyze market trends.', 'parameters': {'symbol': 'Required. string. The stock symbol or index to analyze.', 'indicators': \"Required. string. Comma-separated list of technical indicators to retrieve (e.g., 'SMA', 'EMA', 'RSI').\", 'interval': \"string. The time interval for the technical indicators (e.g., '1min', '1hour', '1day').\"}, 'output': 'An array of objects containing the specified technical indicators for the given symbol and time interval.', 'format': 'application/json', 'path': '/indicators', 'method': 'get'}, {'name': 'searchStockSymbols', 'description': 'Search for stock symbols based on a given query.', 'parameters': {'query': 'Required. string. The search query to find matching stock symbols.', 'exchange': 'string. The stock exchange to search within.', 'limit': 'integer. The number of results to return per request.'}, 'output': 'An array of objects containing matching stock symbols.', 'format': 'application/json', 'path': '/search', 'method': 'get'}, {'name': 'listExchanges', 'description': 'Retrieve a list of supported stock exchanges.', 'parameters': {'limit': 'integer. The number of results to return per request.'}, 'output': 'An array of objects containing information about supported stock exchanges.', 'format': 'application/json', 'path': '/exchanges', 'method': 'get'}]"
},
{
"role": "assistant",
"content": "Node:\n1: Retrieve the real-time market data and historical market data with monthly intervals for the past year for Coca-Cola and PepsiCo stocks from the NYSE.\n2: Retrieve the historical market data for Coca-Cola and PepsiCo stocks for the past year, using monthly intervals, to compare their performance over time.\nEdge: (START,1) (1,2) (2,END)"
}
],
"target": "graph",
"id": "toolalpaca_314",
"category": "eval"
toolbench
"source": "toolbench",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and tools list you can use to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can use the tools to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies. Please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Task: Check if the domain 'example.com' is available for creating branded short links. If available, create a new short link with the destination URL 'https://example.com'.\napi_list:[{'name': 'create_a_new_link_for_url_link_shortener', 'description': 'This is the subfunction for tool \"url_link_shortener\", you can use this tool.The description of this function is: \"create_a_new_link_for_url_link_shortener\"', 'parameters': {'type': 'object', 'properties': {'destination': {'type': 'string', 'description': 'The destination URL you want your branded short link to point to', 'example_value': 'https://rapidapi.com/'}, 'domain_id': {'type': 'string', 'description': 'The unique id of the branded domain. If not specified, rebrand.ly is used'}, 'slashtag': {'type': 'string', 'description': 'The keyword portion of your branded short link'}, 'domain_fullname': {'type': 'string', 'description': 'The unique name of the branded domain, to be used in place of domain[id] in special cases. Precedence will be given to domain[id] value.'}}, 'required': ['destination'], 'optional': ['domain_id', 'slashtag', 'domain_fullname']}}, {'name': 'get_a_list_of_domains_for_url_link_shortener', 'description': 'This is the subfunction for tool \"url_link_shortener\", you can use this tool.The description of this function is: \"get_a_list_of_domains_for_url_link_shortener\"', 'parameters': {'type': 'object', 'properties': {}, 'required': [], 'optional': []}}, {'name': 'Finish', 'description': 'If you believe that you have obtained a result that can answer the task, please call this function to provide the final answer. Alternatively, if you recognize that you are unable to proceed with the task in the current state, call this function to restart. Remember: you must ALWAYS call this function at the end of your attempt, and the only part that will be shown to the user is the final answer, so it should contain sufficient information.', 'parameters': {'type': 'object', 'properties': {'return_type': {'type': 'string', 'enum': ['give_answer', 'give_up_and_restart']}, 'final_answer': {'type': 'string', 'description': 'The final answer you want to give the user. You should have this field if \"return_type\"==\"give_answer\"'}}, 'required': ['return_type']}}]"
},
{
"role": "assistant",
"content": "Node:\n1: Check the availability of the domain 'example.com' for creating branded short links using the URL link shortener service.\n2: Attempt to create a new short link with the domain 'example.com' to check its availability for branding.\nGraph:\nEdge: (START,1) (1,2) (2,END)"
}
],
"target": "graph",
"id": "toolbench_1"
webshop
"source": "webshop",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and actions list you can take to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can take actions to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies, please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "WebShop [SEP] Instruction: [SEP] i am looking for pink, high quality massage sheets that are oil resistant, and price lower than 100.00 dollars [SEP] SearchThe action list you can take: search[keywords]\nclick[value]."
},
{
"role": "assistant",
"content": "Node:\n1: Correctly search for \"pink, high quality massage sheets that are oil resistant, under $100\"\n2: review the search results for the items that match the specification\n3: select the massage sheets that best meet the criteria\n4: click on the product for more details\n5: verify price and features like oil resistance\n6: click on \"Buy Now\" button to purchase.\nEdge: (START,1) (1,2) (2,3) (3,4) (4,5) (5,6) (6,END)"
}
],
"target": "graph",
"id": "webshop_1129",
"category": "eval"
wikihow
"source": "wikihow",
"conversations": [
{
"role": "system",
"content": "You are a helpful task planner.\nFirst I will give you the task description and actions list you can take to finish the task, and your task is to plan a sequence of nodes which are subtasks to achieve the task. Each node of workflow is a subtask, and you can take actions to complete the subtask.\nAfter you construct the plan, since some subtasks may not be dependent on each other, while others may have dependencies, please convert these nodes of subtasks into a topology diagram based on the task relevance in the workflow. The Graph should start with START node, and end with END node. \nKeep your output in foramt:\nNode:\n1.{subtask_1}\n2.{subtask_2}\n...\nEdges:\n(START,1)\n...\n(n,END))"
},
{
"role": "user",
"content": "Task: How to Clean a Jade Roller\nThe action list you can take: ['Take some lukewarm water, wash the brushes and let it dry.\\n', 'Rinse the brush and towel dry it.\\n', 'Place the roller on a soft towel to let it air dry.\\n', 'Clean off massage oil with nail polish remover in the cotton pad.\\n', 'Rub the jade piece with a dry cloth to remove excess soap.\\n', 'Use a rubber cleaning stone to get rid of any remaining dirt.\\n', 'Clean the heated curler using rubbing alcohol.\\n', 'Gently scrub any gems, glass beads, or metal elements.\\n', 'Heat your jade roller at 105 to 110 °F (41 to 43 °C) in your oven for powerful drying.', 'Heat your jade roller at 105 to 110 °F (41 to 43 °C) in your oven for powerful drying.\\n', 'Rub the jade piece with a dry cloth to remove excess soap.', 'Place the roller on a soft towel to let it air dry.', 'Dip a soft cleaning cloth into olive oil.\\n']"
},
{
"role": "assistant",
"content": "Node:\n1: Rub the jade piece with a dry cloth to remove excess soap.\n2: Place the roller on a soft towel to let it air dry.\n3: Heat your jade roller at 105 to 110 °F (41 to 43 °C) in your oven for powerful drying.\nEdge: (START,1) (1,2) (2,3) (3,END)"
}
],
"target": "graph",
"id": "wikihow_1",
"category": "eval"