A file storage data source allows you to upload PDFs, word documents, spreadsheets, and other file formats to use in your knowledge base. You can add metadata to files to provide additional clues and context for your knowledge base to help LLM models retrieve the most relevant files.
For example, if you have added installation and user guides for several different products, you could add metadata to indicate which product each file relates to, and whether it's an installation or user guide. The model can use this metadata to help it find the most relevant files in the data source.
Each file has a maximum size of 50Mb. You can upload as many files as you need.
File storage data sources support:
- Text files such as .txt and .text
- Markdown (.md) files
- HTML and similar formats such as .htm, .html, .shtml, and .ehtml
- Word documents such as .doc and .docx
- Spreadsheet files such as .xls, .xlsx, and .csv
- PDFs
Once uploaded, your files are available in the uploaded files list. Use the search field to filter the list of files by filename (including the file extension), metadata key name, or metadata value.
Changes to data sources are listed as ready to publish in versions, and must be published before they take effect in your chatbot. You should sync your data source to test your changes thoroughly in TestBot before you publish them.
You can:
- Create a data source
- Upload files to the data source
- Add metadata to files
- Download a file
- Delete a file
- Sync a data source
- View a data source's history
- Delete a data source
File metadata
You can add metadata to your files to provide additional information, such as the original source or subject category.
Each piece of metadata is defined as a key with a corresponding value. The value can be text, boolean (true or false), or a number.
You can define as many metadata fields as you need, but your data source will work best if you use them consistently: don't use the same key name to mean different things. You can add metadata to files individually or bulk-copy metadata across multiple files.