Stage Table Function
Databend supports using standard SQL to query data files located in an internal stage or named external stage.
The schema is automatically detected, same as infer_schema.
Syntax
SELECT <columns> FROM
{@<stage_name>[/<path>] | '<uri>'} [(
  [ PATTERN => '<regex_pattern>']
  [ FILE_FORMAT => '<format_name>']
  [ FILES => ( 'file_name' [ , 'file_name' ... ] ) ]
  [ ENDPOINT_URL => <'url'> ]
  [ AWS_KEY_ID => <'aws_key_id'> ]
  [ AWS_KEY_SECRET => <'aws_key_secret'> ]
  [ ACCESS_KEY_ID => <'access_key_id'> ]
  [ ACCESS_KEY_SECRET => <'access_key_secret'> ]
  [ SECRET_ACCESS_KEY => <'secret_access_key'> ]
  [ SESSION_TOKEN => <'session_token'> ]
  [ REGION => <'region'> ]
  [ ENABLE_VIRTUAL_HOST_STYLE => true|false ]
)]
The function parameters are as follows:
FILE_FORMAT = '<format_name>'
<format_name> should be one of the following:
- A built-in file format (see Input & Output File Formats.
- A named file format created by CREATE FILE FORMAT.
If not specified for named stages, the format of the stage should be used.
Only parquet file format is currently supported.
PATTERN = '<regex_pattern>'
A PCRE2-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. Click here to see an example. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html.
FILES = ( 'file1' [ , 'file2' ... ] )
Specifies a list of one or more files names (separated by commas) to be read.
Connection Options for <uri> only
These include:
- ENDPOINT_URL
- AWS_KEY_ID
- AWS_SECRET_KEY
- ACCESS_KEY_ID
- ACCESS_KEY_SECRET
- SECRET_ACCESS_KEY
- SESSION_TOKEN
- REGION
- ENABLE_VIRTUAL_HOST_STYLE
They are explained Create Stage.
Query Examples
Select from Named Stage
-- New stage.
create stage lake;
    
-- Stage the data file into internal stage.
copy into @lake from (select * from numbers(10)) file_format = (type = PARQUET);
-- Show files in the internal stage.
list @lake;
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
| name                                                  | size | md5                                | last_modified                 | creator |
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
| data_8f414f66-5a94-42ad-ad52-a9076541799e_0_0.parquet |  258 | "7DCC9FFE04EA1F6882AED2CF9640D3D4" | 2023-02-24 09:55:46.000 +0000 | NULL    |
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
-- Query.
select min(number), max(number) from @lake (pattern => '.*parquet');
+-------------+-------------+
| min(number) | max(number) |
+-------------+-------------+
|           0 |           9 |
+-------------+-------------+
Select from URI
The following options are available to select from for the URI: s3, azblob , gcs, https, and ipfs.
file_format must be specified.
select *  from 's3://bucket/test.parquet' 
( access_key_id => 'your-access-key-id', 
  secret_access_key => 'your-secret-access-key',
  endpoint_url => 'your-object-storage-endpoint',
  file_format => 'parquet');  
select count(*), author from 'https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet' (file_format => 'parquet')
group by author;
+----------+---------------------+
| count(*) | author              |
+----------+---------------------+
|        1 | Jim Gray            |
|        1 | Michael Stonebraker |
+----------+---------------------+