This document describes how to prepare a CSV or JSONL file that can be read by Batch Translation, and how to read the resulting file back into your database.
All examples below show id, column1
,
column2
as the example columns. Replace them with the
column containing the text to be translated and the columns necessary to
understand the per-row context. Make sure the context columns have
human-readable names. To this end, you can alias the columns while
selecting:
-- if you include a column named `id`, it's used as a unique identifier
select comment_id as id,
-- the name of the column being translated is not fed to the LLM
comment_tx,-- make sure the LLM knows that 5 means 5/10, not 5/5
as rating_out_of_10
rating -- this is the title of the blog post, not the title of the comment
as blog_post_title,
ps_title from table_name
Exporting to CSV is easiest if you use PostgreSQL or SQLite, as both databases have native support for RFC 4180 CSV, the specific CSV format Batch Translation uses. Pandas is a Python package allows you to export compatible CSVs from other databases. If you are using neither PostgreSQL nor SQLite and you prefer not to use Pandas, you’ll probably want to use the JSONL format. You may use the following LLM prompt:
In [programming language] using [ORM], write a script to export [object] to a JSONL file. This is the model definition:
[ORM model]
[relevant code]
JSONL is a format in which each line is a valid JSON object. The easiest way to perform a single translation is to upload a one-line JSON object. This is useful when you have one large document you’d like to translate:
"text": "Your untranslated text!\n\nMore text!"} {
The examples below show how to export CSVs from SQLite and PostgreSQL. The final example shows how to export CSVs from other databases using Pandas.
sqlite3 -header -csv database.db "SELECT id, column1, column2 FROM table_name" > output.csv
sqlite3 database.db ".mode csv" ".import --csv --nulls '<NA>' output.csv new_table_name"
psql -h hostname -U username -d database_name -c "\copy (SELECT id, column1, column2 FROM table_name) TO '/path/to/filename.csv' WITH CSV HEADER"
psql -h hostname -U username -d database_name -c "\copy translated_table_name(id, column1, column2) FROM '/path/to/filename.csv' WITH CSV HEADER NULL AS '<NA>'"
Requires pandas
and sqlalchemy
.
import pandas
= pandas.read_sql('select id, column1, column2 from table_name', sqlalchemy_connection)
df 'file.csv', index=False) df.to_csv(
import pandas
= pandas.read_csv('file.csv', na_values='<NA>')
df 'translations', sqlalchemy_connection, if_exists='replace') df.to_sql(