Batch Translation: Database Imports and Exports

< Return to job submission

This document describes how to prepare a CSV or JSONL file that can be read by Batch Translation, and how to read the resulting file back into your database.

All examples below show id, column1, column2 as the example columns. Replace them with the column containing the text to be translated and the columns necessary to understand the per-row context. Make sure the context columns have human-readable names. To this end, you can alias the columns while selecting:

 -- if you include a column named `id`, it's used as a unique identifier
select comment_id as id,
-- the name of the column being translated is not fed to the LLM
comment_tx,
-- make sure the LLM knows that 5 means 5/10, not 5/5
rating as rating_out_of_10
-- this is the title of the blog post, not the title of the comment
ps_title as blog_post_title,
from table_name

Exporting to CSV is easiest if you use PostgreSQL or SQLite, as both databases have native support for RFC 4180 CSV, the specific CSV format Batch Translation uses. Pandas is a Python package allows you to export compatible CSVs from other databases. If you are using neither PostgreSQL nor SQLite and you prefer not to use Pandas, you’ll probably want to use the JSONL format. You may use the following LLM prompt:

In [programming language] using [ORM], write a script to export [object] to a JSONL file. This is the model definition:

[ORM model]

[relevant code]

JSONL is a format in which each line is a valid JSON object. The easiest way to perform a single translation is to upload a one-line JSON object. This is useful when you have one large document you’d like to translate:

{"text": "Your untranslated text!\n\nMore text!"}

The examples below show how to export CSVs from SQLite and PostgreSQL. The final example shows how to export CSVs from other databases using Pandas.

SQLite

Export

sqlite3 -header -csv database.db "SELECT id, column1, column2 FROM table_name" > output.csv

Import

sqlite3 database.db ".mode csv" ".import --csv --nulls '<NA>' output.csv new_table_name"

PostgreSQL

Export

psql -h hostname -U username -d database_name -c "\copy (SELECT id, column1, column2 FROM table_name) TO '/path/to/filename.csv' WITH CSV HEADER"

Import

psql -h hostname -U username -d database_name -c "\copy translated_table_name(id, column1, column2) FROM '/path/to/filename.csv' WITH CSV HEADER NULL AS '<NA>'"

Python

Requires pandas and sqlalchemy.

Export

import pandas
df = pandas.read_sql('select id, column1, column2 from table_name', sqlalchemy_connection)
df.to_csv('file.csv', index=False)

Import

import pandas
df = pandas.read_csv('file.csv', na_values='<NA>')
df.to_sql('translations', sqlalchemy_connection, if_exists='replace')