CSV - how hledger reads CSV data, and the CSV rules file format
hledger can read CSV (Comma Separated Value/Character Separated Value) files as if they were journal files, automatically converting each CSV record into a transaction. (To learn about writing CSV, see CSV output.)
We describe each CSV file’s format with a corresponding rules file. By
default this is named like the CSV file with a
.rules extension added.
Eg when reading
FILE.csv, hledger also looks for
the same directory. You can specify a different rules file with the
--rules-file option. If a rules file is not found, hledger will create
a sample rules file, which you’ll need to adjust.
This file contains rules describing the CSV data (header line, fields layout, date format etc.), and how to construct hledger journal entries (transactions) from it. Often there will also be a list of conditional rules for categorising transactions based on their descriptions. Here’s an overview of the CSV rules; these are described more fully below, after the examples:
||skip one or more header lines or matched CSV records|
||name CSV fields, assign them to hledger fields|
|field assignment||assign a value to one hledger field, with interpolation|
||a custom field separator|
||apply some rules to matched CSV records|
||skip the remaining CSV records|
||describe the format of CSV dates|
||disambiguate record order when there's only one date|
||inline another CSV rules file|
Note, for best error messages when reading CSV files, use a
.ssv file extension or file prefix - see File
There’s an introductory Convert CSV files tutorial on hledger.org.
Here are some sample hledger CSV rules files. See also the full
At minimum, the rules file must identify the date and amount fields, and often it also specifies the date format and how many header lines there are. Here’s a simple CSV file and a rules file for it:
Date, Description, Id, Amount 12/11/2019, Foo, 123, 10.23
# basic.csv.rules skip 1 fields date, description, _, amount date-format %d/%m/%Y
$ hledger print -f basic.csv 2019-11-12 Foo expenses:unknown 10.23 income:unknown -10.23
Default account names are chosen, since we didn’t set them.
Bank of Ireland¶
Here’s a CSV with two amount fields (Debit and Credit), and a balance field, which we can use to add balance assertions, which is not necessary but provides extra error checking:
Date,Details,Debit,Credit,Balance 07/12/2012,LODGMENT 529898,,10.0,131.21 07/12/2012,PAYMENT,5,,126
# bankofireland-checking.csv.rules # skip the header line skip # name the csv fields, and assign some of them as journal entry fields fields date, description, amount-out, amount-in, balance # We generate balance assertions by assigning to "balance" # above, but you may sometimes need to remove these because: # # - the CSV balance differs from the true balance, # by up to 0.0000000000005 in my experience # # - it is sometimes calculated based on non-chronological ordering, # eg when multiple transactions clear on the same day # date is in UK/Ireland format date-format %d/%m/%Y # set the currency currency EUR # set the base account for all txns account1 assets:bank:boi:checking
$ hledger -f bankofireland-checking.csv print 2012-12-07 LODGMENT 529898 assets:bank:boi:checking EUR10.0 = EUR131.2 income:unknown EUR-10.0 2012-12-07 PAYMENT assets:bank:boi:checking EUR-5.0 = EUR126.0 expenses:unknown EUR5.0
The balance assertions don’t raise an error above, because we’re reading directly from CSV, but they will be checked if these entries are imported into a journal file.
Here we convert amazon.com order history, and use an if block to generate a third posting if there’s a fee. (In practice you’d probably get this data from your bank instead, but it’s an example.)
"Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID" "Jul 29, 2012","Payment","To","Foo.","Completed","$20.00","$0.00","16000000000000DGLNJPI1P9B8DKPVHL" "Jul 30, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$1.00","17LA58JSKRD4HDGLNJPI1P9B8DKPVHL"
# amazon-orders.csv.rules # skip one header line skip 1 # name the csv fields, and assign the transaction's date, amount and code. # Avoided the "status" and "amount" hledger field names to prevent confusion. fields date, _, toorfrom, name, amzstatus, amzamount, fees, code # how to parse the date date-format %b %-d, %Y # combine two fields to make the description description %toorfrom %name # save the status as a tag comment status:%amzstatus # set the base account for all transactions account1 assets:amazon # leave amount1 blank so it can balance the other(s). # I'm assuming amzamount excludes the fees, don't remember # set a generic account2 account2 expenses:misc amount2 %amzamount # and maybe refine it further: #include categorisation.rules # add a third posting for fees, but only if they are non-zero. if %fees [1-9] account3 expenses:fees amount3 %fees
$ hledger -f amazon-orders.csv print 2012-07-29 (16000000000000DGLNJPI1P9B8DKPVHL) To Foo. ; status:Completed assets:amazon expenses:misc $20.00 2012-07-30 (17LA58JSKRD4HDGLNJPI1P9B8DKPVHL) To Adapteva, Inc. ; status:Completed assets:amazon expenses:misc $25.00 expenses:fees $1.00
Here’s a real-world rules file for (customised) Paypal CSV, with some Paypal-specific rules, and a second rules file included:
"Date","Time","TimeZone","Name","Type","Status","Currency","Gross","Fee","Net","From Email Address","To Email Address","Transaction ID","Item Title","Item ID","Reference Txn ID","Receipt ID","Balance","Note" "10/01/2019","03:46:20","PDT","Calm Radio","Subscription Payment","Completed","USD","-6.99","0.00","-6.99","email@example.com","firstname.lastname@example.org","60P57143A8206782E","MONTHLY - $1 for the first 2 Months: Me - Order 99309. Item total: $1.00 USD first 2 months, then $6.99 / Month","","I-R8YLY094FJYR","","-6.99","" "10/01/2019","03:46:20","PDT","","Bank Deposit to PP Account ","Pending","USD","6.99","0.00","6.99","","email@example.com","0TU1544T080463733","","","60P57143A8206782E","","0.00","" "10/01/2019","08:57:01","PDT","Patreon","PreApproved Payment Bill User Payment","Completed","USD","-7.00","0.00","-7.00","firstname.lastname@example.org","email@example.com","2722394R5F586712G","Patreon* Membership","","B-0PG93074E7M86381M","","-7.00","" "10/01/2019","08:57:01","PDT","","Bank Deposit to PP Account ","Pending","USD","7.00","0.00","7.00","","firstname.lastname@example.org","71854087RG994194F","Patreon* Membership","","2722394R5F586712G","","0.00","" "10/19/2019","03:02:12","PDT","Wikimedia Foundation, Inc.","Subscription Payment","Completed","USD","-2.00","0.00","-2.00","email@example.com","firstname.lastname@example.org","K9U43044RY432050M","Monthly donation to the Wikimedia Foundation","","I-R5C3YUS3285L","","-2.00","" "10/19/2019","03:02:12","PDT","","Bank Deposit to PP Account ","Pending","USD","2.00","0.00","2.00","","email@example.com","3XJ107139A851061F","","","K9U43044RY432050M","","0.00","" "10/22/2019","05:07:06","PDT","Noble Benefactor","Subscription Payment","Completed","USD","10.00","-0.59","9.41","firstname.lastname@example.org","email@example.com","6L8L1662YP1334033","Joyful Systems","","I-KC9VBGY2GWDB","","9.41",""
# paypal-custom.csv.rules # Tips: # Export from Activity -> Statements -> Custom -> Activity download # Suggested transaction type: "Balance affecting" # Paypal's default fields in 2018 were: # "Date","Time","TimeZone","Name","Type","Status","Currency","Gross","Fee","Net","From Email Address","To Email Address","Transaction ID","Shipping Address","Address Status","Item Title","Item ID","Shipping and Handling Amount","Insurance Amount","Sales Tax","Option 1 Name","Option 1 Value","Option 2 Name","Option 2 Value","Reference Txn ID","Invoice Number","Custom Number","Quantity","Receipt ID","Balance","Address Line 1","Address Line 2/District/Neighborhood","Town/City","State/Province/Region/County/Territory/Prefecture/Republic","Zip/Postal Code","Country","Contact Phone Number","Subject","Note","Country Code","Balance Impact" # This rules file assumes the following more detailed fields, configured in "Customize report fields": # "Date","Time","TimeZone","Name","Type","Status","Currency","Gross","Fee","Net","From Email Address","To Email Address","Transaction ID","Item Title","Item ID","Reference Txn ID","Receipt ID","Balance","Note" fields date, time, timezone, description_, type, status_, currency, grossamount, feeamount, netamount, fromemail, toemail, code, itemtitle, itemid, referencetxnid, receiptid, balance, note skip 1 date-format %-m/%-d/%Y # ignore some paypal events if In Progress Temporary Hold Update to skip # add more fields to the description description %description_ %itemtitle # save some other fields as tags comment itemid:%itemid, fromemail:%fromemail, toemail:%toemail, time:%time, type:%type, status:%status_ # convert to short currency symbols if %currency USD currency $ if %currency EUR currency E if %currency GBP currency P # generate postings # the first posting will be the money leaving/entering my paypal account # (negative means leaving my account, in all amount fields) account1 assets:online:paypal amount1 %netamount # the second posting will be money sent to/received from other party # (account2 is set below) amount2 -%grossamount # if there's a fee, add a third posting for the money taken by paypal. if %feeamount [1-9] account3 expenses:banking:paypal amount3 -%feeamount comment3 business: # choose an account for the second posting # override the default account names: # if the amount is positive, it's income (a debit) if %grossamount ^[^-] account2 income:unknown # if negative, it's an expense (a credit) if %grossamount ^- account2 expenses:unknown # apply common rules for setting account2 & other tweaks include common.rules # apply some overrides specific to this csv # Transfers from/to bank. These are usually marked Pending, # which can be disregarded in this case. if Bank Account Bank Deposit to PP Account description %type for %referencetxnid %itemtitle account2 assets:bank:wf:pchecking account1 assets:online:paypal # Currency conversions if Currency Conversion account2 equity:currency conversion
# common.rules if darcs noble benefactor account2 revenues:foss donations:darcshub comment2 business: if Calm Radio account2 expenses:online:apps if electronic frontier foundation Patreon wikimedia Advent of Code account2 expenses:dues if Google account2 expenses:online:apps description google | music
$ hledger -f paypal-custom.csv print 2019-10-01 (60P57143A8206782E) Calm Radio MONTHLY - $1 for the first 2 Months: Me - Order 99309. Item total: $1.00 USD first 2 months, then $6.99 / Month ; itemid:, fromemail:firstname.lastname@example.org, toemail:email@example.com, time:03:46:20, type:Subscription Payment, status:Completed assets:online:paypal $-6.99 = $-6.99 expenses:online:apps $6.99 2019-10-01 (0TU1544T080463733) Bank Deposit to PP Account for 60P57143A8206782E ; itemid:, fromemail:, toemail:firstname.lastname@example.org, time:03:46:20, type:Bank Deposit to PP Account, status:Pending assets:online:paypal $6.99 = $0.00 assets:bank:wf:pchecking $-6.99 2019-10-01 (2722394R5F586712G) Patreon Patreon* Membership ; itemid:, fromemail:email@example.com, toemail:firstname.lastname@example.org, time:08:57:01, type:PreApproved Payment Bill User Payment, status:Completed assets:online:paypal $-7.00 = $-7.00 expenses:dues $7.00 2019-10-01 (71854087RG994194F) Bank Deposit to PP Account for 2722394R5F586712G Patreon* Membership ; itemid:, fromemail:, toemail:email@example.com, time:08:57:01, type:Bank Deposit to PP Account, status:Pending assets:online:paypal $7.00 = $0.00 assets:bank:wf:pchecking $-7.00 2019-10-19 (K9U43044RY432050M) Wikimedia Foundation, Inc. Monthly donation to the Wikimedia Foundation ; itemid:, fromemail:firstname.lastname@example.org, toemail:email@example.com, time:03:02:12, type:Subscription Payment, status:Completed assets:online:paypal $-2.00 = $-2.00 expenses:dues $2.00 expenses:banking:paypal ; business: 2019-10-19 (3XJ107139A851061F) Bank Deposit to PP Account for K9U43044RY432050M ; itemid:, fromemail:, toemail:firstname.lastname@example.org, time:03:02:12, type:Bank Deposit to PP Account, status:Pending assets:online:paypal $2.00 = $0.00 assets:bank:wf:pchecking $-2.00 2019-10-22 (6L8L1662YP1334033) Noble Benefactor Joyful Systems ; itemid:, fromemail:email@example.com, toemail:firstname.lastname@example.org, time:05:07:06, type:Subscription Payment, status:Completed assets:online:paypal $9.41 = $9.41 revenues:foss donations:darcshub $-10.00 ; business: expenses:banking:paypal $0.59 ; business:
The following kinds of rule can appear in the rules file, in any order.
Blank lines and lines beginning with
; are ignored.
The word “skip” followed by a number (or no number, meaning 1) tells hledger to ignore this many non-empty lines preceding the CSV data. (Empty/blank lines are skipped automatically.) You’ll need this whenever your CSV data contains header lines.
It also has a second purpose: it can be used inside if blocks to ignore certain CSV records (described below).
fields FIELDNAME1, FIELDNAME2, ...
A fields list (the word “fields” followed by comma-separated field names) is the quick way to assign CSV field values to hledger fields. It does two things:
it names the CSV fields. This is optional, but can be convenient later for interpolating them.
when you use a standard hledger field name, it assigns the CSV value to that part of the hledger transaction.
Here’s an example that says “use the 1st, 2nd and 4th fields as the transaction’s date, description and amount; name the last two fields for later reference; and ignore the others”:
fields date, description, , amount, , , somefield, anotherfield
Field names may not contain whitespace. Fields you don’t care about can be left unnamed. Currently there must be least two items (there must be at least one comma).
Note, always use comma in the fields list, even if your CSV uses another separator character.
Here are the standard hledger field/pseudo-field names. For more about the transaction parts they refer to, see the manual for hledger’s journal format.
Transaction field names¶
comment can be used
to form the transaction’s first line.
Posting field names¶
accountN, where N is 1 to 9, generates a
posting, with that account name. Most often
there are two postings, so you’ll want to set
If a posting’s account name is left unset but its amount is set, a
default account name will be chosen (like expenses:unknown or
amountN sets posting N’s amount. Or,
amount with no N sets posting
1’s. If the CSV has debits and credits in separate fields, use
amountN-out instead. Or
with no N for posting 1.
For convenience and backwards compatibility, if you set the amount of posting 1 only, a second posting with the negative amount will be generated automatically. (Unless the account name is parenthesised indicating an unbalanced posting.)
If the CSV has the currency symbol in a separate field, you can use
currencyN to prepend it to posting N’s amount.
currency with no N
affects ALL postings.
See TIPS below for more about setting amounts and currency.
Instead of or in addition to a fields list, you can use a
“field assignment” rule to set the value of a single hledger field, by
writing its name (any of the standard hledger field names above)
followed by a text value. The value may contain interpolated CSV fields,
referenced by their 1-based position in the CSV record (
%N), or by the
name they were given in the fields list (
# set the amount to the 4th CSV field, with " USD" appended amount %4 USD # combine three fields to make a comment, containing note: and date: tags comment note: %somefield - %anotherfield, date: %1
Interpolation strips outer whitespace (so a CSV value like
" 1 "
1 when interpolated)
TIPS below for more about referencing other fields.
You can use the
separator directive to read other kinds of
character-separated data. Eg to read SSV (Semicolon Separated Values),
The separator directive accepts exactly one single byte character as a
separator. To specify whitespace characters, you may use the special
SPACE. Eg to read TSV (Tab Separated Values), use:
See also: File Extension.
if MATCHER RULE if MATCHER MATCHER MATCHER RULE RULE
Conditional blocks (“if blocks”) are a block of rules that are applied only to CSV records which match certain patterns. They are often used for customising account names based on transaction descriptions.
Each MATCHER can be a record matcher, which looks like this:
REGEX is a case-insensitive regular expression which tries to match
anywhere within the CSV record. It is a POSIX extended regular
expressions with some additions (see Regular
the hledger manual). Note: the “CSV record” it is matched against is not
the original record, but a synthetic one, with enclosing double quotes
or whitespace removed, and always comma-separated. (Eg, an SSV record
2020-01-01; "Acme, Inc."; 1,000 appears to REGEX as
Or, MATCHER can be a field matcher, like this:
which matches just the content of a particular CSV field. CSVFIELD is a
percent sign followed by the field’s name or column number, like
A single matcher can be written on the same line as the “if”; or multiple matchers can be written on the following lines, non-indented. Multiple matchers are OR’d (any one of them can match).
After the patterns there should be one or more rules to apply, all indented by at least one space. Three kinds of rule are allowed in conditional blocks:
field assignments (to set a hledger field)
skip (to skip the matched CSV record)
end (to skip all remaining CSV records).
# if the CSV record contains "groceries", set account2 to "expenses:groceries" if groceries account2 expenses:groceries
# if the CSV record contains any of these patterns, set account2 and comment as shown if monthly service fee atm transaction fee banking thru software account2 expenses:business:banking comment XXX deductible ? check it
This rule can be used inside if blocks (only), to make hledger stop reading this CSV file and move on to the next input file, or to command execution. Eg:
# ignore everything following the first empty record if ,,,, end
This is a helper for the
date2) fields. If your CSV dates
are not formatted like
you’ll need to add a date-format rule describing them with a strptime
date parsing pattern, which must parse the CSV date value completely.
# MM/DD/YY date-format %m/%d/%y
# D/M/YYYY # The - makes leading zeros optional. date-format %-d/%-m/%Y
# YYYY-Mmm-DD date-format %Y-%h-%d
# M/D/YYYY HH:MM AM some other junk # Note the time and junk must be fully parsed, though only the date is used. date-format %-m/%-d/%Y %l:%M %p some other junk
For the supported strptime syntax, see:
hledger always sorts the generated transactions by date. Transactions on the same date should appear in the same order as their CSV records, as hledger can usually auto-detect whether the CSV’s normal order is oldest first or newest first. But if all of the following are true:
the CSV might sometimes contain just one day of data (all records having the same date)
the CSV records are normally in reverse chronological order (newest at the top)
and you care about preserving the order of same-day transactions
then, you should add the
newest-first rule as a hint. Eg:
# tell hledger explicitly that the CSV is normally newest first newest-first
This includes the contents of another CSV rules file at this point.
RULESFILE is an absolute file path or a path relative to the current
file’s directory. This can be useful for sharing common rules between
several rules files, eg:
# someaccount.csv.rules ## someaccount-specific rules fields date,description,amount account1 assets:someaccount account2 expenses:misc ## common rules include categorisation.rules
Balance assertions generated by assigning to
balanceN are of the simple
= type by default,
which is a
assertion. You may find the subaccount-including variants more useful,
eg if you have created some virtual subaccounts of checking to help with
budgeting. You can select a different type of assertion with the
# balance assertions will consider all commodities and all subaccounts balance-type ==*
Here are the balance assertion types for quick reference:
= single commodity, exclude subaccounts =* single commodity, include subaccounts == multi commodity, exclude subaccounts ==* multi commodity, include subaccounts
It’s a good idea to get rapid feedback while creating/troubleshooting CSV rules. Here’s a good way, using entr from http://eradman.com/entrproject :
$ ls foo.csv* | entr bash -c 'echo ----; hledger -f foo.csv print desc:SOMEDESC'
A desc: query (eg) is used to select just one, or a few, transactions of interest. “bash -c” is used to run multiple commands, so we can echo a separator each time the command re-runs, making it easier to read the output.
hledger accepts CSV conforming to RFC 4180. When CSV values are enclosed in quotes, note:
they must be double quotes (not single quotes)
spaces outside the quotes are not allowed
CSV (“Character Separated Values”) files should be named with one of
these filename extensions:
.tsv. Or, the file path
should be prefixed with one of
tsv:. This helps
hledger identify the format and show the right error messages. For
$ hledger -f foo.ssv print
$ cat foo | hledger -f ssv:- foo
More about this: Input files in the hledger manual.
Reading multiple CSV files¶
If you use multiple
-f options to read multiple CSV files at once,
hledger will look for a correspondingly-named rules file for each CSV
file. But if you use the
--rules-file option, that rules file will be
used for all the CSV files.
After reading a CSV file, hledger post-processes and validates the generated journal entries as it would for a journal file - balancing them, applying balance assignments, and canonicalising amount styles. Any errors at this stage will be reported in the usual way, displaying the problem entry.
There is one exception: balance assertions, if you have generated them, will not be checked, since normally these will work only when the CSV data is part of the main journal. If you do need to check balance assertions generated from CSV right away, pipe into another hledger:
$ hledger -f file.csv print | hledger -f- print
When you download a CSV file periodically, eg to get your latest bank transactions, the new file may overlap with the old one, containing some of the same records.
The import command will (a) detect the new
transactions, and (b) append just those transactions to your main
journal. It is idempotent, so you don’t have to remember how many times
you ran it or with which version of the CSV. (It keeps state in a hidden
.latest.FILE.csv file.) This is the easiest way to import CSV data.
# download the latest CSV files, then run this command. # Note, no -f flags needed here. $ hledger import *.csv [--dry]
This method works for most CSV files. (Where records have a stable chronological order, and new records appear only at the new end.)
A number of other tools and workflows, hledger-specific and otherwise, exist for converting, deduplicating, classifying and managing CSV data. See:
A posting amount can be set in one of these ways:
by assigning (with a fields list or field assigment) to
amountN(posting N’s amount) or
amount(posting 1’s amount)
by assigning to
amount-out). For each CSV record, whichever of these has a non-zero value will be used, with appropriate sign. If both contain a non-zero value, this may not work.
by assigning to
balance) instead of the above, setting the amount indirectly via a balance assignment. If you do this the default account name may be wrong, so you should set that explicitly.
There is some special handling for an amount’s sign:
If an amount value is parenthesised, it will be de-parenthesised and sign-flipped.
If an amount value begins with a double minus sign, those cancel out and are removed.
If an amount value begins with a plus sign, that will be removed
If the currency/commodity symbol is included in the CSV’s amount field(s), you don’t have to do anything special.
If the currency is provided as a separate CSV field, you can either:
assign that to
currency, which adds it to all posting amounts. The symbol will prepended to the amount quantity (on the left side). If you write a trailing space after the symbol, there will be a space between symbol and amount (an exception to the usual whitespace stripping).
or assign it to
currencyNwhich adds it to posting N’s amount only.
or for more control, construct the amount from symbol and quantity using field assignment, eg:
fields date,description,currency,quantity # add currency symbol on the right: amount %quantity %currency
Referencing other fields¶
In field assignments, you can interpolate only CSV fields, not hledger fields. In the example below, there’s both a CSV field and a hledger field named amount1, but %amount1 always means the CSV field, not the hledger field:
# Name the third CSV field "amount1" fields date,description,amount1 # Set hledger's amount1 to the CSV amount1 field followed by USD amount1 %amount1 USD # Set comment to the CSV amount1 (not the amount1 assigned above) comment %amount1
Here, since there’s no CSV amount1 field, %amount1 will produce a literal “amount1”:
fields date,description,csvamount amount1 %csvamount USD # Can't interpolate amount1 here comment %amount1
When there are multiple field assignments to the same hledger field, only the last one takes effect. Here, comment’s value will be be B, or C if “something” is matched, but never A:
comment A comment B if something comment C
How CSV rules are evaluated¶
Here’s how to think of CSV rules being evaluated (if you really need to). First,
include- all includes are inlined, from top to bottom, depth first. (At each include point the file is inlined and scanned for further includes, recursively, before proceeding.)
Then “global” rules are evaluated, top to bottom. If a rule is repeated, the last one wins:
skip(at top level)
fields- names the CSV fields, optionally sets up initial assignments to hledger fields
Then for each CSV record in turn:
ifblocks. If any of them contain a
endrule, skip all remaining CSV records. Otherwise if any of them contain a
skiprule, skip that many CSV records. If there are multiple matched
skiprules, the first one wins.
collect all field assignments at top level and in matched
ifblocks. When there are multiple assignments for a field, keep only the last one.
compute a value for each hledger field - either the one that was assigned to it (and interpolate the %CSVFIELDNAME references), or a default
generate a synthetic hledger transaction from these values.
This is all part of the CSV reader, one of several readers hledger can use to parse input files. When all files have been read successfully, the transactions are passed as input to whichever hledger command the user specified.