Direct Connect: Pulling transactions as CSV files from banks via Plaid directly
Have you ever wondered if there’s a service that connects to all the major financial institutions and imports transactions as Beancount files for you automatically? It is the dream of many plaintext accounting book users. And yes, we made that possible with our BeanHub Connect features based on Plaid integration a while back. Please read our article BeanHub Connect - one giant leap with fully automatic bank transactions import from 12,000+ financial institutions in 17 countries for all Beancount users! to learn more about it.
However, what if you don’t want to use BeanHub to host your accounting books? Or, as a ledger, hledger, or GnuCash user, you are not even using Beancount, but you still want to have a fully automatic workflow of importing transactions. What do you do? We heard you! We are glad to announce our new feature, BeanHub Direct Connect – which allows you to pull bank transaction data from Plaid via BeanHub directly into your local computer with or without hosting your Beancount books on BeanHub!
Once you have it established, you only need to run our new BeanHub-CLI dump command like this:
bh connect dump --sync
It will automatically sync all your bank data from Plaid, download the CSV files, and unpack them into your current folder.
$tree
.
└── import-data
└── connect
├── Chase
│ ├── Checking 1234
│ │ ├── 2022.csv
│ │ ├── 2023.csv
│ │ ├── 2024.csv
│ │ └── 2025.csv
│ └── Saving 5678
│ ├── 2022.csv
│ ├── 2023.csv
│ └── 2024.csv
├── Citibank Online
│ ├── Checking 0000
│ │ ├── 2022.csv
│ │ ├── 2023.csv
│ │ ├── 2024.csv
│ │ └── 2025.csv
│ └── Saving 1111
│ ├── 2022.csv
│ ├── 2023.csv
│ └── 2024.csv
└── Wells Fargo
├── Checking 5555
│ ├── 2022.csv
│ ├── 2023.csv
│ ├── 2024.csv
│ └── 2025.csv
└── Saving 6666
├── 2022.csv
├── 2023.csv
└── 2024.csv
The bank transactions will be in a well-defined CSV format like this:
date,name,amount,pending,website,datetime,logo_url,account_id,category_id,check_number,account_owner,merchant_name,transaction_id,authorized_date,payment_channel,transaction_code,transaction_type,iso_currency_code,merchant_entity_id,authorized_datetime,pending_transaction_id,unofficial_currency_code,personal_finance_category_icon_url,counterparties__name,counterparties__type,counterparties__website,counterparties__logo_url,counterparties__entity_id,counterparties__phone_number,counterparties__confidence_level,personal_finance_category__primary,personal_finance_category__detailed,personal_finance_category__confidence_level
2022-08-29,SparkFun,89.4,False,,,,a8x7Joej3BCeNv4aolwlSLoxerDBrbFZmLogG,13005000,,,FUN,oGpyZx5ALJfo5r1J4LnLTDbgA1dpRdfoVgEv3,2022-08-28,in store,,place,USD,,,,,https://plaid-category-icons.plaid.com/PFC_ENTERTAINMENT.png,FUN,merchant,,,,,LOW,ENTERTAINMENT,ENTERTAINMENT_SPORTING_EVENTS_AMUSEMENT_PARKS_AND_MUSEUMS,LOW
2022-08-30,McDonald's,12.0,False,mcdonalds.com,,https://plaid-merchant-logos.plaid.com/mcdonalds_619.png,a8x7Joej3BCeNv4aolwlSLoxerDBrbFZmLogG,13005032,,,McDonald's,LrEPGgV8b4CDg59RwZ8ZsvklJP8pw8FkW1w3n,2022-08-30,in store,,place,USD,vzWXDWBjB06j5BJoD3Jo84OJZg7JJzmqOZA22,,,,https://plaid-category-icons.plaid.com/PFC_FOOD_AND_DRINK.png,McDonald's,merchant,mcdonalds.com,https://plaid-merchant-logos.plaid.com/mcdonalds_619.png,vzWXDWBjB06j5BJoD3Jo84OJZg7JJzmqOZA22,,VERY_HIGH,FOOD_AND_DRINK,FOOD_AND_DRINK_FAST_FOOD,VERY_HIGH
2022-08-30,Starbucks,4.33,False,starbucks.com,,https://plaid-merchant-logos.plaid.com/starbucks_956.png,a8x7Joej3BCeNv4aolwlSLoxerDBrbFZmLogG,13005043,,,Starbucks,pyoAba5GXPFeJzD94bZbS7xgkmZdPZCpQBaXZ,2022-08-30,in store,,place,USD,NZAJQ5wYdo1W1p39X5q5gpb15OMe39pj4pJBb,,,,https://plaid-category-icons.plaid.com/PFC_FOOD_AND_DRINK.png,Starbucks,merchant,starbucks.com,https://plaid-merchant-logos.plaid.com/starbucks_956.png,NZAJQ5wYdo1W1p39X5q5gpb15OMe39pj4pJBb,,VERY_HIGH,FOOD_AND_DRINK,FOOD_AND_DRINK_COFFEE,VERY_HIGH
...
You can further process it with an open-sourced tool, beanhub-import, or your own importing tool. With beanhub-import, like this:
bh import
And boom! Here you go. You can see that it imports data from the local CSV file based on your pre-defined rules.
Many people may ask, why are you pushing out a new feature that makes users depend on your service less? Isn’t that the more vendor lock-in, the better for the business? Well, yes, indeed. But we are firm believers in free software. Everyone should be able to choose whatever tool they feel comfortable with. They should also be able to have full control of their data. Allowing users to import CSV files easily directly opens the door for the whole plaintext accounting community, not just BeanHub or Beancount users. Be it ledger, hledger, or GnuCash, as long as a program consumes CSV files and converts them into transactions automatically, you can have an automatic workflow. We hope more people get into the plaintext double-entry accounting and other open-source accounting book software community.
How do you set up a Connect repository?
We made the whole process as easy as possible. First, you need to sign up for an account and become a paid member.
We provide a 30-day risk-free trial period for the paid membership.
So, if you don’t find it helpful, you can cancel at any time easily, no questions asked. Once you have created an account and signed in, the next step is to create a Connect Repository.
Now, you open the list of the repository page and click the Create Connect Repository
button.
Once you’ve created the Connect Repository, you visit the repository page by clicking the link on the repositories page.
Then you click the Connected Banks
link on the left menu:
On the Connected Banks
page, you click the Connect Bank
button.
It should prompt you to a Plaid dialogue asking which bank you would like to connect to and walk you through the authentication and authorization process.
Once this is done, you have a connected bank now.
We run sync periodically for you with a default schedule.
You can change the schedule or settings of this bank connection by clicking the Edit
button.
Install BeanHub-CLI
Next, to pull the transaction data into your local environment, you need to install [BeanHub-CLI]https://beanhub-cli-docs.beanhub.io/) first. It’s an open-source command line tool we build, providing many useful features for BeanHub users and all Beancount users. You need Python greater or equal to 3.11 installed. Then, you can run:
pip install "beanhub-cli[connect]>=2.0.0"
Please note that connect commands require some extra optional Python dependencies.
Therefore, when installing the BeanHub-CLI Python package, please include connect
as the extra dependency.
Login
Next, you need to log into your BeanHub account in your local environment. It’s very easy to log in to your BeanHub account with BeanHub-CLI. Simply run:
bh login
It will open up an Access Token creation page on the BeanHub website in your browser like this:
It may ask you to log into your BeanHub account on the browser first if you haven’t logged in yet. If your local environment doesn’t allow BeanHub-CLI to open the page in the browser automatically, you can copy the shown URL and open the page manually. Once the page is present, please compare the authentication code shown by the command line tool and the one shown on the website to ensure they are identical before granting access. You can change the scope of access grant for your login session when creating it or change it later on the BeanHub Access Token management page. For more information about the login command, please read our document here.
Sync
Next, to ensure that all the transaction data in BeanHub’s database is up-to-date, you can run a sync command to make BeanHub update transactions from the banks for you immediately instead of waiting until the scheduled time arrives. To run the sync command, simply type:
bh connect sync
Dump
The final step. The dump command collects all the transactions we synced from Plaid previously with either the sync command or scheduled sync operations and dumps them into your local file system as CSV files. To run the command, simply type:
bh connect dump
Since many users may want to run a sync before the dump, we also provide a --sync
argument to make it run sync first, then dump to ensure that your transaction data is up-to-date.
Like this:
bh connect dump --sync
That’s it! Congratulations! Now you have the CSV file ready locally in your current folder, and you can run the beanhub-import command by
bh import
You can also run your own importing tool. It’s up to you! For more information about the sync and dump command, please read our document here.
Data security
Many people refuse to use anything on the cloud. We understand the concerns due to big tech companies abusing people’s trust in the past. Unfortunately, even before you realize it, there’s one kind of cloud you cannot avoid, and you’re already using it, regardless – your bank accounts. Like it or not, once you open a bank account, they have all your data in their computer system. If the bank’s system is compromised, so is your financial data.
Given that there’s no running away from your data being part of someone else’s systems, whether you like it or not, in a modern society, we believe the right approach is understanding the risk and managing it properly. As we mentioned many times, we take data security seriously. We built BeanHub as if it were a payment processor. We adopt modern hardware-based encryption and industry security standards to protect your access token and transaction data from Plaid. But talk is cheap, what about you show us, right? Well, yes! 😉
We want it to be transparent to our users what potential security risks they face when signing up for the service. Here’s how we protect your Plaid data. First, you need to know we protect two types of sensitive data in our database. One is your access token provided by Plaid to pull your data from Plaid API. With that token plus our Plaid API key (which is also heavily protected, of course), anyone can pull your bank transactions from Plaid API servers. If those two values are compromised, it means giving out access to all your bank transactions. Here’s the diagram showing how the encryption is done to protect the access key:
We use AWS KMS as our HSM (Hardware Security Module). We make a GenerateDataKey API call to generate a symmetric encryption key when we get the access token after you’ve connected to a bank via Plaid. The access token will be encrypted, and we immediately drop the original value in plain text. We then keep the encrypted access token in our database. We set very restrictive permissions for the encryption key generated from the GenerateDataKey API call. Therefore, only a particular type of worker in our infrastructure has permission to decrypt the access token. These types of workers are dedicated only to processing BeanHub Connect tasks. We call them the Connect Workers. We designed and built Connect Workers to be simple and robust. Since they don’t process any other tasks or requests, they are behind firewalls, making compromising harder.
Now, we have the access token stored securely. Next, here’s how we pull your transaction data and encrypt them:
Whenever we need to sync data from Plaid API, we create a new task for the Connect Worker to pick up and process.
It will decrypt the encrypted access token and then use it to pull transaction updates from the Plaid API server.
We generate another data key via the GenerateDataKey API call and encrypt your transactions from the same bank.
We create a corresponding dump request task when you run the bh connect dump
command.
A Connect Worker will pick up the task, decrypt the transactions, and encrypt the transaction CSV files using the public key generated by BeanHub-CLI before uploading it to Amazon S3.
Finally, the BeanHub-CLI dump command will download the file from the returned URL of the dump request and decrypt the file with the corresponding private key to the provided public key it generated previously.
Frequently asked questions
Q: Do you sell my data to a third party?
A: No. We only sell BeanHub as a service to you.
Q: Do you use my data to train machine learning models?
A: No. We don’t have a plan for it, but if there’s anything like that in the future, we will ask for your explicit consent and adopt proper anonymization.
Q: Can I delete my Plaid access token and transaction data from BeanHub?
A: Yes.
You can delete any connected bank from your Connected Banks
page in your repository, and it will also delete all of its corresponding Plaid access token and transaction data in our database.
There might still be some encrypted data remaining in our database backup, and they will stay for a while until it falls out of the backup window.
Please note that if you have Beancount transactions generated from Plaid transactions already committed in BeanHub’s Git repository, the already committed Beancount files will remain in the Git history.
Q: With your security measurements, does it mean BeanHub will never get hacked?
A: No. There’s always a risk of a computer system getting hacked in a non-air gap environment (fun facts: recent security research shows a possibility of leaking data from an air-gap environment using memory as the antenna). We design and build the system as secure as possible. Usually, attackers evaluate the benefits (typically financial incentives) of breaking the system and compare it with the cost. Even if BeanHub gets hacked, unless they can also compromise our Connect Workers, our data encryption protection makes it hard for them to extract your data. With that in mind, we believe your bank transaction data from Plaid in BeanHub is secure.
Q: What if AWS gets hacked?
A: We cannot speak on AWS’s behavior in terms of their security measurements. Please read AWS’s security page to learn more. However, if an attacker can compromise AWS, BeanHub will probably be the last target they are interested in.
Q: What if Plaid gets hacked?
A: We cannot speak on Plaid’s behavior in terms of their security measurements. Please read Plaid’s safety page to learn more. However, based on our experience in fintech, a company like Plaid usually adopts multiple layers of security measurements and will have a dedicated security team looking into security risks.
Q: What if my BeanHub account gets compromised
A: We provide both virtual and hardware-based two-factor authentication to all of our users, and we highly recommend that all users adopt a hardware-based token as the second factor, such as YubiKey. In that way, you can significantly reduce the possibility of getting hacked. Currently we only require the second factor authentcation during login process. However, we plan to extend optional second factor authentication to include critical operations such as dumping requests. That way, even if someone steals your password without your second-factor token, they won’t be able to perform anything meaningful, and thus, your account will be safe with us. While we have a plan, we usually wait until an actual user asks for it. If this stops you from using BeanHub, please get in touch with us at support@beanhub.io, and we will prioritize implementing it.
Q: What if Plaid shut down?
A: Plaid is a unicorn fintech company with a valuation of $13.4 billion at the time of writing, providing the API for BeanHub to fetch transaction data from all major financial institutions. Countless fintech startups rely on it to fetch transaction data. While we don’t want to jinx ourselves, the possibility of BeanHub shutting down might be higher than Plaid. If that happens and we are still around, we will find an alternative provider to integrate with them. The good news is that it won’t affect your already-ingested transactions in your plaintext accounting books. The bad news is that you may need to adjust your importing rules if there’s a change in the CSV format. But regardless, you will still keep your data locally on your computer or in BeanHub’s Git repository.
Q: What if BeanHub shut down?
A: In the worst case, say BeanHub shut down without notice (very unlikely). Given the nature of plain textbook accounting, you should have a local copy of all your accounting books on your computer using a Connect Repository. You can move on from there to look for an alternative transaction data source. If it’s a proven market at that time, there will soon be other service providers to fill our position.
Q: Why don’t I sign up for a Plaid account and access their API directly?
A: We build BeanHub to make it easier for technical or non-technical users to enjoy the benefits of plaintext accounting. Connecting to Plaid yourself is doable if you have the required technical skills, and it is highly encouraged if that meets your needs. Our beanhub-import tool is open-source. The import engine will work regardless of whether you provide the CSV files in the same well-defined format. There will be an up-front cost of time and effort in learning how the API works and how to implement it correctly. Other than the development cost, some banks require you to pass a security audit before they grant you access to the transaction data. That cost may be higher for individual users to justify if they need to access banks that require a security audit.
Q: What banks do you support
A: Plaid determines what banks we support. You can use this tool to find out if it includes your bank. However, please note that we currently cannot justify the cost of supporting many banks in the EU due to different pricing structures. But we are working hard to make that happen. Please let us know by contacting support@beanhub.io if you want to use BeanHub, but your bank is from the EU, and it stops you from using BeanHub.
Q: Why can’t BeanHub encrypt Plaid’s access token and transaction data with my encryption keys instead?
A: Yes. We can make this happen by making it more secure and trustless in BeanHub’s system. In that way, even if all of our system is hacked, the attacker won’t be able to decrypt your data without your secret key. However, doing so means all the operations need authorization from your end. And if you lose the key, there will be no way to recover the data. User experience will suffer. While we are also a big fan of zero-trust architecture, the fact is that we need to keep the Plaid API key a secret. Therefore, there’s no way to achieve complete zero trust. We always listen to our users before we build something. Please let us know by contacting support@beanhub.io if you want to see a security improvement like this.
Q: Can I convert a Connect Repository into a BeanHub Git repository later?
A: Yes. While we have not implemented this feature yet, we plan to make it possible to convert your Connect Repository into a Git repository. If you need this feature, please get in touch with our support team via support@beanhub.io, and we will prioritize your request.
Q: Can I convert a BeanHub Git repository into a Connect Repository?
A: Yes. While we have not implemented this feature yet, we plan to make it possible to convert your Git repository into a Connect Repository. However, please note that it means you will lose all your Git history with the Git repository itself. If you need this feature, please contact our support team via support@beanhub.io, and we will prioritize your request.
Q: Is BeanHub open source?
A: Part of it. As a for-profit business, we try our best to open source as much as possible. So far, we have already open-sourced many projects. For example, the most important ones:
- BeanHub CLI - The command line tool for BeanHub and Beancount users
- BeanHub Import - The rule-based import engine
- BeanHub Extract - The library for extracting transaction data from CSV files
- BeanHub Forms - The form library to make forms for generating Beancount transactions
- Beancount Pasrer - Standalone parser for Beancount files
- Beancount Black - An opinioned formatter library for Beancount files
Please see our open-source list to learn more about the open-source projects we built for BeanHub.