Lego Insights with AI
What is Bedrock
I love Legos, but I find it really annoying sorting through a website with an old obscure set from my childhood in mind for a couple of hours. So I am going to create and deploy an AWS Bedrock AI that will help me find the obscure Lego sets from my childhood. We are going to use the data from Rebrickable. AWS Bedrock is a fully managed generative AI platform that allows AWS partners and customers to easily architect and deploy generative AI. This includes automatic embedding of data with integrated embedding models as well as fully managed vector storage.
Steps to Download Data
There are a few different ways to download data from Rebrickable. They provide an API that we
could use, but that requires an API key from Rebrickable. Fortunately, Rebrickable
provides .csv
files for all their website data. This data is publicly available
at this Link. These different files are all
different interconnected tables in a database.
Uploading CloudFormation Template
We're going to be using some demo code that I created to create the agent and it's underlying resources. See the end of the blog post for a link to the repository. For now lets upload the CloudFormation to AWS with this AWS CLI command:
aws cloudformation deploy \ --template-file ./src/bedrock-demo.yml \ --stack-name bedrock-demo \ --profile your-profile \ --region us-east-1 \ --capabilities CAPABILITY_NAMED_IAM
The CAPABILITY_NAMED_IAM
is needed when a stack has custom-named IAM roles.
This capability must be explicitly acknowledged for the stack to start creating.
Uploading Data
We must split up inventory_parts.csv
— this file is too big for the knowledge
base to process in one go. We can use the split
command to split the file up
in the terminal:
split -l 180000 inventory_parts.csv inventory_parts_
This will take a long time as the knowledge base has to process all the lines. Use this command to upload your folder with all the data to S3:
aws --profile your-profile s3 cp ./your-local-folder s3://your-s3-bucket/ --recursive
After the data is uploaded, the CloudFormation should still be in a CREATE_IN_PROGRESS
state. Wait for the CloudFormation to reach a CREATE_COMPLETE
.
Sync Knowledge Base
Syncing the knowledge base will take some time as the embedding model for the
knowledge base has to go through and look at all the data in all {number of files}
files and embed and create vectors of all the data. The embedding AI can
automatically determine how the different .csv
files connect to each other.
There are steps we can take on the structure of our data to optimize this process
but for a quick demo, we leave things as is and hope for a good enough result.
We can also invoke the synchronization job using StartIngestionJob.
Prepare the AI
Once the knowledge base has finished syncing, move over to the Agents console in the Bedrock sidebar. Find your agent and click Prepare on it. The preparation of the AI shouldn't take super long. It's also possible to automatically prepare a bedrock agent with AutoPrepare. however for this demo we've left the manual step.
Asking Questions
Once the AI has completed the preparation process, we are ready to ask it a question to make sure the AI is fully trained. For example:
Question:
What was Lego Mindstorms?
Answer:
Lego Mindstorms was a line of Lego sets that allowed users to build and program robots using Lego bricks. The first generation, Lego Mindstorms Robotics Invention System (RIS), was released in 1998 and included the RCX brick, which was a programmable controller that could be programmed using a computer. Later versions, such as the Lego Mindstorms NXT and Lego Mindstorms EV3, introduced more advanced features and programming capabilities. The sets included various sensors, motors, and building instructions to create different robot designs.
Question:
What is the Lego set 6059-1?
Answer:
The Lego set 6059-1 is called "Knight's Stronghold" and was released in 1990. It contains 224 pieces.
Was It Right?
The AI was correct but could use some more training with different data sources. It was unable to accurately answer how many Lego sets were produced in a given year. As an example, when I asked how many sets were produced in the year 2000, it answered 2, which is incorrect, as Lego produced around 400 sets in 2000. If there were some articles to train the AI on, this might improve the outputs from the AI.
Conclusion
We wanted to demonstrate that with a little bit of CloudFormation and some raw data sources, you can expect a reasonable level of accuracy from a conventional Bedrock architecture with very little effort. From here, Bedrock users can quickly and incrementally improve upon the quality of their agents by performing simple curation of their data sources. If you'd like to try things out for yourself, you can find a copy of the code used in this blog post here.