Lego Insights with AI

What is Bedrock

I love Legos, but I find it really annoying sorting through a website with an old obscure set from my childhood in mind for a couple of hours. So I am going to create and deploy an AWS Bedrock AI that will help me find the obscure Lego sets from my childhood. We are going to use the data from Rebrickable. AWS Bedrock is a fully managed generative AI platform that allows AWS partners and customers to easily architect and deploy generative AI. This includes automatic embedding of data with integrated embedding models as well as fully managed vector storage.

Steps to Download Data

There are a few different ways to download data from Rebrickable. They provide an API that we could use, but that requires an API key from Rebrickable. Fortunately, Rebrickable provides .csv files for all their website data. This data is publicly available at this Link. These different files are all different interconnected tables in a database.

Uploading CloudFormation Template

We're going to be using some demo code that I created to create the agent and it's underlying resources. See the end of the blog post for a link to the repository. For now lets upload the CloudFormation to AWS with this AWS CLI command:

aws cloudformation deploy \
    --template-file ./src/bedrock-demo.yml \
    --stack-name bedrock-demo \
    --profile your-profile \
    --region us-east-1 \
    --capabilities CAPABILITY_NAMED_IAM

The CAPABILITY_NAMED_IAM is needed when a stack has custom-named IAM roles. This capability must be explicitly acknowledged for the stack to start creating.

Uploading Data

We must split up inventory_parts.csv — this file is too big for the knowledge base to process in one go. We can use the split command to split the file up in the terminal:

split -l 180000 inventory_parts.csv inventory_parts_

This will take a long time as the knowledge base has to process all the lines. Use this command to upload your folder with all the data to S3:

aws --profile your-profile s3 cp ./your-local-folder s3://your-s3-bucket/ --recursive

After the data is uploaded, the CloudFormation should still be in a CREATE_IN_PROGRESS state. Wait for the CloudFormation to reach a CREATE_COMPLETE.

Sync Knowledge Base

Syncing the knowledge base will take some time as the embedding model for the knowledge base has to go through and look at all the data in all {number of files} files and embed and create vectors of all the data. The embedding AI can automatically determine how the different .csv files connect to each other. There are steps we can take on the structure of our data to optimize this process but for a quick demo, we leave things as is and hope for a good enough result. We can also invoke the synchronization job using StartIngestionJob.

Prepare the AI

Once the knowledge base has finished syncing, move over to the Agents console in the Bedrock sidebar. Find your agent and click Prepare on it. The preparation of the AI shouldn't take super long. It's also possible to automatically prepare a bedrock agent with AutoPrepare. however for this demo we've left the manual step.

Asking Questions

Once the AI has completed the preparation process, we are ready to ask it a question to make sure the AI is fully trained. For example:

Question:

What was Lego Mindstorms?

Answer:

Lego Mindstorms was a line of Lego sets that allowed users to build and program robots using Lego bricks. The first generation, Lego Mindstorms Robotics Invention System (RIS), was released in 1998 and included the RCX brick, which was a programmable controller that could be programmed using a computer. Later versions, such as the Lego Mindstorms NXT and Lego Mindstorms EV3, introduced more advanced features and programming capabilities. The sets included various sensors, motors, and building instructions to create different robot designs.

Question:

What is the Lego set 6059-1?

Answer:

The Lego set 6059-1 is called "Knight's Stronghold" and was released in 1990. It contains 224 pieces.

Was It Right?

The AI was correct but could use some more training with different data sources. It was unable to accurately answer how many Lego sets were produced in a given year. As an example, when I asked how many sets were produced in the year 2000, it answered 2, which is incorrect, as Lego produced around 400 sets in 2000. If there were some articles to train the AI on, this might improve the outputs from the AI.

Conclusion

We wanted to demonstrate that with a little bit of CloudFormation and some raw data sources, you can expect a reasonable level of accuracy from a conventional Bedrock architecture with very little effort. From here, Bedrock users can quickly and incrementally improve upon the quality of their agents by performing simple curation of their data sources. If you'd like to try things out for yourself, you can find a copy of the code used in this blog post here.