How I Got One Million Pages Ranked in Google | by Peter Avritch | DataDrivenInvestor

2022-04-21 12:16:51 By : Ms. Clara Lin

This is a case study detailing the careful steps I took through trial and error to get over a million pages ranked in Google for an e-commerce website.

I had previously written two popular articles on extreme SEO techniques. The first was The Secret to My Long Tail SEO Success and the other was Using Machine Learning to Improve Upholstery Fabric Discovery. This is the next in the series where I dive deep into how I leveraged software development and automation to build my pages and keep them fresh.

Because of the sheer amount of material to cover, I’m dividing this article into multiple parts. In this first part, I’ll cover the business case, the problems needing to be solved, and the general steps toward an effective solution.

In Part 2 I’ll begin to discuss the specific tools and techniques I developed to work with hundreds of thousands of products, make them discoverable, and importantly, get them ranked in Google so I’d receive a steady stream of free organic traffic which generated millions of dollars in annual sales.

Did I really get a million pages ranked? Yes.

Was it spammy? Absolutely not.

Straight to the point — to get Google to give you this kind of love, you need to have a lot to say. And if you’re reading this, you already know that content is king and Google can spot spam from miles away. But, as I was quick to point out in my previous article, be good to Google and Google will be good to you.

In my case, and what I’ll talk about here, is that I was previously a cofounder of an e-commerce company named Inside Stores selling upholstery fabric and wallpaper. We sold over 500,000 patterns of drop-ship products with nearly all of our website traffic coming from organic SEO — we rarely paid for clicks.

Needless to say, with that many core products, then adding in the variants and discovery collections, it wasn’t so hard to get to a million. The trick, of course, was to generate quality pages that Google would see as valuable and worthy enough to include in their search results, and that’s what I’ll spill on below.

I’ll be the first to admit, our business model lent itself very nicely to the automation techniques I’m about to describe. This won’t be the case for everyone, but if you find yourself needing to solve some of the same kinds of problems I faced, then hopefully this article will provide some valuable insight that just might translate into your space.

One of the first things people learn when building large sites is that Google rarely indexes all of their pages. If you want to be loved by Google, you’ve got to earn it. Google can’t love everyone, so they guard their resources carefully and dole out their love only to the extent it serves them.

But how do you earn Google’s love? No doubt, it took me quite some time to figure that out through lots of trial and error, but once I worked out the magic formula, I soon found Google scanning over 700,000 pages per day from our fabric shop. And why, because they decided our content was important.

Now, although I’ve long since moved on from Inside Stores, the techniques I devised during those founding years are still very much relevant and viable today when implemented correctly. They’re evergreen. In fact, I’m using most of them all over again in my new startup in the beauty space, Hello Gloss.

Next, it’s hard work — really hard. It was my full-time job for several years running as I bootstrapped the company and put all these pieces together.

In my view, this automated SEO approach was a better long term investment than paying Google for one-off clicks; notwithstanding, the business model probably wouldn’t have been viable if we had to buy most of our traffic.

And now that I have all this knowledge and code to fall back on, I’m happy to share what I’ve learned.

Moreover, as people reading my earlier posts started contacting me for help, it turns out that this is quite a niche area that isn’t readily understood by most e-commerce companies — and what started out as me simply posting about some fun things I’ve done along the way has turned into a whole new gig for me as I advise companies on how to put these techniques into action.

That said — I do consulting and speaking on these topics and I welcome being contacted to discuss opportunities.

Lastly, although my techniques are for the most part industry agnostic, I have a wealth of plug-n-play code modules all ready to go for fabric and wallpaper sellers. If this is you, I encourage you to get in touch and and see if I might just have what you need to get a jump start on this essential automation.

This article wouldn’t make sense to most people without me first describing the business case and the problems I was looking to solve. Here, I’m going to talk about my specific experience with Inside Stores selling fabric and wallpaper, but the problems are generally global in nature for any e-commerce company with tens of thousands of products.

First off, why fabric and wallpaper? Straight up — it was all drop ship, the products were soft and light weight and didn’t break in the mail, returns were rare, and importantly, it wasn’t a space Amazon cared about for the longest time. Orders were typically over $500 and much larger sales were quite common.

And as a bonus, shoppers were quick to pay $7 for swatches which were mostly free to us from the manufacturers in the early years. People in a decorating mood would purchase so many of them that we had to cap them out at 20 per order. The swatches also created a perfect gateway for building our mailing list and starting customers on drip campaigns toward a bigger purchase.

But there were a few other key factors that made this space especially keen for me, starting with the sheer scale of things, combined with the need to deal with a variety of interesting technical challenges, including ingesting wonky data from dozens of different technology-challenged suppliers who strangely resisted any kind of automation. Apparently I love puzzles.

And then there was the aspect of visual search which allowed me to start playing with machine learning for the first time and craft a search experience for shoppers which kept them engaged for insane amounts of time.

I quickly realized that if I could overcome these key friction points that I’d be leagues ahead of the competition. And it turned out I was right. See the screen shot at the bottom of this post — we were doing millions of dollars in annual sales at an average 35% gross margin without needing to pay for traffic.

Crucially, all the factors which made the project fun for me were huge impediments for most competitors, and for the most part, they still are.

Even wildly popular shopping platforms like Shopify (which is truly an awesome platform) still can’t satisfy the scale, data ingestion and visual discovery needs of this space for a top tier player. And if anyone was in fact set on using Shopify for their front end in this space, they’d still need most of the back-end tooling I’ll be describing later in this article. There’s just no escape.

In sum, few, if any, of our competitors had the understanding and developer talent needed to overcome these high hurdles. And they couldn’t just get something off the shelf — nothing of the sort existed. This was our moat.

Yet there was one more thing which separated us from the herd — our search and discovery experience was blindingly fast; powered by $10,000 Dell servers with solid state disk drives and 96GB of memory. Everything was cached. But this too was only possible because of our custom tooling specific to this space.

Success always starts by understanding the problems and formulating a plan.

Here, I had over 500K products from 50 suppliers in a space without any data standards; notwithstanding, the products were visual in nature and I needed to figure out how to make them discoverable at scale.

Think about it — these manufacturers never needed to create descriptions for their products because for centuries customers just visited a store and browsed the shelves or flipped through books of fabric samples.

But the biggest hurdle was figuring out how to get the pages for all these products ranked in Google when so many of them would be strikingly similar. Crucially, the success of this business required a strong organic footprint in Google because the math simply wouldn’t work if we had to pay for clicks.

Now that I’ve loosely defined the problem space, I’ll next discuss some of the factors that I took into consideration as I pieced together a solution.

Simply stated, if you don’t love your pages, and give your pages love, nobody else will either. And that includes shoppers as well as Google. Website pages are like fine art — we all know crap when we see it.

Tending to a website takes hard work and lots of time. Sometimes you can spend hours just working on a single page. You’ve surely spent enough time online to know the caliber of pages needed to rank above the pack. It takes a lotta love if you’re going to be competitive in today’s world.

Now, try loving a million pages — every day! And that’s the crux of the problem I set out to solve.

Be good to Google and Google will be good to you. I’m not going to spend too much time here on this because I previously dedicated an entire article to this topic. But trust me, these are words to live by.

Key here is that Google’s mission is to give searchers a great experience. Their hope is to surface the very best pages for every search. Moreover, they have enough data and experience to know what kinds of pages are meaningful, which ones are weak, or worse, plainly spam.

Google uses a variety of metrics to look for love. These can include anything from page design, internal linking strategies, on-page text, html errors, how much time visitors spend on your pages before clicking away; and importantly, who else is showing you love by linking to your site.

In short, Google can tell if you and others love your pages, and this includes watching to see if you’re regularly tending to them by making changes here and there to keep things fresh.

If you’ve got great fresh content that Google thinks people will enjoy seeing, you’ll be rewarded with lots of free traffic. Your goal should be to help Google see that your pages are deserving of being highly ranked above others across the board.

View this as a win-win partnership. Google gets to provide searchers with a best-in-class experience, and you get tons of free traffic for the assist.

Most importantly, don’t get trapped into thinking that Google sends lots of free traffic to pretty much everyone. They don’t. There’s a huge difference between occasional clicks and enough traffic to support a company with with employees, infrastructure, and all the other costs of running a profitable business.

In my view, this is all just table stakes for being in the game — if you cheap out, you’ll lose out. Be clear minded about what it’s going to take to win.

At this scale, everything needs to be automated with custom programming. This includes ingestion, stock status, data cleansing, linking strategies, collections and curation, search, recommendations, product descriptions, tagging, url structure, meta data, server caching, personalized drip campaigns, and more.

As a software developer, I naturally slant this way in my thinking because it would be virtually impossible to manually touch a million pages to keep them fresh, ranked, and discoverable.

At the top of the article I noted that this was my full time job for several years. It’s a serious investment — but it’s also the key to building a moat and fending off competition. My obvious challenge was to approach this in ways my competition would never consider, or have the technical means to pull off.

Of course, the most significant benefit of complete automation is that onboarding new suppliers with tens of thousands of products becomes quite simple, and going forward, it doesn’t take any real effort to keep things fresh.

Below is a screen shot of tooling developed to keep our products updated.

I’ll dig into the specific kinds of automation I deployed later in Part 2 of this article, after I’ve finished describing the rest of the essential puzzle pieces needed to effect a successful solution to the overall problem space.

We all know from Alice in Wonderland that rabbit holes are magical and enchanting. Done right, they’re inviting and need to be explored. And what’s better than a rabbit hole — lots of rabbit holes, of course!

Consequently, one of the most important aspects of my role was to constantly be adding new ways (rabbit holes) for shoppers to search for and discover products. I knew my customers didn’t always know right off what they were looking for, so I needed to make it fun to poke around and keep exploring until they finally came upon exactly the right fabric for their new sofa, or drapes.

It takes a lot of investment and effort to get shoppers to your website. Don’t let them leave — at least not because they simply ran out of things to see. That back button on the keyboard is your worst enemy, and you’re always just one tap away from visitors giving up and going back to Google.

Here’s another way to look at it — if your potential customer is on a particular page on your site which isn’t yet the product that tickles them, then you better offer them some more enchanting rabbit holes to explore, else face sudden death by back button.

The take-away: become a digger.

Dig as many enchanting rabbit holes as you can dream up. Use automation and programming to spin curated collections and recommend similar products.

A final note on rabbit holes — they work big time! Once I implemented the machine learning discovery and curation strategies for fabric and wallpaper, the time customers spent on our sites skyrocketed, and so did sales. Customers loved browsing the different patterns and requesting insane numbers of $7 swatch samples — the key was making the journey fun for them as they decorated their homes.

Below are a few simple examples of what I did for upholstery fabric. There’s also a link to a full article discussing how I approached the discovery problem by using machine learning to curate lists and make recommendations.

One of the big lessons I learned early on in my SEO journey was that something that’s way better than having your top few pages come up number one in Google is having thousands of your pages come up number one, or at least on the first page, across a multitude of search terms.

I call this fishing with a net. And frankly, it’s the only way to achieve scale.

Fishing with a net isn’t just about having lots of product pages. It’s about ancillary curation strategies that can be crafted through automation. I’m constantly asking myself “how can I spin up more pages?”

Another key factor in this strategy is that shoppers frequently don’t truly know what they’re looking for — they’ll know it when they see it. And this is certainly the case for fabric and wallpaper. This means that the information being pushed out to Google should be designed to attract visitors to your store irrespective of what inspired their initial search.

My general perspective is that when you have 500K products, as I did, the likelihood of somebody purchasing the exact product that they clicked on to come to your site is pretty much nil — so focus on catching them in your net and getting them into your store rather than selling them specifically whatever they clicked on first.

Organic traffic is generally the traffic to your site which comes from free sources like Google searching and inbound links — as opposed to when you buy clicks (PPC).

While buying clicks is always easy, it gets expensive real fast. And frequently, if needing to pay for all of your traffic, the math for acquisition and conversion costs goes negative and your business wouldn’t be profitable, and that was exactly the situation I had with Inside Stores. Our success was premised on spending dollars on development and automation, not PPC.

One of the other big reasons to prefer SEO over PPC is that well implemented SEO strategies are evergreen — they live on for a very long time, as opposed to PPC which is just a one-off click that may or may not lead to a sale.

The other problem with PPC for sites that publish tens of thousands of pages is that you cannot possibly manually create meaningful campaigns for all those topics — so you would need to automate everything. And once you need to do that, you might as well go all in on SEO and reap the rewards.

In the SEO world, long tail is a term typically used to describe search phrases that are three or more words long. And when you have hundreds of thousands of pages, you can’t just label them all as just “fabric” or even “blue fabric” because Google will just cherry pick a few of them and skip over the rest.

Every page needs to be unique and have its own purpose for being— this is what Google wants to see if you’re going to get them to index and rank most of your pages.

And of course, the big benefit of lots of pages targeting these multi-word searches is that you have a bigger footprint in Google and have a better chance of getting a visit.

I won’t go any deeper on this topic here because I’ve already written a separate article which gets deep into the weeds of this incredibly powerful strategy (see below).

One of the core benefits of extreme automation is that onboarding new vendors is quite easy, even when most have over 10,000 products. In my case, I had to write some initial integration logic to ingest their data in a form that made sense for my needs, but after that, all the rest of our tooling pipeline kicked in without any further changes.

Specific to my use case in fabric and wallpaper, for each vendor, I’d create modules to learn about new products, watch for content updates, and query in batch and on demand for stock status and price changes. Due to the sheer scale of our data, this granularity was needed to prioritize workloads.

Cleaning up dirty data isn’t particularly glamorous, but it’s an essential component to building best-in-class search and discovery experiences. By normalizing data, specifications, category names, abbreviations, and descriptions into a well-defined schema and taxonomy across all vendors, it just makes everything else that much easier.

Once everything in the database is normalized into a common format, your output content, from titles to meta descriptions, will all be consistent, irrespective of the frequently wonky information provided by vendors.

It doesn’t matter if your site has one page, one hundred, or one million. If you want Google to pay attention, you’ve got to keep things fresh. At the very least, this includes pricing and stock status, but content and slight layout changes will train Google to realize that they better scan often to keep up with your latest changes.

Conversely, if Google sees your pages are rarely touched, they’ll soon conclude these pages aren’t all that important to you and lower their value and ranking in their search results; and worse, scan less frequently.

In short, freshness is a key factor Google uses to see if you’re love’n your pages. You must keep it fresh. And in my case, that meant getting the latest data on all 500K products nearly every day and then recreating all of the collections and product galleries to include the latest changes.

In the drop ship fabric and wallpaper space, one of the core issues we faced was that although suppliers might offer between 10,000 to 20,000 SKUs each, at any one time, 20 percent of them might be temporarily out of stock; or suddenly we’d find hundreds of products get discontinued one day without any advance notice. This meant we had to update our on-site search, discovery and machine curated collections daily to ensure we maintained a quality user experience — we only surfaced products shoppers could buy right then and there.

Googlebot is Google’s web crawler that scans websites for the information needed to power its search services. I like to say that this is how Google shows its love back to you — by scanning your website with purpose and frequency.

Google is always scanning the web, but not every site every day. That would be nearly impossible. So instead, they pick and choose who to scan, how deep, and how often. And if you’ve got a sizable site, you know they truly love you when they scan your entire site every day.

In the screen below, I created a tool to show how many of my page views were coming from Googlebot, and I was absolutely astonished to see Google was scanning what amounted to about 700,000 pages per day on one of my sites.

Interestingly, Google was reading 10 times more pages than our regular website visitors. But that was perfectly fine — feeding the monster was all calculated into our bandwidth and server metrics, and I was overjoyed to see it all working. And to be sure, this cost only a fraction of what buying clicks would cost.

What’s important is to understand why Google was scanning so frequently. It was because they learned over time that the content on our pages changed often, and that included layout, pricing, stock status, internal linking, and lists of similar or recommended alternative products.

Notably, I trained Google to see that if they didn’t come by often enough, they’ll be serving up stale information. And moreover, they learned our content was important enough that it was sensible for them to spend the money to do so.

But here’s the big take-away: in knowing that Google doesn’t do this for everyone, they’re doing it here because they intend to surface these pages high in their search results and they want to be sure the information they present is indeed fully up to date.

Think about it — this takes a lot of resources for Google to keep doing this. That’s over 250 million page reads per year. They’re not going to do this for pages that are so far down in the search results that nobody will ever see them.

Finally, whenever I start on big projects like this that will take many months, if not years to complete, I always try to think about alternative uses for the same code modules right up front — because having some of those use cases in mind typically shapes my design decisions.

For example, the programming for what I’ve been describing here started out only for upholstery fabric, but then quickly repurposed for wallpaper, and then eventually area rugs. You might happen to notice that all these genres share most of the same problems (scale, freshness, visual discovery, etc.).

Another interesting use I envisioned for this same code was for being the back end of a fabric or wallpaper discovery platform, and possibly also for quilting. When I left Inside Stores, I thought long and hard about building a suite of these discovery platforms since I had all the code needed for a huge jump start and my agreement expressly allowed me to do so, but I ultimately chose another direction for the near term.

In Part 2 I’ll start getting a bit more technical as I dive into the specific techniques I devised along with the tooling developed to automate the entire process.

empowerment through data, knowledge, and expertise. subscribe to DDIntel at https://ddintel.datadriveninvestor.com

Serial founder and software architect. CTO at Hello Gloss. Love to build and code — started with Lego.