Programming for Search Engines

Programming for Search Engines 101. An area for avid PHP and .NET developers to chat about Programming techniques and how to make better use of search engines.

Moderator: Moderators

Programming for Search Engines

Postby admin@techwyse.com » Wed Nov 22, 2006 7:22 am

I am hoping that one person will step forward and do some research on google into programming and how to program so that search engines spider program heavy websites!

Please let me know if anyone would like to take on this inititiative. :D
admin@techwyse.com
Site Admin
 
Posts: 20
Joined: Thu May 04, 2006 1:27 pm

Research on how codes can be crawled

Postby jay » Wed Nov 22, 2006 8:05 am

Hi,

I will research how dynamic/programmed sites can be crawled. I will be submitting a report on saturday.

:)

jay
jay@techwyseintl.com
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Postby admin@techwyse.com » Thu Nov 23, 2006 9:19 am

FANTASTIC!!! :D
admin@techwyse.com
Site Admin
 
Posts: 20
Joined: Thu May 04, 2006 1:27 pm

TIPS ON HOW TO MAKE DYNAMIC SITES SEO FRIENDLY !

Postby jay » Sat Nov 25, 2006 3:42 am

A great advantage to using dynamic pages as opposed to static pages is the ability to create content that is constantly changing and updated in real time. RSS headlines, randomly circulating content and other automatically fresh content can boost your ranks in Google, and many other engines.

Another advantage to using PHP is that you can make simple modifications to many scripts to create relevant and fresh page titles. Since this is the most important on page factor in SEO special attention should be given to creating title tags that accurately reflect the pages current content.


Programming language PHP is taken as example for the following points:-

1. Faster execution of programs is more important than download time.

While page size does affect load time, spiders run on servers connected to high bandwidth networks, so download time is less important than the latency of the PHP script’s execution time. If a search engine spider follows a link on a site and is forced to wait too long for the server to process the PHP code behind that page, it may label your page as unresponsive.


2. Name the columns in the query instead of * and use EXPLAIN statement.

The biggest delays in a PHP script typically are the database and loop code. Avoid making SELECT * calls, instead explicitly name all the columns you want to retrieve, and if you are using MySQL, test your queries using the EXPLAIN statement. To optimize loops consider using duplicated code instead of loops that don't repeat very many times, also use as many static values, such as count($array) values inside the loop as you can, generating their values before the loop once. For more details on EXPLAIN statement go to


3. Use less GET variables, maximum can be 3.

Most of the search engines will be able to follow this link, because it has 3 or under get variables (a good rule of thumb is to keep the number of GET variables passed in the URL to under 3), but any more than 3 variables and you will run into problems. Try using less GET variables, and make them more relevant, rather that useless id numbers use titles, and other keyword rich bits of text. This is an example of a better URL:
Page.php?var=category&var2=topic


4. Disable the trans-id feature.

Possibly the biggest cause of webmaster frustration when SEO ing php pages is PHP’s tendency to add session id numbers to links if cookies are rejected by the browser (Search engine spiders reject cookies). This will happen by default if your PHP installation was compiled with the enable-trans-id option (and this is the default from version 4.2 onward), and it creates links with an additional, long nonsense looking GET variable. In addition to making the links clunky this gives spiders different URLs with the same content, which makes them less likely to treat the pages individually, and possibly not even index them at all. A quick way to fix this is to disable the trans-id feature, if you have access, in php.ini by setting session.use_trans_id to false.

5. Question mark in the URL will delay the page indexing.

However the mere presence of a question mark in the URL will introduce a delay in google’s indexing of the page. So the mere presence of a question mark in the URL doesn't throw us for a loop at all. We will crawl those pages more slowly than we do with pages without question marks in the URL. Purely because they've identified themselves as being dynamic and we certainly don't want to bring anyone's site down because we're hitting the database too hard.
Small sites will not need to worry much about this delay as it means your server is hit every few minutes, not a few times a second, but for larger sites this can slow down your sites inclusion into the index.

6. Structuring Database.

Structuring your database is an important step in the process, and must be carefully thought about; however, it may be that your database is well enough structured already - in which case, well done. Keeping your data tidy not only helps you but also will automatically make for a more SEO (Search Engine Optimised) system. We are going to use the example of a single-tier categorisation system. By that, I mean that products are sorted into a single category that fits them best. We will have two tables, one for categories and one for products.

7. Enable mode_rewrite in Apache.

The Apache server has a rewrite module (mod_rewrite) available for Apache 1.2 and beyond that converts requested URLs on the fly. You can rewrite URLs containing query strings into URLs that can be indexed by search engines. The module doesn't come with Apache by default, so find out from your Web hosting firm whether it's available for your server.

I look forward to add more points by more people ! :idea:
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Postby admin@techwyse.com » Sat Nov 25, 2006 3:56 am

This is fantastic! Can we ensure we use all of these and also develop an SOP to include all of these when programming?

Lets continue to add to this ok?

ALSO - are there any notes for programming in ASP.net?

Jay --- this is just the type of leadership we need amongst our programmers! Learning to program with search engines in mind is extremely important in bringing us to the next step! :D :D :D
admin@techwyse.com
Site Admin
 
Posts: 20
Joined: Thu May 04, 2006 1:27 pm

Essential for optimising the site

Postby prasanth » Sun Nov 26, 2006 3:02 am

Hi Jay

These points are not only important but also makes a lot of sense. We need to include these in the SOP so that this will be the standard used while coding modules for dynamic sites.

Are PMs listening? :)

Best Regards
Prasanth
prasanth
 
Posts: 304
Joined: Fri Nov 24, 2006 1:52 am

Postby admin@techwyse.com » Tue Nov 28, 2006 1:49 am

Yes! It is very important!
admin@techwyse.com
Site Admin
 
Posts: 20
Joined: Thu May 04, 2006 1:27 pm

SEO in asp.net

Postby jay » Wed Nov 29, 2006 1:59 am

I will prepare a note on how to make asp.net search engine friendly. Will submit it by saturday.
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Making asp pages SEO friendly

Postby jay » Fri Dec 01, 2006 12:15 pm

Here are some points to be taken care of before developing asp pages/projects to make it SEO friendly.

1. Avoid Postbacks

Your biggest gain in the search engine world is going to avoid the use of postbacks.

For example, say you have content within an ASP panel. But in order to display that content you use a button and capture a click event in the code behind, then you change the property of the panel to visible=true once the button is clicked. This will not work with spiders since they don't "click buttons" so to speak.

The way to write the page so the spider will work with it is to use a link, and then pass a parameter via a URL, this could be a link back to the page if you want, but then in the Page_Load event check for the parameters values to determine what panel or content to display in your page.

2. Use Friendly URLs.

Another thing to look into is the use of a URL rewriter in order to create spider friendly URLs. There are many examples on the Web for creating a URL rewriter. What a URL rewriter does is translate the parameters over to a directory like structure.

For example, mypager.aspx?param1=1&param2=2 becomes something like: /mypage/1/2/default.aspx. This will enhance the spiders efficiency in spidering your site and potentially increase the frequency of a spider doing a deep crawl through your site.

3. Always Change Titles and Meta Tags

Another issue when generating dynamic pages, and using the same page but either posting back or linking back via a URL.

Be sure to change the Title of the page and Meta description. If you do not do this then Google is going to "think" that it is the same page, and the results will not be displayed as high as you would like. It will definitely affect your search results.

One way of tacking this is to convert both the title, and meta description tags to HTML controls and then change the inner text, or dynamically generate the text for the tags when displaying different content.

4. Avoid using 'Viewstate'

Viewstate can be another thing that adversely affects the indexing of your site.

For example, if you view the source of an ASP.NET application you may see something like the following:

<input type="hidden" name="__VIEWSTATE" value="dDwtMjA3MTUw...=" />

And the value of this field can continue on for a long time. The problem this has with search engines is many times a search engine will rank your page based on where a keyword occurs in the document.

For example, say you're searching on ASP.NET and you first have 100k of viewstate and then your keyword appears within the HTML document. This could affect how your page ranks for that keyword since many search algorithms base relavancy on where the keyword appears or how close to the top of the document it appears.
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Rewriting SEO Friendly URL's - For PHP Programmers.

Postby jay » Tue Feb 05, 2008 6:12 am

On today’s Internet, database driven or dynamic sites are very popular. Unfortunately the easiest way to pass information between your pages is with a query string. In case you don’t know what a query string is, it's a string of information tacked onto the end of a URL after a question mark.

So, what’s the problem with that? Well, most search engines (with a few exceptions - namely Google) will not index any pages that have a question mark or other character (like an ampersand or equals sign) in the URL. So all of those popular dynamic sites out there aren’t being indexed - and what good is a site if no one can find it?

The solution? Search engine friendly URLs. There are a few popular ways to pass information to your pages without the use of a query string, so that search engines will still index those individual pages.

SEO Friendly URL Example:-

http://www.techwyse.com/search.php/999/12.

Apache has a "look back" feature that scans backwards down the URL if it doesn’t find what it's looking for. In this case there is no directory or file called "12", so it looks for "999". But it find that there's not a directory or file called "999" either, so Apache continues to look down the URL and sees "search.php". This file does exist, so Apache calls up that script. Apache also has a global variable called $PATH_INFO that is created on every HTTP request. What this variable contains is the script that's being called, and everything to the right of that information in the URL. So in the example we've been using, $PATH_INFO will contain search.php/999/12.

So, you wonder, how do I query my database using search.php/999/12? First you have to split this into variables you can use. And you can do that using PHP’s explode function:

$var_array = explode("/",$PATH_INFO);

Once you do that, you’ll have the following information:

$var_array[0] = "search.php"

$var_array[1] = 999

$var_array[2] = 12

So you can rename $var_array[1] as $article and $var_array[2] as $page_num and query your database.
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Re: Caution : While Programming Dynamic URL's

Postby jay » Sun Jul 11, 2010 12:27 pm

What if we keep on changing the URL's when a record / post is edited from the admin. Here are the consequences :-

1. Will generate a 404 error for the search engine indexed pages.

2. Page ranking will go down.

3. Leads / Conversions ie, sales will be lost.

4. Users will have bad impression about the site and it effects the reputation of the company.

5. Issue can be identified only at a later stage because the internal site links will work perfect but actually it effects the indexed pages.

Steps to avoid such issues.

1. Consult with SEO team if the program is related with dynamic URL's.

2. Ask for a complete and clear document related with URL functionalities like :-

a) Structure of the URL like category/posttitle or date/posttitle etc.
b) Should the admin have the option to edit the url or whether it is automatically created
c) Should the url change when post title is edited
d) how the separators - hyphen and slash should be used etc.. etc..

3. Don't start work if you are unclear of work and if no docs are provided.

4. Once the work is finished do a thorugh check and clarify SEO team of how it is done and request to verify adding / editing the posts and make sure that it doesn't affect the indexed pages and the redirections done in .htaccess

Suggestions welcome !
Jay M
Write Less, Do More
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Re: Caution - While Programming Dynamic URL's

Postby douglas » Mon Jul 12, 2010 2:54 am

Hi Programmers and SEO guys,

Please read the entire posts in this thread. See how long back DJ, Jay and Prasanth started discussions on dynamic URLs.

Though we planned on these things more than 3 years back and come up with solutions, still we are having issues with Dynamic URLs. It's just because taking things as so simple. It's quite simple but a small mistake may cause huge loss. Here is the importance of QA and Verification.

Last week a client lost around 1500 visits due to improper verification of the work by the concerned (doer and receiver) and not sending for QA.

So hereafter please take care in doing critical work like this and ensure QA by QA-Team & SEO-Team is performed intact.

It's also important that how the URL is reflected when adding and editing the item.
douglas
 
Posts: 282
Joined: Fri Feb 20, 2009 3:56 am

Re: Programming for Search Engines

Postby DJ » Tue Jul 13, 2010 11:10 am

SEO friendly URLs are certainly a very big component is delivering a neatly organized site and helping SE's understand what each page does!
The Deej
|
DJ
Site Admin
 
Posts: 1022
Joined: Thu May 04, 2006 4:47 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 11 guests