Friday, January 30, 2015

Finding zeros in an array

A common thing to want to do is to find zero values in an array. This is obviously necessary for string length. So we'll start out with a test harness and a simple implementation:

#include "timing.h"

unsigned int len(char* array)
{
  unsigned int length = 0;
  while( array[length] ) { length++; }
  return length;
}

#define COUNT 100000
void main()
{
  char array[ COUNT ];
  for (int i=1; i<COUNT; i++)
  {
    array[i-1] = 'a';
    array[i] = 0;
    if ( i != len(array) ) { printf( "Error at %i\n", i ); }
  }
  starttime();
  for (int i=1; i<COUNT; i++)
  {
    array[i-1] = 'a';
    array[i] = 0;
    len(array);
  }
  endtime(COUNT);
}

A chap called Alan Mycroft came up with a very neat algorithm to simultaneously examine multiple bytes and determine whether there is a zero in them. His algorithm starts off with the idea that there are two conditions that need to be true if a byte contains the value zero. First of all the upper bit of the byte must be zero, this is true for zero and all values less than 128, so on its own it is not sufficient. The second characteristic is that if one is subtracted from the value, then the upper bit must be one. This is true for zero and all values greater than 128. Although both conditions are individually satisfied by multiple values, the only value that satisfies both conditions is zero.

The following code uses the Mycroft test for a string length implementation. The code contains a pre-loop to get to an eight byte aligned address.

unsigned int len2(char* array)
{
  unsigned int length = 0;
  // Handle misaligned data
  while ( ( (unsigned long long) & array[length] ) &7 )
  {
    if ( array[length] == 0 ) { return length; }
    length++;
  }

  unsigned long long * p = (unsigned long long *) & array[length];
  unsigned long long v8, v7;
  do
  {
    v8 = *p;
    v7 = v8 - 0x0101010101010101;
    v7 = (v7 & ~v8) & 0x8080808080808080;
    p++;
  }
  while ( !v7 );
  length = (char*)p - array-8;
  while ( array[length] ) { length++; }
  return length;
}

The algorithm has one weak point. It does not always report exactly which byte is zero, just that there is a zero byte somewhere. Hence the final loop where we work out exactly which byte is zero.

It is a trivial extension to use this to search for a byte of any value. If we XOR the input vector with a vector of bytes containing the target value, then we get a zero byte where the target value occurs, and a non-zero byte everywhere else.

It is also easy to extend the code to search for other zero bit patterns. For example, if we want to find zero nibbles (ie 4 bit values), then we can change the constants to be 0x1111111111111111 and 0x8888888888888888.

19 comments:

  1. mp3 juice com is very demanded, and everybody would like to get such platform. Here you can indeed found better-chosen files.

    ReplyDelete
  2. Your style is unique compared to other people from whom I have read material. Thanks a lot for posting when I get a chance, guess I'll bookmark this site. Is a visa required for Turkey. Yes, you may need a visa for Turkey before entering Turkey. The Turkey evisa process is fast and efficient. you can apply for a Turkey visa online.

    ReplyDelete
  3. This is a wonderful inspiring article. I am practically satisfied with your great work. You have really put together extremely helpful data. Keep it up ..Continue this... kenya visa for US citizens, The process of e-Visa application is very simple and straightforward. Applicant can complete the process within 5-10 minutes from anywhere in the world.

    ReplyDelete
  4. Great blog. I appreciate your work. Travelers who are not aware about Indian tourist visa fees, should check before applying for a visa.

    ReplyDelete
  5. I think this is an informative post and it is very useful and informative. So, I would like to thank you for the effort you put into writing this article. Apply e visa Indian online & check Indian e visa photo requirements via eta Indian e visa website.

    ReplyDelete
  6. Wow, I value your efforts. Great articles. Do you have any idea about the Indian evisa cost . Indian evisa cost is the processing fee is the amount of money that the applicant is required to pay for the visa at the time of filling the online Indian visa application.

    ReplyDelete
  7. I'm glad to be visiting your blog again after months of not doing so. It's good to see this article that I've been waiting so long for. Travelers who are planning to visit Turkey. It is important for them to be aware of the Turkish visa requirements to avoid an inconvenience during the visa application process.

    ReplyDelete
  8. It's amazing! Your way of explaining is superb. I would like to grab your attention towards our service which provides online software and web services using the latest technology which helps you to grow. We are available all over the world. You can check the above link to see more details.

    ReplyDelete