Testing the "Duplicated Shuffle" theory in SQL Server

I recently became aware of the (kinda mind-blowing) fact that the odds are essentially zero that the order of the deck of cards you just shuffled has ever occurred before in the history of Earth (we’ll discard other planets for now as that variable is not quite yet known). According to this fascinating thread on Math.StackExchange.com the odds of getting the same shuffle are (3×10^14) / 52!

That 52! (the number of possible shuffles) may look innocuous, but that of course means 52 × 51 × 50 × … × 2 which is … a very, very large number. The fact that there have been around 3×10^14 shuffles performed to date (give or take) is barely a drop in the bucket.

Inspired by (and borrowing from) this fantastic post by Brad Schultz about teaching SQL Server to “play” poker, wanting to increase my own SQL Server skills, and knowing that just because the mathematical odds doesn’t mean it won’t happen in real life, I thought I’d write a little code to simulate shuffles and see if I could come up with a duplicate.

Here’s the code:

/*
Shuffle simulator/tester
by Eric Selje

Inspired by
"Playing Poker with T-SQL
by Brad Schulz (brad@stockciphering.com), Apr2010"

Please note that you will NOT be able to see the Suit Symbols (♠,♦,♣,♥) in 
the Grid or Text Results Windows unless you set them to use a font that 
supports all Unicode characters, like Arial (Grid) or Courier New (Text).

By default, SSMS is installed with MS Sans Serif as the Grid Results font.
That font DOES NOT support all Unicode characters.

Change the Grid font via:
Tools -> Options -> Environment -> Fonts and Colors -> Grid Results

*/
SET    NOCOUNT ON;
DECLARE @shuffle INT= 0;
DECLARE @totalShuffles INT = 10000000; /* or whatever you want to try. It takes my PC about 2 minutes per million shuffles) */
DECLARE @hand NVARCHAR(108);
DECLARE @Deals TABLE ( deal NCHAR(108), hsh VARBINARY(100) );

DECLARE @DeckOfCards TABLE
    (
      SpotValue INT ,
      SpotSymbol NVARCHAR(2),
      SuitSymbol NVARCHAR(2)
    );
INSERT INTO @DeckOfCards
        SELECT  SpotValue ,
                SpotSymbol ,
                SuitSymbol
        FROM    ( VALUES ( '2', 2), ( '3', 3), ( '4', 4), ( '5', 5), ( '6', 6)
              , ( '7', 7), ( '8', 8), ( '9', 9), ( '10', 10)
              , ( 'J', 11), ( 'Q', 12), ( 'K', 13), ( 'A', 14) ) Spots ( SpotSymbol, SpotValue )
                                                              CROSS JOIN ( VALUES
                ( N'♠'), ( N'♦'), ( N'♣'), ( N'♥') ) Suits ( SuitSymbol );

WHILE @shuffle < @totalShuffles
    BEGIN
        SET @hand = '';
        WITH    ShuffleAndDeal
                  AS ( SELECT TOP 52
                                CardName = SpotSymbol + SuitSymbol
                       FROM     @DeckOfCards
                       ORDER BY NEWID()
                     )
            SELECT  @hand = COALESCE(@hand, '') + ShuffleAndDeal.CardName
            FROM    ShuffleAndDeal;
        INSERT  INTO @Deals
                ( deal, hsh )
        VALUES  ( @hand, HASHBYTES('SHA1',@hand)  );

        SET @shuffle = @shuffle + 1
    END;
/* Thinking it might be easier for SQL Server to dense_rank a hashbytes rather than an nvarchar(108)? */
WITH allDeals AS 
    (SELECT deal,hsh,DENSE_RANK() OVER (PARTITION BY deal ORDER BY deal) AS occurances FROM @Deals)
SELECT deal,occurances FROM allDeals WHERE occurances > 1;

Essentially what this does is creates a deck of cards the same way Brad did, shuffles them by sorting them using the NEWID() function, gather up the cards into one row via COALESCE, and then do a hash of that sequence of cards. My thinking there is that it’s easier for SQL Server to do a dense_rank() on that smaller hash. Let me test that out.

Lastly it inserts that instance of the shuffle into a temporary table and after it has done the required number of shuffles, checks to see if any of them are duplicated. It doesn’t output all of the shuffles (anymore) because that actually takes longer than calculating them in the first place! It will output any duplicated shuffle (I think… So far I’ve tried it with ten million shuffle recordset and there were no duplicate shuffles in it, but again ten million shuffles is nothing.)

Since it takes my PC about 2 minutes to do a million shuffles, a ten million shuffle trial was all I had the patience for today. Maybe I’ll let it run overnight sometime and see what that does. This is certainly not a perfect way to test the duplicated shuffle theory, but it sure beats shuffling the cards by hand!

This was all for my fun and learning. It reminds me of the math problems we had in high school and thought “When will we ever use this stuff?!”
I’d love anyone to point out where I could improve my code because I’m sure it could be improved!

Thanks for reading.

Testing the “Duplicated Shuffle” Odds in SQL Server

Like this:

Comments

Leave a ReplyCancel reply

Testing the “Duplicated Shuffle” Odds in SQL Server

Share this:

Like this:

Comments

Leave a ReplyCancel reply