Визначте відсутнє число у потоці даних


14

Ми отримуємо потік з n1 попарно різних чисел із множини {1,,n} .

Як я можу визначити пропущене число за допомогою алгоритму, який зчитує потік один раз і використовує пам'ять лише O(log2n) біт?

Відповіді:


7

Ви знаєте, що , і томуS=n(n+1)i=1ni=n(n+1)2 можна кодувати вбітахO(log(n)),це можна зробити впам'ятіO(logn)і в одному шляху (просто знайдітьS-currentSum, це число відсутнє).S=n(n+1)2O(log(n))O(logn)ScurrentSum

Але цю проблему можна було б вирішити в загальному випадку (для постійного ): ми маємо k пропущених чисел, з’ясуємо їх усі. У цьому випадку замість обчислення справедливої ​​суми y i , обчисліть суму j'st потужності x i для всіх 1 j k (я припускав, що x i є відсутніми числами, а y i - вхідними числами):kkyixi1jkxiyi

i=1kxi=S1,i=1kxi2=S2,i=1kxik=Sk (1)

Пам'ятайте , що ви можете вирахувати просто, тому що S 1 = S - y i , S 2 = i 2 - y 2 i , ...S1,...SkS1=SyiS2=i2yi2

Тепер для пошуку пропущених чисел слід вирішити щоб знайти всі x i .(1)xi

Ви можете обчислити:

, P 2 = x ix j , ..., P k = x i ( 2 ) .P1=xiP2=xixjPk=xi (2)

Для цього пам’ятайте, що , P 2 = S 2 1 - S 2P1=S1 , ...P2=S12S22

Але - коефіцієнти P = ( x - x 1 ) ( x - x 2 ) ( x - x k ), але P можна враховувати однозначно, тому ви можете знайти пропущені числа.PiP=(xx1)(xx2)(xxk)P

Це не мої думки; читайте це .


1
I don't get (2). Maybe if you added in the sums' details? Does Pk miss a ?
Raphael

@Raphael, Pi is Newton's identities, I think if you take a look at my referenced wiki page you can get the idea of calculation, each Pi could be calculated by previous Ps, Sj, remember simple formula: 2x1x2=(x1+x2)2(x12+x22), you can apply similar approach to all powers. Also as I wrote Pi is sigma of something, but Pk doesn't have any Σ, because there is just one Π.

Be that as it may, answers should be self-contained to a reasonable degree. You give some formulae, so why not make them complete?
Raphael

11

From the comment above:

Before processing the stream, allocate log2n bits, in which you write x:=i=1nbin(i) (bin(i) is the binary representation of i and is pointwise exclusive-or). Naively, this takes O(n) time.

Upon processing the stream, whenever one reads a number j, compute x:=xbin(j). Let k be the single number from {1,...n} that is not included in the stream. After having read the whole stream, we have

x=(i=1nbin(i))(ikbin(i))=bin(k)ik(bin(i)bin(i))=bin(k),
yielding the desired result.

Hence, we used O(logn) space, and have an overall runtime of O(n).


3
may I suggest an easy optimization that makes this a true streaming single-pass algorithm: at time step i, xor x with bin(i) and with the input bin(j) that has arrived on the stream. this has the added benefit that you can make it work even if n is not known ahead of time: just start with a single bit allocated for x and "grow" the allocated space as necessary.
Sasho Nikolov

0

HdM's solution works. I coded it in C++ to test it. I can't limit the value to O(log2n) bits, but I'm sure you can easily show how only that number of bits is actually set.

For those that want pseudo code, using a simple fold operation with exclusive or ():

Missing=fold(,{1,,N}InputStream)

Hand-wavey proof: A never requires more bits than its input, so it follows that no intermediate result in the above requires more than the maximum bits of the input (so O(log2n) bits). is commutative, and xx=0, thus if you expand the above and pair off all data present in the stream you'll be left only with a single un-matched value, the missing number.

#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>

using namespace std;

void find_missing( int const * stream, int len );

int main( int argc, char ** argv )
{
    if( argc < 2 )
    {
        cerr << "Syntax: " << argv[0] << " N" << endl;
        return 1;
    }
    int n = atoi( argv[1] );

    //construct sequence
    vector<int> seq;
    for( int i=1; i <= n; ++i )
        seq.push_back( i );

    //remove a number and remember it
    srand( unsigned(time(0)) );
    int remove = (rand() % n) + 1;
    seq.erase( seq.begin() + (remove - 1) );
    cout << "Removed: " << remove << endl;

    //give the stream a random order
    std::random_shuffle( seq.begin(), seq.end() );

    find_missing( &seq[0], int(seq.size()) );
}

//HdM's solution
void find_missing( int const * stream, int len )
{
    //create initial value of n sequence xor'ed (n == len+1)
    int value = 0;
    for( int i=0; i < (len+1); ++i )
        value = value ^ (i+1);

    //xor all items in stream
    for( int i=0; i < len; ++i, ++stream )
        value = value ^ *stream;

    //what's left is the missing number
    cout << "Found: " << value << endl;
}

3
Please post readable (pseudo) code of only the algorithm instead (skip main). Also, a correctness proof/argument at some level should be included.
Raphael

4
@edA-qamort-ora-y Your answer assumes that the reader knows C++. To someone who is not familiar with this language, there is nothing to see: both finding the relevant passage and understanding what it's doing are a challenge. Readable pseudocode would make this a better answer. The C++ is not really useful on a computer science site.
Gilles 'SO- stop being evil'

3
If my answer proves not to be useful people don't need to vote for it.
edA-qa mort-ora-y

2
+1 for actually taking the time to write C++ code and test it out. Unfortunately as others pointed out, it's not SO. Still you put effort into this !
Julien Lebot

9
I don't get the point of this answer: you take someone else's solution, which is very simple and obviously very efficient, and "test" it. Why is testing necessary? This is like testing your computer adds numbers correctly. And there is nothing nontrivial abt your code either.
Sasho Nikolov
Використовуючи наш веб-сайт, ви визнаєте, що прочитали та зрозуміли наші Політику щодо файлів cookie та Політику конфіденційності.
Licensed under cc by-sa 3.0 with attribution required.