CPython 2.7.5 中使用 join 寫法來做字串加法比較快?還是使用 += ?

如果你在 windows 下執行下面的程式碼，你會發現用 += 會比 join 寫法還快

a='''
def test_join():
    data = []
    for i in range(100):
        data.append('ad')
    return ''.join(data)
test_join()
'''
b = '''
def test_connect():
    data = ""
    for i in range(100):
        data += 'ad'
    return data
test_connect()
'''

import timeit
print timeit.timeit(a, number=10)
print timeit.timeit(b, number=10)

如果你稍微改一下程式，再執行下面的程式碼，你會發現用 join 會比 += 寫法快

a='''
def test_join():
    data = []
    for i in range(10000000):
        data.append('adsssssssssssssssssssssssssssssssssssssssssssssssssssss')
    return ''.join(data)
test_join()
'''
b = '''
def test_connect():
    data = ""
    for i in range(10000000):
        data += 'adsssssssssssssssssssssssssssssssssssssssssssssssssssss'
    return data
test_connect()
'''

import timeit
print timeit.timeit(a, number=10)
print timeit.timeit(b, number=10)

可是你如果把上面那兩個程式拿到 linux 環境下跑，你會發現在 linux 環境下， += 跑的比 join 的寫法快。怎麼一回事呢?

來看一段 CPython 2.7.5 的執行字串相加的程式碼

static PyObject *

string_concatenate(PyObject *v, PyObject *w,
                   PyFrameObject *f, unsigned char *next_instr)
{
    /* This function implements 'variable += expr' when both arguments
       are strings. */
    Py_ssize_t v_len = PyString_GET_SIZE(v);
    Py_ssize_t w_len = PyString_GET_SIZE(w);
    Py_ssize_t new_len = v_len + w_len;
    if (new_len < 0) {
        PyErr_SetString(PyExc_OverflowError,
                        "strings are too large to concat");
        return NULL;
    }
 
    if (v->ob_refcnt == 2) {
        /* In the common case, there are 2 references to the value
         * stored in 'variable' when the += is performed: one on the
         * value stack (in 'v') and one still stored in the
         * 'variable'.  We try to delete the variable now to reduce
         * the refcnt to 1.
         */
        switch (*next_instr) {
        case STORE_FAST:
        {
            int oparg = PEEKARG();
            PyObject **fastlocals = f->f_localsplus;
            if (GETLOCAL(oparg) == v)
                SETLOCAL(oparg, NULL);
            break;
        }
        case STORE_DEREF:
        {
            PyObject **freevars = (f->f_localsplus +
                                   f->f_code->co_nlocals);
            PyObject *c = freevars[PEEKARG()];
            if (PyCell_GET(c) == v)
                PyCell_Set(c, NULL);
            break;
        }
        case STORE_NAME:
        {
            PyObject *names = f->f_code->co_names;
            PyObject *name = GETITEM(names, PEEKARG());
            PyObject *locals = f->f_locals;
            if (PyDict_CheckExact(locals) &&
                PyDict_GetItem(locals, name) == v) {
                if (PyDict_DelItem(locals, name) != 0) {
                    PyErr_Clear();
                }
            }
            break;
        }
        }
    }
 
    if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {
        /* Now we own the last reference to 'v', so we can resize it
         * in-place.
         */
        if (_PyString_Resize(&v, new_len) != 0) {
            /* XXX if _PyString_Resize() fails, 'v' has been
             * deallocated so it cannot be put back into
             * 'variable'.  The MemoryError is raised when there
             * is no value in 'variable', which might (very
             * remotely) be a cause of incompatibilities.
             */
            return NULL;
        }
        /* copy 'w' into the newly allocated area of 'v' */
        memcpy(PyString_AS_STRING(v) + v_len,
               PyString_AS_STRING(w), w_len);
        return v;
    }
    else {
        /* When in-place resizing is not an option. */
        PyString_Concat(&v, w);
        return v;
    }
}

PyString_Concat 的作用是，接受兩個字串，然後把那個新字串相加，再放回 v 變數。因此，使用 PyString_Concat 這個 function 每次都會產生新字串。為了避免一直產生新字串，因此上面這段程式碼會偵測是否是做類似 a += b的字串相加，是的話，他會嘗試使用 _PyString_Resize 把 a 字串所佔用的空間加大，再把 b 資料複製到 a 後面。而 _PyString_Resize 是直接呼叫 C 的 realloc 。
因此+= 速度快不快取決於 realloc。當 realloc 要調整一塊記憶體時，有兩種情形1.原本地方空間夠大，因此只要原地擴張，這個效能最好2.原本地方空間不夠大，所以要搬去更大的地方，這個情形很花時間，因為他需要把資料複製過去。 += 一開始很快，因為 realloc 時，需要搬家的情形很少，因此速度比 join 快。當字串越來越大，每次 += 時 realloc 都要搬新家(搬家意味著產生新字串 => 分配新空間，複製資料，釋放舊空間)，這個時候效能就會低落了。

list 的 CALL_FUNCTION 比字串做 BINARY_ADD 慢很多，因此在少量資料時， += 會優於 join 寫法。當資料量大時，每次 += 字串都要搬新家。此時， join 的優勢就會出來。

python 的 list 相當於 c++ 的 vector。因此，當 list 空間不夠時，他也會執行 realloc。大資料時， join 寫法會表現的比 += 快，理由有二 1. 雖然 list 搬家時，也要複製資料，但是因為 list 只存指標，所以複製量不會太大(字串相加是整個字串含內容都要複製過去)。2. 每次 realloc 時，他都會要很多空間。不像字串，只會要他所需空間。

至於 += 在真實情形下有沒有比較好，不是跑簡單測試的程式就能說明的。實際上要看你 realloc 會不會時常搬家。我想這就是為何 pep8 裡會有這段的原因。

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

有個不知道實不實用的技巧，為了減少有些函數會產生暫時物件，而導致記憶體空間碎裂，我們可以把那些函數丟給其他 process 處理，這樣也有個好處。新建的 process 不會有記憶體空間碎裂的問題。

程式的窩

2013年11月1日星期五